“Resolved” Is One of the Most Misleading Words in Engineering

I keep a lot of thoughts in my head that don’t really fit anywhere. They show up when I’m staring at an alert dashboard at 2am, or when something breaks “for no reason,” or when I’m trying to explain to someone why the graph looks fine but still makes me nervous. Most of those thoughts just… sit there. Half-formed. Unwritten. Slowly rotting. This blog is me trying to stop that. I work on large systems for a living. The kind where nobody fully understands how things actually behave, even if we all pretend we do. On paper, everything has an owner, an SLO, a dependency graph. In reality, incidents start with one weird blip, then spread sideways in ways that only make sense after you’ve already lost an hour of your life. Here’s a strong opinion I’ve had for a while: Most postmortems are lies. Not malicious ones—just tidy stories we tell ourselves because the truth is too uncomfortable. The real cause is usually some awkward combination of timing, missing context, and humans making reasonable decisions with bad information. We compress all of that into a clean root cause because messiness doesn’t fit in a doc. I’ve watched alerts fire, resolve, fire again, then quietly disappear—only to turn into a real incident later. The system technically “recovered.” Everyone relaxed. Something still felt off. It usually is. That gap—that uneasy feeling between “green” and “safe”—is where I spend most of my energy. Lately I’ve been thinking a lot about models. Not ML hype models. Mental ones. Virtual ones. The fake-but-useful representations we build to reason about reality. A fault chain isn’t reality. An incident timeline isn’t reality. They’re sketches. Approximations. And honestly, that’s fine—as long as we don’t confuse the sketch for the thing itself. One thing I didn’t expect: becoming a parent sharpened this instinct. When you’re watching a kid, you learn quickly that “nothing is wrong” and “everything is fine” are not the same state. Silence can mean sleep—or trouble. You learn to read weak signals without overreacting to every noise. Operations isn’t that different. This blog is where I want to put those thoughts. The half-baked ones. The opinions that don’t survive slide decks. The ideas that feel obvious only after you’ve lived through them. Some posts will be technical. Some won’t. Some might be me arguing with myself in public. Thinking privately doesn’t scale, and pretending certainty where there isn’t any just makes systems—and people—worse. I don’t know exactly who this is for. Probably people who’ve looked at a “resolved” incident and thought, yeah… but not really. If that’s you, stick around. I’m not here to teach. I’m here to think. Let’s see what falls out. March 20, 2026 — Mill Valley. Resolved.