The Difference Between Monitoring and Watching

Every AI team says they monitor their models. When you ask what that means, the answer is almost always: there is a dashboard. That is not monitoring. That is watching.

Every AI team says they monitor their models. Ask what that means, and the answer is almost always the same: there's a dashboard. Somebody set it up. It shows accuracy metrics, maybe some drift indicators, maybe inference latency. It exists. Somewhere.

That's not monitoring. That's watching.

Watching is passive

Watching means information is available if someone goes looking for it. A Grafana dashboard with model performance metrics is watching. A weekly report that gets emailed to a distribution list is watching. A Slack channel where alerts post but nobody has explicitly agreed to act on them is watching.

Watching feels productive because the infrastructure exists. The dashboard is real. The metrics are real. But watching only works if someone happens to look at the right chart at the right time and recognizes that something is wrong. In practice, that almost never happens. People are busy. Dashboards accumulate. The one with model accuracy sits next to fifteen other dashboards, and nobody has it open unless they're preparing for a review.

Watching is a system that depends on human initiative to function. Which means it functions intermittently, inconsistently, and usually too late.

Monitoring is active

Monitoring means the system tells you when something is wrong. Not "the data is available if you look" — the system reaches out and interrupts someone. Specifically:

Defined thresholds. Not "accuracy looks low." A number. Below this number, an alert fires.
Alerts that route to a specific person. Not a team channel. A person. With a name. Who has agreed that this is their responsibility.
A runbook. That person knows what to do when the alert fires. The steps are written down. They don't have to figure it out in the moment.
Documented action. When the person acts, they record what they did, when, and why.
Review of the action. Someone checks whether the remediation worked. The loop closes.

That's monitoring. Every step in the chain is defined, owned, and documented. If any link breaks — the threshold isn't set, the alert goes to a channel instead of a person, the person doesn't have a runbook, nobody reviews the outcome — you don't have monitoring. You have pieces of monitoring with gaps that only become visible when something goes wrong.

How organizations end up watching instead of monitoring

Nobody sets out to build a watching system. What happens is this: the team builds the model, deploys it, sets up a dashboard because that's good practice, and moves on to the next project. The dashboard was the last step of the build phase. It was never the first step of an operations phase, because there was no operations phase. There was build, deploy, and move on.

The thresholds don't get set because nobody agreed on what "bad" looks like in production. The alerts don't route to a person because nobody was designated as the model's owner post-deployment. The runbook doesn't exist because the team that built the model assumed someone else would handle production issues, and that someone else was never identified.

The gap between watching and monitoring is not a tooling gap. It's an ownership gap. The tools exist. The question is whether someone is accountable for using them.

What happens when you're only watching

The pattern repeats everywhere. A model degrades in production. The degradation is visible in the metrics — it's right there on the dashboard, in retrospect. But nobody notices for weeks. Sometimes months. The degradation was slow enough that no single day looked alarming. It only became obvious in trend, and nobody was looking at the trend because nobody was assigned to look at the trend.

By the time someone notices — usually because a downstream consumer complains, or a business metric moves in the wrong direction — the remediation is urgent. What could have been a routine retraining becomes an incident. The incident triggers a post-mortem. The post-mortem recommends "better monitoring." The team adds another dashboard.

The cycle repeats.

The test

Here's a simple test for whether you have monitoring or watching: If your best-performing model started returning random outputs at 2 AM on a Saturday, how long would it take for a specific human being to know about it, and would that person know exactly what to do?

If the answer involves someone checking a dashboard on Monday morning, you're watching.

If the answer is "the on-call gets paged within fifteen minutes and opens the runbook," you're monitoring.

Most organizations, if they're honest, are watching. That's not a moral failing. It's a structural one. And the fix isn't more dashboards. The fix is ownership: thresholds, alerts, people, runbooks, and closed loops.

When I was an Army observer/controller running units through training lanes, the discipline we enforced was articulation. A unit couldn't just look at what was in front of them. Someone had to say, out loud and on the record, what they saw, what it meant, and what they were going to do about it — with the rest of the unit accountable to the answer. Production AI needs that same discipline. Most programs have dashboards. Nobody's been assigned to speak.