The Documentation Standard Nobody Follows — and Why It Matters

Ask an AI team if they document their models and they will say yes. Ask to see the documentation and you will get a README that has not been touched since development.

Ask an AI team if they document their models and they'll say yes. Ask to see the documentation and you'll get one of three things: a README that was written during development and hasn't been touched since, a model card template that was filled out once to satisfy a compliance review, or a Confluence page that the person who wrote it no longer works at the company.

None of these are documentation. They're artifacts of a moment. Documentation is a living record that tells the next person everything they need to know to understand, operate, and maintain the system. By that definition, most AI projects are undocumented.

What's missing

The documentation that exists in most AI projects covers the what: what the model does, what architecture it uses, what the accuracy metrics were at launch. Sometimes it covers the how: how the data was processed, how the model was trained, how it was deployed.

What's almost never documented is the why.

Why was this training data selected? What other datasets were considered? What were the tradeoffs? Were there known gaps in the training data, and if so, what was the decision to proceed despite those gaps?

Why were these features chosen? What alternatives were explored? What was the rationale for the final feature set? Were there features that were excluded for ethical, legal, or practical reasons?

What are the known limitations? Not the theoretical limitations from a research paper — the specific, observed limitations of this model in this deployment context. What inputs does it handle poorly? What populations does it underserve? What happens when it encounters data that doesn't match the training distribution?

What happens when it breaks? Who is notified? What's the escalation path? What's the rollback procedure? Is there a manual fallback process, and if so, who owns it?

These questions have answers. They were discussed in meetings, decided in Slack threads, debated in pull request comments. But they weren't captured in a place that survives the departure of the people who had those conversations. The decisions were made. The rationale evaporated.

Why it matters: the departure test

At the FDIC, every examination finding had to be documented well enough to survive the next examiner inheriting the file. The same standard should apply to AI systems: if the person who built this model leaves tomorrow, can someone else understand it well enough to maintain it, troubleshoot it, and make informed decisions about its future?

In most organizations, the answer is no. The knowledge lives in someone's head. When they leave — and they will leave, data scientists have among the highest turnover rates in technology — the organization has to reverse-engineer its own system.

Consider a common pattern: a model running in production, the builder leaves. Six months later, the model starts behaving unexpectedly. The team that inherited it opens the repository and finds a README that says "Classification model for customer segmentation. See notebook for details." The notebook has no comments. The training data pipeline references a table that's been renamed. The configuration file has parameters that nobody on the current team understands.

The team can't determine whether the model's current behavior is a bug or a feature. They can't retrain it confidently because they don't know why the original training data was selected. They can't modify it because they don't understand the design decisions behind the architecture. They're left with two options: treat the model as a black box and hope it keeps working, or rebuild it from scratch.

Both options are expensive. Both are avoidable. The fix is documentation that was maintained as a living practice, not a one-time deliverable.

Documentation isn't bureaucracy

The objection I hear most often is that documentation slows teams down. It adds process. It's busywork. Data scientists should be building models, not writing documents.

This is a false economy. The time saved by not documenting is borrowed against the future — against the day when someone needs to understand a decision that was made months ago by a person who's no longer available to explain it. The cost of that deferred work compounds silently.

Documentation is the only part of an AI project that survives personnel changes. Code survives too, but code without context is a puzzle, not a guide. Documentation provides the context that makes the code intelligible to someone who wasn't there when it was written.

More fundamentally, documentation is the foundation of governance. You cannot govern what you cannot describe. You cannot audit what isn't recorded. You cannot hold an organization accountable for decisions that were never written down. Every governance capability — accountability, traceability, compliance, incident response — depends on documentation existing and being accurate.

The standard

The standard is not complicated. For every AI system in production, there should be documentation that answers:

What does this system do, and what business problem does it solve?
What data does it use, and why was that data selected?
What are its known limitations and failure modes?
Who owns it in production?
How is it monitored, and what triggers remediation?
What's the process when it needs to be updated or retrained?

Six questions. If you can answer them with documented evidence — not institutional memory, not "ask the team" — you have the foundation. If you can't, you have a system running in production that your organization cannot fully account for.

Most organizations can't answer all six for most of their deployed models. That's not a criticism of the people. It's a criticism of the practice — or the absence of it. The standard is clear. Following it is a choice.