Why I Started Thinking About AI Like a Bank Examiner

The discipline I learned examining banks is exactly what AI programs are missing. Not bureaucracy. The actual discipline: evidence, accountability, traceability.

I didn't set out to become the person who applies bank examination discipline to AI programs. It happened because I watched something ship that shouldn't have shipped the way it shipped, and the feeling I had — the specific discomfort — was one I recognized from a different career.

The moment

I was close to a large enterprise AI program. The team had built a model that influenced downstream decisions affecting customers. The model worked, the accuracy was reasonable, and the engineering was competent. It shipped.

And when it shipped, I noticed what wasn't there.

No documentation of the training data selection process. No record of what alternatives were considered and why they were rejected. No designated owner for post-deployment monitoring. No defined process for what happens when the model's performance degrades. No accountability chain — no single person you could point to and say "this person is responsible for this model's behavior in production."

The team had built and shipped a system that would make decisions affecting real people, and the organizational infrastructure around it was vapor. If you asked "who owns this model?" the answer would have been a team name, not a person. If you asked "what happens when it drifts?" the answer would have been "we'll figure it out." If you asked "show me the documentation," you'd get a README that was written during development and never updated after the first week.

I'd seen this exact pattern before. But not in technology.

The bank examination parallel

When I was at the FDIC, I examined banks. The job was to assess whether a financial institution was operating safely and soundly — whether the controls it claimed to have actually existed, actually functioned, and could be evidenced on demand.

The core principles were simple and non-negotiable:

If you can't evidence a control, the control doesn't exist. A bank can tell you it has a loan review process. If it can't produce the documentation — the policy, the review records, the exception tracking — then the process doesn't exist for examination purposes. The claim is irrelevant. The evidence is everything.

If you can't trace a decision to a person, nobody made the decision. "The committee approved it" only counts if there are minutes, if the committee members are named, if their authority to approve is documented. Institutional decisions require institutional accountability. A decision without a named decision-maker is a decision that nobody owns and nobody can explain.

If you can't produce documentation on request, you're not governed. Governance isn't what you say you do. It's what you can demonstrate you do, to a skeptical reviewer, under time pressure, with consequences for gaps. The documentation exists not for the team's benefit — it exists so that someone who wasn't in the room can reconstruct what happened and why.

Watching that AI model ship without any of this infrastructure, I felt the same thing I felt when I'd open a bank's files and find gaps: this institution is operating on trust and good intentions instead of evidence and accountability. It works until it doesn't. And when it doesn't, nobody can reconstruct what went wrong, because nobody documented what was supposed to go right.

The realization

The discipline I learned examining banks is exactly what AI programs are missing.

Not bureaucracy. Not twelve-page approval forms. Not compliance theater where someone fills out a template to satisfy an auditor. The actual discipline is three things: evidence, accountability, traceability.

Evidence is the work you can show. Why was this training data selected? What alternatives were considered? What does the model do when it encounters inputs outside its training distribution? If the answers live in institutional memory — "ask Sarah, she built it" — you don't have evidence. You have folklore. And folklore doesn't survive turnover, scrutiny, or an incident.

Accountability is a named person who owns the system's behavior in production. Not a team. Not a committee. Someone who is notified when the model drifts, who answers when it causes harm, who authorizes retraining. The bank examiners I worked with never accepted "the committee approved it" as the end of the conversation. They asked which member made the call, and pulled the minutes to verify. The same standard applies to AI in production, and most programs would fail it.

Traceability is the chain from business intent to model design to deployment to monitoring to incident response. Each link documented, dated, and connected to the next. When any link is missing, you can't reconstruct what was supposed to happen when something goes wrong — and that's the moment governance is actually tested.

Why this isn't a compliance pitch

I want to be clear about what I'm arguing and what I'm not.

I'm not arguing that every AI model needs to go through a formal examination process. I'm not arguing for regulation-grade documentation on every proof of concept. I'm not selling a framework or a consulting engagement.

What I'm arguing is simpler: the standard that exists in banking — where you have to be able to evidence your controls, trace your decisions, and demonstrate accountability to a skeptical reviewer — is the right standard for AI systems that affect people.

Not because a regulator requires it. Because it's the standard that prevents the failure mode I keep watching: systems that work until they don't, and when they don't, nobody can explain what happened because nobody documented what was supposed to happen.

The organizations that build this discipline into their AI programs — evidence, accountability, traceability, as operating practice rather than compliance checkbox — are building programs that can survive scrutiny. Not just regulatory scrutiny. The scrutiny of an incident. The scrutiny of a leadership review. The scrutiny of a customer asking "why did your system do this to me?"

The organizations that don't build this discipline will keep shipping models into a void. The models will work for a while. Some will drift. Some will cause harm. And when someone asks what happened, the answer will be the same answer I've heard in banks with weak controls: "We thought we had a process for that."

You did. You just couldn't evidence it.