Eight Questions That Tell You Whether Your AI Is Actually Governed

Most AI governance programs start and end with the model. Here are eight questions that cover the full surface. Most organizations can answer three.

Most AI governance programs start and end with the model. They ask: Is the model accurate? Is it biased? Is it explainable? Those are real questions. They're also about 25% of the problem.

The model is one component in a system that includes data pipelines, human workflows, organizational ownership, and production monitoring. Govern the model and ignore the rest, and you've built a compliance story with blind spots large enough to drive an incident through.

This pattern shows up in financial institutions preparing for examination and in enterprises that discovered their "governed" AI system had no owner, no monitoring, and no clear definition of what it was supposed to do in the first place. The model worked fine. Everything around it was unaccounted for.

Here are eight questions that cover the full surface. If you can answer all eight for every AI system in production, you might actually be governed. Most organizations can answer three.

1. What problem is this system solving, and what decision will it influence?

This is the use case question. It sounds basic. It is not.

A good answer sounds like: "This system triages incoming support tickets by severity and routes them to the appropriate team. It influences queue priority but does not make final resolution decisions."

A red flag sounds like: "We're using AI to improve efficiency in operations." That's a press release, not a governance artifact. If you can't state what the system does and what decisions it touches, you cannot assess its risk — because you haven't defined what "working correctly" means, and you definitely haven't defined "failing dangerously."

2. Who is affected, and what's the impact if this goes wrong?

This is the context question. Same model, different context, completely different risk.

A good answer: "This summarization tool generates internal draft briefs for our legal team. If it hallucinates, an attorney catches it before anything goes out. Impact magnitude is low."

A red flag: "This summarization tool generates customer-facing notices." Same model. Radically different risk profile. The governance response should be different — different testing requirements, different review workflows, different monitoring thresholds.

If your governance treats every deployment of the same model identically, you've confused model governance with system governance. The system includes the humans, the stakes, and the blast radius.

3. Where does the data come from, and is it fit for this purpose?

Data is not a supporting actor in AI governance. It's a first-class control point.

A good answer includes: data provenance, representativeness assessment, identification of sensitive fields, and lawful basis for processing. "Our training data comes from three internal sources, was assessed for demographic representativeness in Q3, and excludes PII by design."

A red flag: "The vendor handles the data." That's not governance — that's delegation without oversight. Both AI providers and deployers carry data governance obligations under every major framework now in play. "The vendor handles it" is not a defensible answer.

4. What kind of model is this, and how explainable is it?

The model question isn't just "what algorithm." It's whether your governance controls match the model's risk characteristics.

A good answer: "We're using a fine-tuned transformer model for document classification. Explainability is limited to attention weights and confidence scores. For high-stakes decisions, a human reviewer validates every output above the risk threshold."

A red flag: "We're using the best available model." Best by what metric? Evaluated how? A deep learning model making consequential decisions with no explainability pathway and light oversight is a governance failure waiting for a trigger.

5. How do you test this system — in the lab and in the field?

Testing that only covers benchmark performance is testing for the demo, not for production.

A good answer: "We run scenario-based tests that simulate the actual failure modes we've identified — including adversarial inputs, edge cases from our domain, and performance on underrepresented subgroups. We retest after every model update."

A red flag: "We tested it before launch." Past tense is the danger word. Systems degrade. Data distributions shift. User behavior changes. If your testing stopped at deployment, your governance has a best-before date, and it's passed.

6. Who sees the output, and what happens next?

The workflow question is where most harm actually originates — not from a broken model, but from how a working model's output gets used.

A good answer: "Model output goes to a reviewer who has authority to override, context to evaluate, and time to do both. The review step is documented and auditable."

A red flag: "The output goes directly into the workflow." No review. No override mechanism. No documentation of how the output was used. Human oversight isn't human awareness. It's the ability to intervene, with the authority and context to do so.

7. Who owns this — the use case, the model, the monitoring, and the incident response?

Ownership requires four named roles at minimum. Not four teams. Four people with names and accountability.

A good answer: "The use case owner is [name]. The model owner is [name]. Production monitoring is owned by [name]. Incident response is led by [name]. Each has documented responsibilities and escalation paths."

A red flag: "The AI team owns it." Which means nobody owns it. Ownership is not about blame — it's about speed and clarity. When something goes wrong in production, you need to know in seconds who makes the call, not schedule a meeting to figure out who's responsible. Provider obligations and deployer obligations are not the same thing, and the regulatory frameworks now make that distinction operational, not theoretical.

8. Are you monitoring performance, drift, bias, and failures in production?

Monitoring is where governance either operates or becomes fiction.

A good answer: "We track model performance weekly against defined thresholds. Drift detection runs automatically. Bias metrics are reviewed quarterly. Every failure above severity 2 triggers the incident response protocol. All of this is documented with triggers, actors, and actions."

A red flag: "We'll set up monitoring after we scale." You won't. And even if you do, you'll have a gap between launch and monitoring where you were flying blind. Post-market monitoring is the requirement that separates governed AI from deployed AI — and governance that stops at launch isn't governance.

The diagnostic

Count your honest answers. Not the ones you plan to have — the ones you have now, documented, for every AI system in production.

Eight out of eight: You're ahead of most organizations. Pressure-test the quality of each answer.

Five to seven: You have governance infrastructure, but you have blind spots. The gaps are probably in workflow, ownership, or monitoring — the dimensions that require organizational commitment, not just technical capability.

Fewer than five: You have a governance narrative, not a governance program. The risk isn't that your AI will fail. It's that when it does, you won't be able to explain what went wrong, who was responsible, or what you were monitoring — because you weren't.

The eight points aren't a maturity model. They're a minimum coverage map. Skip any one, and you've created a blind spot. Stack all eight, and you have the foundation for governance that can survive an audit, an incident, and an honest conversation with your board.