I Spent a Year Inside an Enterprise AI Program. Here's What I'd Do Differently.

A practitioner retrospective — the kind of honest accounting that usually happens over drinks at a conference but rarely makes it into writing.

I spent the better part of this year close to a large enterprise AI program. I won't name the company. What I will do is tell you what I saw, what worked, what didn't, and what I'd change if I could rewind to January.

This isn't a framework. It's not a maturity model. It's a practitioner retrospective — the kind of honest accounting that usually happens over drinks at a conference but rarely makes it into writing. Year-end seems like the right time.

Start with the data, not the model

Every project that started with "what model should we use?" before asking "is our data ready?" failed. Not eventually. Early.

The pattern was consistent. A team would identify a promising use case, select a model — usually whichever foundation model was generating the most hype that quarter — and begin building. Weeks in, they'd discover that the data they needed was fragmented across three systems, had no consistent schema, and contained quality issues that nobody had cataloged because nobody had looked.

The model selection was the easy part. The data work was the real project. And because the team had organized around the model, not the data, the data work felt like an obstacle instead of the foundation.

If I could go back, I'd push for every AI project to begin with a data readiness assessment — not a checkbox, but a genuine evaluation of whether the data exists, is accessible, is clean enough, and is governed. No model selection until that assessment is complete. The teams that did this shipped. The teams that skipped it burned months discovering what they should have known in week one.

Put operators in the room from day one

Every project that was designed by engineers for engineers and then handed to operators failed to stick. The handoff was where value went to die.

The engineers built systems that were technically sound. Elegant architectures. Clean APIs. Good documentation, even. But the people who would use the system every day — the claims adjusters, the analysts, the customer service teams — weren't in the room when design decisions were made. So the system worked in the lab and failed in the field. Not because the technology was wrong, but because the workflow assumptions were wrong.

The operators knew things the engineers didn't. They knew that the data entry screen couldn't handle a four-second latency because they're on the phone with a customer. They knew that the confidence score was meaningless without context about the specific case type. They knew that the "edge cases" the engineers had deprioritized were actually 30% of the workload.

If I could go back, I'd argue for operator representation in every design review. Not as feedback after the fact — as participants in the decision. The projects that included operators from the start shipped tools that people actually used. The projects that didn't shipped tools that people worked around.

Build governance into the workflow, not on top of it

Every project that bolted governance on after launch failed to maintain it.

The pattern: ship the system, then add governance. Create the risk assessment after deployment. Write the documentation after the architecture is locked. Implement monitoring after the system is in production. This approach treats governance as a post-hoc compliance exercise — something you do to satisfy an audit, not something that improves the system.

The result is predictable. The governance artifacts are disconnected from the actual system. The risk assessment describes the system as it was designed, not as it operates. The documentation is immediately stale. The monitoring tracks what was easy to instrument, not what matters.

The projects that got governance right embedded it into the development workflow. Risk assessment was a gate before architecture approval. Documentation was updated as part of the deployment pipeline. Monitoring requirements were defined during design, not after launch. Governance wasn't a separate workstream — it was how the work got done.

If I could go back, I'd insist on never separating "building" from "governing." They're the same activity. The moment you create a governance workstream that runs parallel to the development workstream, you've guaranteed they'll diverge.

Own the monitoring or you don't own the system

Every project that shipped and moved on eventually drifted without anyone noticing.

Deployment is not the finish line. It's the starting line. But enterprise incentives don't work that way. The team that built the system gets credit for shipping it. Then they move to the next project. The system enters "maintenance mode," which in practice means "nobody's actively watching."

Drift is silent. The model's performance degrades gradually. The data distribution shifts. The users develop workarounds that change the input patterns. None of this triggers an alarm because nobody defined what an alarm looks like. Six months later, someone runs an audit and discovers the system has been underperforming for months. The business impact is real but invisible — death by a thousand small wrong answers.

If I could go back, I'd push for tying the development team to the system for a minimum of six months post-deployment. Not in a maintenance role — in an active monitoring role. Defined metrics, defined thresholds, defined escalation paths. The team doesn't move on until monitoring is operational and owned by a named successor. Ship and forget is not a deployment strategy. It's a liability.

The hardest part isn't technical

This is the lesson that took me the longest to internalize, even though it should have been obvious from day one.

The technology worked, the models were capable, and the infrastructure scaled. The engineering talent was strong. None of that was the bottleneck.

The bottleneck was organizational. It was the meeting where three departments couldn't agree on who owned the data. It was the handoff where the development team's definition of "done" didn't match the operations team's definition of "ready." It was the accountability gap where everyone assumed someone else was monitoring the system. It was the executive sponsor who championed the project but couldn't protect it from competing priorities.

The meetings, the handoffs, the accountability gaps. The misaligned incentives. The turf wars dressed up as technical disagreements. That's where AI programs die — not in the code, but in the spaces between teams.

If I were starting over, I'd spend less time on model selection and more time on organizational design. Who owns what. Who decides what. What happens when teams disagree. How accountability flows from the executive sponsor to the operator. Those questions are less interesting than architecture questions, but they determine whether the architecture ever delivers value.

The closing

The technology was never the hard part. The hard part was getting humans to agree on what "ready" means, who's responsible when it breaks, and what happens next. That's governance — and it's the only thing that makes the technology work.