Building Best Practices in a Data Engineering Team That Didn't Have Any

Most data engineering teams don't have a best practices problem. They have a 'nobody ever stopped to write it down' problem.

Most data engineering teams don't have a best practices problem. They have a "nobody ever stopped to write it down" problem.

The team ships pipelines. The pipelines work. New people join, look at the existing code, and build something that looks approximately like what's already there. Patterns emerge by accident. Standards form by imitation. And for a while, that's fine — until it isn't.

The moment it stops being fine is predictable. It's the moment the team scales past the point where everyone can fit in one room and ask the person who built the original pipeline why they did it that way. Once that person leaves, goes on vacation, or just gets too busy to answer questions, the institutional knowledge evaporates and you're left with a codebase full of implicit decisions that nobody can explain.

That's the moment you need best practices. Not before — because nobody will follow them when the team is small enough to just talk. And not six months after — because by then the drift has compounded into something that takes a quarter to untangle.

What best practices actually are

Teams think they have best practices because someone wrote a standards doc and put it in Confluence. The doc existed. The pipelines didn't reflect it. Nobody had opened the doc in four months. That's not a standard. That's a fiction.

The teams where standards actually stuck had something different: documented agreements that were specific enough to resolve arguments, flexible enough to survive contact with reality, and enforced through the development process itself rather than through willpower. The standard lived in the CI pipeline and the PR template, not in a wiki page.

Where to Start

The temptation is to boil the ocean — document everything, standardize everything, create a comprehensive style guide. That fails every time. People don't adopt standards they didn't help create, and they don't follow docs that are longer than what they're trying to build.

Start with the three things that cause the most rework:

1. Pipeline structure. How are pipelines organized? What does a well-structured pipeline look like in your stack? Not abstractly — show the template. If every pipeline follows the same skeleton (ingestion layer, transformation layer, quality checks, output), new engineers don't have to guess. They clone the template and fill it in. The structure IS the documentation.

2. Testing expectations. What does "tested" mean on this team? If the answer is "it ran once and didn't error," that's not testing — that's hoping. Define the minimum: data quality checks on inputs, row count validation, schema validation on outputs, at least one integration test that runs against representative data. Don't mandate 90% coverage on day one. Mandate that every pipeline has at least one test that catches the failure mode that bit the team last quarter.

3. Code review standards. What does a reviewer check for? If reviews are just "looks good, approved," they're theater. Define what a review should catch: hardcoded values that should be config, missing error handling on external calls, SQL that doesn't handle nulls, transformations that assume data shapes that aren't guaranteed. Make it a checklist — not because engineers can't think, but because humans skip things when they're reviewing their fifth PR before lunch.

How adoption actually happens

Best practices initiatives die the same way repeatedly. Someone writes a document, presents it to the team, and asks everyone to follow it. The engineers nod. They file it somewhere. They keep doing what they were doing. Six months later, somebody asks "whatever happened to that standards doc?" and nobody can find it.

Teams that actually adopt standards did it because they felt the pain first. The approach that worked: take the three worst incidents from the last six months. Walk through each one as a team. Ask one question: "What practice, if it had existed, would have caught this before production?" The answers become your first three practices — and they have buy-in because the team derived them from their own failures, not from a document someone handed them.

The other thing I learned: a best practice that requires remembering is a best practice that gets forgotten. The teams that made it stick put the pipeline template in the repo as a scaffold. They put the testing expectations in the CI pipeline as gates. They put the code review checklist in the PR template. The practice ran automatically. Compliance wasn't optional because it wasn't a choice — it was the path of least resistance.

The Practices That Matter Most

Across large data engineering organizations, a pattern emerges. The practices that actually reduce rework and incidents fall into three categories:

Structural consistency. Every pipeline looks the same from the outside, even if the logic inside is different. New engineers can navigate any pipeline because the layout is familiar. Debugging is faster because you know where to look.

Explicit data contracts. The team documents what the pipeline expects as input and what it guarantees as output — schemas, freshness, volume ranges, null handling. When an upstream source changes, the contract breaks loudly instead of failing silently three layers downstream.

Observability by default. Every pipeline emits the same set of operational metrics: rows processed, duration, error count, data quality score. Not as an afterthought — baked into the template. When something goes wrong at 2am, the on-call engineer doesn't have to read the code to understand what happened.

What Changes

The immediate change is boring: fewer incidents, faster onboarding, less rework. A new engineer ships their first pipeline in days instead of weeks because the template and the review process catch the mistakes that used to make it to production.

The less obvious change is cultural. When a team has shared standards, code review becomes collaborative instead of adversarial. "This doesn't match the pattern" is a factual observation, not a personal criticism. Disagreements about approach get resolved by pointing at the agreed standard, not by who has more seniority or who argues louder.

The change that matters most takes longer to see: the team starts treating their data pipelines as products with SLAs instead of scripts that happen to run. That shift — from "it works" to "it's reliable, observable, and maintainable" — is the difference between a data engineering team and a group of people who write SQL.

The Part Nobody Tells You

Building best practices is a leadership problem disguised as a technical one. The practices themselves are straightforward. The hard part is creating the conditions where the team adopts them — which means protected time to refactor existing pipelines, patience when the first sprint is slower because people are learning the new patterns, and the discipline to enforce the standards even when a deadline is pressing.

The moment you let a deadline override the standard "just this once," you've communicated that the standard is optional. It takes one exception to undo six months of adoption. That's the real best practice: the standard is the standard, especially when it's inconvenient.