Your AI Feature Is Only As Reliable As The ETL Pipeline Nobody Owns
The AI feature has the dashboard. The prompt has the version control. The eval suite has the on-call rotation. And then there is the upstream cron job, written in 2022, owned by a team that rotated out of analytics two reorgs ago, that produces the CSV your retrieval index is built from. That cron job has no SLA. That CSV has no schema contract. The team that owns it does not know it feeds an AI feature. When it changes — and it will change — the AI team will spend three weeks debugging a prompt that did nothing wrong.
The AI quality regression you are about to chase is almost never an AI problem. It is an ETL problem wearing an AI costume. The discipline that has to land is the seam between the two — the contract, the lineage, the freshness signal, the paired on-call — and the team that does not formalize it ships an AI feature whose reliability is bounded by the least-loved cron job in the company.
The invisible dependency
An AI feature is a pipeline. The last 20 percent of that pipeline — the prompt, the model, the eval harness — is where the engineering investment is concentrated, and where every retrospective begins. The first 80 percent — the ingestion job, the normalization step, the deduplication pass, the daily snapshot, the column that gets joined in from a third upstream system — was built before anyone said the word "agent" in a meeting. It is owned by people who think they own a data warehouse for analytics consumers. They do not know an LLM is downstream.
This invisibility is the failure mode. The data team treats their pipeline like a Tableau backend: a column rename is a routine cleanup, a daily run that slips to every-other-day is an acceptable degradation, a schema "improvement" is a Slack announcement to the analytics channel. None of those communications reach the AI team, because the AI team was never enrolled as a consumer. There is no contract that says the embedding pipeline depends on customer_segment being a string and not an integer. There is no consumer registry that pages the data team when a downstream RAG index is reading their output.
The AI team, meanwhile, treats the upstream as ground truth. Their evals are run against a snapshot of the data taken at some point in the past. Their retrieval works because the columns are where they expect them to be. Their fine-tune was trained on a distribution that they assume is stationary. Every single one of those assumptions is a contract that was never signed, and the upstream team is free to violate every one of them this afternoon, because nobody told them they had agreed to anything.
The failure modes that get logged as "model regressions"
The pattern repeats so often it is almost a genre. The AI team notices that quality is down four points on the weekly eval. Latency is fine, error rates are fine, the model version did not change, the prompt did not change. They spend a week tuning the prompt. They spend another week trying a different chunking strategy. They spend a third week running ablations against the retrieval pipeline. Eventually somebody traces a sample of bad outputs back to a specific document, finds the document, looks at when it was last ingested, and discovers that the upstream pipeline started filtering out a category of records two weeks ago because of a "harmless cleanup" that removed records flagged as "internal." The AI feature was relying on those records.
A second pattern: the upstream pipeline begins emitting a column with a different precision. A timestamp that used to be milliseconds is now seconds. The retrieval layer was using the timestamp to break ties on relevance ranking. Suddenly the tie-breaking is non-deterministic, the same query returns different documents on different days, and the eval suite begins to oscillate. The model is fine. The retrieval is fine. The data is one decimal off.
A third: the upstream pipeline's run cadence drops from hourly to daily, because the cost-cutting initiative deprioritized non-critical jobs. The RAG index is now stale by up to 24 hours. The AI feature begins answering questions about "the latest" with information that is a day behind. No alert fires anywhere — the pipeline ran successfully, the index was updated successfully, the model responded successfully — and the only signal is that customer satisfaction quietly drops over a quarter.
A fourth: the upstream pipeline silently truncates a long string field at 256 characters because somebody changed the warehouse column type. The RAG index now contains chunks that are missing the second half of every long document. Retrieval still returns chunks. The chunks are just incomplete. The model answers based on incomplete context. Hallucination rates rise. The team blames the model.
In every one of these cases, the AI team's first three theories were about the model. None of those theories were correct. The fourth theory, eventually, was about the data. The data is almost always the answer, and almost never the first place anyone looks.
The contract layer that has to exist
- https://www.acceldata.io/blog/how-data-contracts-guarantee-pipeline-reliability-data-quality-slas
- https://www.montecarlodata.com/blog-implementing-data-contracts-in-the-data-warehouse/
- https://www.informationweek.com/data-management/nobody-told-legal-about-your-rag-pipeline-why-that-s-a-problem
- https://datahub.com/blog/data-lineage-for-ml/
- https://atlan.com/know/context-drift-detection/
- https://www.dataworldbank.net/2026/04/26/context-decay-orchestration-drift-and-the-rise-of-silent-failures-in-ai-systems/
- https://www.montecarlodata.com/blog-what-is-data-observability/
- https://platformengineering.org/blog/the-agent-reliability-score-what-your-ai-platform-must-guarantee-before-agents-go-live
- https://streamkap.com/resources-and-guides/real-time-data-pipelines-genai
- https://dzone.com/articles/why-embedding-pipelines-break-at-scale
- https://www.telm.ai/blog/data-quality-for-vector-databases/
- https://materialize.com/blog/your-vector-search-is-probably-broken/
- https://www.getdbt.com/blog/data-slas-best-practices
