Skip to main content

The Retention Policy That Erased Context Your Model Was Still Reading

· 12 min read
Tian Pan
Software Engineer

A nightly retention worker deletes any user message older than thirty days. A long-running enterprise support session, opened in early March, is still active in late May. On the request that comes in at turn 41, your prompt assembler reads from the same messages table the retention worker has been quietly pruning. Turns 1 through 28 are gone. The model receives a conversation that starts at turn 29 with no signal that earlier turns ever existed. The user asks "what was the SLA we agreed on earlier?" and the model confidently invents a number, because the actual answer was in turn 4 — which the retention worker erased the night before.

This is not a model failure. The model did exactly what it was supposed to: produce a plausible answer from the context it was handed. The failure happened upstream, in the gap between two teams that each thought they owned the messages table.

The Messages Table Is Two Different Things

If you ask your privacy or security team what the messages table is, they will tell you it is a store of user data subject to a retention policy. The retention policy says: anything older than thirty days, delete. The worker runs nightly, the policy is enforced, the compliance dashboard goes green.

If you ask your AI platform team what the messages table is, they will tell you it is the context window. The prompt assembler reads from it at request time, hydrates the model with the conversation so far, and lets the model continue the thread.

Both teams are right about what the table is to them. Neither team is right about what the table actually is, which is a multi-consumer dataset whose lifecycle is now governed by the team with the most aggressive policy. Retention has the deletion key. Prompt assembly has only a read path. So retention wins every conflict by default — and the conflict surfaces as a hallucination at request time, not as an error at deletion time.

The asymmetry matters because the retention team will never see the bug. They run their nightly job, the rows go away, their SLO is satisfied. The bug only appears when someone reads the now-truncated conversation back through a prompt-build path, and that read happens in a different system, owned by a different team, instrumented against different metrics. The two teams are working on the same data, but their telemetry doesn't intersect until a customer complains.

Why The Model Cannot Tell That History Is Missing

The instinct is to push the problem onto the model: "the model should notice that the conversation starts mid-thread and ask for clarification." This fails for a structural reason. The model has no way to know that turns 1 through 28 ever existed. The prompt-assembler builds a context array, hands it to the model, and the model's view of the world is whatever is in that array. There is no "turns deleted" sentinel because the assembler never put one there. The assembler did what it was told: read from the messages table, format whatever comes back as a conversation, send to the model.

From the model's perspective, a session that begins at turn 29 is indistinguishable from a session that began at turn 29. The numbering is your bookkeeping, not the model's. If your assembler renumbers turns to start at 1, even your telemetry will fail to flag the truncation. The model's confident answer about an SLA that was discussed in a turn it never saw is not hallucination in the colloquial sense — it is the model successfully completing the task you gave it, with the inputs you gave it, against the conversational frame those inputs imply.

This is why fixing it inside the model is the wrong move. The model is not the layer that knows the conversation was truncated. The assembler is. The assembler is the place that observes the gap between "expected turn 1" and "actual oldest row is turn 29." That observation has to become a signal before it reaches the model, or the signal is lost forever.

The Two Teams Problem

Almost every team I have seen hit this bug has the same org chart. Privacy and security own the data warehouse, the retention worker, the deletion audit. AI platform owns the prompt assembler, the inference path, the eval suite. They sit in different sub-orgs and ship on different cadences. They have separate incident channels. They have separate roadmaps.

In any reasonable reading of responsibility, retention belongs to privacy. They are the ones who carry the regulatory exposure if data sits around too long. They are also the ones who know which fields are sensitive, which jurisdictions require shorter horizons, which contracts require explicit per-tenant deletion windows. Moving retention to the AI platform team would be a worse outcome.

The actual problem is that retention was treated as a property of the store, not as a property of the read path. The retention worker is correct that the row is past its horizon. The retention worker is also wrong that deleting it is safe, because "safe" here means "no consumer still depends on the row," and the prompt-assembler is a consumer that nobody told the retention worker about. The assembler joined the table years after the retention policy was written. The deletion path was never updated to model the new consumer.

The blast radius compounds because chat-history retention is exactly the kind of policy that gets set once and forgotten. It was likely written when "the messages table" meant "the thing the support UI displays to the agent who is reading the ticket." Display consumers tolerate truncation gracefully — the agent just sees a shorter thread. Prompt consumers do not, because they pass the truncation through a stochastic process that fills the gap with plausible fabrication.

What "Session-Aware Retention" Actually Has To Do

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates