Skip to main content

The Citation URL That Resolved But No Longer Said What the Model Quoted

· 10 min read
Tian Pan
Software Engineer

A RAG agent answers a customer's regulatory question with a tidy paragraph and a citation. The verification layer fetches the URL, sees a 200 OK, ticks the box, and ships. Six months later a compliance audit pulls the transcript, clicks the same link, and finds a page that now says the opposite of what the agent quoted. The URL is fine. The quote is fine in the transcript. The two no longer match. The customer's compliance officer asks whether the agent fabricated the quote, and the team cannot prove it didn't, because the only surviving evidence of what the URL used to say is the agent's own assertion of what it said.

This is not a hallucination in the usual sense. The model retrieved real content, faithfully extracted a real sentence, and emitted a real URL that still resolves. Every link-checker on earth would call this citation valid. The audit fails anyway, because the verification layer was measuring the wrong property. Reachability is not fidelity. A URL is a pointer to a mutable document under someone else's editorial control, and the moment the document changes, every transcript that quoted it becomes a hallucination report waiting to happen.

A 2016 study of scholarly references found that roughly three out of four URI references in academic publications lead to content that has materially changed since citation. That number predates LLMs by half a decade. Add an agent that quotes from those URLs to thousands of customers a day and the audit trail decays at the same rate as the open web, which is to say: faster than your retention policy is written for.

URL Durability Is Not Source Durability

The verification layer most teams ship treats a citation as a tuple of (claim, URL) and verifies it by fetching the URL and checking the response code. This is a category error. The claim is a statement about the document at a specific moment in time. The URL is a name that points to whatever currently sits at that location. The two are correlated at quote time and decorrelated forever after.

Three failure modes hide inside this conflation. The page is silently edited in place with no version indicator and the URL keeps resolving — this is the bread-and-butter editorial workflow of most news sites, most regulatory portals, most company-controlled documentation. The page is moved and a redirect returns a different document under the same logical name — common with CMS migrations and acquisitions. The page is removed and replaced with a soft 404 that returns 200 OK but says nothing about the original claim — common in compliance contexts where pulled content is the point.

In all three cases, the verification check passes. The cited claim no longer exists at the cited location. The transcript is the only artifact that remembers what the model thought it was quoting, and the transcript's authority is precisely the property in dispute.

Why the Audit Frame Makes This Worse

In a low-stakes consumer product, citation drift is a UX problem: users click through, find something different from what the agent said, and shrug. In a regulated context, the asymmetry is brutal. The team that built the agent has every incentive to surface a citation. The team that audits the agent — sometimes the same team's compliance counterpart, sometimes a regulator, sometimes a customer's legal department — has every incentive to verify against the live source, because that is the only source whose authenticity is independent of the agent's claim about it.

The auditor's reasonable position is that a citation that does not match the cited source is evidence of hallucination. The team's position — that the source used to say what we quoted — is a claim with no evidence other than the transcript, which is the artifact under investigation. The team loses this argument every time, not because the agent fabricated the quote, but because the team's verification layer never preserved the only evidence that would have closed the loop.

The deeper problem is that the team's compliance posture was built on an assumption that does not survive contact with the open web: that a citation's referent is stable for the lifetime of the transcript. Nobody wrote this assumption down. Nobody priced what it would cost if it broke. The contract with the customer says the agent's answers are auditable. The architecture quietly inherits the open web's mutability without telling anyone.

Snapshot at Quote Time, Not at Audit Time

The fix that does the most work for the least architectural cost is to capture the cited content at the moment of quotation and store it alongside the response. When the agent emits a citation, the system fetches the specific paragraph (or larger span) the quote came from, hashes it, and persists both the content and the hash with the transcript. The audit comparison is then between the transcript's quote and the snapshot, not between the transcript's quote and whatever the URL serves today.

This pattern has prior art. The Memento protocol, an HTTP extension that has been around since 2009 and underpins the Wayback Machine, formalizes the distinction between a URI-R (the original resource, mutable) and a URI-M (a memento, a fixed version of the resource at a specific datetime). A citation that points to a URI-M is durable in a way a citation that points to a URI-R is not. Most RAG systems emit URI-Rs because that is what the source page hands them. The system that pairs the URI-R with a private snapshot at quote time is doing in-house what the Memento protocol does institutionally: anchoring the citation to a moment.

Three implementation notes matter. First, snapshot the smallest span that still contains the claim, not the whole page — page-level snapshots inflate storage and complicate audit by including content the model never referenced. Second, hash the snapshot with a content-addressable scheme so the audit can prove the snapshot itself has not been tampered with by the team holding it. Third, the snapshot is now sensitive data — if the source page is paywalled, copyrighted, or contains PII, the team has just absorbed a content-licensing or privacy problem they did not have before. Plan for it.

Drift Detection as a First-Class Job

Snapshotting at quote time solves the audit problem, but it does not solve the user-facing problem: the customer who clicks the citation today and finds something different from what the agent said. For that, a re-verification job that periodically re-fetches citations and compares them against the stored snapshot is the right primitive.

The output of this job is not a binary pass/fail. It is a drift signal with a few useful gradations:

  • Identical: source content matches the snapshot byte-for-byte. The citation is as durable as the underlying page.
  • Cosmetic drift: whitespace, formatting, or surrounding boilerplate changed; the cited span is intact. The citation is still trustworthy.
  • Material drift: the cited span has changed, been moved, or been removed. The transcript's quote is no longer findable at the URL. The user-facing UI should surface this as a "source has changed since this answer was generated" warning, ideally with a link to the snapshot the agent actually quoted from.
  • Reversal: the cited span has been edited to say something different in a way that changes the claim's meaning. This is the audit-incident case. The transcript should be flagged for review, and any downstream artifact that derived from this response (a summary, a recommendation, a contract clause) should be re-evaluated.

A team that runs this job nightly across the citation index gets a steady signal of how much of the corpus they are quoting is actively eroding under them. That signal is also a procurement input — sources with high drift rates are sources the agent should rely on less, or cite with explicit caveats.

Retention and the Lifetime of a Claim

The architectural question that the snapshot pattern forces is also the one teams least like to answer: how long should the snapshot live? Citations have a natural retention period — the lifetime of any response that references them. If the team retains transcripts for seven years to satisfy a regulator, the cited snapshots have to be retained for seven years too. The team that snapshots at quote time but cleans up snapshots on a thirty-day rolling window has built an audit trail that decays at the rate of the storage budget rather than at the rate of the obligation.

The procurement contract that names the agent's outputs as auditable is also, by implication, a retention contract on every piece of evidence required to audit them. The team's data engineers and the team's compliance officers usually have not had this conversation. The first time it comes up is when the storage bill exceeds the model bill and the finance team asks why a chatbot is paying for a document warehouse. The honest answer is that the chatbot inherited a regulator's evidentiary requirements the moment it shipped a citation. The storage cost is the cost of being able to defend the answer.

Citation as a Claim About a Moment

The architectural realization is that a citation in an LLM response is not a pointer to a source. It is a claim about what the source said at a specific moment, with a URL attached as a convenience. The URL is the easiest part of the claim to verify and the least informative thing about it. The hard part — and the part the verification layer has to actually do — is preserving the content that gave the claim its meaning, in a form that survives the source's editor.

Teams that treat URLs as the unit of citation have shipped an audit trail whose half-life is set by the editorial cadence of every site they quote. Teams that treat the cited span as the unit of citation, snapshot it at quote time, hash it, and retain it alongside the transcript have shipped an audit trail that survives the source. The difference is invisible on the day the citation is emitted. It is the only thing that matters on the day the audit lands.

The fix is not expensive. The expensive thing is the conversation that has to happen first: that the architectural intent of "the agent cites its sources" is a claim the team has been making on the open web's behalf, and the open web has not signed up to be bound by it. The team that closes that gap before the audit lands has built a system that means what its citations say. The team that does not has built a hallucination report that hasn't fired yet.

References:Let's stay in touch and Follow me for more thoughts and updates