Why Your RAG Citations Are Lying: Post-Hoc Rationalization in Source Attribution
Show a user an AI answer with a link at the end of each sentence, and the needle on their trust meter swings halfway across the dial before they have read a single cited passage. That is the whole marketing pitch of enterprise RAG: "grounded," "sourced," "verifiable." It is also the most-shipped, least-tested claim in AI engineering. Recent benchmarks find that between 50% and 90% of LLM responses are not fully supported — and sometimes contradicted — by the sources they cite. On adversarial evaluation sets, up to 57% of citations from state-of-the-art models are unfaithful: the model never actually used the document it is pointing at. The citation was attached after the fact, to rationalize an answer the model had already decided to give.
This is not a retrieval bug. You can have perfect retrieval and still get lying citations, because the failure is architectural. The generator writes prose first and stitches links on second. The links look like evidence. They are decoration.
The industry has been so focused on whether cited documents are relevant that it has skipped past a more uncomfortable question: does the cited span actually entail the claim it is attached to? The answer, at production scale, is frequently no. And the more polished your UI makes the citations look — footnote superscripts, hoverable previews, colored highlights — the more decisively users stop checking.
Correctness Is Not Faithfulness
The research community is finally drawing a line between two things that enterprise RAG products treat as one: citation correctness and citation faithfulness.
- Correctness asks: does the cited document support the statement? You can measure this with a natural language inference (NLI) model asking "does passage P entail claim C?"
- Faithfulness asks: did the model actually derive the claim from the cited document, or did it generate the claim from parametric memory and then hunt for a passage that looks compatible?
A post-rationalized citation is indistinguishable from a faithful one at the output level. It may even be technically correct — the passage really does support the claim — but the model ignored the passage when generating the answer. That makes the whole chain of trust a lie of omission. The user (or the downstream agent) assumes the retrieved evidence drove the answer. It did not. The model's pre-training drove the answer, and retrieval was theater.
This matters because the failure mode is silent. If your generator confidently asserts something plausible, attaches a real-looking citation, and the cited span does loosely relate to the topic, no amount of "check the sources" UI will catch it. Humans skim. Agents treat a passing link as confirmation. The hallucination is laundered through the citation step.
How Architecture Bakes In the Lie
Look at how most RAG pipelines are wired and the post-rationalization becomes predictable, almost inevitable.
The dominant pattern is generate-then-retrieve-then-cite or retrieve-then-generate-then-cite. In both, retrieval runs, generation runs, and a third step — often a separate prompt, sometimes a separate model — assigns citations to the already-written text. By the time the citation step runs, the generator has no mechanical connection to any specific passage. It chose tokens based on the blended distribution of (prompt instructions) × (parametric memory) × (loosely attended retrieved context). The citer then does the only thing it can: similarity-match each sentence of the output to the nearest chunk of retrieved context. "Nearest" is not "causal."
That architectural seam is where faithfulness dies. Recent work comparing generation-time citation (G-Cite) against post-hoc citation (P-Cite) finds the tradeoff baked into the design: P-Cite achieves higher citation coverage (it can find some passage that matches almost any claim) but lower semantic precision, while G-Cite commits to evidence during decoding and is stingier about what it will cite at all. On the FEVER fact-verification task, G-Cite hit 94% correctness with 27% coverage; P-Cite balanced at 75%/75%. Coverage is what marketing wants — every sentence gets a footnote. Precision is what users need.
The other architectural culprit is chunk-level retrieval paired with sentence-level citation. Your retriever returns a 512-token chunk. Your generator writes a sentence. The citer pins the chunk ID to the sentence. The chunk contains twelve claims; only one of them (maybe) supports the written sentence. The user sees "[3]" and clicks; they land on a paragraph containing the keyword; their brain files the claim as "grounded." Nobody verified that the specific sentence they read is entailed by any specific span of the cited chunk. This is why sub-sentence and span-level citation research has exploded recently — coarse-grained pointers are, functionally, misinformation.
The Citation-Faithfulness Eval
The fix on the measurement side is to stop treating "is there a citation?" as a pass/fail check and start treating citation-level entailment as a first-class eval.
A minimal pipeline:
- Decompose the answer into atomic claims (one verifiable assertion per claim).
- For each claim, extract the cited span (not just the cited document).
- Run an NLI model: does the cited span entail the claim? Label as
SUPPORTS,CONTRADICTS, orNEUTRAL. - Roll up to citation precision (fraction of citations that actually entail) and citation recall (fraction of claims that have at least one entailing citation).
- https://arxiv.org/abs/2412.18004
- https://www.alphaxiv.org/overview/2412.18004v1
- https://dl.acm.org/doi/10.1145/3731120.3744592
- https://arxiv.org/html/2509.21557
- https://www.nature.com/articles/s41467-025-58551-6
- https://arxiv.org/abs/2305.14627
- https://arxiv.org/html/2407.01796
- https://arxiv.org/abs/2509.20859
- https://arxiv.org/abs/2507.04480
- https://arxiv.org/html/2510.17853v1
- https://www.whyaitech.com/notes/systems-note-002.html
- https://aclanthology.org/2023.findings-emnlp.307.pdf
