Skip to main content

One post tagged with "citations"

View all tags

The Citation Index Your Chunker Shifted by One When It Started Prefixing Line Numbers

· 11 min read
Tian Pan
Software Engineer

The chunker started prepending [line N] to every chunk. The eval went green. Every citation the model produced after that day pointed to the paragraph one position before the actual evidence, on every document, in the regulated industry the product serves. The team did not find out from the eval. The team found out from an auditor who looked at the cited sentence, read it, and pointed out that it contradicted the claim it was supposed to support.

This is the kind of regression that survives a code review, a manual QA pass on three sample documents, and a feature-flag rollout. None of those checks were wrong in isolation. They were all asking the same question — does a citation appear where one is expected — and none of them were asking the question the auditor asked, which is whether the citation points at the sentence the claim came from. The gap between those two questions is where the off-by-one lived for as long as it lived.

What makes this failure mode worth a separate write-up is not the bug itself. Off-by-one errors are old news. The interesting part is that the failure was produced by two systems that continued to agree on the structure of an integer while silently disagreeing about what the integer meant.