The Freshness-Relevance Tradeoff in RAG: Why You Can't Optimize Both at Query Time
A user asks your assistant what the company's parental leave policy is. The bot returns 12 weeks, with a citation. The cited document was the right answer in 2023; HR posted an update last quarter that took it to 16. Both versions are in your knowledge base. Cosine similarity scored the 2023 version 0.87 and the 2024 version 0.84, because the older page has the cleaner phrasing and fewer hedges. The fresher document loses by three percentage points and the user gets a wrong answer that looks audited.
This is the freshness-relevance tradeoff, and the uncomfortable part is that it has no clean solution at query time. If you weight recency, you bias retrieval toward whatever was edited yesterday — which in most knowledge bases is the noisy, high-churn surface area that should not be the source of truth. If you don't weight recency, you ship answers grounded in documents that were superseded months ago. There is no single global knob that gets both right, and most teams discover this only after a few embarrassing answers leak past their eval suite.
