3 posts tagged with "knowledge-base"

The Marketing Page Your RAG Cited as an Engineering Spec

May 31, 2026 · 9 min read

Software Engineer

A support engineer pastes a customer ticket into your internal assistant. The question is sharp: "Does our API support multi-region writes on the free tier?" The assistant comes back instantly, citing a chunk it retrieved with 0.91 cosine similarity. The answer is yes. The chunk is from a landing page written by marketing in 2023 to win a head-to-head against a competitor. Engineering removed multi-region writes from the free tier eighteen months ago and posted a terse internal RFC that nobody linked from a customer-facing page. The RFC is also in the vector store. It scored 0.74.

The assistant didn't hallucinate. It retrieved the highest-scoring document and faithfully grounded its answer in the text. The retriever did its job. The job was wrong.

Provenance Debt in AI Knowledge Bases: When Your RAG System Learns From Itself

May 7, 2026 · 8 min read

Tian Pan

Software Engineer

Your RAG system is probably indexing its own outputs. You just don't know it yet.

It starts innocuously: someone adds a quarterly summary document to the knowledge base. That summary was written by the same LLM that queries the knowledge base. Six months later, a developer adds AI-generated release notes. Then auto-generated support FAQs. Then a synthesized onboarding guide. None of these documents are labeled as AI-generated. To the retrieval system, they look identical to human-written primary sources. Now when your model retrieves context to answer a question, a significant portion of that context is the compressed, possibly-distorted output of a prior model run — and your accuracy metrics are still green.

This is provenance debt: the accumulation of AI-generated content in retrieval corpora without source markers, creating a feedback loop where each generation of model outputs becomes raw material for the next.

Stale Docs, Confident Answers: The Hidden Failure Mode in AI Help Centers

May 7, 2026 · 10 min read

Tian Pan

Software Engineer

Here is an uncomfortable finding from Google Research: when a RAG system retrieves insufficient or outdated context, the hallucination rate doesn't stay flat — it jumps from 10.2% to 66.1%. Adding a stale knowledge base doesn't make your AI help center neutral. It makes it sixfold more likely to give a confident wrong answer than if you had shipped nothing at all.

"Stale Docs, Confident Answers: The Hidden Failure Mode in AI Help Centers"

Most teams building AI-powered search and help centers focus on retrieval quality, embedding models, and chunk size. Almost none of them have a process for tracking whether the documents in the corpus are still accurate. That gap — documentation debt — is now showing up as a production reliability problem, not just a content problem.

About Tian Pan