Skip to main content

3 posts tagged with "information-retrieval"

View all tags

The Multilingual RAG Retrieval Gap: Why Cross-Lingual Queries Silently Fail Your Vector Search

· 11 min read
Tian Pan
Software Engineer

A team builds a RAG system. English retrieval hits 94% recall. They ship. Three months later, support tickets from French and German users pile up — the chatbot keeps returning irrelevant results or nothing at all. The engineers look at their monitoring dashboard. Overall recall: 91%. Nothing looks broken.

The corpus is English. The embedding model is English-only. The users are not. Every French query gets embedded into a vector space that was never designed to share coordinates with the English documents it's searching against. The cosine similarities aren't bad — they're geometrically meaningless. And because aggregate metrics aggregate, the problem is invisible until users complain loudly enough.

This is the multilingual RAG retrieval gap, and it's one of the most common silent failure modes in production AI systems serving non-English audiences.

Reranking Is the Real Work: Why Your Retrieval System's Bottleneck Is Never the Index

· 10 min read
Tian Pan
Software Engineer

Teams building RAG systems almost universally hit the same wall: they spend a week tuning their HNSW index parameters, add product quantization, push recall@100 from 0.81 to 0.87 — and then watch LLM output quality barely budge. The assumption baked into months of effort is that a better index equals better answers. It doesn't. The bottleneck was never the index.

The actual chokepoint is the ranking step between your candidate set and your context window. What you put into the LLM determines what comes out, and the job of ranking is to ensure that the most genuinely relevant documents, not just the most semantically similar ones, make it through. That distinction matters more than any HNSW configuration you'll ever tune.

Corpus Curation at Scale: Why Your RAG Quality Ceiling Is Your Document Quality Floor

· 10 min read
Tian Pan
Software Engineer

There's a belief embedded in most RAG architectures that goes something like this: if retrieval returns the right chunks, the LLM will produce correct answers. Teams invest heavily in embedding model selection, hybrid retrieval strategies, and reranking pipelines. Then, three months after deploying to production, answer quality quietly degrades — not because the model changed, not because query patterns shifted dramatically, but because the underlying corpus rotted.

Enterprise RAG implementations fail at a roughly 40% rate, and the failure mode that practitioners underestimate most isn't hallucination or poor retrieval recall. It's document quality. One analysis found that a single implementation improved search accuracy from 62% to 89% by introducing document quality scoring — with no changes to the embedding model or retrieval algorithm. The corpus was the variable. The corpus was always the variable.