Skip to main content

50 posts tagged with "retrieval"

View all tags

Cross-Encoder Reranking in Practice: What Cosine Similarity Misses

· 10 min read
Tian Pan
Software Engineer

Your RAG pipeline retrieves the top 10 documents and your LLM still gives a wrong answer. You increase the retrieval count to 50. Still wrong. The frustrating part: the correct document was in your vector store the whole time—it was just ranked 23rd. This is not a recall problem. It's a ranking problem, and cosine similarity is the culprit.

Vector search does a decent job of finding semantically adjacent content. But "semantically adjacent" and "most useful for this specific query" are not the same thing. Cosine similarity measures the angle between two vectors in embedding space, and that angle only captures a coarse notion of topical proximity. What it cannot capture is the fine-grained interaction between the specific words in your query and the specific words in a document—the difference between "how to prevent buffer overflows" and "buffer overflow exploit techniques" is subtle at the vector level but critical for your retrieval system.

GraphRAG vs. Vector RAG: The Architecture Decision Teams Make Too Late

· 12 min read
Tian Pan
Software Engineer

Most teams discover they need GraphRAG six months too late — after they've already explained to users why the AI got the relationship wrong, why it confused two entities that share similar embeddings, or why it confidently cited a document that contradicts the actual answer. Vector RAG is genuinely good at what it does. The problem is that teams treat it as good at everything, and keep piling on retrieval hacks when the underlying architecture has hit a mathematical ceiling.

Fewer than 15% of enterprises have deployed graph-based retrieval in production as of 2025. This is not because the technology is immature. It's because the failure signals for vector-only RAG are subtle: the system runs, the LLM responds, and only careful inspection reveals that the retrieved context was plausible but wrong.

Retrieval Monoculture: Why Your RAG System Has Systematic Blind Spots

· 10 min read
Tian Pan
Software Engineer

Your RAG system's evals look fine. NDCG is acceptable. The demo works. But there's a category of failure no single-metric eval catches: the queries your retriever never even gets close on, consistently, because your entire embedding space was never equipped to handle them in the first place.

That's retrieval monoculture. One embedding model. One similarity metric. One retrieval path — and therefore one set of systematic blind spots that look like model errors, hallucination, or user confusion until you actually examine the retrieval layer.

The fix is not a bigger model or more data. It's understanding that different query structures need different retrieval mechanisms, and building a system that stops routing everything through the same funnel.

Knowledge Graph vs. Vector Store: Choosing Your Retrieval Primitive

· 9 min read
Tian Pan
Software Engineer

Most teams stumble into vector stores because they're easy to start with, then discover a category of queries that simply won't work no matter how well they tune chunk size or embedding model. That's not a tuning problem — it's an architectural mismatch. Vector similarity and graph traversal are fundamentally different retrieval mechanisms, and the gap matters more as your queries get harder.

This is not a "use both" post. There are real trade-offs, and getting the choice wrong costs months of engineering time. Here's what the decision actually looks like in practice.

Document Extraction Is Your RAG System's Hidden Ceiling

· 10 min read
Tian Pan
Software Engineer

A compliance contractor builds a RAG system to answer questions against a 400-page policy document. The system passes internal QA. It retrieves correctly against single-topic queries. Then it goes live and starts returning confident, well-structured, wrong answers on anything involving exception clauses.

The debugging loop looks familiar: swap the embedding model, tune similarity thresholds, experiment with chunk sizes, add a reranker. Weeks pass. The improvement is marginal. The real problem is that a key exception clause was split across two chunks at a paragraph boundary — not because of chunking strategy, but because the PDF extractor silently broke the paragraph in two when it misread the layout. Neither chunk, in isolation, is retrievable or interpretable. The system cannot hallucinate its way to a correct answer because the correct information never entered the index cleanly.

This is the extraction ceiling: the point beyond which no downstream optimization can compensate for corrupted or missing input data.

GraphRAG vs. Vector RAG: When Knowledge Graphs Beat Embeddings

· 9 min read
Tian Pan
Software Engineer

Most teams reach for vector embeddings when building RAG pipelines. It's the obvious default: embed documents, embed queries, find the nearest neighbors, feed results to the LLM. It works well enough on the demos. Then they deploy to a compliance team or a scientific literature corpus, and accuracy falls off a cliff. Not gradually — abruptly. On queries involving five or more entities, vector RAG accuracy in enterprise analytics benchmarks drops to zero. Not 50%. Not 20%. Zero.

This isn't a configuration problem. It's an architectural mismatch. Vector retrieval treats documents as points in semantic space. Knowledge graphs treat them as nodes in a relational structure. When your queries require traversing relationships — not just finding similar content — the topology of your retrieval architecture is what determines whether you get the right answer.

When Embeddings Aren't Enough: A Decision Framework for Hybrid Retrieval Architecture

· 11 min read
Tian Pan
Software Engineer

Most RAG implementations start the same way: spin up a vector database, embed documents with a decent model, run cosine similarity at query time, and ship it. The demo looks great. Relevance feels surprisingly good. Then you deploy it to production and discover that "Error 221" retrieves documents about "Error 222," that searching for a specific product SKU surfaces semantically similar but wrong items, and that adding a date filter causes retrieval quality to crater.

Vector search is a genuinely powerful tool. It's also not sufficient on its own for most production retrieval workloads. The teams winning with RAG in 2025 aren't choosing between dense embeddings and keyword search — they're using both, deliberately.

This is a decision framework for when hybrid retrieval is worth the added complexity, and how to build each layer without destroying your latency budget.

The Knowledge Contamination Problem: When Your RAG System Ignores Its Own Retrieval

· 8 min read
Tian Pan
Software Engineer

A team ships a RAG pipeline for internal documentation. Retrieval looks solid — the right passages come back. But in production, users keep getting stale answers. They dig into the logs and find the model is returning facts from its training data, not from the documents it was handed. The retrieval worked. The model just didn't use it.

This is the knowledge contamination problem: the model's parametric memory — the knowledge baked into its weights during training — overrides the retrieved context. It's quiet, it's confident, and it's one of the most common failure modes in production RAG systems.

Poisoned at the Source: RAG Corpus Decay and Data Governance for Vector Stores

· 11 min read
Tian Pan
Software Engineer

Your RAG system was working fine at launch. Three months later it's confidently wrong about a third of user queries — and your traces show nothing broken. The retriever is fetching documents. The model is generating responses. The pipeline looks healthy. The problem is invisible: every vector in your store still has a similarity score, but half of them are pointing to facts that no longer exist.

This is corpus decay. It doesn't throw errors. It doesn't trigger alerts. It accumulates quietly in the background, and by the time you notice it through user complaints or quality degradation, your vector store has become a liability.

The RAG Eval Antipattern That Hides Retriever Bugs

· 10 min read
Tian Pan
Software Engineer

There's a failure mode common in RAG systems that goes undetected for months: your retriever is returning the wrong documents, but your generator is good enough at improvising that end-to-end quality scores stay green. You keep tuning the prompt. You upgrade the model. Nothing helps. The bug is three layers upstream and your metrics are invisible to it.

This is the retriever eval antipattern — evaluating your entire RAG pipeline as a single unit, which lets the generator absorb and hide retrieval failures. The result is a system where you cannot distinguish between "the generator failed" and "the retriever failed," making systematic improvement nearly impossible.

The Discovery Problem: Why Semantic Search Fails Browsing Users

· 9 min read
Tian Pan
Software Engineer

Vector search is eating the world. Embedding-based retrieval now powers product search at every major e-commerce platform, drives the retrieval layer of RAG systems, and sits at the core of most AI-powered search rewrites. But there is a category of user that these systems fail silently and consistently: the browsing user. Not because the embeddings are bad. Because they were built to solve a different problem.

The fundamental assumption behind semantic search is that users arrive with a query that approximates what they want. Optimize for proximity in embedding space to that query, and you win. But a significant fraction of real users arrive with something closer to curiosity than a query — and for them, the nearest neighbors in vector space are exactly the wrong answer.

Semantic Search as a Product: What Changes When Retrieval Understands Intent

· 11 min read
Tian Pan
Software Engineer

Most teams building semantic search start from a RAG proof-of-concept: chunk documents, embed them, store vectors, query with cosine similarity. It works well enough in demos. Then they ship it to users, and half the queries fail in ways that have nothing to do with retrieval quality.

The reason is that RAG and user-facing semantic search are solving different problems. RAG asks "given a question, retrieve context for an LLM to answer it." Semantic search asks "given a user's query, surface results that match what they actually want." The second problem has a layer of complexity that RAG benchmarks systematically ignore — and that complexity lives almost entirely before retrieval begins.