Skip to main content

34 posts tagged with "retrieval"

View all tags

No Results Is Not Absence: Why Agents Treat Retrieval Failure as Proof

· 10 min read
Tian Pan
Software Engineer

The most dangerous sentence in an agent transcript is not a hallucination. It is four calm words: "I could not find it." The agent sounds epistemically humble. It sounds like due diligence. It sounds, to any downstream reader or caller, exactly like a fact. And yet the statement carries no information about whether the thing exists. It only carries information about what happened when a specific tool, invoked with a specific query, consulted a specific index that the agent happened to have access to at that moment.

Between those two readings lies a production incident waiting to happen. A support agent tells a customer "we have no record of your order" because a replication lag delayed the write to the read replica by ninety seconds. A coding agent declares "there are no tests for this module" because it searched a directory that did not contain the test folder. A compliance agent replies "no prior violations on file" because the audit index had not ingested last week's report. In each case the agent's output is grammatically a negation, but epistemically it is a shrug that has been re-typed as a claim.

Popularity Bias in Vector Retrieval: Why the Same Five Chunks Dominate Every Query

· 10 min read
Tian Pan
Software Engineer

Pull a week of retrieval logs from any mature RAG system and sort chunks by how often they were returned. The shape is almost always the same: a small cluster of chunks appears in thousands of queries while the vast majority of your corpus shows up a handful of times or never at all. The system isn't broken. It's doing exactly what its index was built to do — and that is the problem.

This is popularity bias in vector retrieval, and it gets worse as your corpus grows. A few chunks become gravity wells that win retrieval across queries that have little to do with each other, while your long tail quietly disappears below the top-k cutoff. Your RAG system starts feeling "generic" — users ask specific questions and get answers that sound like they were written for someone else. By the time product complains, the distribution has already been lopsided for weeks.

Your RAG Chunker Is a Database Schema Nobody Code-Reviewed

· 11 min read
Tian Pan
Software Engineer

The first time a retrieval quality regression lands in your on-call channel, the debugging path almost always leads somewhere surprising. Not the embedding model. Not the reranker. Not the prompt. The culprit is a one-line change to the chunker — a tokenizer swap, a boundary rule tweak, a stride adjustment — that someone merged into a preprocessing notebook three sprints ago. The fix touched zero lines of production code. It rebuilt the index overnight. And now accuracy is down four points across every tenant.

The chunker is a database schema. Every field you extract, every boundary you draw, every stride you pick defines the shape of the rows that land in your vector index. Change any of them and you have altered the schema of an index that other parts of your system — retrieval logic, reranker features, evaluation harnesses, downstream prompts — depend on as if it were stable. But because the chunker usually lives in a notebook or a small Python module that nobody labels as "infrastructure," these changes ship with the rigor of a config tweak and the blast radius of an ALTER TABLE.

Chunking Strategy Is the Hidden Load-Bearing Decision in Your RAG Pipeline

· 10 min read
Tian Pan
Software Engineer

Most RAG quality conversations focus on the wrong things. Teams debate embedding model selection, tweak retrieval top-K, and experiment with prompt templates — while a single architectural decision made during ingestion quietly caps how good the system can ever be. That decision is chunking strategy: how you cut documents into pieces before indexing them.

A 2025 benchmark study found that chunking configuration has as much or more influence on retrieval quality as embedding model choice. And yet teams routinely pick a default — 512 tokens with RecursiveCharacterTextSplitter, usually — and then spend months wondering why their retrieval precision keeps disappointing them. The problem was baked in at index time. Swapping models cannot fix it.

When Vector Search Fails: Why Knowledge Graphs Handle Queries Embeddings Can't

· 9 min read
Tian Pan
Software Engineer

Vector search has become the default retrieval primitive for RAG systems. Embed your documents, embed the query, find nearest neighbors — it's simple, fast, and works surprisingly well for a wide class of questions. But production deployments keep hitting the same wall: certain queries return garbage results despite high similarity scores, certain multi-document reasoning tasks fail silently, and certain entity-heavy queries degrade to random noise as complexity grows.

The issue isn't embedding quality or index size. It's that semantic similarity is the wrong abstraction for a significant class of retrieval problems. Knowledge graphs aren't a replacement for vector search — they solve a structurally different problem. Understanding which problems belong to which tool is what separates a brittle RAG pipeline from one that holds up in production.

RAG Position Bias: Why Chunk Order Changes Your Answers

· 8 min read
Tian Pan
Software Engineer

You've spent weeks tuning your embedding model. Your retrieval precision looks solid. Chunk size, overlap, metadata filters — all dialed in. And yet users keep reporting that the system "ignores" information it clearly has access to. The relevant passage is in the top-5 retrieved results every time. The model just doesn't seem to use it.

The culprit is often position bias: a systematic tendency for language models to over-rely on information at the beginning and end of their context window, while dramatically under-attending to content in the middle. In controlled experiments, moving a relevant passage from position 1 to position 10 in a 20-document context produces accuracy drops of 30–40 percentage points. Your retriever found the right content. The ordering killed it.

The Reranker Gap: Why Most RAG Pipelines Skip the Most Important Layer

· 8 min read
Tian Pan
Software Engineer

Most RAG pipelines have an invisible accuracy ceiling, and the engineers who built them don't know it's there. You tune your chunking strategy, upgrade your embedding model, swap vector databases — and the system still returns plausible but subtly wrong documents for a stubborn class of queries. The retrieval looks reasonable. The LLM sounds confident. But downstream accuracy has quietly plateaued at a level that no amount of prompt engineering will break through.

The gap almost always traces to the same missing piece: a reranker. Specifically, the absence of a cross-encoder in a second retrieval stage. It's the layer that's technically optional, practically expensive to skip, and systematically omitted from the canonical "embed, index, query" tutorials that most RAG pipelines are built from.

Corpus Architecture for RAG: The Indexing Decisions That Determine Quality Before Retrieval Starts

· 12 min read
Tian Pan
Software Engineer

When a RAG system returns the wrong answer, the post-mortem almost always focuses on the same suspects: the retrieval query, the similarity threshold, the reranker, the prompt. Teams spend days tuning these components while the actual cause sits untouched in the indexing pipeline. The failure happened weeks ago when someone decided on a chunk size.

Most RAG quality problems are architectural, not operational. They stem from decisions made at index time that silently shape what the LLM will ever be allowed to see. By the time a user complains, the retrieval system is doing exactly what it was designed to do — it's just that the design was wrong.

Cross-Encoder Reranking in Practice: What Cosine Similarity Misses

· 10 min read
Tian Pan
Software Engineer

Your RAG pipeline retrieves the top 10 documents and your LLM still gives a wrong answer. You increase the retrieval count to 50. Still wrong. The frustrating part: the correct document was in your vector store the whole time—it was just ranked 23rd. This is not a recall problem. It's a ranking problem, and cosine similarity is the culprit.

Vector search does a decent job of finding semantically adjacent content. But "semantically adjacent" and "most useful for this specific query" are not the same thing. Cosine similarity measures the angle between two vectors in embedding space, and that angle only captures a coarse notion of topical proximity. What it cannot capture is the fine-grained interaction between the specific words in your query and the specific words in a document—the difference between "how to prevent buffer overflows" and "buffer overflow exploit techniques" is subtle at the vector level but critical for your retrieval system.

GraphRAG vs. Vector RAG: The Architecture Decision Teams Make Too Late

· 12 min read
Tian Pan
Software Engineer

Most teams discover they need GraphRAG six months too late — after they've already explained to users why the AI got the relationship wrong, why it confused two entities that share similar embeddings, or why it confidently cited a document that contradicts the actual answer. Vector RAG is genuinely good at what it does. The problem is that teams treat it as good at everything, and keep piling on retrieval hacks when the underlying architecture has hit a mathematical ceiling.

Fewer than 15% of enterprises have deployed graph-based retrieval in production as of 2025. This is not because the technology is immature. It's because the failure signals for vector-only RAG are subtle: the system runs, the LLM responds, and only careful inspection reveals that the retrieved context was plausible but wrong.

Retrieval Monoculture: Why Your RAG System Has Systematic Blind Spots

· 10 min read
Tian Pan
Software Engineer

Your RAG system's evals look fine. NDCG is acceptable. The demo works. But there's a category of failure no single-metric eval catches: the queries your retriever never even gets close on, consistently, because your entire embedding space was never equipped to handle them in the first place.

That's retrieval monoculture. One embedding model. One similarity metric. One retrieval path — and therefore one set of systematic blind spots that look like model errors, hallucination, or user confusion until you actually examine the retrieval layer.

The fix is not a bigger model or more data. It's understanding that different query structures need different retrieval mechanisms, and building a system that stops routing everything through the same funnel.

Knowledge Graph vs. Vector Store: Choosing Your Retrieval Primitive

· 9 min read
Tian Pan
Software Engineer

Most teams stumble into vector stores because they're easy to start with, then discover a category of queries that simply won't work no matter how well they tune chunk size or embedding model. That's not a tuning problem — it's an architectural mismatch. Vector similarity and graph traversal are fundamentally different retrieval mechanisms, and the gap matters more as your queries get harder.

This is not a "use both" post. There are real trade-offs, and getting the choice wrong costs months of engineering time. Here's what the decision actually looks like in practice.