Skip to main content

61 posts tagged with "retrieval"

View all tags

GraphRAG vs. Vector RAG: The Architecture Decision Teams Make Too Late

· 12 min read
Tian Pan
Software Engineer

Most teams discover they need GraphRAG six months too late — after they've already explained to users why the AI got the relationship wrong, why it confused two entities that share similar embeddings, or why it confidently cited a document that contradicts the actual answer. Vector RAG is genuinely good at what it does. The problem is that teams treat it as good at everything, and keep piling on retrieval hacks when the underlying architecture has hit a mathematical ceiling.

Fewer than 15% of enterprises have deployed graph-based retrieval in production as of 2025. This is not because the technology is immature. It's because the failure signals for vector-only RAG are subtle: the system runs, the LLM responds, and only careful inspection reveals that the retrieved context was plausible but wrong.

Retrieval Monoculture: Why Your RAG System Has Systematic Blind Spots

· 10 min read
Tian Pan
Software Engineer

Your RAG system's evals look fine. NDCG is acceptable. The demo works. But there's a category of failure no single-metric eval catches: the queries your retriever never even gets close on, consistently, because your entire embedding space was never equipped to handle them in the first place.

That's retrieval monoculture. One embedding model. One similarity metric. One retrieval path — and therefore one set of systematic blind spots that look like model errors, hallucination, or user confusion until you actually examine the retrieval layer.

The fix is not a bigger model or more data. It's understanding that different query structures need different retrieval mechanisms, and building a system that stops routing everything through the same funnel.

Knowledge Graph vs. Vector Store: Choosing Your Retrieval Primitive

· 9 min read
Tian Pan
Software Engineer

Most teams stumble into vector stores because they're easy to start with, then discover a category of queries that simply won't work no matter how well they tune chunk size or embedding model. That's not a tuning problem — it's an architectural mismatch. Vector similarity and graph traversal are fundamentally different retrieval mechanisms, and the gap matters more as your queries get harder.

This is not a "use both" post. There are real trade-offs, and getting the choice wrong costs months of engineering time. Here's what the decision actually looks like in practice.

Document Extraction Is Your RAG System's Hidden Ceiling

· 10 min read
Tian Pan
Software Engineer

A compliance contractor builds a RAG system to answer questions against a 400-page policy document. The system passes internal QA. It retrieves correctly against single-topic queries. Then it goes live and starts returning confident, well-structured, wrong answers on anything involving exception clauses.

The debugging loop looks familiar: swap the embedding model, tune similarity thresholds, experiment with chunk sizes, add a reranker. Weeks pass. The improvement is marginal. The real problem is that a key exception clause was split across two chunks at a paragraph boundary — not because of chunking strategy, but because the PDF extractor silently broke the paragraph in two when it misread the layout. Neither chunk, in isolation, is retrievable or interpretable. The system cannot hallucinate its way to a correct answer because the correct information never entered the index cleanly.

This is the extraction ceiling: the point beyond which no downstream optimization can compensate for corrupted or missing input data.

GraphRAG vs. Vector RAG: When Knowledge Graphs Beat Embeddings

· 9 min read
Tian Pan
Software Engineer

Most teams reach for vector embeddings when building RAG pipelines. It's the obvious default: embed documents, embed queries, find the nearest neighbors, feed results to the LLM. It works well enough on the demos. Then they deploy to a compliance team or a scientific literature corpus, and accuracy falls off a cliff. Not gradually — abruptly. On queries involving five or more entities, vector RAG accuracy in enterprise analytics benchmarks drops to zero. Not 50%. Not 20%. Zero.

This isn't a configuration problem. It's an architectural mismatch. Vector retrieval treats documents as points in semantic space. Knowledge graphs treat them as nodes in a relational structure. When your queries require traversing relationships — not just finding similar content — the topology of your retrieval architecture is what determines whether you get the right answer.

When Embeddings Aren't Enough: A Decision Framework for Hybrid Retrieval Architecture

· 11 min read
Tian Pan
Software Engineer

Most RAG implementations start the same way: spin up a vector database, embed documents with a decent model, run cosine similarity at query time, and ship it. The demo looks great. Relevance feels surprisingly good. Then you deploy it to production and discover that "Error 221" retrieves documents about "Error 222," that searching for a specific product SKU surfaces semantically similar but wrong items, and that adding a date filter causes retrieval quality to crater.

Vector search is a genuinely powerful tool. It's also not sufficient on its own for most production retrieval workloads. The teams winning with RAG in 2025 aren't choosing between dense embeddings and keyword search — they're using both, deliberately.

This is a decision framework for when hybrid retrieval is worth the added complexity, and how to build each layer without destroying your latency budget.

The Knowledge Contamination Problem: When Your RAG System Ignores Its Own Retrieval

· 8 min read
Tian Pan
Software Engineer

A team ships a RAG pipeline for internal documentation. Retrieval looks solid — the right passages come back. But in production, users keep getting stale answers. They dig into the logs and find the model is returning facts from its training data, not from the documents it was handed. The retrieval worked. The model just didn't use it.

This is the knowledge contamination problem: the model's parametric memory — the knowledge baked into its weights during training — overrides the retrieved context. It's quiet, it's confident, and it's one of the most common failure modes in production RAG systems.

Poisoned at the Source: RAG Corpus Decay and Data Governance for Vector Stores

· 11 min read
Tian Pan
Software Engineer

Your RAG system was working fine at launch. Three months later it's confidently wrong about a third of user queries — and your traces show nothing broken. The retriever is fetching documents. The model is generating responses. The pipeline looks healthy. The problem is invisible: every vector in your store still has a similarity score, but half of them are pointing to facts that no longer exist.

This is corpus decay. It doesn't throw errors. It doesn't trigger alerts. It accumulates quietly in the background, and by the time you notice it through user complaints or quality degradation, your vector store has become a liability.

The RAG Eval Antipattern That Hides Retriever Bugs

· 10 min read
Tian Pan
Software Engineer

There's a failure mode common in RAG systems that goes undetected for months: your retriever is returning the wrong documents, but your generator is good enough at improvising that end-to-end quality scores stay green. You keep tuning the prompt. You upgrade the model. Nothing helps. The bug is three layers upstream and your metrics are invisible to it.

This is the retriever eval antipattern — evaluating your entire RAG pipeline as a single unit, which lets the generator absorb and hide retrieval failures. The result is a system where you cannot distinguish between "the generator failed" and "the retriever failed," making systematic improvement nearly impossible.

The Discovery Problem: Why Semantic Search Fails Browsing Users

· 9 min read
Tian Pan
Software Engineer

Vector search is eating the world. Embedding-based retrieval now powers product search at every major e-commerce platform, drives the retrieval layer of RAG systems, and sits at the core of most AI-powered search rewrites. But there is a category of user that these systems fail silently and consistently: the browsing user. Not because the embeddings are bad. Because they were built to solve a different problem.

The fundamental assumption behind semantic search is that users arrive with a query that approximates what they want. Optimize for proximity in embedding space to that query, and you win. But a significant fraction of real users arrive with something closer to curiosity than a query — and for them, the nearest neighbors in vector space are exactly the wrong answer.

Semantic Search as a Product: What Changes When Retrieval Understands Intent

· 11 min read
Tian Pan
Software Engineer

Most teams building semantic search start from a RAG proof-of-concept: chunk documents, embed them, store vectors, query with cosine similarity. It works well enough in demos. Then they ship it to users, and half the queries fail in ways that have nothing to do with retrieval quality.

The reason is that RAG and user-facing semantic search are solving different problems. RAG asks "given a question, retrieve context for an LLM to answer it." Semantic search asks "given a user's query, surface results that match what they actually want." The second problem has a layer of complexity that RAG benchmarks systematically ignore — and that complexity lives almost entirely before retrieval begins.

Knowledge Graphs as a RAG Alternative: When Structured Retrieval Beats Embeddings

· 9 min read
Tian Pan
Software Engineer

Most RAG implementations fail in exactly the same way: the vector search retrieves something plausible but not what the user actually needed, the LLM wraps it in confident prose, and the user gets an answer that's approximately right but specifically wrong. The frustrating part is that the failure mode is invisible — cosine similarity scores look fine, the retrieved passages mention the right topics, but the answer is still wrong because the question required reasoning across relationships, not just semantic proximity.

Vector embeddings are excellent at one thing: finding text that sounds like your query. That's a powerful capability, and it covers an enormous range of production use cases. But it breaks predictably when the question depends on how entities connect to each other rather than how closely their descriptions match. For those queries, a knowledge graph — a property graph you traverse with Cypher or SPARQL — is not an optimization. It's a fundamentally different kind of retrieval that solves a different class of problem.