171 posts tagged with "rag"

Knowledge Graph vs. Vector Store: Choosing Your Retrieval Primitive

April 18, 2026 · 9 min read

Software Engineer

Most teams stumble into vector stores because they're easy to start with, then discover a category of queries that simply won't work no matter how well they tune chunk size or embedding model. That's not a tuning problem — it's an architectural mismatch. Vector similarity and graph traversal are fundamentally different retrieval mechanisms, and the gap matters more as your queries get harder.

This is not a "use both" post. There are real trade-offs, and getting the choice wrong costs months of engineering time. Here's what the decision actually looks like in practice.

Retrieval Debt: Why Your RAG Pipeline Degrades Silently Over Time

April 18, 2026 · 10 min read

Tian Pan

Software Engineer

Six months after you shipped your RAG pipeline, something changed. Users aren't complaining loudly — they're just trusting the answers a little less. Feedback ratings dropped from 4.2 to 3.7. A few support tickets reference "outdated information." Your engineers look at the logs and see no errors, no timeouts, no obvious regression. The retrieval pipeline looks healthy by every metric you've configured.

It isn't. It's rotting.

Retrieval debt is the accumulated technical decay in a vector index: stale embeddings that no longer represent current document content, tombstoned chunks from deleted records that pollute search results, and semantic drift between the encoder version that indexed your corpus and the encoder version now computing query embeddings. Unlike code rot, retrieval debt produces no stack traces. It produces subtly wrong answers with confident-looking citations.

Choosing a Vector Database for Production: What Benchmarks Won't Tell You

April 17, 2026 · 10 min read

Tian Pan

Software Engineer

When engineers evaluate vector databases, they typically load ANN benchmarks and pick whoever tops the recall-at-10 chart. Three months later, they're filing migration tickets. The benchmarks measured query throughput on a static, perfectly-indexed dataset with a single client. Production looks nothing like that.

This guide covers the five dimensions that predict whether a vector database holds up under real workloads — and a decision framework for matching those dimensions to your stack.

Document Extraction Is Your RAG System's Hidden Ceiling

April 17, 2026 · 10 min read

Tian Pan

Software Engineer

A compliance contractor builds a RAG system to answer questions against a 400-page policy document. The system passes internal QA. It retrieves correctly against single-topic queries. Then it goes live and starts returning confident, well-structured, wrong answers on anything involving exception clauses.

The debugging loop looks familiar: swap the embedding model, tune similarity thresholds, experiment with chunk sizes, add a reranker. Weeks pass. The improvement is marginal. The real problem is that a key exception clause was split across two chunks at a paragraph boundary — not because of chunking strategy, but because the PDF extractor silently broke the paragraph in two when it misread the layout. Neither chunk, in isolation, is retrievable or interpretable. The system cannot hallucinate its way to a correct answer because the correct information never entered the index cleanly.

This is the extraction ceiling: the point beyond which no downstream optimization can compensate for corrupted or missing input data.

Enterprise RAG Governance: The Org Chart Behind Your Retrieval Pipeline

April 17, 2026 · 11 min read

Tian Pan

Software Engineer

Forty to sixty percent of enterprise RAG deployments fail to reach production. The culprit is almost never the retrieval algorithm—HNSW indexing works fine, embeddings are reasonably good, and vector similarity search is a solved problem. The breakdown happens upstream and downstream: no document ownership, no access controls enforced at query time, PII sitting unprotected in vector indexes, and a retrieval corpus that diverges from reality within weeks of launch. These are governance failures, and most engineering teams treat them as someone else's problem right up until a compliance team, a security audit, or a user who received another tenant's data makes it their problem.

This is the organizational and technical anatomy of a governed RAG knowledge base—written for engineers who own the pipeline, not executives who approved the budget.

GraphRAG vs. Vector RAG: When Knowledge Graphs Beat Embeddings

April 17, 2026 · 9 min read

Tian Pan

Software Engineer

Most teams reach for vector embeddings when building RAG pipelines. It's the obvious default: embed documents, embed queries, find the nearest neighbors, feed results to the LLM. It works well enough on the demos. Then they deploy to a compliance team or a scientific literature corpus, and accuracy falls off a cliff. Not gradually — abruptly. On queries involving five or more entities, vector RAG accuracy in enterprise analytics benchmarks drops to zero. Not 50%. Not 20%. Zero.

This isn't a configuration problem. It's an architectural mismatch. Vector retrieval treats documents as points in semantic space. Knowledge graphs treat them as nodes in a relational structure. When your queries require traversing relationships — not just finding similar content — the topology of your retrieval architecture is what determines whether you get the right answer.

When Embeddings Aren't Enough: A Decision Framework for Hybrid Retrieval Architecture

April 17, 2026 · 11 min read

Tian Pan

Software Engineer

Most RAG implementations start the same way: spin up a vector database, embed documents with a decent model, run cosine similarity at query time, and ship it. The demo looks great. Relevance feels surprisingly good. Then you deploy it to production and discover that "Error 221" retrieves documents about "Error 222," that searching for a specific product SKU surfaces semantically similar but wrong items, and that adding a date filter causes retrieval quality to crater.

Vector search is a genuinely powerful tool. It's also not sufficient on its own for most production retrieval workloads. The teams winning with RAG in 2025 aren't choosing between dense embeddings and keyword search — they're using both, deliberately.

This is a decision framework for when hybrid retrieval is worth the added complexity, and how to build each layer without destroying your latency budget.

The Knowledge Contamination Problem: When Your RAG System Ignores Its Own Retrieval

April 17, 2026 · 8 min read

Tian Pan

Software Engineer

A team ships a RAG pipeline for internal documentation. Retrieval looks solid — the right passages come back. But in production, users keep getting stale answers. They dig into the logs and find the model is returning facts from its training data, not from the documents it was handed. The retrieval worked. The model just didn't use it.

This is the knowledge contamination problem: the model's parametric memory — the knowledge baked into its weights during training — overrides the retrieved context. It's quiet, it's confident, and it's one of the most common failure modes in production RAG systems.

Knowledge Cutoff Is a Silent Production Bug

April 17, 2026 · 11 min read

Tian Pan

Software Engineer

Most production AI failures are loud. The model returns a 5xx. The schema validation throws. The eval suite catches the regression before it ships. But there is a category of failure that is completely silent — no error, no exception, no alert fires — because the system is working exactly as designed. It is just working with a snapshot of reality from 18 months ago.

Your LLM has a knowledge cutoff. That cutoff is not a documentation footnote. It is a slowly widening gap between what your model believes to be true and what is actually true, and it compounds every day you keep the same model in production. Teams celebrate launch, then watch user trust quietly erode over the next six months as the world moves and the model stays still.

Live Web Grounding in Production: Why Calling a Search API Is Only the Beginning

April 17, 2026 · 10 min read

Tian Pan

Software Engineer

Most engineers discover the limits of live web grounding the same way: they wire up a search API in an afternoon, ship it to production, and spend the next three weeks explaining why the latency is six seconds, the answers are wrong about recent events, and users are occasionally getting directed to fake phone numbers.

The underlying assumption — that search-augmented LLMs are just "regular RAG but with fresh data" — is the source of most of the pain. Live web grounding shares almost nothing with static retrieval beyond the word "retrieval." It is a distributed systems problem wearing an NLP hat.

Poisoned at the Source: RAG Corpus Decay and Data Governance for Vector Stores

April 17, 2026 · 11 min read

Tian Pan

Software Engineer

Your RAG system was working fine at launch. Three months later it's confidently wrong about a third of user queries — and your traces show nothing broken. The retriever is fetching documents. The model is generating responses. The pipeline looks healthy. The problem is invisible: every vector in your store still has a similarity score, but half of them are pointing to facts that no longer exist.

This is corpus decay. It doesn't throw errors. It doesn't trigger alerts. It accumulates quietly in the background, and by the time you notice it through user complaints or quality degradation, your vector store has become a liability.

The RAG Eval Antipattern That Hides Retriever Bugs

April 17, 2026 · 10 min read

Tian Pan

Software Engineer

There's a failure mode common in RAG systems that goes undetected for months: your retriever is returning the wrong documents, but your generator is good enough at improvising that end-to-end quality scores stay green. You keep tuning the prompt. You upgrade the model. Nothing helps. The bug is three layers upstream and your metrics are invisible to it.

This is the retriever eval antipattern — evaluating your entire RAG pipeline as a single unit, which lets the generator absorb and hide retrieval failures. The result is a system where you cannot distinguish between "the generator failed" and "the retriever failed," making systematic improvement nearly impossible.

About Tian Pan