Skip to main content

GraphRAG in Production: When Vector Search Hits Its Ceiling

· 9 min read
Tian Pan
Software Engineer

Your vector search looks great on benchmarks. Users are still frustrated.

The failure mode is subtle: a user asks "Which of our suppliers have been involved in incidents that affected customers in the same region as the Martinez account?" Your embeddings retrieve the incident records. They retrieve the supplier contracts. They retrieve the customer accounts. But they retrieve them as disconnected documents, and the LLM has to figure out the relationships in context — relationships that span three hops across your entity graph. At five or more entities per query, accuracy without relational structure drops toward zero. With it, performance stays stable.

This is the ceiling that knowledge graph augmented retrieval — GraphRAG — is built to address. It is not a drop-in replacement for vector search. It is a different system with a different cost structure, different failure modes, and a different class of queries where it wins decisively.

What GraphRAG Actually Is (It's Not One Thing)

The term "GraphRAG" covers a spectrum of architectures, not a single system. Understanding which variant you're evaluating matters as much as the benchmarks themselves.

Type 1: Graph-Enhanced Vector Search. The lightest-weight option. You add graph metadata — entity tags, relationship labels — as structured filters on top of your existing vector index. No LLM-powered entity extraction required. The graph is not queried directly; it constrains vector similarity search. This gets you significant gains for queries involving known entity types at a fraction of the complexity.

Type 2: Graph-Guided Retrieval. The system identifies entities in the user's query, traverses the graph to collect connected entities within N hops, then retrieves documents associated with those entities. Vector similarity is replaced by or augmented with traversal. This wins on cross-document reasoning — finding evidence spread across three documents that share no semantic overlap but share a critical entity relationship.

Type 3: Microsoft-Style Full GraphRAG. The system extracts entities and relationships from every document using an LLM, runs community detection over the resulting graph, and builds hierarchical summaries of each community. At query time, it routes to the appropriate community level based on query scope. This is the architecture behind Microsoft's open-source GraphRAG release and the one that gets cited in most benchmark papers.

Type 4: Temporal Knowledge Graphs. Purpose-built for agent memory. Every fact carries a timestamp; contradicting facts create explicit supersedes relationships. This is what Neo4j's Graphiti is designed for. It is not a document retrieval system — using it for Q&A over a static document corpus is the wrong tool for the job.

Most teams evaluating "GraphRAG" are actually comparing Type 3 to plain vector RAG, then wondering why it costs so much.

The Cost Reality Is Uncomfortable

For a 500-page document corpus, the indexing costs break down roughly like this:

ApproachIndexing CostTime
Vector RAGUnder $5Minutes
LightRAG~$0.50~3 minutes
Microsoft GraphRAG$50–200~45 minutes

The cost gap in Type 3 GraphRAG comes entirely from LLM-powered entity extraction, which consumes about 58% of total indexing tokens. You are paying an LLM to read every document and produce a structured entity-relationship graph. That process does not scale linearly — it scales with both document count and entity density.

Microsoft addressed this in January 2025 with Dynamic Community Selection, which reduced token usage by 79% while maintaining answer quality. Their LazyGraphRAG variant defers community summarization until query time, achieving comparable answer quality at roughly 0.1% of the standard indexing cost. The tradeoff: query latency increases by 2–8 seconds as the system builds summaries on demand.

For teams with corpora that change frequently, LazyGraphRAG's approach is often more practical than pre-building summaries that go stale. For read-heavy systems with stable corpora, the pre-computed summaries pay off.

LightRAG sits in a pragmatic middle ground — roughly 70–90% of full GraphRAG performance at 1/100th the cost, using a simpler flat graph structure rather than hierarchical communities. Its weakness is broad analytical queries ("Summarize the themes across all our incident reports for Q3") where the hierarchical community structure of full GraphRAG wins.

Failure Modes Unique to Graph Retrieval

Vector RAG failures are boring and uniform: the right chunk wasn't retrieved because the query didn't match semantically. You can find these failures by looking at retrieval recall.

Graph RAG failures are varied and harder to instrument:

Entity extraction drift. The LLM extracting entities from documents achieves 60–85% accuracy depending on domain specificity. In general English text, it performs well. In specialized domains — medical billing codes, legal contract clauses, supply chain part numbers — extraction accuracy degrades significantly, and the errors are not random. The same extraction errors appear consistently across similar document types, so your coverage gaps are systematic, not random.

Graph decay. Production knowledge graphs without automated refresh drift 15–20% from ground truth per quarter. Teams consistently underestimate the maintenance burden: the initial build takes weeks, but ongoing freshness requires automated pipelines to detect document changes, re-extract affected entities, and update relationships without corrupting the existing graph. Teams report spending 2–3x more engineering hours on graph freshness than on the initial build.

Community summary staleness. If you're using pre-built community summaries, those summaries reflect the document state at index time. A customer complaint that changed a supplier relationship in your data system does not update the community summary that includes that supplier. The graph's structural links update (if you refresh them), but the LLM-generated summaries do not update automatically — they require re-running community detection and summarization.

Routing failures in hybrid systems. Most production deployments use vector RAG for simple queries and graph retrieval for complex ones. The routing logic — deciding which path to take — is itself a failure point. A misrouted complex query hits vector search and returns a plausible but incomplete answer. Users do not see the routing decision; they see a confident but wrong response.

What the Benchmarks Actually Show

The performance split between vector RAG and GraphRAG is not "GraphRAG wins everywhere." The numbers are more specific:

  • For specific document search (retrieve the exact policy, find the exact contract clause), vector RAG wins: 54% vs. 35% for GraphRAG.
  • For aggregation queries (how many suppliers have had incidents in the last 12 months?), GraphRAG retrieves relevant results 3× more often: 23% vs. 8%.
  • For cross-document reasoning (which customers are affected by this supplier's incident given their regional overlap?), GraphRAG wins by 4×: 33% vs. 8%.

The 3.4× overall accuracy advantage that appears in some GraphRAG benchmarks is real but misleading without this breakdown. If your query distribution skews toward specific document retrieval, that advantage does not apply. If your queries require multi-hop reasoning across entity relationships, the advantage is dramatic.

When to Actually Use Graph RAG

A practical decision framework:

Start with vector RAG if your domain has fewer than 1,000 entities with simple relationships. Well-governed metadata filters on a vector index will outperform GraphRAG at roughly one-tenth the cost. This covers the majority of enterprise use cases.

Add Type 1 graph enhancement (metadata filters derived from existing structured data — your CRM, org charts, product catalogs) when you see retrieval failures around known entity types. This is the cheapest path to graph-informed retrieval with no entity extraction cost.

Evaluate Type 2 graph-guided retrieval when specific multi-hop query patterns emerge in your logs and vector search is demonstrably failing them. Do not architect for hypothetical future queries — find the real ones in your query logs first.

Consider LightRAG or full GraphRAG when you need global summarization across a large, semi-structured corpus and the specific entity types in your domain benefit from explicit relationship modeling. Healthcare, legal, financial compliance, and supply chain are domains where this threshold is often reached.

Use temporal graph architectures (Graphiti) only for agent memory systems where you need to track how knowledge evolves over time. This is a different problem than document retrieval.

The Graph Maintenance Problem That Kills Pilots

GraphRAG pilots succeed. GraphRAG production deployments often fail quietly — not because the retrieval quality degrades immediately, but because the graph maintenance overhead is underestimated.

Every document update requires updating entity extraction. Every relationship change requires updating graph traversal structure. Every new document type requires tuning the entity extraction prompt for that type. The graph is not a data store that absorbs writes automatically; it is a derived artifact that must be rebuilt or incrementally updated when the underlying data changes.

Teams that succeed in production treat the knowledge graph as a first-class data product — with schema governance, quality monitoring, freshness SLAs, and automated pipelines for detection and refresh. Teams that treat it as infrastructure they deploy once and maintain occasionally discover that the graph's accuracy has drifted below usefulness within two quarters.

A Staged Migration Path

The expensive mistake is architecting full GraphRAG before you know which queries actually need it. The cheaper path:

  1. Instrument your current vector RAG system. Log queries, retrieved chunks, and user satisfaction signals. Identify the specific query patterns where users are expressing frustration or rephrasing the same question multiple times.

  2. Classify failure modes. Distinguish between semantic mismatch failures (wrong chunks retrieved) and relational reasoning failures (right chunks retrieved but relationships missed). Only relational failures benefit from graph augmentation.

  3. Start with Type 1. If you have structured entity data in existing systems, add it as metadata to your vector index. This is a few days of work and often resolves 30–50% of relational failures.

  4. Run a targeted Type 2 pilot. For the query patterns that Type 1 does not fix, build a narrow graph covering only the entity types involved in those queries. Do not build a comprehensive enterprise knowledge graph. Build the smallest graph that resolves your highest-impact failures.

  5. Evaluate maintenance cost before expanding. After 90 days, measure how much engineering effort went into keeping that narrow graph fresh. Extrapolate that cost to the broader graph you are considering. If the maintenance math works, expand. If it does not, you have found your ceiling.

GraphRAG is not a RAG upgrade — it is a different system with different failure modes, different costs, and different operational requirements. Used on the right query distribution with the right maintenance investment, it resolves problems that vector search cannot. Applied universally before you understand your query distribution, it is an expensive way to discover that most of your queries worked fine before.

References:Let's stay in touch and Follow me for more thoughts and updates