When Vector Search Fails: Why Knowledge Graphs Handle Queries Embeddings Can't
Vector search has become the default retrieval primitive for RAG systems. Embed your documents, embed the query, find nearest neighbors — it's simple, fast, and works surprisingly well for a wide class of questions. But production deployments keep hitting the same wall: certain queries return garbage results despite high similarity scores, certain multi-document reasoning tasks fail silently, and certain entity-heavy queries degrade to random noise as complexity grows.
The issue isn't embedding quality or index size. It's that semantic similarity is the wrong abstraction for a significant class of retrieval problems. Knowledge graphs aren't a replacement for vector search — they solve a structurally different problem. Understanding which problems belong to which tool is what separates a brittle RAG pipeline from one that holds up in production.
The Two Primitives and What They Actually Do
Vector search answers the question: what documents are semantically similar to this query? It compresses text into a high-dimensional point, then finds nearby points. The fundamental operation is distance in embedding space.
Knowledge graphs answer a different question: what entities exist, how are they related, and what can I infer by traversing those relationships? The fundamental operations are graph traversal — following edges between nodes.
These are not equivalent operations on the same data. Distance and connectivity are orthogonal properties. A document about "treatment protocols for Type 2 diabetes" and a document about "insulin resistance in adolescents" may be far apart in embedding space yet critically connected through a patient entity who has both conditions. Vector search will miss that connection unless the query itself bridges the gap. A graph traversal finds it by following edges.
Where Semantic Similarity Breaks Down
Multi-hop relationship queries are the most obvious failure mode. "Find all papers that cite papers which cite Smith 2019" requires traversing node → node → node. No amount of embedding quality makes this expressible as a nearest-neighbor search. The query is a graph traversal by definition. The same applies to organizational hierarchy queries, supply chain tracing, citation networks, and any question with "of the X that Y" structure.
Entity disambiguation under noise is a subtler failure. When the same real-world entity appears under different surface forms — "JPMorgan", "JP Morgan Chase", "JPM", "the bank" — vector search handles it poorly without explicit entity resolution. Embeddings compare mention vectors to candidate entity vectors but ignore structural context: which other entities co-occur, what relationships exist, what the local graph neighborhood looks like. EDEGE and similar hybrid approaches that combine semantic embeddings with subgraph structure consistently outperform pure embedding disambiguation, because the graph structure provides global semantic context that a single embedding vector cannot capture.
Aggregation queries collapse entirely. "How many documents mention both X and Y in the context of Z?" is a structured query over document metadata and content. Vector search returns similar documents, not an answer to a combinatorial question. Research benchmarks show graph-based retrieval performing 3x better on aggregation queries than vector RAG, specifically because graph traversal can count edges, filter by node properties, and aggregate across relationships.
Cross-document reasoning at scale degrades as the number of entities grows. In controlled benchmarks, vector RAG accuracy drops to near 0% when queries involve five or more distinct entities. GraphRAG maintains stable performance at ten or more entities, because it explicitly models relationships rather than relying on cosine similarity to discover connections that span documents.
There are also the silent failure modes that are harder to catch. Embedding model version mismatch — where query and index use different model versions — produces valid similarity scores backed by meaningless comparisons. Rare terms, SKUs, specific identifiers, and short queries without semantic context all suffer from overgeneralization: the embedding encodes "similar meaning" when you need exact match.
What Knowledge Graphs Bring to Retrieval
A knowledge graph represents entities as nodes and relationships as typed edges. "Author A wrote Paper B, which cites Paper C, which was funded by Organization D" becomes a traversable structure. Queries that require following these chains — breadth-first or depth-first — become expressible operations instead of approximations.
Microsoft's GraphRAG research demonstrated this concretely on news corpora. Using thousands of Russian and Ukrainian news articles, GraphRAG discovered entities like "Novorossiya" and traced relationship chains across documents where baseline vector RAG returned nothing relevant. The difference wasn't retrieval quality on individual documents — it was the ability to connect information that was distributed across documents and linked only through entity relationships.
The architecture is two-stage: an LLM extracts entities and relationships from source documents in the first stage; in the second stage, graph community detection generates domain-specific summaries at different granularities. The retrieval path then uses graph traversal rather than nearest-neighbor search, while still supporting vector and full-text search where those are appropriate.
For entity-heavy domains — healthcare records, legal documents, financial filings, technical specifications — this architecture produces measurable results. A healthcare implementation connecting patient records, research literature, and treatment protocols reported 18% improvement in diagnostic accuracy for complex cases and 31% reduction in treatment plan development time. The improvement came from surfacing relationships that existed in the data but were invisible to semantic similarity.
Hybrid Retrieval: When You Need Both
The practical architecture for most production systems isn't a choice between graphs and vectors — it's running both and merging results.
HybridRAG pipelines operate in three stages:
- Vector search finds semantically relevant entities and document chunks
- Graph traversal explores relationships between the entities the vector search returned
- Weighted merging combines results using a scoring system that accounts for both similarity and graph distance
This addresses the core weakness of each approach in isolation. Pure vector search surfaces relevant individual documents but misses relationships. Pure graph retrieval is precise about relationships but depends entirely on the quality of entity extraction and graph construction. Hybrid retrieval uses the vector component to find entry points into the graph, then uses graph traversal to follow the connections that semantics alone can't surface.
The technology stack for a hybrid pipeline typically involves parallel indexing: a vector index (Pinecone, Qdrant, Weaviate) handles semantic retrieval; a graph database (Neo4j, Amazon Neptune) handles relationship traversal; a BM25 index handles keyword and identifier exact match. The retrieval layer executes all three, merges results with a weighted scorer, and passes the combined context to the generation step.
Research benchmarks on this hybrid approach show +11% context relevance and +8% factual correctness compared to either approach alone. The gains come from different query types playing to each component's strengths.
The Construction Problem
The main reason teams stick with vector search isn't theoretical — it's operational. Building and maintaining a knowledge graph is significantly more expensive than maintaining an embedding index.
Entity resolution is the hardest part. Unresolved and semantically duplicated entities create inconsistent graphs where "JPMorgan" and "JP Morgan" are separate nodes with no connecting edge, making relationship queries fail. Traditional knowledge graph construction is often domain-dependent, semi-automated, and requires predefined entity taxonomies with extensive manual annotation.
LLM-assisted construction has made this substantially more tractable. Tools like Neo4j's LLM Graph Builder can convert unstructured text — PDFs, documents, transcripts — to knowledge graphs using multiple LLM providers without manual annotation. The accuracy limitations (LLMs hallucinate relationships, miss nuanced entity types outside their training distribution) require validation pipelines and human review for high-stakes domains, but the operational cost is now an order of magnitude lower than it was before 2024.
Dynamic updates remain unsolved at scale. Embedding indexes can be updated incrementally by adding new vectors. Knowledge graphs require more complex update logic: new entities must be resolved against existing ones, new relationships must be validated for consistency, and graph structure changes can affect traversal paths in non-obvious ways. For rapidly changing corpora, this is a genuine constraint.
The temporal dimension is also worth calling out explicitly. Vector search has no concept of time — a 2015 document and a 2025 document are equidistant from the query if their embeddings are similar. Hybrid systems with temporal metadata filters handle this by restricting retrieval to documents within specified time windows, but this is bolted on rather than intrinsic to the retrieval primitive.
Decision Framework
The question of when to use which approach comes down to what the query structure actually requires:
Use vector search when:
- Queries are semantic and open-ended ("what documents discuss X?")
- The corpus is unstructured and relationship extraction would be expensive
- Retrieval latency is a hard constraint
- The domain doesn't have well-defined entities with consistent naming
Use knowledge graphs when:
- Queries require multi-hop traversal ("find all X that are related to Y through Z")
- Entity disambiguation across documents is critical
- Aggregation over relationships is needed
- The domain has well-defined entities (healthcare, legal, technical specifications)
- Cross-document reasoning is the primary use case
Use hybrid when:
- The query mix includes both semantic and relational patterns
- Entity-heavy queries must work alongside free-form semantic questions
- The corpus is large enough that neither approach alone covers all retrieval needs
What This Means for Your Architecture
The failure mode to watch for in production is confident retrieval of incorrect context. Vector search returns high-similarity results even when those results don't actually answer the query — the similarity score tells you something about semantic relatedness, not about whether the retrieved content contains the answer. For multi-hop queries, the retrieved documents are often individually relevant but fail to surface the connecting information. The system appears to be working until you run evaluations on the query types that actually matter for your use case.
If your query distribution is dominated by simple semantic questions, vector search is the right default. It's cheaper to build, easier to maintain, and handles the majority of retrieval workloads adequately.
If your query distribution includes multi-hop reasoning, entity disambiguation, or aggregation across relationships — and most enterprise knowledge bases eventually encounter all three — the hybrid architecture is the right investment. Start by instrumenting retrieval failures against real queries to identify which failure mode you're actually hitting. Entity-heavy queries degrading with scale and relationship queries returning irrelevant but semantically similar context are the clearest signals that graph traversal needs to enter the picture.
The underlying principle is that retrieval primitives should match query structure. Semantic similarity is a powerful primitive for a specific class of questions. Relationship traversal is a different primitive for a different class. Getting them confused is the root cause of most retrieval failures that can't be fixed by scaling the embedding model.
- https://calmops.com/algorithms/graphrag-hybrid-retrieval/
- https://dasroot.net/posts/2026/03/graph-rag-vector-search-limitations/
- https://weaviate.io/blog/graph-rag
- https://www.meilisearch.com/blog/graph-rag-vs-vector-rag
- https://www.microsoft.com/en-us/research/project/graphrag/
- https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
- https://dev.to/oozioma/semantic-search-is-an-architecture-problem-5h8l
- https://www.shaped.ai/blog/the-vector-bottleneck-limitations-of-embedding-based-retrieval
- https://arxiv.org/html/2508.21038v1
- https://blog.vespa.ai/vector-search-is-reaching-its-limit/
- https://neo4j.com/blog/genai/knowledge-graph-llm-multi-hop-reasoning/
- https://dl.acm.org/doi/10.1145/3777378
- https://neo4j.com/blog/developer/knowledge-graph-vs-vector-rag/
- https://arxiv.org/html/2502.11371v1
- https://memgraph.com/blog/why-hybridrag
- https://arxiv.org/html/2408.04948v1
