Cross-Encoder Reranking in Practice: What Cosine Similarity Misses
Your RAG pipeline retrieves the top 10 documents and your LLM still gives a wrong answer. You increase the retrieval count to 50. Still wrong. The frustrating part: the correct document was in your vector store the whole time—it was just ranked 23rd. This is not a recall problem. It's a ranking problem, and cosine similarity is the culprit.
Vector search does a decent job of finding semantically adjacent content. But "semantically adjacent" and "most useful for this specific query" are not the same thing. Cosine similarity measures the angle between two vectors in embedding space, and that angle only captures a coarse notion of topical proximity. What it cannot capture is the fine-grained interaction between the specific words in your query and the specific words in a document—the difference between "how to prevent buffer overflows" and "buffer overflow exploit techniques" is subtle at the vector level but critical for your retrieval system.
