The Query Rewriting Layer Your RAG Pipeline Skipped
When a RAG system answers wrong, the first instinct on most teams is to blame the encoder. Swap to a bigger embedding model. Try a domain-tuned one. Bump the dimension count. Three sprints later the recall curve has nudged a few points and the user complaints look the same.
The diagnosis was wrong. Most retrieval failures aren't embedding failures. They're query-shape failures — and no amount of vector tuning fixes a vocabulary mismatch that exists before the encoder ever runs.
A user types "how do I cancel." The relevant document is titled "Subscription Lifecycle Management" and uses words like "termination," "billing cycle close," and "service deactivation." There is no encoder in the world that pulls those two strings into the same neighborhood by lexical luck. The cosine similarity gap is real, and it lives in the input, not the model. The query rewriting layer that goes ahead of retrieval is the thing most pipelines skip and then spend a quarter trying to compensate for downstream.
