Skip to main content

15 posts tagged with "vector-database"

View all tags

Choosing a Vector Database for Production: What Benchmarks Won't Tell You

· 10 min read
Tian Pan
Software Engineer

When engineers evaluate vector databases, they typically load ANN benchmarks and pick whoever tops the recall-at-10 chart. Three months later, they're filing migration tickets. The benchmarks measured query throughput on a static, perfectly-indexed dataset with a single client. Production looks nothing like that.

This guide covers the five dimensions that predict whether a vector database holds up under real workloads — and a decision framework for matching those dimensions to your stack.

Vector Store Access Control: The Row-Level Security Problem Most RAG Teams Skip

· 11 min read
Tian Pan
Software Engineer

Most teams building multi-tenant RAG systems get authentication right and authorization wrong. They validate that users are who they claim to be, then retrieve documents from a shared vector index and filter the results before sending them to the LLM. That filter—the post-retrieval kind—is security theater. By the time you remove unauthorized documents from the list, they're already in the model's context window.

The real problem runs deeper than a misplaced filter. Most RAG systems treat document authorization as an ingest-time concern ("can this user upload this document?") but fail entirely to enforce it at query time ("can this user see documents matching this query?"). The gap between those two checkpoints is where silent data leakage lives—and it's where most production incidents originate.

The RAG Freshness Problem: How Stale Embeddings Silently Wreck Retrieval Quality

· 12 min read
Tian Pan
Software Engineer

Your RAG system launched three months ago with impressive retrieval accuracy. Today, it's confidently wrong about a third of what users ask — and nothing in your monitoring caught the change. No errors logged. No latency spikes. The semantic similarity scores look healthy. But the documents being retrieved are outdated, and the model answers with full confidence because the retrieved context looks authoritative.

This is the RAG freshness problem: semantic similarity does not care about time. An embedding of a deprecated API reference scores just as high as a current one. A policy document from last quarter retrieves ahead of the updated version. The system doesn't know and can't tell. Most teams discover their index is weeks or months stale only after a user complaint — and by then, users have already quietly stopped trusting it.