Skip to main content

17 posts tagged with "vector-database"

View all tags

The PII Leak in Your RAG Pipeline: Why Your Chatbot Knows Things It Shouldn't

· 10 min read
Tian Pan
Software Engineer

Your new internal chatbot just told an intern the salary bands for the entire engineering department. The HR director didn't configure anything wrong. No one shared a link they shouldn't have. The system just... retrieved it, because the intern asked about "compensation expectations for engineers."

This is the RAG privacy failure mode that most teams don't see coming. It's not a bug in the traditional sense—it's a fundamental mismatch between how retrieval works and how access control is supposed to work.

Knowledge Graph vs. Vector Store: Choosing Your Retrieval Primitive

· 9 min read
Tian Pan
Software Engineer

Most teams stumble into vector stores because they're easy to start with, then discover a category of queries that simply won't work no matter how well they tune chunk size or embedding model. That's not a tuning problem — it's an architectural mismatch. Vector similarity and graph traversal are fundamentally different retrieval mechanisms, and the gap matters more as your queries get harder.

This is not a "use both" post. There are real trade-offs, and getting the choice wrong costs months of engineering time. Here's what the decision actually looks like in practice.

Choosing a Vector Database for Production: What Benchmarks Won't Tell You

· 10 min read
Tian Pan
Software Engineer

When engineers evaluate vector databases, they typically load ANN benchmarks and pick whoever tops the recall-at-10 chart. Three months later, they're filing migration tickets. The benchmarks measured query throughput on a static, perfectly-indexed dataset with a single client. Production looks nothing like that.

This guide covers the five dimensions that predict whether a vector database holds up under real workloads — and a decision framework for matching those dimensions to your stack.

Vector Store Access Control: The Row-Level Security Problem Most RAG Teams Skip

· 11 min read
Tian Pan
Software Engineer

Most teams building multi-tenant RAG systems get authentication right and authorization wrong. They validate that users are who they claim to be, then retrieve documents from a shared vector index and filter the results before sending them to the LLM. That filter—the post-retrieval kind—is security theater. By the time you remove unauthorized documents from the list, they're already in the model's context window.

The real problem runs deeper than a misplaced filter. Most RAG systems treat document authorization as an ingest-time concern ("can this user upload this document?") but fail entirely to enforce it at query time ("can this user see documents matching this query?"). The gap between those two checkpoints is where silent data leakage lives—and it's where most production incidents originate.

The RAG Freshness Problem: How Stale Embeddings Silently Wreck Retrieval Quality

· 12 min read
Tian Pan
Software Engineer

Your RAG system launched three months ago with impressive retrieval accuracy. Today, it's confidently wrong about a third of what users ask — and nothing in your monitoring caught the change. No errors logged. No latency spikes. The semantic similarity scores look healthy. But the documents being retrieved are outdated, and the model answers with full confidence because the retrieved context looks authoritative.

This is the RAG freshness problem: semantic similarity does not care about time. An embedding of a deprecated API reference scores just as high as a current one. A policy document from last quarter retrieves ahead of the updated version. The system doesn't know and can't tell. Most teams discover their index is weeks or months stale only after a user complaint — and by then, users have already quietly stopped trusting it.