5 posts tagged with "data-residency"

The Data Residency Contract Your Provider Honored at the API Boundary and Broke at the Cache

June 3, 2026 · 9 min read

Software Engineer

Your residency audit traced every outbound request from the tenant's traffic, watched it terminate on a hostname in Frankfurt, and signed off. The audit was correct about everything it measured. It was also looking at the wrong layer. The request went to the EU. The bytes that satisfied the request — the cached prefix the provider hashed and pulled from the nearest available node — lived in us-east-1. Your regional endpoint promised you a destination. The cache promised nothing, because the cache was a different product, governed by a different SLA, designed for cost rather than for compliance.

The customer's auditor caught it. Not yours. A different vendor's incident report mentioned that prompt cache placement was decoupled from inference region, and the customer's GRC team asked the obvious follow-up question: where do our prefixes go? The contract amendment to close the gap took ninety days. The renewal got suspended. The team that wrote the integration had done nothing wrong by the documentation they were handed.

The PII Redactor That Scrubbed the User's Question and Left the Prompt Cache Untouched

June 3, 2026 · 11 min read

Tian Pan

Software Engineer

A customer audit finds eleven months of verbatim user PII sitting in a Redis cluster nobody on the residency team knew existed. No system was compromised. No attacker got in. The data was written there on purpose, by a service the inference team built and named "prompt cache," as a performance optimization. The redactor on the analytics path worked perfectly the entire time. The redactor was simply not on this path.

The breach is real anyway. Under GDPR, retention beyond the contracted thirty days is enough; the data does not need to have leaked to trigger Article 33 notification obligations. The residency team's inventory listed every log, every warehouse, every queue — and missed the cache because the cache was on the inference team's side of the org chart. The privacy boundary that everyone trusted ran straight down the analytics pipeline and stopped at the wall where the LLM stack began.

Retrieval Pipeline Residency: The Embedding That Crossed the Border Your LLM Call Didn't

June 2, 2026 · 9 min read

Tian Pan

Software Engineer

The team that ships "AI for EU customers" usually ships exactly one residency control: an inference endpoint pinned to an EU region. The procurement team gets a DPA, the architecture diagram gets a green checkmark next to "model hosted in Frankfurt," and the launch proceeds. What the diagram doesn't show is that the customer's verbatim query gets vectorized by a US-hosted embedding API on its way to the model, that the vector store the query is matched against has its operational plane in us-east-1, that the rerank model is a third-party SaaS deployed wherever the vendor chose, that the prompt cache is keyed regionally on hits and globally on misses, and that the trace store logging the retrieved chunks has a 30-day retention bucket that replicates cross-region for redundancy.

The inference layer respects residency. The retrieval pipeline doesn't even know it's a participant.

This is the gap where most "GDPR-compliant" RAG deployments fail an audit the team didn't realize was coming. The fix isn't another control on the model call — it's recognizing that data residency is a property of every component the customer's bytes touch, and that the team owning "the LLM" owns at most one of the six surfaces involved.

The Inference Region Your Data Residency Policy Forgot to Pin

June 2, 2026 · 9 min read

Tian Pan

Software Engineer

The compliance audit always starts with the same question and your team always answers it the same way. "Where is customer data processed?" In the EU region, the slide deck says, and the SDK config screenshot confirms it, and the DPA promises it. Then the auditor pulls a sample of last quarter's request logs, joins them to the provider's per-request region header, and the room gets quiet. Something like four percent of EU enterprise prompts were served by a US-region inference node during a forty-minute capacity event the team did not know happened. The cache that holds reusable prefixes was in the global pool. The trace store the support team queries is in us-east. The DPA was a slide deck. The contract was a routing hint.

This is the kind of incident that does not show up in a postmortem because no service degraded. The model returned an answer, the user got a response, the latency graph stayed flat. The thing that broke is a thing the dashboards were never wired to see: the geographic path of the request through the provider's infrastructure. Engineers who would never confuse a us-east-1 URL with "the request actually executed in us-east-1" routinely make that exact mistake at the LLM API layer, because the provider's region parameter looks like the AWS one, behaves like the AWS one in the happy path, and silently degrades to "best effort" the moment the preferred region runs out of GPU.

Multi-Region AI Deployment: Data Residency, Model Parity, and the Latency Tax Nobody Budgets

May 3, 2026 · 10 min read

Tian Pan

Software Engineer

When engineers budget for multi-region AI deployments, they typically account for two variables: infrastructure cost per region and replication overhead. What they consistently underestimate — sometimes catastrophically — are three costs that only appear once you're live: model parity gaps that make your EU cluster produce different outputs than your US cluster, KV cache isolation penalties that make every token in GDPR territory more expensive to generate, and silent compliance violations that trigger when your retry logic routes a French user's data through Virginia.

A German bank spent 14 months deploying a large open-source model on-premises to satisfy GDPR requirements. That's not unusual. What's unusual is that the engineers who proposed the architecture understood the compliance constraint upfront. Most don't until an incident report forces the conversation.

About Tian Pan