Skip to main content

DLP Belongs in Your AI Gateway, Not Bolted Into Every App

· 11 min read
Tian Pan
Software Engineer

The first internal LLM gateway is almost always built for the boring reasons: cost attribution so finance can answer "which team spent the inference budget," rate limiting so one runaway script doesn't burn the monthly quota, provider failover so an OpenAI hiccup doesn't take down the assistant. Data loss prevention shows up on the slide deck, but it ships as "each app team should redact sensitive fields before they call the model." Six months later there are nine apps in production, three half-maintained redaction libraries with subtly different regex sets, two prototypes that bypass the gateway entirely "just for testing," and a customer-data-in-prompt incident that everyone's middleware was supposed to prevent because nobody's middleware was the canonical egress point.

This is not a tooling problem. It is an architectural mistake. DLP is an egress control, and egress controls only work when the path is mandatory. The moment you let app teams own redaction, you've ceded the property that makes DLP function — that there is exactly one place sensitive data can leave, and you can prove what crossed it. The 2025 LayerX Security report puts the scale of the problem in numbers most teams haven't internalized: GenAI-related DLP incidents more than doubled in early 2025 and now make up 14% of all data-security incidents across SaaS traffic, with employees averaging 6.8 pastes into GenAI tools per day, more than half of which contain corporate information. The shadow path is winning by default.

Why "Each App Should Redact" Always Loses

The "redact in the app" pattern has a seductive cleanness on a whiteboard. The team that owns the data knows which fields are sensitive. The redaction logic lives next to the business logic that produced the data. Latency is amortized across calls the app already makes. It feels like the right separation of concerns until you watch what actually happens.

App teams ship a redactor. Three months later a new use case needs a slightly different rule — maybe internal employee names should be allowed but external customer names should not — and somebody adds a flag. Six months later there's a pull request that "temporarily disables redaction for the QA harness so we can reproduce the bug" and never gets reverted. A new app launches that copies the redactor from the older app's repo and freezes it; the older app's redactor evolves; the two drift. A vendor SDK gets added that calls the model directly because it was easier than threading the gateway URL through the SDK's config. None of these decisions is unreasonable in isolation. Each one removes a single brick from the wall.

Compare this with how mature organizations handle other egress-sensitive flows. Outbound email goes through one SMTP relay with DKIM, SPF, and bounce handling enforced at the relay rather than per-app. Database access goes through a connection broker that audits queries. External API calls in regulated environments go through a forward proxy. Nobody seriously argues that "each microservice should do its own DKIM signing." DLP for model traffic is the same shape of problem and deserves the same shape of solution: one egress checkpoint, mandatory traversal, policy enforced where the data actually leaves the trust boundary.

The "DRY for redaction" framing also misses the asymmetry of failure. A bug in cost-attribution shows up as a wrong line on a dashboard the FinOps team will reconcile next month. A bug in DLP shows up as customer data sitting in a third-party model provider's retention store, often surfaced first by a regulator or a customer support escalation, occasionally by the discovery that a pretrained completion suggested another customer's PII as a likely autocomplete. The blast radius justifies the centralization in a way that pure code-reuse arguments never do.

What the Gateway-as-Egress-Checkpoint Actually Owns

A gateway designed for DLP, not retrofitted with it, has four properties that distinguish it from a routing proxy with a regex bolt-on.

Mandatory traversal, including dev and CI. The single hardest org commitment in the whole architecture is that there is no second path. No "dev shortcut" that hits the provider's API directly with a personal key. No vendor SDK whose default base URL is the upstream provider. No notebook on a researcher's laptop that gets a one-off API key for an experiment that becomes a quarterly initiative. Network egress policy enforces this — the firewall blocks model provider domains except through the gateway's egress IP. Identity policy enforces it — provider API keys are issued to the gateway, not to humans. Build policy enforces it — CI environments get gateway credentials, not raw provider credentials. The reason this is the hardest commitment is that it makes engineers' lives marginally less convenient, and somebody senior has to be willing to defend the policy when a frustrated team asks for an exception.

Per-route classifier policies. Not every model call needs the same DLP profile. A customer-support summarization endpoint should redact customer PII before sending to a third-party model and reverse the redaction on the way back. An internal code-review agent might be allowed to send proprietary source but must strip customer database snapshots from any logs. A marketing copy generator should refuse if the input contains any customer field at all. Per-route policy means the gateway exposes named routes (not just "the model API") and each route has a declared classifier configuration, retention policy, and provider allowlist. This is the schema that lets security and product reason about the same artifact.

Structured redaction with reversible vault tokens. The crude version of DLP replaces detected PII with a literal [REDACTED]. This wrecks model output quality, because the model can no longer tell that two [REDACTED] tokens refer to the same person, and it makes responses unusable when the app needs to address the user by name. The 2025 NoPII benchmark of 109 tests found that placeholder masking like [PERSON] or [SSN] dropped output quality to 54-68%, while deterministic tokenization — each entity gets its own unique opaque token like entity_7a3f — preserved 91-96% of quality. The architecture pattern is: detect PII, swap it for a vault token, store the mapping in a short-lived encrypted vault keyed to the request ID, send the tokenized prompt to the provider, then re-hydrate the response on the way back. The provider sees entity_7a3f. The user sees their name. The audit log records that entity_7a3f corresponded to a specific customer record.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates