Skip to main content

Data-Sensitivity-Tier Model Routing: Governing Which Model Sees Which Data

· 11 min read
Tian Pan
Software Engineer

Your AI system routed a patient query to a self-hosted model at 9 AM. At 11 AM, that model's pod restarted during a deployment. The request queue backed up, the router detected a timeout, and it fell back to the cloud LLM you use for generic queries. The query completed successfully. No alerts fired. Your monitoring dashboard showed green. Somewhere in that exchange, protected health information traveled to a vendor with whom you have no Business Associate Agreement.

That's not a hypothetical. It's the default behavior of nearly every AI routing stack that wasn't explicitly designed to prevent it.

Most AI routing today is a cost and latency optimization problem. Route easy queries to cheap fast models, hard queries to expensive capable ones, and use a fallback when the primary is unavailable. This logic is correct for a commodity content use case and completely wrong for an enterprise system that handles regulated data.

The missing dimension is data sensitivity. Which model is allowed to see this request matters more than which model is cheapest or fastest for it. Most teams know this in principle and ignore it in practice until a compliance review forces the question.

The Routing Decision Everyone Is Already Making — Just Not Explicitly

Every AI product makes routing decisions, even if those decisions are "always use GPT-4o" or "always use our hosted Llama instance." The difference is whether sensitivity is part of that decision.

Current routing optimizes for three variables: cost per token, latency percentile, and capability level for the query complexity. Tools like OpenRouter, AI gateways, and custom routers expose these knobs well. You can reduce inference costs 20-40% through intelligent provider selection. These are real wins worth pursuing.

But there's a fourth variable that most routers treat as out-of-scope: the privacy classification of the data in the request. Is this a public help center query? An internal employee question that includes HR context? A customer query that surfaces account details? A clinical note with PHI?

The routing decision for each of these is different — not because they need different capability levels, but because they have different permission sets for which models can legally and contractually see them.

What Data Sensitivity Tiers Actually Look Like

Enterprise data classification frameworks converge on three to four tiers:

  • Public: No sensitivity constraints. Generic help center content, product documentation, anonymized FAQ data. Any capable model is fine.
  • Internal: Business context that isn't regulated but isn't meant to leave the organization. Employee names in org chart queries, internal tool documentation, non-regulated financial summaries. Prefer private endpoints; cloud fallback acceptable under most data agreements.
  • Confidential: PII, financial records, business-sensitive communications. GDPR-scope data for EU users. Requires private or contractually-covered endpoints. No uncontrolled cloud fallback.
  • Restricted: PHI under HIPAA, credentials, biometrics, data with sector-specific regulations (PCI, FedRAMP, ITAR). Must run on private or on-premise infrastructure. Explicit vendor contracts required. Failover to uncovered endpoints is a compliance violation, not a degradation.

The classification of a specific piece of data isn't always obvious, and context collapses tiers upward. A name alone is internal. A name plus a diagnosis is PHI-level. A name plus a financial account is confidential. Your classification layer has to reason about combinations, not just individual field presence.

The critical point: classification must propagate forward through the data pipeline. When a source column is tagged as PII, that tag needs to reach the LLM router — not stay behind in the data catalog where it was originally applied.

The Gap Between Classification and Routing

Most organizations that have made any investment in AI governance have two systems that don't talk to each other: a data classification system and an AI routing system.

The data catalog knows which datasets contain sensitive information. Databricks Unity Catalog, Atlan, Alation, and similar tools classify data at rest, propagate tags through data lineage, and enforce access policies for analytics workloads. This is mature, well-understood infrastructure for the data warehouse.

The AI router knows which models are available, their costs, and their latency profiles. It doesn't know anything about what's in the requests it's routing.

The gap between them is where compliance risk accumulates silently. A request arrives at the routing layer carrying PHI. The router evaluates cost and latency and sends it to the cheapest available provider. The classification system never saw the request; the routing system never saw the classification.

Closing this gap requires a classification layer that operates on in-flight requests, not just at-rest datasets. This means PII detection running in the request path — Named Entity Recognition models, regex patterns for known identifier formats, and rules that account for field combinations. The detected sensitivity tier is then passed to the routing decision as a first-class input, alongside cost and capability signals.

What Enforcement Actually Looks Like

Once you have sensitivity-aware routing, you have to decide what happens when the right model isn't available. This is where most implementations make a mistake.

Advisory mode: the router prefers the appropriate tier but falls back to a less restrictive tier if unavailable, and maybe logs a warning. This is the wrong default for anything above public data. Advisory enforcement is identical to no enforcement when the private endpoint goes down at 2 AM on a busy night.

Enforcement mode: requests with confidential or restricted sensitivity fail explicitly if the appropriate model tier is unavailable. The request returns a 503. Nothing is logged as "succeeded." An alert fires. Your on-call team investigates.

This feels wrong when you think about availability. It feels correct when you think about compliance. A HIPAA violation that stays silent isn't less of a violation because your system showed green. The observable event to the compliance auditor is not "did your AI product stay available" but "did PHI travel to an uncovered vendor." Explicit failure is the right response.

Hard enforcement requires that sensitivity constraints be encoded as blocking conditions, not preferences. In practice this looks like:

  • Routing rules that evaluate sensitivity tier as a hard constraint before cost/latency optimization
  • Fallback chains that enumerate only same-tier or higher-privacy options — a restricted-tier request can fall back to a different restricted-tier endpoint (e.g., a secondary on-premise cluster), but never to a confidential-tier or public endpoint
  • Audit logs that capture every routing decision with its sensitivity tier and the model selected — not just errors, but all decisions, so you can demonstrate to auditors which requests went where
  • Explicit configuration distinguishing "this model is preferred for cost reasons" from "this model is the only permitted option for this tier"

Where the Classification Taxonomy Breaks Down

Even a well-designed system hits friction at the edges.

Dynamic content combinations are hard. A request might start as innocuous and become sensitive halfway through an agentic workflow. An agent that starts with a public-tier user intent might pull in customer account data through a tool call, suddenly making the full context restricted-tier. If classification only runs at request ingress, this gets missed. Classification needs to run on tool call responses too, and the sensitivity tier of the overall context needs to ratchet upward when any component is restricted.

Metadata propagation across system boundaries is harder than it sounds. If your classification system lives in the data warehouse and your AI gateway lives in the application layer, getting tags to propagate cleanly requires deliberate integration work that often falls through the cracks of multiple team ownership boundaries. The data team owns the catalog; the platform team owns the gateway; nobody owns the handoff.

The classification model itself can fail. ML-based PII detection misses obfuscated identifiers and catches false positives that inflate costs. Rule-based patterns require maintenance as identifier formats evolve. Neither is reliable enough to be the sole enforcement mechanism, which is why most mature implementations use defense in depth: request-level classification as a signal, contract-level controls at the vendor layer (BAAs, DPAs, ZDR agreements), and data minimization upstream to reduce what reaches the AI layer in the first place.

The Audit Discovery Problem

The reason most teams build sensitivity-tier routing reactively is that the failure mode is invisible until it isn't.

Routing PHI to an uncovered vendor doesn't produce an error. It produces a successful response. The failure is a compliance state, not a system state — and compliance states only become visible during audits, breach investigations, or regulatory inquiries. By that point you're explaining historical decisions that your systems have no record of.

The pattern that recurs in post-audit assessments: an organization discovers they've been routing sensitive data to the wrong tier only after an external audit surfaces a gap, because their monitoring was measuring system health (latency, error rates, model availability) rather than compliance health (which sensitivity tier went to which endpoint, whether any routing decisions violated policy).

The instrumentation investment is small compared to the remediation cost. A routing decision log that captures request timestamp, detected sensitivity tier, model selected, and whether any hard constraints were evaluated is a few hundred bytes per request. That log is also your compliance evidence when an auditor asks whether PHI ever traveled to a non-covered vendor.

Building the Enforcement Stack

A sensitivity-tier routing implementation has three components that need to exist before the routing logic can work:

Classification at the request layer. A PII detection step in the request path that produces a sensitivity tier. This is distinct from, but ideally synchronized with, your data catalog's classification. The request-layer classifier needs to handle unstructured text, not just structured field names.

A routing policy that treats sensitivity as a hard constraint. This is configuration, not code. It looks like: tier=restricted → must use endpoint set {private-llm-cluster-a, private-llm-cluster-b}; if none available → fail with explicit error. Tier=confidential → prefer private endpoints; cloud fallback permitted only if BAA confirmed for that provider. Tier=public → optimize for cost and latency.

Audit logging that captures routing decisions. Every request, with its detected tier and the endpoint selected. Separate from your operational latency/error logs, because it serves a different consumer: the compliance function, not the on-call engineer.

The third piece is consistently where implementations skip corners. Audit log storage is boring infrastructure with no visible ROI until you need it. Build it anyway.

Where the Market Is

The major cloud AI platforms are adding sensitivity-aware routing features, which is a useful signal that the practice is becoming expected. Azure AI Foundry's model router supports policy-driven selection with region and sensitivity constraints. AWS SageMaker Catalog added restricted classification terms that trigger routing to compliance-approved project boundaries. OpenRouter's sovereign routing restricts inference to specific geographic regions for data residency compliance. Vercel's AI Gateway added zero data retention routing that pins requests to providers with ZDR policies.

None of these eliminate the need to design the enforcement logic yourself. They provide the enforcement primitives; you still have to decide which sensitivity tiers map to which model pools, what failover behavior looks like, and how to integrate your existing data classification infrastructure with the request path.

Roughly 40% of enterprises with formal AI governance programs have some form of routing policy. Fewer than 15% integrate sensitivity-tier classification into that routing. The gap is mostly an integration and ownership problem, not a technical capability problem — the tools exist, but no single team owns both halves of the pipeline.

That number will compress quickly once the first wave of post-audit remediation stories becomes public. The question is whether your organization builds this before or after the compliance review that makes it mandatory.

What to Do Now

If you're starting from scratch: build the audit log first. Before you have classification and enforcement, at minimum log which model endpoints are handling requests for which user roles and data contexts. That gives you retroactive defensibility and reveals which flows are actually sensitive in practice before you invest in automated classification.

If you have classification but no routing integration: define the tier-to-endpoint mapping as an explicit policy document before you automate it. Teams that skip this step end up with classification data that routes to the cheapest model regardless, because the gateway was never told what the tiers mean.

If you have routing but no enforcement: flip advisory to hard-fail for your top tier first. The operational disruption of explicit failures for restricted data is a feature, not a bug — it surfaces the cases where private infrastructure isn't reliable enough to be the only option, forcing the conversation about backup capacity before an audit forces it for you.

The silent compliance failure is not a hard problem to prevent. It's a hard problem to prioritize before it's already happened.

References:Let's stay in touch and Follow me for more thoughts and updates