Skip to main content

AI Shadow IT: When Product Teams Build Their Own LLM Proxy

· 11 min read
Tian Pan
Software Engineer

The shadow IT incident your platform team is going to investigate in Q3 already happened in January. It looks like this: a senior engineer on a product team has a launch this month. The platform team's "official" LLM gateway is on the roadmap for "next quarter." So the engineer creates a corporate credit card OpenAI account, drops the API key into a .env file, ships the feature, and hits the public deadline. The launch is a success. Six months later, the FinOps team finds three vendor accounts nobody can attribute, the security team finds prompts containing customer data routed to a region not covered by the data processing agreement, and the platform team discovers the gateway it spent two quarters building has 14% adoption because every team that needed AI shipped without it.

This is not a security failure or a discipline failure. It is a platform-product velocity mismatch, and treating it as anything else guarantees the next gateway you ship will have the same adoption problem.

The 2026 numbers say this is no longer a rare pattern. Surveys of knowledge workers consistently show 75%+ usage of generative AI tools at work, with roughly half engaging in unauthorized AI behaviors that the security team has no visibility into. The State of FinOps 2026 Report puts AI spend as a managed line item for 98% of respondents — up from a fraction of that two years ago. Average enterprise AI budgets have moved from $1.2M annually in 2024 to roughly $7M in 2026 even as per-token costs dropped by orders of magnitude, because the volume scaled faster than the discount. None of those numbers describe a fringe problem. They describe an operating model that hasn't caught up with the demand it's supposed to support.

Why the Side-Channel Wins on the Day It's Built

The platform team's gateway and the product team's .env file are not competing on technical merit on the day the side-channel ships. They are competing on three friction surfaces, and the platform team usually loses on all three.

Time to first token. A product engineer who needs to call an LLM today wants to be calling one inside the next hour. A vendor signup, a credit card, and a copy-pasted SDK example get them there. A platform gateway that requires a JIRA ticket, a security review, a per-team key provisioning workflow, and an internal SDK migration does not get them there in the same hour. By the time the gateway is ready, the side-channel has already been load-tested in production and the team has built around its quirks.

Capability surface. The vendor's native SDK exposes every feature the day the vendor ships it: a new model, a new tool-calling format, a new prompt cache breakpoint, a new embedding endpoint. The platform gateway exposes a curated subset, lagging the vendor by some number of weeks. For a feature that depends on a capability the gateway doesn't proxy yet, the gateway is not a constraint — it's a blocker. The product team correctly observes that the gateway is "not ready" for their use case, and the side-channel is the only path forward.

Failure attribution. When a request through the gateway fails, the engineer now has two systems to blame and two on-call rotations to navigate. When a request through the direct vendor SDK fails, there's one stack trace and one status page. In an incident, the gateway is a layer you have to first prove innocent before you can move on. Engineers respond to that incentive by routing their hottest path around it.

The platform team's instinct is to treat the side-channel as a discipline problem and prohibit it. The side-channel exists because the platform's friction was higher than the alternative's friction at the moment of decision. Prohibition without reducing the friction just moves the side-channel further underground.

What Actually Breaks When the Side-Channel Wins

The damage from a shadow LLM proxy is not abstract. It compounds across four surfaces, and each one shows up to a different team weeks or months after the original decision.

Cost attribution becomes archaeology. The FinOps team gets a vendor invoice with a single line item: $87,000 in monthly token spend, no team tags, no per-feature breakdown, no idea which 12 services account for 90% of the bill. Without per-request attribution, there is no path to chargeback, no way to flag a runaway loop in someone's coding agent, and no way to rank which features have unit economics that work. The standard for enterprise gateways in 2026 is per-request cost attribution with input, output, and reasoning tokens broken out by team, project, customer, and feature — and that data archived to a queryable store. The shadow account ships with none of it.

The audit trail is the application log. Compliance frameworks like SOC 2, GDPR, ISO 27001, and the EU AI Act assume the organization can answer "what prompt and what output, for which user, against which model version, on which date, by whose authority" for any AI interaction. With the EU AI Act's high-risk system enforcement landing in August 2026 and penalties of up to €35M or 7% of global revenue, that question is no longer rhetorical. A direct vendor call from application code stores the prompt-response pair only if the application explicitly logs it, in a format the compliance team has to reverse-engineer to query, with retention controlled by whatever defaults the developer chose at 2 AM on launch night. The gateway, done right, makes that record a side effect of the call.

Data-residency contracts become hopes. The DPA with the vendor specifies which regions are allowed to process which categories of data. A direct call from application code does not enforce that contract — the developer has to manually pick the right endpoint, remember which dataset they're working with, and not regress under deadline pressure. Under GDPR Article 33, sending personal data to an LLM region not covered by a processing agreement is a reportable breach, and the team that finds out is usually the legal team three weeks after the regulator's letter arrives. A gateway can route by data classification automatically; a .env file cannot.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates