4 posts tagged with "ai-gateway"

The traceparent header your gateway dropped between LLM call and tool execution

June 3, 2026 · 11 min read

Software Engineer

A user reports that the agent answered correctly but the database update never happened. You open your observability tool, search for the trace ID stamped on the user-facing conversation, and find a clean tree — five LLM calls, four tool decisions, a final response. No errors. Then you search for the tool service that owns the database write, and you find another trace, with the same wall-clock window but a different trace ID, a different root span, and no link back. You search the gateway logs. Three more orphan traces. The agent run that looked like a single coherent interaction in the chat UI fragmented, in your tracing backend, into a forest.

The header that should have stitched it together is traceparent. It is a 55-byte W3C-standard string that every span in a distributed system uses to identify its parent. It is also, in most production LLM agent stacks, dropped at least once between the user's request and the side effect the user actually wanted.

Build vs Buy for the AI Gateway: The Decision That Locks in Your Next 18 Months

May 14, 2026 · 11 min read

Tian Pan

Software Engineer

The build-vs-buy decision for an AI gateway is almost never made on a framework. It is made on instinct in week one by an engineer who likes the problem, and then revisited in month nine by a director who is tired of the bill. Neither moment is when the decision should actually be made, and neither party is evaluating the choice on the axes that matter eighteen months from now.

The seductive thing about the build path is that month one is cheap. A two-hundred-line proxy in front of OpenAI, a switch statement that routes "claude" requests to Anthropic, a retry loop, and the team has shipped what looks like a gateway. Month nine, that proxy is twelve thousand lines of half-finished retry logic, prompt caching with broken invalidation, cost attribution that nobody trusts, fallback routing that triggered the wrong way during the last incident, an observability schema that diverged from the rest of the stack, and per-tenant rate limiting bolted on after the first enterprise customer asked. Every feature is a worse copy of something the buy path would have shipped on day one. The engineer who wrote the original two hundred lines has left.

The AI Gateway Is the SPOF Nobody Named

May 14, 2026 · 10 min read

Tian Pan

Software Engineer

The pitch sounded responsible. "Let's not hardcode OpenAI everywhere — we'll put a thin abstraction in front, then we can swap providers if we need to." Two years later, that thin abstraction is a service with its own deploy pipeline, its own SRE on-call, an eval gate that blocks bad prompts, a semantic cache that saves seven figures a year, a retry policy with provider-specific backoffs, an observability schema every dashboard depends on, and a key vault holding the credentials for six model vendors. Every AI feature in the company terminates there.

It is also, almost by accident, the single point of failure with the worst blast radius in the stack. When the primary LLM provider goes down — and in 2025 OpenAI was tracked having 294 outage events since January, with Anthropic logging 184.5 hours of total customer impact in December alone — the gateway routes around it and most users never notice. When the gateway itself dies, every AI feature in every product simultaneously stops, the failover that was supposed to fire never gets a chance, and the postmortem opens with "the abstraction layer we built to insulate us from provider outages was the outage."

DLP Belongs in Your AI Gateway, Not Bolted Into Every App

April 26, 2026 · 11 min read

Tian Pan

Software Engineer

The first internal LLM gateway is almost always built for the boring reasons: cost attribution so finance can answer "which team spent the inference budget," rate limiting so one runaway script doesn't burn the monthly quota, provider failover so an OpenAI hiccup doesn't take down the assistant. Data loss prevention shows up on the slide deck, but it ships as "each app team should redact sensitive fields before they call the model." Six months later there are nine apps in production, three half-maintained redaction libraries with subtly different regex sets, two prototypes that bypass the gateway entirely "just for testing," and a customer-data-in-prompt incident that everyone's middleware was supposed to prevent because nobody's middleware was the canonical egress point.

This is not a tooling problem. It is an architectural mistake. DLP is an egress control, and egress controls only work when the path is mandatory. The moment you let app teams own redaction, you've ceded the property that makes DLP function — that there is exactly one place sensitive data can leave, and you can prove what crossed it. The 2025 LayerX Security report puts the scale of the problem in numbers most teams haven't internalized: GenAI-related DLP incidents more than doubled in early 2025 and now make up 14% of all data-security incidents across SaaS traffic, with employees averaging 6.8 pastes into GenAI tools per day, more than half of which contain corporate information. The shadow path is winning by default.

About Tian Pan