Skip to main content

3 posts tagged with "build-vs-buy"

View all tags

Build vs Buy for the AI Gateway: The Decision That Locks in Your Next 18 Months

· 11 min read
Tian Pan
Software Engineer

The build-vs-buy decision for an AI gateway is almost never made on a framework. It is made on instinct in week one by an engineer who likes the problem, and then revisited in month nine by a director who is tired of the bill. Neither moment is when the decision should actually be made, and neither party is evaluating the choice on the axes that matter eighteen months from now.

The seductive thing about the build path is that month one is cheap. A two-hundred-line proxy in front of OpenAI, a switch statement that routes "claude" requests to Anthropic, a retry loop, and the team has shipped what looks like a gateway. Month nine, that proxy is twelve thousand lines of half-finished retry logic, prompt caching with broken invalidation, cost attribution that nobody trusts, fallback routing that triggered the wrong way during the last incident, an observability schema that diverged from the rest of the stack, and per-tenant rate limiting bolted on after the first enterprise customer asked. Every feature is a worse copy of something the buy path would have shipped on day one. The engineer who wrote the original two hundred lines has left.

We Already Have That: When AI Features Reinvent Code You Already Own

· 11 min read
Tian Pan
Software Engineer

A team I worked with shipped a "smart" date extractor last quarter. The model parsed natural-language phrases like "next Tuesday" and "two weeks from the 14th," ran in production behind a feature flag, and cost about three cents per request at the chosen tier. Six weeks later, a backend engineer wandered into a design review and mentioned, casually, that the company already had a date parser. It had been written in 2019, lived in a utility module nobody on the AI team had read, handled 99.4% of the same inputs at sub-millisecond latency, and ran for free. The AI feature did not get pulled. It got rationalized — "the model handles the long tail" — and the team moved on, having shipped a more expensive, slower, less accurate version of something the company already owned.

This is not a one-off story. It is the dominant failure mode for AI features inside companies older than the AI team. The pattern repeats: a smart classifier duplicates a regex pipeline written years ago, a retrieval system fetches a vendor list that an internal service has been maintaining as a typed table, an agent learns to extract entities a parser already extracts deterministically. The AI feature ships with a quality bar lower than the deterministic system it didn't know existed, and the team who built the deterministic system finds out at a cross-team meeting.

Build vs Buy for Guardrails: The Moderation API Is Now on Your Safety-Critical Path

· 10 min read
Tian Pan
Software Engineer

The hosted moderation API you bought to ship faster is now a synchronous external dependency on your safety-critical path. That sentence isn't an opinion — it's the architecture diagram, redrawn honestly. On the day the vendor degrades, you have two choices and both of them are bad: fail open and the guardrail is useless precisely when something is probably wrong, or fail closed and a guardrail outage becomes a feature outage. Most teams discover which one they picked during the incident, not before.

The reason teams reach for a vendor here isn't laziness. Building a content classifier, a prompt-injection detector, and a PII redactor in-house looks like a six-month detour from the actual product, and the vendor has a free tier and a five-minute integration. The integration is genuinely fast. The architectural consequence is that a third party now sits in the request path of every user-facing generation, with availability, latency, and behavioral characteristics you don't control and didn't model.

This post is about treating that decision as an architectural one rather than a procurement one.