Skip to main content

12 posts tagged with "api-design"

View all tags

Prompt Deprecation Contracts: Why a Wording Cleanup Is a Breaking Change

· 9 min read
Tian Pan
Software Engineer

A four-word edit on a system prompt — "respond using clean JSON" replacing "output strictly valid JSON" — once produced no eval movement, shipped on a Thursday, and was rolled back at 4am Friday after structured-output error rates went from 0.3% to 11%. The prompt did not get worse. It got different, and the parsers downstream of it had been pinned, without anyone noticing, to the literal phrase "strictly valid."

This is the failure mode that most prompt-engineering teams have not yet built tooling for: the prompt was treated as text the author owned, when it was in fact a contract with consumers the author never met. Some of those consumers are other prompts that quote the original verbatim. Some are tool descriptions whose JSON schema fields anchor on a particular adjective. Some are evals whose rubrics ask the judge to check for "the strictly valid format." And some are parsers — the most brittle category — whose regexes were calibrated to the exact preamble the model used to emit.

A "small wording cleanup" silently breaks parsers, shifts judge calibration, and invalidates weeks of eval runs. None of these failures show up on the PR. All of them show up on the dashboard a week later as drift.

Agent Traffic Is Not Human Traffic: Designing APIs for Two Species of Caller

· 11 min read
Tian Pan
Software Engineer

The API you shipped two years ago was designed for a single species of caller: a person, behind a browser or a mobile client, clicking once and waiting for a response. That assumption is now wrong on roughly half of every interesting endpoint. The other half of the traffic is agents — your own, your customers', third-party integrations using your endpoints as tools — and they have different physics. They burst. They retry forever. They parallelize. They parse error strings literally. They act on behalf of a human who will not be available to clarify intent when something breaks.

Most of the production weirdness landing in postmortems this year traces back to one architectural mistake: treating both species as the same caller class. Rate limits sized for human pacing get blown apart by an agent's parallel fanout. Error messages designed to be human-readable get parsed wrong by an agent that retries forever on a 400. Idempotency assumptions that humans satisfy by default get violated when an agent retries the same payload from a recovered checkpoint. Auth logs lose the ability to distinguish "the user did this" from "the user's agent did this on the user's behalf."

The fix is not a smarter WAF or a bigger rate-limit bucket. It is a deliberate API design that names two caller classes, treats their traffic as different shapes, and records the delegation chain so accountability survives the indirection.

Conversation State Is Not a Chat Array: Multi-Turn Session Design for Production

· 10 min read
Tian Pan
Software Engineer

Most multi-turn LLM applications store conversation history as an array of messages. It works fine in demos. It breaks in production in ways that take days to diagnose because the failures look like model problems, not infrastructure problems.

A user disconnects mid-conversation and reconnects to a different server instance—session gone. An agent reaches turn 47 in a complex task and the payload quietly exceeds the context window—no error, just wrong answers. A product manager asks "can we let users try a different approach from step 3?"—and the engineering answer is "no, not with how we built this." These are not edge cases. They are the predictable consequences of treating conversation state as a transient array rather than a first-class resource.

The ORM Impedance Mismatch for AI Agents: Why Your Data Layer Is the Real Bottleneck

· 9 min read
Tian Pan
Software Engineer

Most teams building AI agents spend weeks tuning prompts and evals, benchmarking model choices, and tweaking temperature — while their actual bottleneck sits one layer below: the data access layer that was designed for human developers, not agents.

The mismatch isn't subtle. ORMs like Hibernate, SQLAlchemy, and Prisma, combined with REST APIs that return paginated, single-entity responses, produce data access patterns exactly wrong for autonomous AI agents. The result is token waste, rate limit failures, cascading N+1 database queries, and agents that hallucinate simply because they can't afford to load the context they need.

This post is about the structural problem — and what an agent-optimized data layer actually looks like.

AI-Native API Design: Why REST Breaks When Your Backend Thinks Probabilistically

· 11 min read
Tian Pan
Software Engineer

Most backend engineers can recite the REST contract from memory: client sends a request, server processes it, server returns a status code and body. A 200 means success. A 4xx means the client did something wrong. A 5xx means the server broke. The response is deterministic, the timeout is predictable, and idempotency keys guarantee safe retries.

LLM backends violate every one of those assumptions. A 200 OK can mean your model hallucinated the entire response. A successful request can take twelve minutes instead of twelve milliseconds. Two identical requests with identical parameters will return different results. And if your server times out mid-inference, you have no idea whether the model finished or not.

Teams that bolt LLMs onto conventional REST APIs end up with a graveyard of hacks: timeouts that kill live agent tasks, clients that treat hallucinated 200s as success, retry logic that charges a user's credit card three times because idempotency keys weren't designed for probabilistic operations. This post walks through where the mismatch bites hardest and what the interface patterns that actually hold up in production look like.

API Design for AI-Powered Endpoints: Versioning the Unpredictable

· 8 min read
Tian Pan
Software Engineer

Your /v1/summarize endpoint worked perfectly for eighteen months. Then you upgraded the underlying model. The output format didn't change. The JSON schema was identical. But your downstream consumers started filing bugs: the summaries were "too casual," the bullet points were "weirdly specific," the refusals on edge cases were "different." Nothing broke in the traditional sense. Everything broke in the AI sense.

This is the versioning problem that REST and GraphQL were never designed to solve. Traditional API contracts assume determinism: the same input always produces the same output. An AI endpoint's contract is probabilistic — it includes tone, reasoning style, output length distribution, and refusal thresholds, all of which can drift when you swap or update the underlying model. The techniques that work for database-backed APIs are necessary but not sufficient for AI-backed ones.

Behavioral SLAs for AI-Powered APIs: Writing Contracts for Non-Deterministic Outputs

· 10 min read
Tian Pan
Software Engineer

Your payment service has a 99.9% uptime SLA. Requests either succeed or fail with a documented error code. When something breaks, you know exactly what broke.

Now imagine you've shipped a smart invoice-parsing API that wraps an LLM. One Monday morning, your largest customer calls: "Your API returned a valid JSON object, but the total_amount field is off by a factor of ten on invoices with foreign currencies." Your service returned HTTP 200. Your uptime dashboard is green. By every traditional SLA metric, you didn't break anything. But you absolutely broke something — and you have no contractual language to even describe what went wrong.

This is the gap at the center of most AI API deployments today. The contract that governs what your API promises was written for deterministic systems, and LLMs are not deterministic systems.

What Semantic Versioning Actually Means for AI Agents

· 10 min read
Tian Pan
Software Engineer

Your customer service agent has been running reliably for three months. A routine model update rolls in on a Tuesday. By Wednesday afternoon, three downstream services are silently parsing the wrong fields from the agent's responses—the JSON keys shifted subtly but nothing returned an error. By Thursday you've traced a drop in order completions to a JSON field renamed from "status" to "current_state". The model updated, the agent stayed at v2.1.0, and nobody got paged.

This is the versioning gap that nobody in traditional API design had to solve. Semver works when you can deterministically reproduce outputs from a specification. AI agents can't make that promise. Yet downstream services depend on their behavior just as critically as they depend on any microservice API. The gap between "we tagged a release" and "downstream consumers are protected" has never been wider.

Documenting Probabilistic Features: The Missing Layer Between Model Behavior and Developer Onboarding

· 10 min read
Tian Pan
Software Engineer

Your documentation says the /summarize endpoint returns a concise summary. That is true. It returns a different concise summary every time, sometimes misses a key point, occasionally returns structured JSON when you forgot to specify format in the prompt, and degrades silently after a model update you didn't know happened. None of this appears in the docs.

Traditional API documentation captures contracts: given input X, expect output Y. AI-powered features break that model at its foundation. There is no stable contract to document. The same prompt, same model, same parameters — different output. And yet teams ship these features with the same style of documentation they'd write for a database query: a function signature, a return type, maybe a sentence about error codes.

The gap between what your docs say and what your feature actually does is where developer trust goes to die.

The Enterprise API Impedance Mismatch: Why Your AI Agent Wastes 60% of Its Tokens Before Doing Anything Useful

· 8 min read
Tian Pan
Software Engineer

Your AI agent is brilliant at reasoning, planning, and generating natural language. Then you point it at your enterprise SAP endpoint and it spends 4,000 tokens trying to understand a SOAP envelope. Welcome to the impedance mismatch — the quiet tax that turns every enterprise AI integration into a token bonfire.

The mismatch isn't just about XML versus JSON. It's a fundamental collision between how LLMs think — natural language, flat key-value structures, concise context — and how enterprise systems communicate: deeply nested schemas, implementation-specific naming, pagination cursors, and decades of accumulated protocol conventions. Unlike a human developer who reads WSDL documentation once and moves on, your agent re-parses that complexity on every single invocation.

LLM Output as API Contract: Versioning Structured Responses for Downstream Consumers

· 10 min read
Tian Pan
Software Engineer

In 2023, a team at Stanford and UC Berkeley ran a controlled experiment: they submitted the same prompt to GPT-4 in March and again in June. The task was elementary — identify whether a number is prime. In March, GPT-4 was right 84% of the time. By June, using the exact same API endpoint and the exact same model alias, accuracy had fallen to 51%. No changelog. No notice. No breaking change in the traditional sense.

That experiment crystallized a problem every team deploying LLMs in multi-service architectures eventually hits: model aliases are not stable contracts. When your downstream payment processor, recommendation engine, or compliance system depends on structured JSON from an LLM, you've created an implicit API contract — and implicit contracts break silently.

Agent-Friendly APIs: What Backend Engineers Get Wrong When AI Becomes the Client

· 11 min read
Tian Pan
Software Engineer

In 2024, automated bot traffic surpassed human traffic on the internet for the first time. Gartner projects that more than 30% of new API demand by 2026 will come from AI agents and LLM tools. And yet only 24% of organizations explicitly design APIs with AI clients in mind.

That gap is where production systems break. Not because the LLMs are bad, but because APIs built for human developers have assumptions baked in that silently fail when an autonomous agent is the caller. The agent can't ask for clarification, can't read a doc site, and can't decide on its own whether a 422 means "fix your request" or "try again in a few seconds."

This post is for the backend engineer who just found out their service is being called by an AI agent — or who is about to build one that will be.