Skip to main content

25 posts tagged with "api-design"

View all tags

The Finish Reason Your Code Never Inspects

· 10 min read
Tian Pan
Software Engineer

Your handler did everything right. The HTTP status was 200. The body parsed. The text field had characters in it. You incremented responses_succeeded, appended the message to the conversation, returned the JSON down to the client, and moved on. The user got a sentence that ended mid-clause, a redacted answer dressed up as a normal one, or a polite refusal phrased as a completion. Your dashboard does not know any of that happened. The provider told you. You did not read the field.

Every major inference API returns a stop signal alongside the text: OpenAI calls it finish_reason, Anthropic calls it stop_reason, Gemini calls it finishReason. The field is small. It is one enum value per response. It is also the only out-of-band channel the model has for telling you whether the response you just shipped is the answer or a fragment of one. Treating it as cosmetic is the same shape of bug as ignoring HTTP status codes — except your monitoring caught the HTTP one a decade ago and has no opinion about this one.

The Rate Limit You Set for Humans an Agent Saturates in Three Seconds

· 10 min read
Tian Pan
Software Engineer

The rate limit was never a fairness primitive. It was a sales-engineering quote that grew up — a number a solutions engineer typed into a docs page during onboarding three years ago, copied into a tier definition, and never revisited because no one ever hit it. The limit said "100 requests per minute" and it meant "more than any sane integration will ever need," because every integration on the platform was a backend service driven by a human at a keyboard, and humans do not type a hundred times a minute.

Then a paying tenant pointed an agent at the endpoint. The agent did not type. It did not pause to read responses. It did not have a UI to render between requests. It executed a planning loop that called the API once per reasoning step, and one reasoning step took the model about thirty milliseconds of wall time to formulate. The agent hit the per-minute ceiling in three seconds, the per-hour ceiling in three minutes, and the daily quota before the on-call engineer's coffee had cooled. The support escalation landed before the throttle dashboard had updated.

The Pointer Your Agent Mistook for a Value: Reference vs Value in Tool Outputs

· 11 min read
Tian Pan
Software Engineer

A search tool returns ten document IDs. An asset tool returns an S3 presigned URL. A database tool returns a row handle. A file tool returns a path. Each of those returns is, formally, a pointer — a small string that names a value the agent does not yet possess. The model's downstream behavior depends entirely on whether it knows that and dereferences before reasoning, or whether it treats the pointer as if it were already the thing.

The failure mode is invisible from the trace. The tool call succeeded. The return is well-formed. The model emitted plausible-looking output. Nothing in the log says "the agent reasoned about a filename and called it a document." The pointer-vs-value confusion sits underneath the visible behavior, in a layer your tool schema never named.

The Degradation Signals Your Agent Never Receives

· 9 min read
Tian Pan
Software Engineer

When a downstream API starts to wobble, a human operator finds out a dozen ways before anything actually breaks. The status page flips to yellow. A changelog email lands in the inbox. A warning banner appears in the provider's dashboard. The on-call channel lights up with a 429 someone spotted in the logs. A teammate posts "anyone else seeing slow writes?" None of these are responses to a request. They are the ambient operational signal that surrounds the API, and a human absorbs it almost passively.

An agent calling the same API receives exactly one thing: the response to the request it just made. Status code, headers, body. That is the entire channel. It has no inbox, no dashboard, no Slack, no peripheral vision. It cannot notice that the last ten calls each took twice as long as the ten before. It cannot read the status page, because nobody handed it the URL and it has no standing instruction to look. When the dependency degrades, the agent is the last party in the system to find out — and it usually finds out by failing.

This asymmetry is not a model capability problem. A smarter model does not fix it. The agent is blind to operational signals because the plumbing never delivers them, and most agent stacks ship without anyone noticing the plumbing is missing.

The Idempotency Key Your Agent Never Sent

· 11 min read
Tian Pan
Software Engineer

A customer once got refunded three times for a single return. Not because the model hallucinated a policy, not because a human fat-fingered a form — because the refund tool timed out twice, the agent retried both times, and every retry carried a fresh request with no way for the payment processor to know it had seen this work before. Three clean HTTP 200s. Three real movements of money. The agent did exactly what it was told: when a call fails, try again.

The bug was not in the model. The bug was in a header that was never sent.

Retrying is the single most natural thing an agent does. A tool call returns an error, or worse, returns nothing at all, and the loop's instinct — encoded in the framework, the prompt, or the model's own training — is to try the action again. That instinct is correct for reads and catastrophic for writes. The difference between a resilient agent and one that double-charges customers is not intelligence. It is whether every state-changing tool call carries an idempotency key, and whether the system on the other end actually honors it.

Your Internal API Became a Public API the Day an Agent Called It

· 10 min read
Tian Pan
Software Engineer

Internal APIs survive on a quiet arrangement: nobody writes the contract down because everybody already knows it. The fields that happen to be there, the error you throw that a caller secretly parses, the endpoint that returns 200 with an empty list instead of 404 — these are load-bearing behaviors held together by the fact that you can name every caller and Slack them before you change anything. That arrangement works right up until it doesn't.

It stops working the day you wire an agent to that API. Not because the agent is malicious or careless, but because the agent is a caller you cannot reach. It has no Slack handle. It did not read your migration note. It depends on response shapes it absorbed from an example payload or a schema snapshot, and it will keep depending on them long after you've moved on.

The uncomfortable truth is that "internal" was never a property of the API. It was a property of the caller list. Shorten that list to people you know and the API is internal; add one participant you can't coordinate with and the API is public — with all the discipline that word implies, and none of the infrastructure you'd have built if you'd known.

The Tool Schema You Changed Without Telling the Agent

· 11 min read
Tian Pan
Software Engineer

A backend engineer renames a field. user_id becomes customer_id, because the team finally standardized on the word "customer" across services. They add one more argument, region, because billing now needs it. The change ships behind a normal pull request with two approvals. Every downstream service that calls the endpoint gets updated in the same release. The integration tests are green. By every measure a backend team uses, this is a routine, well-executed API change.

A week later, support tickets start climbing. The agent that places orders is occasionally placing them with no customer attached, or attaching them to the wrong region. Nobody changed the agent. Nobody changed the prompt. The model is the same version it was last month. And yet the agent is now wrong in a way it was not wrong before.

The cause is not a bug in the model and not a bug in the backend. It is that the tool schema has two consumers, and only one of them was in the room when the change was reviewed.

You Can't Email a Changelog to a Model: Why API Deprecation Breaks When the Caller Is an LLM

· 10 min read
Tian Pan
Software Engineer

API deprecation is a communication protocol that assumes the receiver can read. You publish a changelog, send an email to registered developers, add a Deprecation header, give six months of notice, and trust that a human on the other end will see the warning, file a ticket, and migrate before the sunset date. That entire workflow quietly stopped working the moment your most active caller became a language model.

An LLM does not subscribe to your developer newsletter. It does not have a Slack channel where someone pastes your migration guide. It rediscovers your API on every single call — from a tool description it was handed, a documentation page that may be eighteen months stale, or a memory of how your API looked in its training data. There is no persistent client you can version, notify, or page. Each request is a fresh negotiation with an entity that has no memory of your last announcement and no obligation to read your next one.

This is not a hypothetical. As agents become the dominant consumers of internal and external APIs, the deprecation playbook every backend team has used for fifteen years is failing in a specific, diagnosable way — and most teams discover it only when a "deprecated for six months" endpoint is still serving an agent in production with no path to make it stop.

Hyrum's Law for Streamed Reasoning: Pacing, Pauses, and Intermediate Tokens Are an Undocumented Contract

· 11 min read
Tian Pan
Software Engineer

A team upgrades from a frontier model to its faster successor. The eval suite is green. Final answers match. Tool-call schemas are identical. The structured outputs validate against the same JSON schema they always did. They ship. Within a day, support tickets pile up: "the assistant feels rushed," "it's not really thinking anymore," "something is off." The product manager pulls telemetry and finds task-completion rates unchanged. The engineering team double-checks the eval and the schema and finds nothing wrong. The complaint is real, but the contract — as the team defined it — is intact.

What changed is the texture of the stream. The old model paused for 800 milliseconds before calling a tool, emitted a "Let me check that..." preamble, and dribbled tokens at roughly 35 per second with natural-feeling clusters around clause boundaries. The new model emits tokens at 90 per second, never pauses, and skips the preamble entirely. None of that was in any documented contract. All of it was load-bearing.

This is Hyrum's law, and streaming makes its surface area enormous. Any observable behavior of your system will be depended on by somebody — and a streaming AI surface exposes far more observable behavior than the team realizes.

API Documentation Is Reliability Infrastructure: How Your Docs Determine Agent Success Rates

· 10 min read
Tian Pan
Software Engineer

Most engineering teams think of API documentation as a developer experience concern — something you improve to reduce support tickets and onboarding time. That framing made sense when your primary consumer was a human reading docs in a browser. It is no longer adequate.

When an AI agent calls your API via tool use, your documentation stops being a guide and becomes runtime behavior. A vague parameter description isn't a UX inconvenience — it is a direct instruction to the model that produces hallucinated values. A missing error code isn't a gap in your reference docs — it is an ambiguous signal that can send an agent into a retry loop with no exit condition. The documentation you wrote three years ago for a human audience is now being parsed by a stateless language model that will execute confidently regardless of whether it understood correctly.

Agent as User: Why Your Product Analytics Break When Bots Become Your Power Users

· 10 min read
Tian Pan
Software Engineer

Automated internet traffic grew 23.5% year-over-year in 2025 — eight times faster than human traffic. Agent-driven interactions alone grew 7,851%. If you're building a product that handles any meaningful volume of API traffic, there's a reasonable chance your heaviest "users" are not human. The uncomfortable truth is that your product analytics almost certainly have no idea.

This isn't a bot detection problem. It's an instrumentation architecture problem. When an AI agent books travel, files expense reports, queries your database, or calls your payment API, it leaves a completely different behavioral signature than a human doing the same thing — and your session funnels, NPS surveys, and cohort retention charts are quietly telling you lies.

AI-Native API Design: Building Backends That Agents Can Actually Use

· 10 min read
Tian Pan
Software Engineer

Your REST API works fine. Documentation is thorough. Error codes are consistent. Every human-authored client you've ever tested handles it well. Then your team integrates an AI agent and within an hour it's generated 2,000 failed requests by retrying variations of an endpoint that doesn't exist — bulk_search_users, search_all_users, bulk_user_search — each attempt triggering real downstream processing.

This isn't a prompt engineering failure. It's an API design failure.

REST APIs were built for clients that parse documentation, respect contracts, and call exactly what's specified. AI agents are different: they reason about what an endpoint probably does based on names and descriptions, retry without tracking state, and treat error messages as instructions rather than diagnostic codes. Designing an API for an agentic caller requires rethinking assumptions that most backend engineers have never had to question.