Skip to main content

Agent-Friendly APIs: What Backend Engineers Get Wrong When AI Becomes the Client

· 11 min read
Tian Pan
Software Engineer

In 2024, automated bot traffic surpassed human traffic on the internet for the first time. Gartner projects that more than 30% of new API demand by 2026 will come from AI agents and LLM tools. And yet only 24% of organizations explicitly design APIs with AI clients in mind.

That gap is where production systems break. Not because the LLMs are bad, but because APIs built for human developers have assumptions baked in that silently fail when an autonomous agent is the caller. The agent can't ask for clarification, can't read a doc site, and can't decide on its own whether a 422 means "fix your request" or "try again in a few seconds."

This post is for the backend engineer who just found out their service is being called by an AI agent — or who is about to build one that will be.

The Core Problem: APIs Are Designed for Humans, Not Machines

When a human developer hits a vague error, they open the documentation, check the status page, and infer from experience what to do next. When an AI agent hits the same error, it has to reason from what's in front of it: the HTTP status code, the response body, and whatever context it accumulated during the task. If those signals are ambiguous, the agent is stuck guessing — and guesses at machine speed.

Consider a Python SDK that calls response.raise_for_status() immediately, before reading the response body. The agent receives: "HTTP Error 422: Unprocessable Entity". What it needed was: "Field 'first_name' required — provide a first_name string". When the SDK was fixed to surface the full error body, the agent immediately self-corrected. With the vague error, it looped. This is not an edge case — it is the default behavior of most HTTP error handling patterns in most languages.

The same logic applies to pagination, authentication flows, rate limiting, and schema design. Every assumption your API makes about the caller having human judgment is a potential failure mode in agentic systems.

Errors Are API Surface, Not Afterthoughts

The single highest-leverage change you can make for AI clients is improving your error responses. Not the happy path — agents follow happy paths fine. It's the error path where agents diverge from human behavior catastrophically.

A good error response for an AI agent needs to be machine-parseable and action-prescribing. The gold standard, popularized by Stripe, structures errors with: a machine-readable code string (not just an HTTP status), a human-readable message, a param field naming the specific input that caused the problem, and a doc_url linking directly to recovery documentation. Extend this with two more fields that agents specifically need: is_retriable (boolean) and retry_after_seconds (integer).

With those fields, an agent's error handling becomes a deterministic decision tree: read is_retriable → if false, stop and escalate; if true, wait retry_after_seconds, retry with the same idempotency key. Without them, the agent has to infer retriability from status codes alone — and the inference is frequently wrong. A 409 Conflict might mean "the resource already exists, don't retry," "there was a concurrent write, retry immediately," or "you have a version conflict, fetch the latest state first." Three completely different recovery paths, one ambiguous code.

RFC 9457 (application/problem+json) standardizes the response envelope. Adopting it costs a day of work and immediately makes your error surface parseable by any agent, SDK, or monitoring system that understands the standard.

Idempotency Keys Are Not Optional for Agentic Callers

AI agents retry tool calls roughly 15-30% of the time, based on production telemetry from multiple agent frameworks. The retries happen for reasons the agent can't always control: timeouts mid-request, transient network errors, model uncertainty about whether a previous call completed, and orchestrator restarts. If your state-mutating endpoints don't support idempotency keys, every retry is a potential duplicate — a duplicate payment, a duplicate email, a duplicate database record.

The implementation pattern is well-established: clients generate a UUID before the first attempt, send it as an Idempotency-Key header on every retry, and servers cache the response for that key for 24 hours, returning the cached result for duplicate submissions. The key detail that trips teams up in production: in-memory caches don't survive pod restarts. Distributed agent workloads require Redis or an equivalent datastore with atomic SET NX semantics, so two pods receiving simultaneous requests with the same key only execute one.

Marking operations by their retry semantics matters too. "Safe to retry" is not the same as "idempotent" — a GET is safe to retry but so is a write with idempotency support. Agents need to know which operations they can safely re-issue without escalating to a human for confirmation.

Pagination That Doesn't Assume a Human Is Watching

Offset-based pagination has a subtle failure mode that human API consumers rarely notice but that agents hit reliably: when records are created or deleted between pages, items shift in the result set. Page 2 can return items that were already on page 1, or skip items that should have appeared. For a human pagination widget, this is an occasional visual glitch. For an agent autonomously traversing a full dataset, it's silent data corruption.

Cursor-based pagination eliminates this class of failure. A cursor encodes a stable position in the result set, unaffected by writes happening concurrently. The required contract for agents is minimal and consistent: every paginated response needs data (the array of results), next_cursor (a string, null when exhausted), and has_more (boolean). With those three fields, an agent can write a single generic pagination loop that works across every endpoint in your API.

The fragility comes from inconsistency. If half your endpoints use cursor pagination and half use offset, agents cannot write a single reusable traversal pattern. Every new endpoint is a bespoke implementation. Standardizing pagination behavior across your whole API surface is worth more than any individual optimization.

Rate Limits as Real-Time State, Not Post-Hoc Punishment

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates