Agent-Friendly APIs: What Backend Engineers Get Wrong When AI Becomes the Client
In 2024, automated bot traffic surpassed human traffic on the internet for the first time. Gartner projects that more than 30% of new API demand by 2026 will come from AI agents and LLM tools. And yet only 24% of organizations explicitly design APIs with AI clients in mind.
That gap is where production systems break. Not because the LLMs are bad, but because APIs built for human developers have assumptions baked in that silently fail when an autonomous agent is the caller. The agent can't ask for clarification, can't read a doc site, and can't decide on its own whether a 422 means "fix your request" or "try again in a few seconds."
This post is for the backend engineer who just found out their service is being called by an AI agent — or who is about to build one that will be.
The Core Problem: APIs Are Designed for Humans, Not Machines
When a human developer hits a vague error, they open the documentation, check the status page, and infer from experience what to do next. When an AI agent hits the same error, it has to reason from what's in front of it: the HTTP status code, the response body, and whatever context it accumulated during the task. If those signals are ambiguous, the agent is stuck guessing — and guesses at machine speed.
Consider a Python SDK that calls response.raise_for_status() immediately, before reading the response body. The agent receives: "HTTP Error 422: Unprocessable Entity". What it needed was: "Field 'first_name' required — provide a first_name string". When the SDK was fixed to surface the full error body, the agent immediately self-corrected. With the vague error, it looped. This is not an edge case — it is the default behavior of most HTTP error handling patterns in most languages.
The same logic applies to pagination, authentication flows, rate limiting, and schema design. Every assumption your API makes about the caller having human judgment is a potential failure mode in agentic systems.
Errors Are API Surface, Not Afterthoughts
The single highest-leverage change you can make for AI clients is improving your error responses. Not the happy path — agents follow happy paths fine. It's the error path where agents diverge from human behavior catastrophically.
A good error response for an AI agent needs to be machine-parseable and action-prescribing. The gold standard, popularized by Stripe, structures errors with: a machine-readable code string (not just an HTTP status), a human-readable message, a param field naming the specific input that caused the problem, and a doc_url linking directly to recovery documentation. Extend this with two more fields that agents specifically need: is_retriable (boolean) and retry_after_seconds (integer).
With those fields, an agent's error handling becomes a deterministic decision tree: read is_retriable → if false, stop and escalate; if true, wait retry_after_seconds, retry with the same idempotency key. Without them, the agent has to infer retriability from status codes alone — and the inference is frequently wrong. A 409 Conflict might mean "the resource already exists, don't retry," "there was a concurrent write, retry immediately," or "you have a version conflict, fetch the latest state first." Three completely different recovery paths, one ambiguous code.
RFC 9457 (application/problem+json) standardizes the response envelope. Adopting it costs a day of work and immediately makes your error surface parseable by any agent, SDK, or monitoring system that understands the standard.
Idempotency Keys Are Not Optional for Agentic Callers
AI agents retry tool calls roughly 15-30% of the time, based on production telemetry from multiple agent frameworks. The retries happen for reasons the agent can't always control: timeouts mid-request, transient network errors, model uncertainty about whether a previous call completed, and orchestrator restarts. If your state-mutating endpoints don't support idempotency keys, every retry is a potential duplicate — a duplicate payment, a duplicate email, a duplicate database record.
The implementation pattern is well-established: clients generate a UUID before the first attempt, send it as an Idempotency-Key header on every retry, and servers cache the response for that key for 24 hours, returning the cached result for duplicate submissions. The key detail that trips teams up in production: in-memory caches don't survive pod restarts. Distributed agent workloads require Redis or an equivalent datastore with atomic SET NX semantics, so two pods receiving simultaneous requests with the same key only execute one.
Marking operations by their retry semantics matters too. "Safe to retry" is not the same as "idempotent" — a GET is safe to retry but so is a write with idempotency support. Agents need to know which operations they can safely re-issue without escalating to a human for confirmation.
Pagination That Doesn't Assume a Human Is Watching
Offset-based pagination has a subtle failure mode that human API consumers rarely notice but that agents hit reliably: when records are created or deleted between pages, items shift in the result set. Page 2 can return items that were already on page 1, or skip items that should have appeared. For a human pagination widget, this is an occasional visual glitch. For an agent autonomously traversing a full dataset, it's silent data corruption.
Cursor-based pagination eliminates this class of failure. A cursor encodes a stable position in the result set, unaffected by writes happening concurrently. The required contract for agents is minimal and consistent: every paginated response needs data (the array of results), next_cursor (a string, null when exhausted), and has_more (boolean). With those three fields, an agent can write a single generic pagination loop that works across every endpoint in your API.
The fragility comes from inconsistency. If half your endpoints use cursor pagination and half use offset, agents cannot write a single reusable traversal pattern. Every new endpoint is a bespoke implementation. Standardizing pagination behavior across your whole API surface is worth more than any individual optimization.
Rate Limits as Real-Time State, Not Post-Hoc Punishment
Humans check rate limit status on dashboards after they've been throttled. Agents cannot. The only way an agent knows it's approaching a rate limit is if you tell it on every response — not just on the 429 that fires when it's too late.
Return X-RateLimit-Remaining, X-RateLimit-Limit, and X-RateLimit-Reset on every response, not just 429s. An agent reading these headers can throttle itself proactively, adding delays when Remaining drops below a threshold, before it triggers a wall that blocks its entire workflow.
When a 429 does occur, Retry-After must be a number of seconds, not prose. "Please try again in a few minutes" is unactionable. Retry-After: 47 tells the agent exactly when to retry. Include the same value in a structured JSON body field (retry_after_seconds) alongside the current quota state, so agents that don't parse headers can still recover deterministically.
Multi-agent systems introduce a coordination problem that single-agent rate limiting can't solve. In a documented production incident, nine agents sharing one API quota burned through 5,000 requests/hour in under eight minutes because of synchronized retries. A background polling agent (low priority) was blocking a critical architecture agent (high priority) — priority inversion at the quota layer. The thundering herd from synchronized 429 retries caused a chain of 60+ failures and a 90-second collapse into full API lockout.
The mitigation is not purely on the API side. Agent frameworks need jitter in their retry logic, priority-tiered access to shared quota pools, and proactive throttling based on live rate limit headers. But API design can help: return enough state in 429 responses that an agent framework can implement backpressure without needing a separate quota tracking service.
OpenAPI Descriptions Are Semantic Routing Signals
When an agent is deciding which tool to call, it reads your OpenAPI descriptions. The model is not looking at your schema structure — it's reading your natural language descriptions to match intent to capability. A description that says "Get invoices" gives an agent almost no signal. A description that says "Returns a paginated list of invoices filtered by status, customer_id, and date range, sorted by created_at descending. Requires billing:read scope." tells the agent when to call this endpoint, what parameters to prepare, and whether it has the right authorization.
Every parameter needs semantic description too. Not just its type, but what valid values mean, what the format is (ISO 8601? UUID? comma-separated list?), and when an optional parameter should be included. Every return field needs to specify what it represents and under what conditions it's null. Enums need descriptions for each member.
This is tedious to write and almost nobody does it. It's also the primary reason agents call the wrong tool or pass incorrect parameters — not because the model failed, but because the description was too thin for the model to distinguish between two similar endpoints.
Anthropic's internal data on this is instructive: providing input_examples in tool definitions (showing minimal, partial, and full parameter usage) improved accuracy on complex parameter handling from 72% to 90%. The schema defines structure; examples capture behavioral patterns that structure cannot express, like "include the escalation field only when status is 'critical'."
Machine-to-Machine Authentication
OAuth authorization code flow requires a browser. Agents don't have browsers. If your only supported authentication method involves a redirect to a login screen, agents cannot integrate with your API at all, regardless of how well-designed everything else is.
The viable authentication patterns for agents are: API keys (bearer tokens, simple and stateless), OAuth client credentials flow (machine-to-machine, no user interaction required), and mutual TLS. If you support only interactive OAuth, you've excluded the entire agentic caller population.
The related failure mode is over-scoped credentials. In a documented incident, an agent was given a privileged token with read access to all repositories in an organization. A malicious prompt in a public GitHub issue caused the agent to exfiltrate private repository contents into a public pull request. The blast radius of any prompt injection attack is bounded by the permissions of the agent's credentials. Agents should receive the minimum scoped, revocable credentials needed for their specific task — the same principle as least-privilege service accounts, applied to autonomous systems.
The Discovery Layer: llms.txt
Cloudflare noted in production that exposing thousands of REST endpoints to an agent is simply unworkable — the descriptions alone fill the agent's context window before it can reason about which endpoint to call. The llms.txt convention, proposed in September 2024 and now adopted by over 844,000 websites including Stripe and Anthropic, addresses this at the discovery level.
A /llms.txt file at your domain root is a Markdown document that curates the most important information for LLM agents: the preferred integration patterns, the deprecated endpoints to avoid, the authentication setup path, and explicit "Instructions for Large Language Model Agents" that surface footguns not obvious from the OpenAPI spec alone. It's the API equivalent of a good README — one afternoon of writing that eliminates entire categories of integration mistakes.
It won't replace thorough OpenAPI documentation. But it dramatically reduces the context cost of an agent understanding your API surface, and it's the right place to document agent-specific quirks (like "our GraphQL endpoint always returns 200, check the errors array in the response body") that don't fit neatly anywhere else.
Designing for the Client You'll Actually Have
The Stytch engineering team framed this well: "If an AI agent can't figure out how your API works, neither can your users." Agent-readability is a leading indicator of overall API quality, because agents expose ambiguity that humans route around through experience and documentation lookups.
The concrete changes are not exotic. Idempotency keys, structured errors with is_retriable, cursor-based pagination, rate limit headers on every response, OpenAPI descriptions with semantic content, machine-to-machine auth options — none of these require re-architecting your API. They're additive constraints on top of what you probably already have.
What they require is treating the error path as API surface, not as an afterthought. Treating your schema descriptions as semantic contracts, not as comment fields you fill in once and forget. And recognizing that the caller that cannot ask for clarification deserves more precise signals, not fewer — because in the absence of precision, agents don't stop. They guess.
- https://arxiv.org/abs/2502.17443
- https://www.apideck.com/blog/api-design-principles-agentic-era
- https://medium.com/@pol.avec/handling-http-errors-in-ai-agents-lessons-from-the-field-4d22d991a269
- https://www.tamirdresher.com/blog/2026/03/21/rate-limiting-multi-agent
- https://nordicapis.com/how-ai-agents-are-changing-api-rate-limit-approaches/
- https://zuplo.com/learning-center/token-based-rate-limiting-ai-agents
- https://www.answer.ai/posts/2024-09-03-llmstxt.html
- https://authzed.com/blog/timeline-mcp-breaches
- https://dzone.com/articles/idempotency-in-ai-tools-most-expensive-mistake
- https://www.anthropic.com/engineering/advanced-tool-use
