Your APIs Assumed One Human at a Time. Parallel Agents Broke the Contract.
A backend engineer I know spent a Tuesday afternoon staring at a Datadog graph that had never spiked before: the per-user 429 counter on their internal calendar service. The customer complaining had not changed their behavior. They had simply turned on the assistant feature, which now spawned eight planning threads in parallel against the same calendar API every time the user said "find me time next week." The rate limiter — a perfectly reasonable 60 requests per minute per user, written years ago against a UI that physically could not click that fast — was firing within the first three seconds of every request and silently corrupting half the assistant's responses.
The rate limit was not the bug. The contract was the bug. That backend, like most internal services written before 2024, had a quietly enforced assumption baked into every layer: one user means one stream of activity, paced by a human's reaction time, with one cookie jar, one CSRF token, and one set of credentials that could be re-prompted if anything went wrong. Agents shred all five of those assumptions at once, and the failures show up as a constellation of unrelated incidents — 429 storms, last-write-wins corruption, audit logs you can't subpoena, re-auth loops that hang headless workers — that nobody connects until the pattern is named.
The shorthand I have been using with platform teams is this: every backend you own has an undocumented contract with its callers, and that contract was negotiated with humans. Agents are now showing up to renegotiate. You can either do the renegotiation deliberately, in code review, or you can do it during your next incident.
The five assumptions agents break (and why they're invisible until they fire)
Single-user-at-human-speed is not one assumption. It is at least five, layered through different parts of the stack, owned by different teams, each defensible in isolation:
- Rate limits are shaped for steady human cadence. A real person clicks, reads, types, clicks again. A 100 req/min limit is generous for that. An agent fan-out planner can dispatch 500 requests in ten seconds and then be silent for five minutes. The token bucket fills perfectly across the five-minute window and the limiter still fires constantly, because the cadence is wrong, not the volume.
- Idempotency is treated as the client's problem. "If you double-submit, that's on you" works when "you" is a human who notices the double-charge and complains. When "you" is a planning agent that retries on a transient 502 by re-running its tool call from the top, the server will quietly create two of everything and the agent will report success. The Idempotency-Key header has been an IETF draft since 2021 and most internal APIs still treat it as optional.
- Sessions and CSRF tokens assume one cookie jar. Single-page-app session models lean on a per-browser cookie and a CSRF token bound to that session. Spawn ten parallel agent workers against the same logical user and you have ten cookie jars or one shared jar with ten concurrent writers — both modes break things the original auth designer never tested.
- Audit logs record an action, not a chain of authority. "User U updated record R at timestamp T" was sufficient when U was a person who could be asked what happened. When U is an OAuth principal acting on behalf of a human acting on behalf of a service account that the human authorized last quarter, "user U did it" is a lie of omission that compliance will eventually catch.
- Locking semantics are last-write-wins because two-tab humans were rare. A user opening the same record in two browser tabs and editing both was an edge case worth ignoring. Three agents writing to the same record in the same second is now the modal case, and your "we'll just use last-write-wins" decision from 2019 is now silently dropping data.
None of these are exotic. Each one is something a senior engineer would defend on its own. The problem is that all five hold simultaneously, and an agent workload tests all five at once on the very first day it is enabled.
The rate-shape mismatch is not a tuning problem
The first instinct when 429s spike is to raise the limit. This is wrong, and it is wrong in a way that costs more than it saves.
Consider what the rate limiter is actually for. Two jobs, mostly: protect the backend from a single tenant exhausting capacity, and constrain abuse from compromised credentials. Both of those jobs are denominated in the same unit — requests per minute per principal — because for a human user, requests per minute is a reasonable proxy for resource consumption and for "is this account behaving like an account, or like an exfiltration script."
Agents decouple those two units. Resource consumption per minute is now bursty and high; behavioral signal is now meaningless because every account looks like a script. Raising the per-minute limit to accommodate the burst means your abuse heuristic is gone and your capacity protection is wishful thinking.
The redesign is to split the budget into two dimensions. Concurrency budget caps how many requests can be in flight at once for a given principal — this is what protects the backend, because in-flight requests directly map to thread pools, database connections, and downstream API quotas. Token bucket caps work over time, but you set it generously, because you have already capped the worst-case fan-out via concurrency. A planner trying to spawn 500 parallel threads against a service with concurrency cap 8 will either queue, get fast 503s with retry hints, or — best — get a 429 with a Retry-After header that the agent's executor knows how to honor. The graph stops being a saw-tooth of false positives.
The second piece is per-tool quotas separate from per-principal quotas. Tool catalogs have wildly different blast radii — a search call costs a millisecond and a public-facing list endpoint, while a "send email" call costs an external API charge and a deliverability reputation. Treating both as "1 unit per request" against the same per-user budget is exactly the abstraction failure that lets a buggy agent burn through your transactional email quota in fifteen minutes.
Idempotency is now a contract, not a feature
A pattern I keep seeing in postmortems: an agent gets a 502 from a backend, retries from the top of its planning loop instead of the failed call, and the backend ends up with two of whatever was being created. The fix is always the same — make the endpoint accept an Idempotency-Key header and store the result of the first attempt — and the response is always the same: "we'll add it to the backlog."
That backlog item should be a P1, because the absence of idempotency is no longer a latent risk. With human users, double-submit was a sometimes-thing that the user noticed. With agent users, retry-on-error is the default behavior of every agent framework on the market. Stripe figured this out a decade ago for payments because the cost of getting it wrong was money; backends that touch any kind of external state — sending a notification, creating a calendar event, modifying a record — are about to learn the lesson on their own time.
The implementation is shockingly small. Accept an Idempotency-Key header on every mutating endpoint. Hash the request body alongside the key. On a duplicate key with a matching body hash, return the original response. On a duplicate key with a different body hash, return 422 — that is a client bug. Keep keys for 24 hours. That is the entire spec, and it is now a non-negotiable for any service you expect agents to call.
The harder part is making it mandatory at the gateway. An agent that "forgets" to send an idempotency key will silently corrupt data the moment retries fire. Treat a missing key on a mutating call from an agent principal as a 400 — make the contract explicit, refuse the request, force the bug to surface in development.
"On behalf of" needs to mean something
The audit-log column called user_id was never meant to answer the question "who actually authorized this." It was meant to answer "whose session was this." When sessions belonged to humans and humans authorized themselves, those two questions had the same answer.
For agents the questions diverge irreversibly. RFC 8693 (OAuth 2.0 Token Exchange) and the on-behalf-of flow already model this: the token carries a sub claim for the human who delegated, an act claim for the agent that is acting, and act claims can chain — agent-of-agent-of-user, with each layer recording what it added to the request. The standards are not new; the tooling around enforcing and recording them is.
What needs to change in your backends:
- The audit log schema grows three first-class fields.
principal(the human or service that ultimately authorized the action),actor(the immediate caller — usually an agent identity), anddelegation_chain(the JSON path ofactclaims, so a forensic auditor can replay how authority flowed). One column is no longer enough. If you keep stuffing the agent identity intouser_id, you have lost the ability to tell, six months from now, whether a deleted record was deleted by a person, by their assistant, or by a third-party agent the assistant invoked. - Headers propagate, or the chain breaks at the first hop. The internal services downstream of your edge gateway need to receive and forward the principal/actor headers, not strip them. This is exactly the kind of cross-team plumbing that nobody owns until an incident makes it everyone's problem.
- Authorization decisions key off the actor, not just the principal. A human may have permission to delete a record. An agent acting on their behalf may not — or may need a separate consent — depending on policy. The
if user_can_delete(record)check is no longer the right shape. It needs to beif actor_can_delete(record, on_behalf_of=principal), with the policy engine able to reason about the actor identity and the delegation scope.
The pushback I hear on this is "we'll add it later, the audit log is fine." It is not fine. The day a regulator or an incident-response team needs to reconstruct who authorized a write, "the audit log says user U" with no actor distinction is a hole you cannot patch retroactively, because the data was never captured. Add the columns now while the table is small.
The "agent-friendly" trap
There is a temptation, when this list of changes lands on a platform team's roadmap, to bundle it all under a label like "agent-friendly v1" and ship it as a feature. This framing is wrong, and it is wrong in a way that will produce a worse outcome than doing nothing.
"Agent-friendly" sounds like a sticker you add to the docs after some optional work — like "GraphQL support" or "webhooks v2." It implies a parallel API surface for agents that humans can ignore, and a long migration where some endpoints are agent-aware and others are not. In practice, every endpoint that an agent can reach (which is to say, every endpoint behind your auth boundary) is now part of the agent contract whether you like it or not. The choice is not "do we support agents" — they are already calling — but "is our contract written down and enforced, or is it being discovered, one incident at a time, in production."
The framing that works is to treat the contract as a renegotiation across every API your auth layer fronts. A small and mandatory set of changes — concurrency budgets, idempotency keys, principal/actor split in the audit log, OAuth on-behalf-of for delegated calls, per-tool quotas — applied to every service, with the gateway enforcing the new contract uniformly. Optional features can come later. Mandatory ones come first, because the failure mode of "this service is the only one that ignores idempotency" is silent data corruption, not a graceful degradation.
The teams I see getting this right have an explicit AAA charter (Authentication, Authorization, Audit) for non-human callers, owned by the platform team, with a quarterly review of every backend's compliance. The teams getting it wrong are still tuning rate limits in response to last week's pager.
What to do on Monday
If your backends were designed before 2024 and you are now turning on agentic features, the renegotiation will happen. The choice is whether you drive it or it drives you. Three concrete moves that pay back the fastest:
- Audit your rate limiters for the burst-vs-cadence mismatch. Instrument the time-distribution of requests, not just the rate. Find the principals where the variance is high — those are the agent users — and add a concurrency cap before raising the per-minute limit.
- Pick the ten endpoints with the largest blast radius and require idempotency keys at the gateway. Sending email, charging cards, writing to external systems, creating durable records. Make the absence of a key a 400 for agent principals. The retries are coming whether you are ready or not.
- Add
actorandprincipalcolumns to your audit log schema and start populating them, even if the policy engine doesn't yet use them. Backfill is impossible; capture is cheap. You will want this data the first time an agent does something a human did not authorize.
Agent workloads are not waiting for your roadmap. The contract you negotiated with humans is being renegotiated this quarter, line by line, in incident reviews. The teams that win will be the ones that wrote the new contract down before they had to argue about it at 3 AM.
- https://zuplo.com/learning-center/token-based-rate-limiting-ai-agents
- https://medium.com/@rameshkannanyt0078/built-a-fastapi-backend-for-ai-agents-in-2026-heres-what-broke-fa6c5b4d2c25
- https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/
- https://stripe.com/blog/idempotency
- https://datatracker.ietf.org/doc/html/rfc8693
- https://learn.microsoft.com/en-us/entra/identity-platform/v2-oauth2-on-behalf-of-flow
- https://blog.christianposta.com/explaining-on-behalf-of-for-ai-agents/
- https://www.loginradius.com/blog/engineering/auditing-and-logging-ai-agent-activity
- https://www.apistronghold.com/blog/ai-agents-stateless-audit-trail
- https://www.augmentcode.com/guides/multi-agent-outputs-n-pass-enterprise-audit
- https://block.github.io/goose/blog/2026/01/05/agentic-guardrails-and-controls/
- https://stytch.com/blog/ai-agent-authentication-methods/
- https://workos.com/blog/best-oauth-oidc-providers-for-authenticating-ai-agents-2025
