Skip to main content

Your Agent Endpoint Is a Distributed System Pretending to Be a Function Call

· 9 min read
Tian Pan
Software Engineer

The most dangerous line of code in a modern AI application looks completely innocent:

result = await agent.run(user_query)

It reads like a function call. It has a name, it takes an argument, it returns a value. Your IDE autocompletes it. Your type checker is satisfied. And that single await is hiding a remote, multi-hop, partially-failing distributed system behind the syntax of a local procedure. The gap between what the code looks like and what it actually does is where most production agent incidents live.

We have decades of muscle memory telling us that a function call either returns or throws. It runs on our machine, it is fast, and if it fails, it fails loudly and immediately. None of that is true for an agent endpoint. Underneath that await sit network round trips, a remote service with its own rate limits and outages, retries that may duplicate side effects, streams that can die halfway through, and tool calls that fan out into yet more remote services. The SDK is doing you a quiet disservice by making all of that look like len(my_list).

This is not a new lesson. It is the fallacies of distributed computing wearing a new outfit. The network is not reliable, latency is not zero, the remote endpoint is not always up. We learned this for microservices and wrapped every cross-service call in timeouts, retries, and circuit breakers. Then the AI SDKs arrived with their friendly .run() and .create() methods, and a generation of application code forgot the lesson on contact.

The SDK Method Is a Network Boundary in Disguise

Treat every model call and every tool call as what it is: an RPC across a network boundary. The moment you accept that framing, a checklist you already own becomes mandatory.

A normal RPC to an internal microservice gets a timeout budget, a bounded retry policy, a circuit breaker, and a metric. Your agent.run() call deserves exactly the same treatment, and usually gets none of it, because it does not look like an RPC. It looks like a method on an object you imported.

The tell is in the failure modes. A local function does not return HTTP 429. A local function does not have a p99 latency that is ten times its p50. A local function does not get slower because someone else's traffic spiked. An agent endpoint does all three. The default client timeout in most HTTP stacks, proxies, and load balancers is 30 seconds — and that is frequently shorter than a real agent response that chains several tool calls and a long completion. So the first thing the disguise costs you is a timeout that fires in the wrong place, killing a request the model would have finished in 45 seconds.

The fix is not clever. It is the boring infrastructure you would never skip for a payments service:

  • An explicit timeout budget per call, sized to the actual latency distribution, not the library default.
  • A bounded retry policy with exponential backoff and jitter, so a provider blip does not turn into a synchronized retry storm from every one of your instances at once.
  • A circuit breaker that stops hammering a provider that is clearly down and fails fast instead.
  • A metric per call site, because you cannot reason about latency you do not measure.

None of this is innovative. It is table stakes for any network call, and the only reason it gets skipped is that the SDK made the call look local.

The Difference Between "Failed" and "I Don't Know"

Here is the distinction that the function-call mental model erases entirely, and it is the most expensive one.

A function call has two outcomes: it returned, or it threw. A network call has three: it succeeded, it failed, or you don't know. The third state — the request that timed out, the connection that dropped after you sent the bytes but before you got a response — is the one that does real damage, because your code has no way to distinguish "the model never saw my request" from "the model processed my request, ran three tools, and the acknowledgment got lost on the way back."

This is the Two Generals' Problem, and it is not solvable. No amount of confirmation messages closes the gap. When an agent call times out, you genuinely cannot know whether the work happened. You can only decide what to do about not knowing.

For a read-only call — "summarize this document" — not knowing is cheap. Retry it; worst case you pay for two completions. For a call that took an action — the agent that already sent the email, charged the card, filed the refund, or deleted the record — not knowing is the whole problem. Retry blindly and you do the irreversible thing twice. Don't retry and you might have done nothing at all.

The function-call model gives you no vocabulary for this. It tells you the call "failed," and "failed" quietly implies "nothing happened." For an agent that invokes tools with side effects, "failed" much more often means "something happened and I lost the receipt."

Retries Are Where Agents Quietly Charge the Card Twice

Agent retries are both more frequent and less visible than the human-triggered kind, which makes them more dangerous.

More frequent, because agents are chatty. A single user query can fan out into dozens of tool calls, and each one is an independent opportunity for a transient failure that triggers a retry. More invisible, because no human is sitting there clicking "submit" again — the retry happens inside an orchestration loop, three layers below your application code, and the only evidence is a duplicate row that someone notices a week later.

The canonical incident is depressingly consistent across teams. An agent calls a tool that charges a customer. The downstream payment service is slow because of its own external dependency. The agent's timeout fires. The retry logic — correct, well-intentioned retry logic — fires a second charge. The first charge succeeded the whole time; the agent just never got told. The customer is double-billed, and every layer of the system behaved exactly as designed.

The fix is to make the tool call safe to repeat, and that means idempotency keys threaded through every tool that mutates state. Generate a stable key when the agent first decides to take an action. Pass it to the tool. The tool's backend records the key on first execution and, on any call with a key it has already seen, returns the original result without re-running the side effect. A replay becomes a no-op. The retry storm becomes harmless.

Idempotency has to be a first-class requirement of your tool layer, not a polish item. The rule of thumb:

  • Read-only tools (search, fetch, compute) are safe to retry aggressively as-is.
  • Mutating tools must accept and honor an idempotency key before an agent is ever allowed to call them.
  • Tools that genuinely cannot be made idempotent — a third-party API with no dedup support, a physical action — need to be fenced behind explicit human confirmation, not handed to an autonomous retry loop.

Design the action layer so that "run it again" is always safe. Then retries stop being scary and become what they should be: a routine response to the unreliable network you already knew you had.

Reliability Compounds, and Agents Multiply the Chain

There is a piece of arithmetic that the function-call framing hides completely. When you chain remote calls, their reliability multiplies. Five services at 99.9% availability each give you a composite of roughly 99.5% — the chain is always less reliable than its weakest link, and usually worse than all of them.

An agent is a reliability chain that builds itself at runtime. You do not know in advance whether a query will make one tool call or forty. Each hop multiplies more failure probability into the result. A workload where the median request is rock-solid can still have a tail where enough hops line up that something, somewhere, fails — and because the agent decides the chain dynamically, you cannot eyeball it from the code.

This is why per-call-site instrumentation matters more for agents than for ordinary services. You need to see each hop: which tool, how long, succeeded or failed or unknown. Without that, a degraded agent looks like "it's slow sometimes" instead of "the calendar tool's p99 tripled and it sits in 80% of our chains." The model is not the only flaky downstream dependency in the system — every tool it can reach is one too.

Treat the Model Like a Flaky Downstream, Because It Is One

The mental shift is small and it changes everything. Stop thinking of the agent endpoint as a smart function. Start thinking of it as a remote service you do not own, do not control, and cannot fully trust — one with rate limits, outages, latency spikes, and a non-zero rate of responses that never arrive.

That reframing tells you exactly what to build, because you have built it before for every other unreliable dependency:

  • Wrap every model and tool call in a timeout budget sized to reality.
  • Make retries bounded, jittered, and reserved for idempotent operations.
  • Thread idempotency keys through every tool that touches state.
  • Add circuit breakers so a provider outage fails fast instead of cascading.
  • Instrument every hop so the dynamically-built chain is visible.
  • Treat a timeout as "I don't know," not "it failed" — and decide deliberately what to do about not knowing.

None of this is exotic. It is the standard discipline for any RPC across a network boundary, and the only reason agent code so often lacks it is that a tidy SDK method talked you out of it. The await is comforting. The comfort is the bug. The sooner your code admits that agent.run() is a distributed system wearing a function's clothing, the sooner it stops surprising you in production.

References:Let's stay in touch and Follow me for more thoughts and updates