Skip to main content

The Degradation Signals Your Agent Never Receives

· 9 min read
Tian Pan
Software Engineer

When a downstream API starts to wobble, a human operator finds out a dozen ways before anything actually breaks. The status page flips to yellow. A changelog email lands in the inbox. A warning banner appears in the provider's dashboard. The on-call channel lights up with a 429 someone spotted in the logs. A teammate posts "anyone else seeing slow writes?" None of these are responses to a request. They are the ambient operational signal that surrounds the API, and a human absorbs it almost passively.

An agent calling the same API receives exactly one thing: the response to the request it just made. Status code, headers, body. That is the entire channel. It has no inbox, no dashboard, no Slack, no peripheral vision. It cannot notice that the last ten calls each took twice as long as the ten before. It cannot read the status page, because nobody handed it the URL and it has no standing instruction to look. When the dependency degrades, the agent is the last party in the system to find out — and it usually finds out by failing.

This asymmetry is not a model capability problem. A smarter model does not fix it. The agent is blind to operational signals because the plumbing never delivers them, and most agent stacks ship without anyone noticing the plumbing is missing.

What the agent actually sees

Picture the narrowest possible window into a dependency: a single HTTP response. That is what a tool call returns to an agent. If the call succeeds, the agent sees a 200 and a body. If it fails, it sees a 500 or a timeout. In between — the entire spectrum of "working, but not well" — the agent sees almost nothing it is trained to act on.

Human operators live in that in-between space. They have learned that systems rarely fail cleanly; they degrade first. Latency creeps up. Error rates tick from 0.1% to 2%. A queue backs up. The provider posts "elevated error rates, investigating." An operator reads those weak signals and adapts: slows down, batches differently, switches to a fallback, or just waits. The adaptation happens long before the hard failure.

The agent has no equivalent. Its loop is request, response, decide, repeat. Each iteration is stateless with respect to operational health unless you deliberately make it otherwise. The response from call number 200 carries no memory that calls 190 through 199 were getting slower. The agent treats a 1.8-second response and a 180-millisecond response as the same outcome: success. So it does the same thing it did before, only now it is doing it into a system that is asking — quietly, in ways the agent cannot hear — to be left alone.

The brownout the agent runs straight into

The most dangerous failure mode is not the API going down. It is the API staying up while getting slow. Engineers call this a brownout: the service still returns 200s, but every request costs more time, more connections, more capacity. A brownout is a system pleading for less load while technically still serving.

A human sees a brownout immediately, because a human watches the clock. An agent does not watch the clock unless you build it a clock. So when a dependency browns out, the agent keeps calling at full speed. Worse, if the brownout tips a few requests into timeouts, the agent's retry logic kicks in — and retries are the accelerant.

The arithmetic is brutal in any multi-layer system. If each layer of a five-service call chain retries three times, one original user request can fan out into 3^5 = 243 backend calls. A single 429 triggers retries, those retries consume the same scarce quota, that consumption triggers more 429s, and the loop tightens into a retry storm. The agent, seeing only individual responses, has no idea it is the one generating the storm. It experiences the incident as "the API is being flaky" and responds with exactly the behavior — more requests — that is causing the flakiness.

The fix everyone reaches for is exponential backoff with jitter, and it is necessary. Double the wait between attempts, add randomness so a fleet of agents does not synchronize into a thundering herd, cap the attempts at five to seven. But backoff only triggers after a failure. It is a reaction to the 429, not a way to avoid it. An agent that backs off correctly still walked into the wall first. To not hit the wall, the agent needs to see the wall coming — and that information exists. It is just being thrown away.

The headers already in the response, ignored

Here is the part that should bother you: the agent does not need a new data source to see the brownout coming. The signal is already in the response it is holding.

Well-behaved APIs broadcast their health on every single call. The emerging IETF standard defines RateLimit headers carrying the remaining quota and the reset window; many providers still ship the older X-RateLimit-Remaining, X-RateLimit-Limit, and X-RateLimit-Reset triad. A 429 response is supposed to carry Retry-After, and when both Retry-After and a reset header are present, Retry-After wins. GitHub's REST API publishes your remaining quota on every response and tells you the exact second it resets. Zendesk, Jira, and most large platforms do the same. The dependency is telling the agent, on call number 190, that only three requests of quota remain before the reset — ten calls before the 429 it is about to trigger.

The agent ignores all of it. Most tool integrations parse the body and discard the headers before the response ever reaches the model. The agent is handed the letter with the warning torn off the top. It then sails through requests 191 to 199, hits the limit on 200, and only now — via a hard 429 — discovers a fact it has been holding in its hand for ten calls.

The same waste applies to latency. Nothing stops the tool layer from timing every call and noticing that the trailing-ten-call average just doubled. Nothing stops it from reading a Sunset header announcing an endpoint's retirement, or a deprecation notice in a response field. These are not exotic signals. They are standard, structured, and present. They simply are not on the path between the HTTP client and the model's context.

Build the agent a side-channel

The fix is an architectural decision, not a prompt. Treat operational metadata as a first-class input to the agent, on equal footing with the response body. Concretely, that means the tool layer — the wrapper, gateway, or MCP server between the model and the real API — stops being a dumb pass-through and starts being a translator. Its job is to convert human-facing operational signals into something the agent can act on.

A few things that layer should do:

  • Surface rate-limit state into context. When a response carries RateLimit-Remaining or X-RateLimit-Remaining, do not discard it. Attach a plain-language note to the tool result: "Quota: 3 of 5000 requests remain; resets in 47s." Now the model can reason about it — defer non-urgent calls, batch, or simply wait — instead of discovering the limit by hitting it.
  • Track and report degradation. Have the tool layer keep a short rolling window of latency and error rate per dependency. When the trailing average crosses a threshold, annotate results: "This API's latency is 3x its normal baseline; consider reducing call volume." The agent cannot compute this itself across stateless calls; the wrapper can.
  • Honor Retry-After mechanically, below the model. The model should not be the thing parsing Retry-After and sleeping. Enforce it in the tool layer with real backoff and jitter, and trip a circuit breaker after a threshold of consecutive 429s so a misbehaving loop cannot keep firing. The model's job is the task; the plumbing's job is the protocol.
  • Give the agent a health feed, not just responses. For critical dependencies, expose a dependency_health tool the agent can call, or inject a periodic health summary into context. A status page, a /health endpoint, and a known-incidents list are all machine-readable or close to it. The agent needs the dependency's health as an input it can poll — not a fact it backs into after a failure.
  • Translate deprecation into instruction. A Sunset header or a deprecation field means nothing to a model as raw metadata. The tool layer should turn it into an actionable line: "This endpoint retires on 2026-08-01; a replacement tool search_v2 is available." Metadata the model cannot act on is noise; an instruction it can act on is signal.

The MCP ecosystem is inching toward some of this — 2026 roadmap work on server metadata and well-known discovery URLs is a step toward dependencies that describe their own health. But you do not need to wait for a protocol. The translation layer is something any team can build today, in the wrapper code they already own.

The principle underneath

An agent is only as situationally aware as its context makes it. It does not have peripheral vision, a pager, or a colleague who mentions the API looked slow this morning. Everything a human operator picks up from the environment, an agent picks up only if you put it in the context window.

So the question to ask of every dependency your agent touches is simple: how would a human know this service is degrading — and is the agent receiving that same information? If the answer is a status page, a changelog email, or a header the tool layer quietly drops, then the agent is flying blind, and it will keep flying blind straight into the next brownout. The work is not making the model smarter. It is building the side-channel that carries the signal the model was never given a way to hear.

References:Let's stay in touch and Follow me for more thoughts and updates