Skip to main content

Your OAuth Tokens Expire Mid-Task: The Silent Failure Mode of Long-Running Agents

· 11 min read
Tian Pan
Software Engineer

The first time a production agent runs for forty minutes and hits a 401 on step 27 of 40, the incident review is almost always the same. Someone in the room asks why the token wasn't refreshed. Someone else points out that the refresh logic exists, but it lives in the HTTP client the agent's tool wrapper was never wired into. A third person notices that even if the refresh had fired, two of the agent's parallel tool calls would have tried to rotate the same refresh token at the same instant and blown up the session anyway. Everyone nods. Then the team spends the next week retrofitting credential lifecycle into an architecture that assumed requests finish in 800 milliseconds.

OAuth was designed for a world where an access token outlives the request that uses it. Long-running agents inverted that assumption. The request — really, a chain of tens or hundreds of tool calls orchestrated across minutes or hours — now outlives the token. The industry spent a decade building libraries, proxies, and refresh flows around the short-request assumption, and almost none of it transplants cleanly to agent loops.

This post is about what that architectural mismatch looks like in production: the failure modes that ride on top of it, the refresh patterns that actually work at agent timescales, and the auth-context drift problems nobody warns you about until the audit log shows your agent taking actions on behalf of a user who was offboarded forty-five minutes ago.

The Failure Mode Is Architectural, Not a Bug

When a web request carries a soon-to-expire token, the worst outcome is usually a 401 that the user retries with a click. The blast radius is one request, one user, one screen. Agents break that assumption at every boundary. A single long-running task fans out to dozens of tool calls, each of which might hit a different authenticated API, each of which might need its own refresh. The task has no user sitting in front of it to re-click. There is no browser to redirect to an authorization endpoint. There is often no single process to hold state across the entire run.

The typical retrofitted agent codebase has three layers that each made sense alone. The HTTP client knows how to refresh a token if it gets a 401. The tool wrapper knows how to call the HTTP client. The agent orchestrator knows how to sequence tool calls. None of them own the credential's lifecycle end-to-end. The 401 handling in the HTTP client covers a single retry and then surfaces the error upward; the tool wrapper turns that error into a tool-call failure; the orchestrator interprets the failure as a task failure, or worse, prompts the model to "try something else," which the model cheerfully does — sometimes with a tool call that did not require auth and appears to succeed.

The silent variant is the worst one. An agent hits a 401 on a write, the retry also fails because the refresh never fired, and the agent decides the task was "complete enough" based on the steps that did succeed. The audit log shows half a task. The user sees a success notification. The on-call engineer finds out next week from a customer ticket.

Proactive Plus Reactive, Not Either-Or

The refresh pattern that holds up at agent timescales is not a single strategy. Production systems combine two discipline layers that cover different failure modes.

Proactive refresh is the baseline: rotate the access token before it expires, typically when it has used 70–80% of its lifetime, driven by the issuer's expires_in rather than a client-side guess. This alone prevents most mid-task 401s because the agent is always operating on a token with meaningful remaining life. It also avoids the cascade of retry logic firing against tokens that were about to expire anyway.

Reactive refresh is the safety net for the cases proactive misses: clock skew, a refresh that fired but the new token hadn't propagated to the node making the request, an access token that was server-side-revoked between being minted and being used. When a 401 comes back, the wrapper refreshes once and retries once. Exactly once in each direction — anything more and you are building a DDoS against your own auth server while also masking real auth failures.

The place teams most often get this wrong is by picking one and skipping the other. Proactive-only breaks on revoked tokens and clock drift. Reactive-only means every first request after expiry eats a refresh round-trip and every downstream tool call that batched behind it stalls. The combination is not an engineering taste call; it is what the failure modes add up to demanding.

Token Refresh Is a Concurrency Problem

OAuth 2.0 refresh tokens are single-use under any sane configuration. Refresh token rotation — where each use invalidates the previous token and issues a new pair — is the default in OAuth 2.1 and is mandatory in the MCP authorization spec. This is good for security and terrible for naive refresh code in an environment where an agent might issue ten parallel tool calls against the same upstream API.

The race condition is brutally simple. Two concurrent tool calls read the same stored credentials. Both see the access token is expired. Both call the refresh endpoint. The first one succeeds and invalidates the previous refresh token; the second one gets a refresh_token_reused error, which most issuers treat as a breach signal and respond to by revoking the entire session. Now every other tool call in the agent's loop is about to fail, and there is no graceful way to re-auth a non-interactive process.

The fix is coordination, and the level depends on your deployment. For a single-process agent runtime, an in-memory mutex around the read-refresh-write cycle is enough: one call rotates the token, the rest wait and then read the fresh one. For a multi-process or multi-node runtime, you need a distributed lock — Redis with a short-TTL SETNX is the standard recipe — plus a "re-read before refresh" step inside the lock, so the second holder of the lock notices that someone already rotated and skips its own refresh. Without that re-read, the lock just serializes the same race.

The on-call pattern that tells you this is biting you: refresh_token_reused spikes correlated with fan-out steps in your agent workflows, followed by a cluster of "reauthenticate your account" notifications to users who never explicitly logged out.

Auth Context Drift Is the Hardest Part

The refresh mechanics are solvable engineering. The identity model is the part most teams get wrong because it requires deciding whose authority the agent is acting under, and when that authority ends.

Picture a typical long-running agent: a user triggers it at 9:00 AM; the agent is still running at 9:45; at 9:30 the user was offboarded. The refresh token is still valid. The agent continues. Every tool call continues to succeed. Every action is attributed to a user who no longer works at the company. The audit log is technically accurate and forensically useless.

Variants of this drift show up everywhere once you look. The user rotates their password because they think it was phished — the active session continues. The user revokes the agent's grant in their settings page — depending on the provider, the access token keeps working until it expires. The user logs out — same thing. None of these actions were designed with "my workflow is still executing" as a consideration.

The architectural pattern that actually handles this treats each long-running agent run as a workflow with its own credential lifecycle, not as an extension of the user's session. That looks like: the agent gets a delegation token at task start whose scope is the minimum needed for the declared task, whose lifetime is the task's expected duration plus a small buffer, and whose refresh is gated on "is the originating user still an authorized principal" rather than just "does the refresh token still work." When the user-state check fails — offboarded, credential-rotated, grant-revoked — the agent's next refresh attempt fails closed, the task pauses, and a human (the user, their manager, or an admin) is pulled in to confirm or cancel.

This is strictly more work than letting the OAuth library's default retry logic run, and it is the work. Agents that act with human-equivalent authority over long horizons need credential machinery that treats authority as revocable in-flight. Anything less is a compliance incident waiting for its trigger.

Designing the Credential Lifecycle Explicitly

Putting the pattern into practice is less about picking a library and more about deciding which layer owns which part of the contract. The layering that has held up across multiple agent codebases looks roughly like this.

The tool wrapper owns nothing about tokens. It receives a principal handle from the runtime and makes HTTP calls; it never reads from disk or touches the refresh endpoint. If it gets a 401, it surfaces it as a typed error.

The agent runtime owns the credential. It holds the refresh token, the access token, the expiry, and the user-state check. It exposes a get_access_token(principal, scopes) primitive that implements both proactive rotation and the concurrent-refresh lock. It also exposes a pause_task(reason) primitive for when a refresh fails closed due to user-state, not network.

The orchestrator owns policy. It decides what scopes the task needs at start, how long the task's token should live, how to handle pause_task — escalate to the user, escalate to an admin, hard-fail the task, or something finer-grained depending on the action's reversibility.

The auth server owns the hard guarantees: short access-token lifetimes (15–60 minutes is the range the MCP spec settled on for good reason), rotated refresh tokens, per-principal revocation that propagates in seconds, and audience-scoped tokens so a compromised tool cannot spend tokens outside its intended surface.

The property that emerges is that no single layer has to reason about everything. The tool doesn't know about refresh; the runtime doesn't know about scopes; the orchestrator doesn't know about HTTP. The failure modes still exist, but they each have a single owner, and the audit log can actually answer the question "what happened" after the fact.

Treat Long-Running Agents as Their Own Trust Boundary

The shortest version of everything above: if your agents routinely run for longer than your access token lives, your access token's lifetime is a product decision you never made on purpose. The default — borrowed from a decade of short-request OAuth integrations — is that access tokens drift to 60 minutes because that's what the docs say. For an agent that completes most tasks in four minutes, that's oversized and a security cost. For an agent that runs for three hours, that's undersized and an operational cost. Neither default helps you.

The teams that do this well stop thinking of agent authentication as a library configuration and start thinking of it as a workflow with its own lifecycle events: task start, token mint, periodic revalidation, task complete, token revoke. They set access-token lifetimes to match task shapes, scope tokens narrowly to the tools the task actually declares, and wire the pause-and-escalate path before they need it. The work shows up nowhere on the feature roadmap, and it is the difference between an agent platform that passes its first real audit and one that does not.

Your agents are going to outlive their tokens. Build for that now, so the first time it happens in production is not also the first time anyone thought about it.

Sources:

References:Let's stay in touch and Follow me for more thoughts and updates