The Revoked Tool Your Agent Kept Calling Because the Registry Cache Was an Hour Stale

June 3, 2026 · 11 min read

Software Engineer

A user opens the integrations page, finds the Stripe connector they installed last month, clicks Remove, and closes the tab. They believe they have just rescinded an authority. What they have actually done is decrement a row in a database that the agent currently talking to them will not read again for another forty-three minutes. In the interval, the agent will try to call that Stripe tool, the registry's authorization layer will correctly say no, the agent's harness will see the denial as a transient downstream blip and retry three times, and the user's own Stripe audit log will record three unauthorized access attempts arriving from a vendor they thought they had just severed.

The user's escalation will read, almost verbatim: your platform kept trying to access my Stripe after I removed it. That is exactly what happened, and the root cause sits one layer deeper than the bug report ever reaches. The tool registry was the source of truth for what the agent was allowed to do. The agent did not read the source of truth. It read a cache.

The cache had a one-hour TTL because the team that built the harness was worried about hammering the registry service on every turn of a long-running session. They picked a TTL by reasoning about registry load. They did not pick a TTL by reasoning about the consequences of the agent acting on a revoked authority. Those are different questions with different answers, and a team that conflates them ends up with a system where "I removed access" means "I removed access in an hour."

The Cache Is a Promise You Did Not Mean to Make

Every cache encodes a tolerance for staleness. When the cached value is a product name or a feature flag default, an hour of staleness is mostly a minor UX wrinkle. When the cached value is the set of capabilities an agent may invoke on the user's behalf, an hour of staleness is a promise to the user that you almost certainly did not mean to make: we will continue acting under the authority you just revoked, for up to sixty minutes, and there is nothing you can do from your side to make us stop sooner.

The user has no way to discover this promise from the surface of the product. The "Remove" button does not say takes effect within the hour. The confirmation toast does not say active sessions retain the previous authority until their next cache refresh. The user's mental model of a remove button is the model every remove button has ever taught them: the thing is gone now.

The platform's mental model is the one the harness code happens to encode, and that encoding came from a load-shedding decision made on a Wednesday afternoon a year before this user ever heard of the product. The mismatch between those two models is where the incident lives.

Removal Is an Event, Not a Polling Result

The pattern that closes this gap is not a shorter TTL. A shorter TTL trades one window of staleness for a smaller window of staleness, and the window can never reach zero without abandoning the cache entirely. The pattern is the one the recent revisions of the Model Context Protocol have been quietly converging toward: treat the change in available tools as an event that the registry pushes to active sessions, not as a value that sessions poll for.

A server that declares the listChanged capability is supposed to emit a notifications/tools/list_changed message the moment its tool set changes. The active client receives the notification, drops its cached manifest, and re-reads the registry before the next turn. The semantics are not "we promise the cache will refresh inside an hour." The semantics are "we promise that any change to what the user has authorized is visible on the next agent action."

The shift in framing matters more than the wire protocol does. Once you treat the registry as a source of events rather than a source of pollable values, the bound on staleness stops being a function of your TTL choices and becomes a function of the round trip between the registry and the active session. The blast radius of a stale cache shrinks from up to one hour to the time it takes to deliver one notification. That is the difference between a UX problem you discover at 3am and a UX problem that does not exist.

This pattern is not free. It requires the harness to maintain an open channel back to the registry for every active session, which is operationally heavier than a polling cache, and it requires the client side to actually act on the notifications when they arrive. Several mainstream agent clients still do not. The MCP spec mandates the server side, but the client behavior is a gap that has shown up in issue trackers across multiple major implementations. A team that adopts the pattern needs to verify both ends; declaring the capability on the server while the client silently ignores the notifications is worse than not declaring it at all, because it teaches the user to trust a guarantee no one is enforcing.

"Removed" and "Unreachable" Are Not the Same Denial

The second failure mode in the Stripe incident was not the cache. It was the retry. The harness called the tool, the registry's authorization layer correctly returned a denial, and the harness treated the denial as a transient downstream error and retried three times.

The harness was not malfunctioning. It was applying the retry policy the team had configured for the tool-call path, which was the policy every distributed systems textbook recommends for any network-dependent call: assume the first failure is transient, retry with backoff, give up after a small number of attempts. That policy is correct for the population of errors it was designed against. It is wrong for the specific error the registry returned, because that error was permanent by construction.

The contract between the registry's authorization layer and the harness did not distinguish the tool no longer exists for this user from the tool's backend is briefly unreachable. Both surfaced to the harness as "tool call failed." The harness, lacking a way to tell them apart, applied the only policy it had, which was the one that retries on the assumption that the failure is recoverable.

Every retry of a tool the user has just revoked is a second violation of the user's authority decision. The first call could be excused as a race between the user's click and the cache refresh. The retries cannot. By the third retry, the system is acting on a denial it has already received, which is qualitatively worse than acting on a cache it has not yet refreshed.

The fix is not in the retry logic. The fix is in the denial vocabulary. The registry needs to return enough information for the harness to classify the denial, and the harness needs to branch on the classification:

Removed, revoked, no longer authorized: do not retry, surface to the user, log the attempted call as an audit event.
Service unreachable, timeout, internal error: retry with backoff under a budget, fall back to telling the user the tool is temporarily unavailable.

The two branches look similar from the harness's perspective, but they correspond to fundamentally different underlying conditions. One is a state the user created and expects to be honored on the next action. The other is a condition outside the user's control and may resolve itself. Treating them as the same error is what produces the audit-log signature that triggered the customer's own security alerting in the first place.

The Audit Surface Has to Be Visible to the User Whose Authority Was Honored Late

A subtle aspect of the Stripe incident is how the user discovered it. They did not see it through the agent's UI. They saw it in their own Stripe audit log, after their own security tooling alerted them to a string of unauthorized access attempts arriving from a vendor they thought they had cut off.

That sequence is bad in two ways. It means the platform owed the user a notification of "we attempted to call a tool you had revoked, here is the audit trail of what we tried" and did not deliver it. And it means the customer's first source of truth about the platform's behavior was an external system, which converts the conversation from a routine support ticket into a security incident the customer is now obligated to escalate inside their own organization.

The audit surface a platform owes a user after this kind of incident is not optional. It is the only artifact that lets the user verify, after the fact, that the revocation they performed was eventually honored and that no further unauthorized actions occurred. It needs to be visible from the user's side of the product, not buried in a server log only the platform's engineers can read. It needs to enumerate every attempted call made under the previously-cached authority, what the registry decided, and when the cache finally caught up.

A platform that builds this surface as a first-class part of the revocation flow has a different relationship with the user when something goes wrong. The user can verify the system's behavior themselves, the support ticket becomes informational rather than adversarial, and the platform's own engineering team gets a high-fidelity feed of where the cache-vs-registry skew is producing visible incidents.

The Architectural Realization

The tool registry is not a data store. It is a control-plane surface. The values it holds are not facts about the world that are expensive to read and therefore reasonable to cache. They are the user's most recent decisions about what your system is allowed to do on their behalf, and the cost of treating those decisions as eventually consistent is paid by the user, in the currency of their own audit logs and their own trust.

A team that picks a registry TTL based on registry-service load is optimizing the wrong objective. The objective the user cares about is that their last action on the authority surface is reflected in the system's behavior on the next action it takes. The objective the platform's reliability team cares about is that the registry service does not get hammered by per-turn reads from millions of active sessions. Those two objectives are not in conflict if you stop expressing them in the same dimension. You handle the user's objective by treating revocation as an event you push to active sessions. You handle the reliability team's objective by sizing the registry to handle event broadcasts and on-demand reads for the relatively small population of sessions that actually need fresh data.

The trap is to think you can solve both objectives with a single TTL knob. You cannot. The TTL knob trades the user's interest against the platform's interest along a single axis, and the only honest answer is that there is no setting of that knob where both interests are served. A team that recognizes the framing problem stops tuning the knob and starts redesigning the surface so the question does not have to be asked.

The longer-term version of this realization is that anything in your agent's tool surface that the user can revoke needs to be treated as a piece of control-plane state, not a piece of cacheable data. Tool installations, scope grants, permission flags, even feature entitlements: each of them is a decision the user has made about what your system may do, and each of them carries the same hazard the Stripe tool did. A platform that gets this right for one surface and wrong for the next ten has not yet learned the lesson; it has only patched the case its last incident named.

The next time someone on your team proposes a TTL on any surface that gates an agent's authority, the question to ask is not what should the TTL be. The question to ask is what does this TTL promise the user about the freshness of their own decisions, and are we prepared to defend that promise the next time a user removes a tool and watches your agent keep calling it.

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Revoked Tool Your Agent Kept Calling Because the Registry Cache Was an Hour Stale

The Cache Is a Promise You Did Not Mean to Make

Removal Is an Event, Not a Polling Result

"Removed" and "Unreachable" Are Not the Same Denial

The Audit Surface Has to Be Visible to the User Whose Authority Was Honored Late

The Architectural Realization

Recommended Reading

About Tian Pan

The Cache Is a Promise You Did Not Mean to Make​

Removal Is an Event, Not a Polling Result​

"Removed" and "Unreachable" Are Not the Same Denial​

The Audit Surface Has to Be Visible to the User Whose Authority Was Honored Late​

The Architectural Realization​

Recommended Reading

About Tian Pan

The Cache Is a Promise You Did Not Mean to Make

Removal Is an Event, Not a Polling Result

"Removed" and "Unreachable" Are Not the Same Denial

The Audit Surface Has to Be Visible to the User Whose Authority Was Honored Late

The Architectural Realization