Skip to main content

Credentials Residue: The Agent You Retired Is Still Logged Into Production

· 10 min read
Tian Pan
Software Engineer

Six months after you sunset an agent, a security auditor pings the team Slack: "Why does this OAuth app still have read access to the company Google Workspace?" Nobody recognizes the app name. Someone greps the codebase — no hits. Someone checks the deploy manifests — no hits. Eventually a former PM remembers: that was the meeting-summarizer prototype, the one product killed in Q3. The user-facing surface was deleted. The OAuth grant, the service account in BigQuery, the Pinecone index, the Slack alert routing, the Datadog dashboard, the Splunk saved search, the eval dataset full of customer transcripts — all still there, all still authenticated, all still billing.

This is the credentials residue problem, and it is the dominant operational failure of the agent era. Every agent you ship provisions a halo of resources across vendors, internal services, and data systems. When you retire the agent by deleting its code, you remove maybe a fifth of what it created. The rest sits in production as ghost infrastructure, attributable to nobody, owned by nobody, and — most dangerously — still credentialed.

The scale problem makes this worse than it sounds. Industry telemetry now puts the ratio of non-human identities to human identities in the average enterprise at roughly 50 to 1, and at the high end past 140 to 1. Each retired agent doesn't just leave one credential behind; it leaves a delegation chain — the OAuth app talks to a service account, which holds a key to a vector database, which is read by an inference proxy, which writes to a logging pipeline. Pulling one thread does not collapse the others. Each is its own quiet liability.

Code Removal Is Not Decommissioning

The default mental model for retiring a feature is git revert. That model is structurally incompatible with agents.

A web feature retired by reverting the PR removes the route, the handler, the migration, and the test. The feature's footprint is fully inside the repository. An agent retired by the same gesture removes the prompt, the tool definitions, the orchestration loop, and the API client. The feature's footprint is mostly outside the repository: in vendor consoles, in IAM policies, in cloud resource catalogs, in observability tools the developer never opens.

The asymmetry compounds because agents are creative. A web feature provisions resources at deploy time through declarative infra-as-code. An agent provisions resources at runtime — the first time it needs a vector index, the first time a developer wires up a new MCP server, the first time someone snapshots a production transcript into an eval set "just to debug this one regression." None of those are in Terraform. None of them get torn down by a Terraform apply.

The result is that agent retirement, done by code revert, removes the call sites of the agent's resources without touching the resources themselves. Revocation terminates an active token; deprovisioning removes the underlying identity, the credentials, and any delegated scope chains it holds. Most "agent retirement" sprints do the first and skip the second.

The Inventory You Wish You Had

The reason teams skip the second step is not laziness. It is that they cannot enumerate what to revoke. By the time an agent ships its third version, the resources it touches look like this:

  • OAuth and SSO: a registered app with a vendor (Google, Slack, Notion, Salesforce), with one or more refresh tokens stored in your secret manager, granting scopes the original prototype needed and never tightened.
  • Cloud identity: one or more service accounts or IAM roles, with policies attached, with keys issued, with workload identity federation bindings.
  • Data systems: warehouse roles, vector index credentials, queue subscriptions, S3 prefixes with bucket policies that name the agent's role.
  • Provider accounts: an Anthropic or OpenAI workspace project with a budget, with API keys, with logging enabled, with custom evals registered.
  • MCP and tool registries: server entries in an internal MCP gateway, tool registrations in an agent registry, scoped tokens for each tool the agent calls.
  • Eval and telemetry: production transcripts snapshotted into an eval bucket, an evaluation run history, dashboards pinned to specific trace queries, alerts routed to channels.
  • Feature gating: a flag in your experimentation platform whose state nobody remembers, sometimes the only thing protecting users from a half-deleted code path.

A retirement playbook that lists "delete the code" as the first step never reaches half of these. A playbook that starts with "produce the inventory" reveals the impossible task that the team has been quietly avoiding for two quarters.

The Discipline Starts at Launch, Not Sunset

The only sustainable fix is to make the inventory cheap to produce by tagging every resource the agent provisions with a stable agent ID at the moment of creation. This is the architectural prerequisite that turns decommissioning from an archaeology project into a tag query.

Concretely: every cloud resource gets agent-id=<id>, every OAuth app gets the agent ID in its client name or metadata, every vector index uses the agent ID in its namespace, every API key is created in a project named for the agent, every dashboard and alert has the agent ID in its tags or saved-search name, every eval dataset has the agent ID in its manifest. The convention has to be enforced where the resources are created — in the IaC module, in the provisioning script, in the registration helper — not as an afterthought audit.

Tag-based lifecycle management is the standard discipline for cloud cost management; the same primitive solves the agent residue problem. A resource without the tag is treated as nonexistent during decommissioning, which means a resource without the tag is also a violation that should fail review at provisioning time.

The version of this that fails predictably is "we'll add agent IDs to the new tags going forward." The team adds the tag, ships three new agents that use it, and then a year later when one of those agents is retired, the playbook works for that one but does nothing for the dozen older agents that never got tagged. Decommissioning is a backstop for the discipline you established at launch. If you didn't establish it, the backstop is a weekend of grep across vendor consoles.

The Eval Dataset Is Regulated Data

The eval dataset built from production transcripts deserves a specific call-out, because it is the residue category most likely to become a compliance incident rather than a security one.

The flow looks innocuous at the time. A regression slips through; an engineer wants to reproduce it; they snapshot a few hundred real production conversations into a JSONL and check it into the evals folder. Six months later that JSONL contains personally identifiable information from users who have, in the meantime, exercised their right to erasure under GDPR or its analogues. The deletion request was honored in the production database. The eval snapshot was not in the deletion pipeline because nobody told the privacy team it existed.

Right-to-erasure obligations require removing user data from production databases, backups, analytics pipelines, and training and evaluation datasets. An eval snapshot derived from production transcripts is a regulated data asset whose retention obligations the engineering team did not realize they were taking on at snapshot time.

The decommissioning playbook for evals therefore cannot be "delete the dataset." It has to be one of two paths: delete with a record of what was deleted and when, or move to a long-term retention tier with an explicit owner who has signed up for the deletion-request workflow. The default of "leave it in the bucket because storage is cheap" is the path that produces the audit finding.

The Decommissioning Checklist

The output of all of this is an explicit, executable checklist tied to the agent's manifest. The shape that works in practice:

  • Identity: revoke OAuth refresh tokens; delete the OAuth app registration with the vendor; delete service accounts and IAM roles; remove workload identity bindings; remove the agent's user from any internal IAM groups.
  • Provider: revoke API keys; close out the provider workspace project (or zero its budget and rename it RETIRED-<id>); disable logging exports.
  • Data: drop vector indices; revoke warehouse role grants; remove queue subscriptions; delete or archive S3 prefixes with explicit retention; remove bucket policies referencing the agent role.
  • Tools and MCP: remove server entries from the MCP registry; revoke per-tool tokens; remove tool registrations from the agent registry.
  • Eval and telemetry: review eval datasets for PII retention obligations and delete or transfer ownership; archive or delete dashboards and saved searches; delete alerts or transfer routing; export trace data if required for retention.
  • Feature gating: remove the flag from the experimentation platform after confirming no code references remain; remove the flag from config.
  • Record: write a decommissioning record into your audit log naming the agent ID, the resources removed, the resources retained with reason, and the human who signed off.

The checklist is boring. That is the point. The interesting work is the architectural decision to tag resources with the agent ID at creation, which turns this list from an investigation into a script that iterates over a tag query.

What the Org Has to Recognize

The deepest version of this is an organizational realization: an agent is not a feature, it is a network of provisioned resources distributed across vendors, internal services, and data systems. The team whose retirement playbook is "git revert the feature" is not retiring the agent. They are retiring the user-facing surface and leaving the backend identity, data, and telemetry residue for someone else to find.

Two roles emerge from taking this seriously. The first is owner of record for every agent — a named human, not a team, who is on the hook for the lifecycle from provisioning through decommissioning. Without that, the agent's resources outlive every reorg and become genuinely orphaned. The second is a non-human identity owner inside the security or platform org who can audit the inventory across all agents, enforce tag conventions at provisioning time, and run the periodic discovery sweeps that catch the resources that slipped through.

Neither role is glamorous. Both are load-bearing. The teams that staff them spend their first year cleaning up the residue trail of agents that were retired before the discipline existed, and emerge with an inventory they trust and a decommissioning script that runs in minutes. The teams that don't staff them spend the same year explaining to auditors what an OAuth app called meeting-summarizer-v2-prod is doing in their Google Workspace tenant, and which long-departed engineer registered it.

The forward-looking version of this is straightforward: assume every agent you ship today will be retired within two years. The retirement is going to happen during a sprint where everyone is busy with the next thing. The only mechanism that survives that distraction is one where the agent's resources can be enumerated by a tag query and removed by a script. Build that mechanism before you ship the second agent. After the fifth, the cost of catching up exceeds the cost of doing it right from the start.

References:Let's stay in touch and Follow me for more thoughts and updates