AI Agents Graduate to First-Class Platform Citizens With RBAC, Quotas, Governance Policies. Your IAM Was Designed for Humans. Are You Ready for Non-Human Workers?

Your platform engineering team just spent six months building an internal developer portal with RBAC, quotas, and governance policies for human engineers. Last week, three different teams deployed AI agents that bypass all of it.

The agents authenticate with static API keys shared across services. They inherit broad permissions from the systems they connect to. Nobody knows what they’re accessing, who “owns” them, or how to revoke their access when something goes wrong.

This isn’t a hypothetical. According to recent research, while 80.9% of technical teams have moved AI agents into active testing or production, only 14.4% report all agents going live with full security/IT approval. The identity crisis is real: only 18% of security leaders are highly confident their current IAM systems can effectively manage agent identities.

The Human-Centric Architecture Trap

Our IAM infrastructure was designed with humans in mind:

  • Users have email addresses and can reset passwords
  • Role-based access control assumes humans understand their job responsibilities
  • Session timeouts protect against forgotten logins
  • Audit trails track who did what for compliance

But AI agents don’t fit this model:

  • They don’t have email addresses — they authenticate via API keys, service accounts, or OAuth tokens
  • RBAC grants overly broad permissions — defining granular, task-specific access for dynamic agents is operationally complex
  • They don’t “forget” to log out — agents run continuously and need persistent access
  • Audit trails show “service account” activity — not which agent, which model version, or which human authorized it

When 44% of agents use static API keys, 43% use username/password combinations, and 35% rely on shared service accounts, we’re essentially treating them like legacy batch jobs from 2010. The infrastructure hasn’t caught up to the reality that these are autonomous, decision-making entities that need their own identity category.

The Governance Gap: When Adoption Outpaces Control

Here’s the uncomfortable truth: platform engineering teams are being bypassed. Teams deploy agents using their existing credentials, cloud IAM roles, or developer accounts. The agents “just work” — until they don’t.

The practical consequences:

  • No inventory — You can’t govern what you can’t see. How many agents are running? Which systems are they accessing?
  • No ownership — When an agent misbehaves, who’s responsible? The developer who deployed it? The team that owns the model? The security team that should have caught it?
  • No boundaries — Most agents inherit broad permissions from the systems they connect to, with no zero-trust boundaries governing what they can actually reach
  • No audit trail — When an incident happens, you see a service account made 10,000 API calls. But which agent? For which task? Authorized by whom?

Research shows 40% of organizations are increasing identity and security budgets specifically to address AI agent risks, while 34% have established dedicated budget lines for agent governance. The market is signaling this is a real problem.

Three Approaches to Agent Identity

The industry is converging on treating agents as first-class identity primitives, not as users or services. Here’s what that looks like:

1. Agent Identity Gateways

Purpose-built infrastructure that sits between agents and your platform:

  • Dynamic authentication — On-behalf-of (OBO) token exchange instead of static keys
  • Runtime authorization — Policy evaluation at request time, not role assignment at deployment time
  • Continuous traceability — Every agent action logged with model version, prompt context, and human authorizer
  • Unified orchestration — Single control plane for all agent identities across your infrastructure

2. Fine-Grained Authorization (FGA)

Extending beyond traditional RBAC to handle hierarchical, resource-scoped access:

  • Not just “Can this agent access the database?” but “Can this agent query customer table filtered to accounts it’s been assigned?”
  • Context-aware permissions based on agent task, time of day, data sensitivity
  • Temporary permission grants with automatic expiration

3. Infrastructure-Level Guardrails

Default safe by design:

  • Mandatory inventory — Agents can’t deploy without registering identity, purpose, owner
  • Default deny boundaries — Agents get minimal permissions by default, must request escalation
  • Automated alerts — New identities without defined owners trigger security review
  • Graduated policies — Low-risk agents get streamlined approval, high-risk agents require security sign-off

The Question Platform Engineering Teams Need to Answer

If your CI/CD pipeline blocks deployments that fail security scans, should it also block agents that exceed permission boundaries?

Cisco, Strata, and others are betting the answer is yes. The emerging pattern: preventive controls that treat agent deployment like code deployment — with gates, approvals, and automated policy enforcement.

But this raises uncomfortable questions:

  • Who sets the permission thresholds? Security? Platform? Individual teams?
  • How do you balance innovation velocity with governance requirements?
  • What’s the appeals process when a legitimate agent is blocked?
  • How do you avoid creating a bureaucratic bottleneck that teams route around?

What We’re Doing (And What We’re Still Figuring Out)

We’re three months into implementing agent identity governance at our company. Here’s our current approach:

What’s working:

  • Required agent registration in our service catalog (lightweight form: purpose, owner, system access)
  • Integrated agent identity checks into our existing IAM review process
  • Created an “Agent” identity type in our RBAC system with tighter default permissions

What we’re still figuring out:

  • How to trace agent behavior back to specific model versions and prompts (our audit logs show “agent_id” but not enough context)
  • Whether to build custom infrastructure or adopt a vendor solution (we’re evaluating Strata, WorkOS FGA, and Curity)
  • How to handle agents that need cross-system orchestration (do they get one identity per system or federated identity?)
  • The organizational question: Should platform, security, or individual teams own agent governance?

The Bigger Question

Are we ready for non-human workers with the same access privileges as senior engineers?

Because that’s what we’re building. These aren’t scripts. They’re autonomous systems that make decisions, access sensitive data, and take actions with real business consequences.

Our IAM systems were designed for a world where humans were the only actors. That world is gone. The question is whether we update our infrastructure to match reality — or keep pretending agents are just another API integration.

What’s your organization doing about agent identity? Are you treating them as users, services, or something new entirely?

This hits close to home. We’re dealing with this exact issue in financial services, where the stakes are even higher because of regulatory compliance.

The Authorization Creep Problem

What you described — agents inheriting broad permissions from existing systems — is what we call “authorization creep” in our security reviews. An agent deployed by a developer with database admin rights suddenly has the same access. But unlike the developer, the agent doesn’t understand the difference between test and production data. It doesn’t know that certain tables contain PII that requires extra logging.

We had an incident last month: An agent deployed by our fraud detection team accessed customer transaction data using the team lead’s service account. The agent worked perfectly for its intended task (flagging suspicious patterns). But it also cached customer data in a temporary Redis instance that didn’t have encryption at rest. Our compliance team found it during a routine audit.

The developer who deployed it had no idea. The agent’s framework handled caching automatically. The service account had Redis access. Everything “just worked” — until it violated our data residency policies.

What We’re Implementing: Graduated Agent Policies

We’ve adopted a three-tier agent classification system:

Tier 1: Read-Only Observers

  • Can query internal APIs and databases (read-only)
  • No write permissions, no external API calls
  • Lightweight approval: Team lead sign-off
  • Example: Agents that summarize logs or generate reports

Tier 2: Internal Actors

  • Can write to internal systems, orchestrate workflows
  • No access to customer PII or financial data without explicit grants
  • Moderate approval: Security review + team lead
  • Example: Agents that create Jira tickets, update internal dashboards

Tier 3: External-Facing Agents

  • Can interact with customer data, external APIs, or financial systems
  • Require full security review, data classification audit, and ongoing monitoring
  • Heavy approval: Security + Compliance + VP sign-off
  • Example: Agents that respond to customer inquiries, execute trades

The key is we block by default. An agent without a tier classification can’t deploy to production. Period.

The Hard Part: Dynamic Permissions

The challenge we’re still wrestling with is agents that need context-aware permissions.

For example, our customer support agent needs to access customer account data — but only for the specific customer in the conversation. Traditional RBAC would give it access to all customers (too broad) or require us to create role assignments dynamically for each conversation (operationally impossible).

We’re experimenting with attribute-based access control (ABAC) where permissions are evaluated at runtime:

Allow: agent_id=support-agent-v2
  AND resource.type=customer_account
  AND resource.customer_id IN request.context.conversation.participants

But implementing this requires rebuilding how our authorization layer works. We’re not there yet.

The Organizational Question You Raised

Should platform, security, or individual teams own agent governance?

In my experience, it has to be a partnership with clear ownership boundaries:

  • Platform team owns the infrastructure (agent identity gateway, registration system, default policies)
  • Security team owns risk classification and approval workflows
  • Individual teams own agent behavior and accountability

Where we got burned: We initially made security the sole owner. They became a bottleneck. Teams started deploying agents in “shadow IT” mode using personal API keys. That was worse than no governance at all.

Now we have a shared responsibility model: Platform provides the tools, security defines the rules, teams own the outcomes.

What I’d Ask Your Team

You mentioned you’re evaluating vendor solutions (Strata, WorkOS FGA, Curity). Here’s what I’d dig into based on our experience:

  1. How do they handle temporal permissions? (Agent needs access for 5 minutes during a task, not forever)
  2. Can they integrate with your existing RBAC? (We couldn’t rip-and-replace our entire IAM stack)
  3. What’s the developer experience? (If it’s too painful, teams will route around it)
  4. Do they support policy-as-code? (We need GitOps workflows for agent permissions, not web UIs)

The fact that you’re tackling this now — three months in — is ahead of the curve. Most organizations I talk to are still in the “agents are just API keys” phase. By the time they realize the problem, they have hundreds of untracked agents in production.

This is such an important conversation, and I appreciate how you’re framing it as an infrastructure and organizational design problem, not just a security add-on.

What struck me most in your post: “Only 14.4% report all agents going live with full security/IT approval.” That’s not a technical failure — that’s a systemic misalignment between how teams want to work and what infrastructure makes easy.

Why Teams Are Bypassing Governance

I’ve seen this pattern before with containerization, microservices, and now AI agents. When the official path is too slow or complicated, teams create workarounds. The question isn’t “How do we force compliance?” but “Why is non-compliance easier than compliance?”

In our organization, I talked to the teams deploying agents without going through platform/security. Here’s what they told me:

  1. “We didn’t know we needed approval” — Agents felt like code, not infrastructure. Why would we file a security review for code?
  2. “The approval process would take 3 weeks” — Our existing IAM provisioning process assumes human employees with onboarding timelines. Agents get deployed in hours, not weeks.
  3. “We’re just experimenting” — The agent only runs locally, or in dev, or “just for our team.” (Spoiler: It always escapes to production.)
  4. “We use our own API keys, so it’s secure” — Developers genuinely believed using personal credentials was the safe option because they “own” the risk.

None of these are malicious actors. They’re smart engineers trying to ship value quickly. The infrastructure made the wrong thing easy and the right thing hard.

Rethinking Agent Identity as Product Design

What if we designed agent identity systems with the same user experience rigor we apply to developer tools?

The “happy path” should be the secure path. If getting an agent identity takes 10 minutes and deploying with a personal API key takes 5 minutes, we’ll lose. The secure option needs to be faster, not slower.

Here’s what we’re building:

  • Self-service agent provisioning — Developer requests agent identity via Slack command or CLI tool, gets approval + credentials in <10 minutes for Tier 1 agents (read-only)
  • Default-deny with easy escalation — Agents start with minimal permissions. Need more? agent-cli request-permission --resource=database --justification="customer support queries" triggers automated review
  • Embedded in existing workflows — Agent identity request built into our CI/CD pipeline. If you deploy something that makes external calls, the pipeline prompts: “Is this an agent? Register it now.”
  • Clear feedback loops — When an agent is blocked, the error message explains why and how to fix it with a one-click escalation link

We borrowed this philosophy from how we think about developer experience: Remove friction from the right thing, add friction to the risky thing.

The Inventory Problem Is Bigger Than Technical

You mentioned “How many agents are running? Which systems are they accessing?” as a discovery problem. That’s true, but it’s also an organizational accountability problem.

Who owns the agent?

  • The developer who wrote the code?
  • The team that deployed it?
  • The product manager who requested it?
  • The exec who approved the budget?

In traditional software, ownership is clear: This service is owned by Team X, documented in the service catalog, with an on-call rotation. But agents are more fluid. They’re spawned from frameworks, shared across teams, versioned rapidly.

We created an “Agent Registry” modeled on our service catalog:

  • Every agent has a human owner (not just a team — a specific person who’s accountable)
  • Agents require quarterly certification: “Is this still running? Still needed? Still compliant?”
  • Agents without active owners get automatic deprovisioning warnings (30-day countdown)

This isn’t just governance theater. When we had an incident with an agent making unexpected API calls, we knew exactly who to page. They knew the context, intent, and could debug quickly. That wouldn’t have happened if the agent was just “owned by the engineering team.”

The Uncomfortable Trade-off: Velocity vs. Control

Here’s the tension every engineering leader faces: Tight governance slows innovation. Loose governance creates risk.

The answer isn’t to choose one. It’s to segment by risk and optimize differently:

  • Low-risk agents (internal, read-only, no PII) → Fast approval, self-service, minimal monitoring
  • Medium-risk agents (internal writes, orchestration) → Automated security review, standard approval SLA
  • High-risk agents (external-facing, PII access, financial impact) → Full compliance review, ongoing monitoring, incident response plans

Where we failed initially: We treated all agents as high-risk because we were scared of the unknown. That made the approval process so painful that teams bypassed it entirely.

Where we’re succeeding now: We made the low-risk path almost effortless (2-minute registration, instant approval) so teams developed the muscle memory of registering agents. By the time they needed high-risk agents, going through governance felt normal, not punitive.

What I’d Add to Your “Still Figuring Out” List

You mentioned struggling with tracing agent behavior back to model versions and prompts. This is huge. When an agent misbehaves, we need to reconstruct:

  • What version of the model made the decision?
  • What was the prompt or context that led to the behavior?
  • What data did it access to reach that conclusion?

We’re experimenting with structured logging requirements for agent deployments:

  • Every agent logs: agent_id, model_version, prompt_hash, decision_rationale, data_accessed
  • These logs feed into our existing observability stack (we use Datadog, but any APM works)
  • We can trace from “this API call happened” back to “this agent with this model version made this decision based on this context”

It’s not perfect — some frameworks don’t expose enough detail — but it’s better than the black box we had before.

The Real Question: Treating Agents as Employees or Tools?

You asked: “Are we ready for non-human workers with the same access privileges as senior engineers?”

I think the framing is slightly off. The question isn’t whether we’re ready — we’re already there. The question is: Do we treat agents like employees (with identity, permissions, accountability) or like tools (with usage policies and owner responsibility)?

My take: Agents are neither employees nor tools. They’re a new category: autonomous delegates.

They act on behalf of humans but make independent decisions. They need identity systems that reflect that hybrid nature — more sophisticated than API keys, but more flexible than employee IAM.

Thanks for starting this conversation. I’d love to hear how other platform leaders are thinking about this.

Reading this as someone who’s not a platform engineer but has built agents for our design team, I have a very different perspective:

I Had No Idea This Was Supposed to Be Governed

Seriously. I built an agent that reads Figma files, summarizes design changes, and posts updates to Slack. It uses:

  • Figma API (my personal token)
  • Slack webhook (from our team channel)
  • OpenAI API (company account, but I set it up)

I thought of this as “automating a workflow,” not “deploying an autonomous system that requires IAM governance.” It runs on my laptop. It’s Python scripts. It felt like the kind of thing I’d write in an afternoon, not something that needs security review.

But reading your post… I’m exactly the problem you’re describing. I bypassed all the governance (that I didn’t know existed) because it was easier to ship than to ask permission.

Why Designers (and Other Non-Engineers) Are Blind to IAM

Here’s what I didn’t understand:

  1. That my Figma token has broad permissions — I thought “it’s just my files” but it actually accesses everything I have permission to see, including client work and internal designs
  2. That Slack webhooks can’t be revoked individually — If I leave the company, that webhook still works until someone manually disables it
  3. That the OpenAI API sees all the prompts — I’m sending design feedback (which includes client names and project details) to an external service
  4. That “on my laptop” still counts as production — The agent runs 24/7 via cron. It’s not “just a script” anymore.

If your platform engineering team built the perfect agent identity system, would I have used it? Honest answer: Only if someone told me it existed and why I needed it.

What Would Have Made Me Compliant

I’m not trying to bypass security. I just didn’t know I was doing anything risky. Here’s what would have helped:

1. A “Should I register this?” self-assessment quiz

Something like:

  • Does your script access APIs with your personal credentials? → Yes
  • Does it run automatically (cron, webhook, scheduled)? → Yes
  • Does it access data outside your personal workspace? → Yes
  • → This should be registered as an agent. Here’s how: [link]

Make it visual, make it fast, make it feel helpful (not punitive).

2. Built-in agent scaffolding

What if your CLI tool had:

agent init design-slack-updater
# → Creates agent template with identity registration
# → Walks through permission scoping
# → Sets up logging and monitoring by default

Instead of me writing a Python script from scratch with requests.get() and hardcoded tokens, I start with a template that does the right thing.

3. Friendly error messages when I do it wrong

My Figma API token could have returned:

Error 403: This personal access token is being used in an automated context.
Personal tokens are for manual use only.
Create an agent identity instead: https://docs.internal/agent-setup

Instead, it just worked. There was no signal that I was doing anything wrong.

The Design-Engineering Gap on “Agents”

I think there’s a vocabulary mismatch here. When engineers say “agents,” they mean autonomous AI systems that need governance. When I say “agent,” I mean “a script that saves me 30 minutes a day.”

To me, this isn’t AI. It’s automation. It’s the same category as Zapier or IFTTT. I wouldn’t think to file a security review for a Zapier integration (though maybe I should?).

How do you bridge that gap? How do you make non-engineers understand that their “just a script” is actually “an autonomous system with access to production data”?

The Brutal Practical Question

You wrote: “If your CI/CD pipeline blocks deployments that fail security scans, should it also block agents that exceed permission boundaries?”

From a design/product perspective, my answer is: Yes, but only if the error message teaches me how to fix it.

If the pipeline blocks my agent and says “Permission denied,” I’ll assume it’s broken and work around it (run it locally, use a different token, whatever).

If the pipeline blocks my agent and says:

🚫 This agent requires database access but hasn't been approved.
Reason: Agents with data access need security review to prevent accidental leaks.
Next steps:
  1. Register agent: agent-cli register design-slack-updater
  2. Request permission: agent-cli request-permission --resource=database --justification="summarize design changes"
  3. Estimated approval time: 10 minutes for read-only, 2 days for write access

Need help? Slack #agent-help or https://docs.internal/agent-faq

Now I understand why it’s blocked, what to do, and how long it takes. I’m way more likely to follow the process.

What I Wish Platform Teams Understood

We’re not trying to be malicious or reckless. We’re trying to ship value quickly and didn’t realize we were touching infrastructure that needs governance.

The best way to get us compliant isn’t stricter policies — it’s making the secure path obvious and easy.

  • If I have to read 20 pages of docs to register an agent, I won’t do it.
  • If I can run agent init and answer 5 questions, I will.

The gap isn’t between “governed” and “ungoverned.” It’s between “people who know agents need governance” and “people who don’t even know their script is an agent.”

Make the education part of the tooling, not a separate prerequisite.


(Also, uh, I should probably go register my Figma-Slack agent now. :sweat_smile: Where do I start?)