The skills gap discussion surfaced something nobody wants to say out loud, so I’ll say it:
AI agents need “clear governance and data access controls”—but nobody’s published a working framework yet. What are you actually implementing?
The Gap Between Talk and Action
Every platform engineering article says:
- “Implement RBAC for AI agents”
- “Add audit logging and governance policies”
- “Create human-in-loop approval gates”
Great. HOW?
Our compliance team is asking questions I can’t answer:
- What data can this AI agent access?
- Who’s responsible when it makes a mistake?
- How do we audit agent decisions?
- Can we prove compliance to regulators?
What We’re Trying (The Messy Reality)
Attempt 1: Treat agents like service accounts
- Created RBAC policies for agents
- Result: Too restrictive—agents couldn’t do their jobs
- Agents need dynamic access (learn what they need), not static permissions
Attempt 2: Wide open access with logging
- Let agents access what they need, log everything
- Result: Security nightmare, compliance team shut it down
- One agent racked up $8K in API costs in 3 hours
Attempt 3: Human-in-loop for everything
- Require human approval for all agent actions
- Result: Defeats the purpose of AI agents (not faster than humans doing it manually)
Current state: Somewhere between Attempt 2 and 3
- Agents can act autonomously in sandbox/dev
- Prod requires human approval for destructive actions
- Cost limits per agent per hour ($100 ceiling)
- Full audit trails in Datadog
But this feels like duct tape, not a framework.
The Questions I Need Answers To
For those who’ve shipped AI agents to production:
-
Permission models: Static RBAC? Dynamic policies? How do agents “request” access they discover they need?
-
Blast radius control: How do you limit damage from a misbehaving agent? Rollback mechanisms?
-
Cost governance: What’s your approach? Hard limits? Soft warnings? How do you balance cost control with agent effectiveness?
-
Audit/compliance: What are you logging? How do you prove to auditors that agent actions were appropriate?
-
Failure attribution: When an agent breaks something, who’s responsible? The agent? The platform team? The developer who invoked it?
My Fear
We’re all making this up as we go, building ad-hoc policies, and creating tomorrow’s compliance disasters.
Has anyone actually built a comprehensive AI agent governance framework? Or are we all just shipping and hoping for the best?
Because if AI governance and data controls are critical but no working playbook exists, we need to start sharing notes.
Michelle Washington | CTO | Seattle, WA