I just got approval from our board to build what I’m calling our “AI Control Plane”—and I know some of you are going to think I’m creating bureaucracy where we need speed. But hear me out.
Three months ago, we had 50+ engineers using 12 different AI coding assistants, LLM APIs, and agent frameworks. Zero visibility. Zero governance. Zero idea what data was being sent where. When our security team asked “can you audit AI tool usage?” the answer was “…we’ll get back to you.”
Last week, I mandated that all AI tool usage goes through a centralized control plane managed by our platform team. The reaction was… mixed. Some engineers accused me of killing innovation. Others thanked me for finally addressing what they saw as a compliance disaster waiting to happen.
The 10× Retrofit Cost Is Real
Here’s what convinced me we couldn’t wait: I talked to three CTOs who retrofitted AI governance after an incident. One had an agent leak PII into training data. Another had a developer accidentally expose API keys through an AI chat log. The third had a compliance audit fail because they couldn’t demonstrate data lineage for AI-generated code.
In all three cases, the retrofit cost 10× what proactive governance would have cost. Not just in engineering time—in legal reviews, customer trust rebuilding, talent drain from frustrated engineers, and opportunity cost while everything was locked down.
Our Architecture: Centralized Control, Decentralized Innovation
I’m not trying to control which AI tools engineers use. I’m trying to ensure that whatever they use goes through a governed layer. Our architecture:
Centralized (Platform Team Owns):
- Authentication and authorization for all AI services
- Centralized logging and observability of AI interactions
- Policy enforcement (data classification, PII filtering, rate limiting)
- Security scanning of prompts and responses
- Cost tracking and budget controls
Decentralized (Engineering Teams Choose):
- Which AI coding assistant (GitHub Copilot, Cursor, etc.)
- Which LLM API for specific use cases
- Agent frameworks and implementation patterns
- Tool selection within approved categories
We’re implementing what I’m calling the “four pillars” based on recent CNCF research:
- Golden paths: Pre-approved AI tool configurations
- Guardrails: Policy enforcement at runtime
- Safety nets: Monitoring and alerting for anomalies
- Manual review: Human-in-the-loop for high-risk operations
The Galileo Agent Control release from earlier this year is inspiring our approach—write behavioral policies once, enforce across all agent deployments. We’re building on Kong API Gateway + Open Policy Agent + DataDog observability.
What I’m Struggling With
-
The bottleneck risk: Will centralized governance slow down engineering velocity so much that we lose our competitive edge?
-
The staffing challenge: Who owns this? Platform team is already stretched. Security team doesn’t have AI expertise. Do we hire a dedicated AI governance team?
-
The tool integration problem: Not all AI tools have APIs we can intercept. How do you govern local IDE assistants that call LLMs directly?
-
The buy-in problem: How do I get engineers excited about governance instead of seeing it as a blocker?
For those of you who’ve built centralized AI governance: What does your architecture look like? What am I missing? What should I prioritize in the first 90 days?
And for those who think I’m making a mistake—tell me why. I’d rather hear the counterarguments now than discover them the hard way six months from now.
Related reading that shaped my thinking: