I’ve been a security engineer long enough to remember when security policies lived in Word documents on SharePoint, reviewed annually by a committee that hadn’t written code in a decade. Policy-as-Code is the antidote to that world, and after two years of implementing it, I’m convinced it’s the future of security enforcement. But it comes with a set of problems that the tooling vendors don’t talk about.
What Policy-as-Code Actually Means
Policy-as-Code means defining security policies — network rules, access controls, compliance checks, deployment requirements — as actual code that’s versioned in Git, tested in CI, and deployed automatically. Instead of a PDF that says “all containers must run as non-root,” you write a Rego policy (Open Policy Agent), a Kyverno rule (Kubernetes), a Sentinel policy (Terraform), or a Cedar policy (AWS) that enforces that requirement automatically.
The tooling ecosystem has matured significantly:
- Open Policy Agent (OPA) with Rego for general-purpose policy evaluation — Kubernetes admission control, API authorization, data filtering
- Kyverno for Kubernetes-native policy enforcement — simpler syntax than Rego, native resource mutation
- Sentinel for HashiCorp ecosystem — Terraform plan validation, Vault access policies
- Cedar (AWS) for fine-grained authorization — IAM-level policies with formal verification
The Promise Delivered
At my company, we implemented OPA for Kubernetes admission control, and the results have been transformative. Before OPA, security violations were caught in post-deployment audits — meaning non-compliant workloads ran in production for days or weeks before someone noticed. Now, 95% of security violations are caught before deployment, at the admission control layer.
No containers running as root. No pods without resource limits. No services exposed without TLS. No images from untrusted registries. All enforced automatically, consistently, 24/7. No exceptions, no drift, no “I’ll fix it next sprint.”
The developer experience improved too. Instead of getting a security review rejection 3 days after submitting a PR, developers get immediate feedback: “Your deployment was rejected because policy X requires Y.” They fix it, redeploy, and move on. Security reviews that used to take days now take seconds.
The New Problem: Who Writes the Policies?
Here’s where it gets complicated. The security team writes the policies, but we don’t understand all the application requirements. When a policy blocks a legitimate deployment, developers come to us and say “fix your policy.” We say “change your architecture.” Neither side understands the other’s constraints, and the result is friction.
Example: we wrote a network policy that restricted cross-namespace communication in Kubernetes. Perfectly reasonable from a security standpoint — blast radius containment. But a new feature required a service in the orders namespace to communicate with a service in the inventory namespace. Our policy blocked it. The development team didn’t know the policy existed until their deployment failed in staging. They escalated. We debated. It took two days to resolve — a policy exception plus an architecture review.
This happens weekly.
The Policy Testing Challenge
Policies need tests, just like application code. But testing a policy means anticipating every possible input. You’re not testing a function that takes integers — you’re testing a rule that evaluates against the entire space of possible Kubernetes manifests, Terraform plans, or API requests.
We test our policies against a library of known-good and known-bad configurations. But the edge cases are infinite. We missed an edge case in a network policy that blocked cross-namespace communication needed for a new observability pipeline, causing a 4-hour production incident. The policy was correct according to its specification — but the specification didn’t account for observability infrastructure that legitimately needed cross-namespace access.
Testing policies against static fixtures isn’t enough. You need to test against production-like configurations, which means maintaining a realistic test environment that evolves with your infrastructure. It’s a significant investment.
The Governance Gap
Application code has clear ownership: the feature team. Infrastructure code has clear ownership: the platform team. But policy code sits in a no-man’s-land between security, platform, and application teams. Nobody owns it definitively.
When a policy needs to change:
- Security team says “we set the security intent”
- Platform team says “we implement the enforcement mechanism”
- Application team says “we need exceptions for our use case”
Who approves the change? Who reviews the PR? Who’s on-call when a policy causes an incident? In practice, all three teams need to be involved, which means policy changes move slowly and require coordination across multiple teams.
My Proposal: Co-Owned Policy Repos
I’m pushing for a model where:
- Security team defines policy intent and sets minimum security baselines
- Platform team implements enforcement (OPA/Kyverno configuration, admission controller setup)
- Application teams can submit exception requests as PRs, reviewed by both security and platform
- All policies have owners, expiration dates, and mandatory review cycles
- Policy changes require approval from at least one security engineer and one platform engineer
It’s not perfect — it’s slow and coordination-heavy. But it ensures that policies reflect real-world requirements, not just theoretical security postures.
Is anyone doing policy-as-code well? How do you handle the ownership question? And how do you test policies against the combinatorial explosion of possible inputs? I’d love to hear what’s working and what isn’t.