Infrastructure-as-Intent — Spacelift Bypasses Terraform Entirely With Natural Language Cloud Provisioning

The demo that turned heads at KubeCon this year wasn’t a new container orchestrator or a service mesh — it was Spacelift Intent translating a plain English sentence directly into cloud provider API calls. No HCL. No Terraform plan. No generated intermediate code. You describe what you want:

“Create an S3 bucket with versioning enabled, encrypted with a KMS key, and a lifecycle policy that moves objects to Glacier after 90 days.”

And the system provisions it directly through the AWS API. It runs as an MCP (Model Context Protocol) server that plugs into AI coding assistants — Cursor, Claude Code, Windsurf — while enforcing the same policy guardrails and audit trails as traditional Infrastructure-as-Code pipelines.

Let me explain why this is conceptually significant before I tear into the practical problems.

The Promise: Skipping the Code Layer

Infrastructure-as-Code was a revolution because it replaced clicking through the AWS console with version-controlled, reviewable, repeatable configurations. Terraform and CloudFormation gave us reproducibility and auditability. But IaC has its own substantial problems that we’ve collectively normalized:

  • HCL is a domain-specific language that takes months to learn. It’s not programming, it’s not configuration, it’s something in between — and it has its own weird quirks (count vs. for_each, anyone?).
  • Terraform state management is error-prone. State file locking, state drift, importing existing resources, moving resources between state files — these are entire categories of infrastructure incidents.
  • The gap between “what I want” and “how to express it in HCL” is significant. You want a VPC with public and private subnets across 3 AZs. The mental model is simple. The Terraform code is 200+ lines of resource blocks, data sources, and variable references.

Infrastructure-as-Intent skips the code layer entirely. You express intent, and the system figures out the implementation. This is the same conceptual leap that SQL made over procedural data access — you declare what you want, not how to get it.

My Honest Evaluation After Three Weeks of Testing

The happy path demos are genuinely impressive. I tested common infrastructure patterns — VPCs, Application Load Balancers, RDS instances, S3 buckets with various configurations, ECS services, CloudFront distributions — and they provisioned correctly from natural language descriptions about 85% of the time. For standardized patterns, this works.

But edge cases are where it falls apart, and infrastructure is mostly edge cases in production:

1. Complex networking configurations require precision that natural language doesn’t convey well. When I said “create a VPC with three private subnets using 10.0.0.0/16,” it created the subnets — but it chose its own CIDR blocks for each subnet. My team has a specific CIDR allocation scheme that maps to our network topology documentation. Natural language can express this, but it becomes so verbose and detailed that you’ve essentially written a specification document, not a casual description. At that point, you might as well write the Terraform.

2. Drift detection and state management remain unsolved. If someone modifies the infrastructure manually — and they will, during incidents — how does the intent system know? Terraform tracks state explicitly through its state file. Intent-based systems need a fundamentally different reconciliation model. Spacelift’s approach is to scan existing infrastructure and compare it against the “intent history,” but this is fuzzy matching at best. Terraform’s state comparison is deterministic; intent-based reconciliation is probabilistic.

3. Debugging failures is significantly harder. When a Terraform plan fails, I can read the HCL, understand the dependency graph, and identify the issue. When an intent-based provisioning fails, the error exists in the gap between my description and the system’s interpretation — a much harder debugging surface. I described a Lambda function with a VPC configuration, and it created the function but put it in the wrong subnets. The “why” required understanding the AI’s interpretation of “the private subnets,” which isn’t inspectable the way a Terraform resource reference is.

The Governance Question

Intent-based provisioning actually has potential security advantages. The system can enforce policies at the intent level — “no public S3 buckets ever,” “all databases must have encryption at rest” — rather than at the code level, where you’re scanning for specific HCL patterns that might be expressed differently across modules. Policy enforcement against intent is conceptually cleaner than policy enforcement against implementation.

But it introduces a new trust boundary. You’re trusting the AI to interpret your intent correctly and not provision something you didn’t ask for. In traditional IaC, you review the plan before applying — terraform plan shows you exactly what will be created, modified, or destroyed. In intent-based provisioning, the “plan” is the AI’s internal interpretation, which is opaque. Spacelift shows a preview of the API calls it intends to make, which helps, but it’s not the same as reading a Terraform plan that maps directly to your code.

Who This Is Actually For

Quick prototyping and development environments? Absolutely. Standardized infrastructure patterns where the intent is clear and the implementation is well-understood? Yes. Spinning up a demo environment with standard components? Perfect use case.

For production infrastructure with complex requirements, regulatory constraints, and precise networking needs? Traditional IaC is still safer. The determinism and auditability of Terraform outweigh the convenience of natural language for high-stakes infrastructure.

The question I keep coming back to: would you trust natural language to provision your production infrastructure, or does IaC fundamentally need the code layer for safety?

The skills gap angle is the one that resonates most with me as someone managing engineering teams at scale.

We have 30 engineers but only 4 who are comfortable writing Terraform. Infrastructure changes are bottlenecked on those 4 people. Every sprint, we have tickets sitting in the “waiting for infra” column because someone needs a new environment, a modified security group, or an additional cache cluster, and the Terraform-capable engineers are already overcommitted. The backlog isn’t a prioritization problem — it’s a skills concentration problem.

If intent-based provisioning could handle 60-70% of common infrastructure requests — spinning up development environments, configuring standard services, creating monitoring dashboards, provisioning standard database instances — it would unblock the other 26 engineers for self-service infrastructure. They know what they need. They just can’t express it in HCL, and the ramp-up time to learn Terraform well enough to not create security issues is 3-6 months.

The risky 30-40% — networking changes, security group modifications, production database configurations, IAM policy updates — would still go through the Terraform experts with full plan review and approval workflows. This isn’t a novel concept. We already use a tiered approach for database access: engineers can query read replicas self-service, but production write access requires DBA approval. Apply the same pattern to infrastructure provisioning.

The key insight is that you don’t need to trust intent-based provisioning for everything to get value from it. If it handles the routine 70% reliably, you’ve dramatically reduced the bottleneck on your infrastructure team while keeping expert review for the high-risk 30%. That’s a pragmatic middle ground between “AI provisions everything” and “only Terraform experts touch infrastructure.”

The real question is whether the boundary between “safe for self-service” and “requires expert review” can be enforced programmatically, or whether it requires human judgment to determine which category a request falls into.

The audit trail question is the one that keeps me up at night with these intent-based tools.

In traditional IaC, the audit trail is clear and complete:

  1. Commit — someone wrote specific HCL code and submitted it
  2. PR review — another engineer reviewed the exact changes
  3. Plan output — Terraform showed exactly what would be created, modified, or destroyed
  4. Apply log — a complete record of every API call made and its result

Every decision is documented in version control. If an auditor asks “why does this S3 bucket have public read access?”, I can trace it back to a specific commit, a specific PR, a specific reviewer, and a specific apply. The chain of accountability is unbroken.

With intent-based provisioning, the trail becomes:

  1. Natural language description → 2. ??? → 3. Infrastructure exists

What happened in step 2? What specific API calls were made? What permissions were set? What default values did the AI choose when the natural language description was ambiguous? What the AI “decided” isn’t captured in a reviewable format that satisfies SOC 2, HIPAA, or PCI-DSS audit requirements.

Before any organization adopts this for production — and I mean any production workload, not just high-security ones — they need:

(1) Complete API call logging independent of the tool. AWS CloudTrail, GCP Audit Logs, Azure Activity Log. Don’t trust the provisioning tool to log its own actions — trust the cloud provider’s immutable audit log. This should already be in place, but it becomes critical when the provisioning mechanism is opaque.

(2) Post-provisioning compliance scanning. Does what was created actually match what was intended? Run a tool like Prowler, ScoutSuite, or Checkov against the provisioned infrastructure immediately after creation. The intent was “private S3 bucket” — verify it’s actually private. The gap between intent and implementation is exactly where security vulnerabilities hide.

(3) Human approval gates for anything touching production. The convenience of natural language doesn’t justify losing the human review step. If the intent-based system generates a “plan” of API calls it intends to make, that plan needs human review before execution. This is non-negotiable for production infrastructure.

The convenience of skipping the code layer doesn’t justify losing the auditability that IaC provides. If anything, the opacity of AI interpretation increases the need for independent verification, not decreases it.

I see this as the natural evolution, not a replacement. And the history of infrastructure management supports this view.

We went from manual provisioning (clicking in consoles) → scripts (bash/Python automating API calls) → IaC (Terraform/CloudFormation with declarative configs) → GitOps (ArgoCD/Flux with git as the source of truth). Each layer built on the previous one and didn’t eliminate it. We still write scripts for edge cases. We still use raw API calls for debugging. We still click in the console during incident response when we need to act faster than a pipeline allows.

Intent-based provisioning is the next layer. It will handle the 70% of infrastructure that’s well-understood patterns — the standard VPCs, the common database configurations, the typical load balancer setups — and IaC will handle the 30% that requires precision, custom networking, and regulatory compliance.

The Spacelift approach of running as an MCP server within existing AI workflows is strategically smart. It meets developers where they already are — inside their AI coding assistants — rather than requiring them to adopt a new tool, learn a new interface, or change their workflow. The best infrastructure tooling is invisible; it’s embedded in the workflow developers are already using.

What I’d watch for is how the ecosystem evolves around intent validation. Just like Terraform developed terraform validate, tflint, and Sentinel policies, intent-based systems will need their own validation layer: “Does this natural language description unambiguously specify the intended infrastructure?” If the description is ambiguous, the system should ask clarifying questions rather than making assumptions. The quality of the system will ultimately be measured not by how well it handles clear descriptions, but by how gracefully it handles ambiguous ones.

The companies that win in this space will be the ones that build the trust layer — showing users exactly what the system interpreted, what it plans to do, and getting explicit confirmation before executing. Trust is the bottleneck, not technology.