LLMs Invent Plausible-Looking Kubernetes API Fields That Pass Linting But Fail in Production — Non-Deterministic Code Validation is Broken

I’ve been auditing AI-generated Kubernetes manifests for clients, and I keep finding the same problem: LLMs confidently generate YAML fields that don’t exist.

The Problem

AI coding assistants generate Kubernetes manifests that:

  • Look syntactically correct
  • Pass basic YAML linting
  • Use plausible-sounding API fields
  • Fail silently or cause unexpected behavior in production

Real Examples I’ve Found

Example 1: Invented field names

spec:
  containers:
    - name: app
      resources:
        guaranteedMemory: "512Mi"  # Not a real field

The actual fields are requests and limits. The AI invented guaranteedMemory because it sounds reasonable. Kubernetes ignores unknown fields in many contexts — no error, just no memory guarantee.

Example 2: Wrong API version features

apiVersion: apps/v1
kind: Deployment
spec:
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
      pauseSeconds: 30  # Not a real field in apps/v1

The AI combined knowledge from different API versions or related projects. pauseSeconds isn’t a thing, but it sounds like it should be.

Example 3: Plausible security settings

spec:
  securityContext:
    enforceSELinux: true  # Not a real field
    auditMode: strict     # Not a real field

These sound like security best practices. An engineer who doesn’t know the exact SecurityContext schema might accept them. The cluster accepts them (and ignores them).

Why This Is Dangerous

1. Silent failures:
Kubernetes often ignores unknown fields rather than rejecting them. Your “security hardening” might be doing nothing.

2. False confidence:
You think you configured something correctly. Tests pass (because the field is ignored). Production behaves unexpectedly.

3. Non-deterministic validation:
The same AI might generate different fields for the same request. You can’t rely on consistency.

The Validation Gap

Traditional validation catches:

  • Syntax errors
  • Type mismatches
  • Required field missing

Traditional validation misses:

  • Unknown fields (often silently accepted)
  • Semantically wrong values
  • Fields that exist but don’t do what you think

What I Recommend

  1. Strict schema validation - Use kubectl --dry-run=server or similar to validate against the actual cluster API
  2. Schema-aware linting - Tools like Kubeconform or Datree that know the real API schema
  3. Human review of AI-generated infra - Don’t trust the AI more than you’d trust a junior engineer
  4. Test in staging first - Obvious, but critical when you can’t fully trust your manifests

Questions for Platform Engineers

  1. How do you validate AI-generated Kubernetes configs?
  2. Have you encountered “invented field” issues in production?
  3. What tooling helps catch these problems?

@security_sam, this is exactly the kind of thing that keeps platform engineers up at night. Let me share what we’ve implemented to catch these issues.

Our validation pipeline:

We have a multi-layer validation approach in our GitOps pipeline:

Layer 1: Schema validation (Kubeconform)

kubeconform -strict -schema-location default -schema-location 'https://raw.githubusercontent.com/...'

The -strict flag is crucial — it rejects unknown fields instead of ignoring them.

Layer 2: Policy validation (OPA/Gatekeeper)
We have policies that enforce our specific requirements:

  • Required labels
  • Resource limits within bounds
  • Security context requirements

Layer 3: Server-side dry-run

kubectl apply --dry-run=server -f manifest.yaml

This catches things that schema validation misses because it validates against the actual cluster API.

What we’ve caught:

Since implementing strict validation:

  • 47 “invented field” issues in the first month
  • 12 API version mismatches
  • 8 deprecated field usages

The cultural challenge:

The hardest part wasn’t the tooling — it was getting developers to not be annoyed when their AI-generated manifests fail validation. We had to frame it as “the validation caught a real issue” rather than “the validation is being pedantic.”

My recommendations:

  1. Make validation fast — if it takes 30 seconds, people will skip it
  2. Provide actionable error messages — “Field ‘guaranteedMemory’ is not valid. Did you mean ‘resources.requests.memory’?”
  3. Integrate into IDE — catch issues before commit, not in CI
  4. Track metrics — how many AI-generated configs fail validation? This tells you whether AI quality is improving.

One warning:

Server-side dry-run requires cluster access from CI. This has security implications. We use a dedicated, limited-permission service account.

This hits close to home. Last month I spent two hours debugging why a CronJob wouldn’t trigger. The LLM-generated YAML had schedule: "0 */6 * * *" which looked fine, but it also included concurrencyPolicy: Forbid paired with successfulJobsHistoryLimit: 3 and failedJobsHistoryLimit: 1 - all valid fields.

The actual bug? It generated startingDeadlineSeconds: 0 instead of omitting it or using a reasonable value. With 0, if the controller missed the scheduled time by even a millisecond (which happens during cluster upgrades), the job would never run. Kubernetes silently skipped it.

What I’ve started doing:

  1. Diff against known-good configs - I keep a library of production-tested manifests and diff LLM output against them
  2. Question “helpful” additions - LLMs love adding fields. If I didn’t explicitly ask for startingDeadlineSeconds, why is it there?
  3. Test in kind clusters first - Spin up a local cluster, apply, and actually watch the behavior

The frustrating part is these aren’t syntax errors. kubectl apply --dry-run=server passed. The schema was valid. The semantics were wrong.

For anyone using Copilot or similar tools for K8s work: treat every generated field as a potential landmine. The model doesn’t understand what these fields do, only what patterns it’s seen.

From a leadership perspective, this is a training and process problem, not just a tooling problem.

We’ve had to rethink how we onboard engineers to use AI coding assistants for infrastructure work. The mental model shift is crucial: LLMs are pattern matchers, not reasoning engines. They’ll generate configurations that look statistically likely based on training data, not configurations that are semantically correct for your specific use case.

What we’ve implemented:

  1. Mandatory review checklist for AI-generated K8s configs - Every PR with generated manifests requires the author to annotate which fields they explicitly requested vs which the LLM “helpfully” added

  2. Production vs development divergence tracking - We noticed LLMs tend to generate configs that work in dev (permissive resource limits, no pod disruption budgets) but fail production requirements

  3. “Explain this field” culture - If you can’t explain why a field exists and what value it should have, it gets removed. LLMs add noise; humans must curate.

The deeper issue Sam raised about non-determinism is real. We’ve seen the same prompt produce different outputs across team members, leading to config drift before anything even hits the cluster.

My guidance to the team: use AI for boilerplate scaffolding, then strip it down to essentials. Start minimal and add fields with intention, rather than accepting a “complete” config and hoping nothing’s wrong.