Skip to main content

Structured Output Is Not Validated Output

· 9 min read
Tian Pan
Software Engineer

The day your team turns on schema-constrained decoding feels like a milestone. The parsing errors stop. The JSONDecodeError alerts go quiet. The flaky regex that scraped fields out of prose gets deleted. Someone says "the model returns valid JSON now" in standup, and the structured-output ticket gets closed.

That sentence is where the trouble starts. "The model returns valid JSON now" is the beginning of correctness work, not the end of it. JSON mode and constrained decoding guarantee the shape of a response — that quantity is an integer, that status is one of three enum values, that the object has the keys you asked for. They guarantee nothing about whether quantity is the right number, whether status reflects what actually happened, or whether the sku field points at a product that exists in your catalog.

The schema is a type signature. It tells you the function returns an int. It does not tell you the int is correct. Treating a passing schema check as a passing correctness check is the single most common way structured-output pipelines ship well-formed nonsense to production — and because the nonsense is well-formed, your parser, your type checker, and your dashboards all stay green while it happens.

What the schema actually proves

Constrained decoding works by masking tokens. At each generation step, the decoder computes which next tokens would keep the output on a path that can still satisfy the grammar, sets the probability of every other token to zero, and renormalizes. The model literally cannot emit a curly brace in the wrong place or a string where the schema demands a number.

This is a real guarantee, and it is worth having. But notice exactly what it covers: structural conformance. The grammar knows that price must be a number. It does not know that the price of this product is 49.99. It knows currency must be a three-letter enum. It does not know the order was placed in euros. The constraint operates on token syntax; the correctness lives in semantics the grammar was never given.

There is a subtler problem hiding inside the mask. The model's next-token probabilities are computed without awareness of the constraint. The decoder applies the mask after the fact. So when the model "wants" to say something the grammar forbids, constrained decoding doesn't make it reconsider — it just forces the highest-probability allowed token. Enforcing a constraint can change not only the format of an answer but its substance. A model that, left alone, would have written "unknown" for a field it genuinely cannot determine may instead be forced to pick a concrete enum value, because "unknown" isn't in the enum and the grammar has to put something there. You asked for a clean schema and got a confident wrong answer in exchange.

The canonical illustration is a numeric field. You define speed as a number. The model has internalized that the answer is "fast" — a descriptive notion, not a measured one. Without a constraint it writes {"speed": "fast"}. With one it cannot, so it emits {"speed": 9999} or {"speed": 0} — a number, schema-valid, and meaningless. The format error you used to catch is now a semantic error you won't.

The schema-on-read trap

Here is the failure pattern I see most often. A team adopts structured outputs, wires the response straight into a Pydantic model or a TypeScript interface, and the moment that deserialization succeeds, the data is treated as trusted. The validator and the parser have been silently merged into one step. Parsing succeeded, therefore the data is good.

But parsing only proves the bytes matched the type. It is "schema-on-read" in the loosest sense: you read the field as a string and now you believe it. Call this the well-formed nonsense class of bug. Every value is the right type. Every required key is present. Every enum is a legal enum. And the record is still wrong:

  • sku is a perfectly formatted string — "PRD-44182" — that matches no row in the products table.
  • ship_date is a valid ISO-8601 date — 2026-02-30 — that does not exist on a calendar.
  • discount_pct is a number, 140, when the only meaningful range is 0 to 100.
  • assignee is a valid enum member your prompt listed, but the downstream service was decommissioned last quarter and no longer routes to it.
  • total is a number, and it does not equal sum(line_items), because the model computed it independently.

None of these trip a schema check. All of them are bugs. The schema validated that you received an object of the right kind; it could not validate that the object describes something real. The trap is that the green parse result looks like a validation pass, so the team that would never trust an unvalidated user form will happily trust an unvalidated model output — because the model output came with a schema attached.

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates