7 posts tagged with "structured-output"

Streaming Structured Output: Why Your Parser Hangs on Token 47

May 9, 2026 · 11 min read

Software Engineer

The first time a team builds a streaming AI feature with structured output, the bug is always the same. The model is generating fine. The chunks are arriving fine. But somewhere around token 47, the parser hangs, the UI freezes, or — worse — a half-formed enum value gets routed to a downstream tool that quietly does the wrong thing. The team adds a try/catch around JSON.parse, considers themselves done, and ships. Two weeks later, a sibling team complains that the streaming UI feels janky after the response gets long. A quarter later, an incident review asks why a "Delete" tool call fired on a record that the model was still describing as "DeleteIfEmpty."

The bug is not in any single token. The bug is that token-streaming and structured output are architecturally at odds, and most frameworks paper over the conflict with prayer. A schema says "this is a complete object." A token stream says "here are the bytes one at a time." Every intermediate state between those two endpoints is, by definition, invalid against the schema. The team's job is to decide what to do during those intermediate states — and most teams have not made that decision explicitly.

Streaming JSON Parsers: The Gap Between Tokens and Typed Objects

April 27, 2026 · 12 min read

Tian Pan

Software Engineer

The model is emitting JSON token by token. Your UI wants to render fields the moment they materialize — a confidence score before the long answer body, the arguments of a tool call as the model fills them in. Then someone wires up JSON.parse on every chunk and the whole thing falls over, because JSON.parse is all-or-nothing. It needs a balanced document to return anything. Until the model emits the closing brace, you have nothing to show.

This is not a parser problem you can fix with a try/catch. The standard JSON parser was designed against a content-length-known HTTP response. Partial input is not a state it models — it is "input error." When you treat a token stream as if it were an HTTP body, you inherit thirty years of "the document is either complete or invalid," and your UI pays the bill.

Structured Output Reliability in Production: Why JSON Mode Is Not a Contract

April 20, 2026 · 8 min read

Tian Pan

Software Engineer

A team ships a document extraction pipeline. It uses JSON mode. QA passes. Monitoring shows near-zero parse errors. Six weeks later, a silent failure surfaces: every risk assessment in the corpus has been marked "low" — valid JSON, correct field names, wrong answers. The pipeline has been confidently lying in a schema-compliant format for weeks.

This is the core problem with treating JSON mode as a reliability guarantee. Structural conformance and semantic correctness are different properties of a system, and confusing them is one of the most expensive mistakes in production AI engineering.

LLMs as Universal Protocol Translators: The Middleware Pattern Nobody Planned For

April 12, 2026 · 11 min read

Tian Pan

Software Engineer

Every integration engineer has stared at two systems that refuse to talk to each other. One speaks SOAP XML from 2008. The other expects a REST JSON payload designed last quarter. The traditional fix — write a custom parser, maintain a mapping layer, pray nobody changes the schema — works until the third or fourth system enters the picture. Then you're maintaining a combinatorial explosion of translation code that nobody wants to own.

Teams are now dropping an LLM into that gap. Not as a chatbot, not as a code generator, but as a runtime protocol translator that reads one format and writes another. It works disturbingly well for certain use cases — and fails in ways that are genuinely dangerous for others. Understanding the boundary between those two zones is the entire game.

JSON Mode Won't Save You: Structured Output Failures in Production LLM Systems

April 9, 2026 · 9 min read

Tian Pan

Software Engineer

When developers first wire up JSON mode, the response feels like solving a problem. The LLM stops returning markdown fences, prose apologies, and curly-brace-adjacent gibberish. The output parses. The tests pass. Production ships.

Then, three weeks later, a background job silently fails because the model returned {"status": "complete"} when the schema expected {"status": "completed"}. A data pipeline crashes because a required field came back as null instead of being omitted. An agent tool-call loop terminates early because the model embedded a stray newline inside a string value and the downstream parser choked on it.

JSON mode guarantees syntactically valid JSON. It does not guarantee that the JSON means what you think it means, contains the fields your application expects, or maintains semantic consistency across requests. These are different problems, and they require different solutions.

Structured Generation: Making LLM Output Reliable in Production

March 3, 2026 · 10 min read

Tian Pan

Software Engineer

There is a silent bug lurking in most LLM-powered applications. It doesn't show up in unit tests. It doesn't trigger on the first thousand requests. It waits until a user types something with a quote mark in it, or until the model decides — for no apparent reason — to wrap its JSON response in a markdown code block, or to return the field "count" as the string "three" instead of the integer 3. Then your production pipeline crashes.

The gap between "LLMs are text generators" and "my application needs structured data" is where most reliability problems live. Bridging that gap is not a prompt engineering problem. It's an infrastructure problem, and in 2026 we finally have the tools to solve it correctly.

Structured Output in Production: Getting LLMs to Return Reliable JSON

November 10, 2025 · 8 min read

Tian Pan

Software Engineer

At some point in production, every LLM-powered application needs to stop treating model output as prose and start treating it as data. The moment you try to reliably extract a JSON object from a language model — and feed it downstream into a database, API call, or UI — you discover just how many ways this can go wrong. The model wraps JSON in markdown fences. It generates a valid object but omits required fields. It formats dates inconsistently across calls. It hallucinates enum values. Any one of these failures silently corrupts downstream state.

Structured output has evolved from an afterthought into a first-class concern for production LLM systems. This post covers the three main mechanisms for enforcing it, where each breaks down, and how to design schemas that keep quality high under constraint.

About Tian Pan