Streaming JSON Parsers: The Gap Between Tokens and Typed Objects
The model is emitting JSON token by token. Your UI wants to render fields the moment they materialize — a confidence score before the long answer body, the arguments of a tool call as the model fills them in. Then someone wires up JSON.parse on every chunk and the whole thing falls over, because JSON.parse is all-or-nothing. It needs a balanced document to return anything. Until the model emits the closing brace, you have nothing to show.
This is not a parser problem you can fix with a try/catch. The standard JSON parser was designed against a content-length-known HTTP response. Partial input is not a state it models — it is "input error." When you treat a token stream as if it were an HTTP body, you inherit thirty years of "the document is either complete or invalid," and your UI pays the bill.
The fix is not to "parse harder." The fix is to recognize that structured output and streaming output are not orthogonal features you compose at runtime. They are a single design problem the model API and the client library have to solve together — and the team that ships first usually does it by writing their own partial parser, then writes it again three months later when they discover their first version was O(n²).
The All-or-Nothing Trap and What It Actually Costs
The first instinct of every team that hits this is the same: "Just buffer until we see a closing brace, then parse." It works in dev. It works in your eval set. It fails in production for two reasons.
The latency tax is bigger than you think. A 12 KB structured response from a frontier model takes seconds to fully emit. If your UI waits for the last token, the user stares at a spinner for the entire duration. Industry write-ups put the perceived-latency improvement from progressive rendering at roughly 60–70% — not because the response is faster, but because the user sees motion. The work happens at the same speed; the waiting changes shape. A field that fills in over five seconds feels alive. A spinner that resolves in five seconds feels broken.
The naive "incremental" fix is worse than the wait. The most common mistake teams make is to keep buffering, but call JSON.parse (or a JSON-repair library) on every new chunk to "see what we have so far." That gives you a parser that re-parses the entire prefix on every token. For a stream that arrives in 5-character chunks, processing a 12 KB response means walking roughly 15 million characters when you only have 12,000 — classic O(n²) behavior. One detailed engineering write-up measured this directly: the naive approach took 16.7 seconds for a 12 KB document and the final chunks were taking 19–20 milliseconds each; a stateful incremental parser ran the same workload in 43 milliseconds, with per-chunk latency under 30 microseconds. That is a 388x speedup, and the naive version became unusable around the 5 KB mark — the "janky zone" where typing visibly stalled.
The trap is not that the naive parser is wrong. The trap is that it works fine in your prototype, where the responses are 200 bytes, and degrades silently as your prompts grow. By the time it is bad, you have shipped it.
What a Real Partial Parser Has to Do
A partial JSON parser is not a relaxed JSON parser. It is a state machine that explicitly models the open positions in the document: which strings are mid-character, which arrays are mid-element, which objects have an unbalanced brace. From that state, it has to make decisions the standard parser never had to make.
Incremental tokenization that survives mid-string truncation. When the stream cuts off in the middle of a value — "answer": "Sure, the capi — the parser has to decide what to yield. Library options here include "yield the partial string as-is," "yield the partial string with a marker that it is incomplete," or "do not yield this field until the closing quote arrives." Different applications want different answers. A chat UI probably wants partial strings. A function-call argument probably does not, because executing a tool with a half-typed argument is worse than waiting one more chunk.
In-flight schema validation. If the model emits {"score": "high" and your schema says score is a number, you do not want to wait until the document closes to find out. A schema-aware streaming parser can flag the violation the moment the type is decided — which matters because the model has not yet emitted the rest of the document, and a re-prompt or retry can be cheap. Catching it after the closing brace is a wasted full generation.
Repair strategies for the truncation case. When the upstream connection drops mid-document, you have a choice: discard the partial, return whatever sub-structure parses cleanly, or attempt to close open structures heuristically. The third option is what libraries like best-effort-json-parser and partial-json-parser-js exist to do — return the parsed structure from [1, 2, {"a": "apple"] rather than throwing on the missing brackets. The right default is application-dependent, and it should be a parameter, not a parser-internal assumption.
Type-aware progressive yield. The parser should emit each field as soon as the field is decided — meaning the value's type is fixed and the value is either complete or growing in a known direction (a string getting longer, an array getting more elements). This is what lets the UI bind to fields individually and re-render as each one settles. The Vercel AI SDK's streamObject and useObject hooks expose a partialObjectStream that does exactly this: each yielded object is the schema-shaped state-so-far, with later fields populated as they arrive.
These are not independent features. A parser that does incremental tokenization but no schema validation is a tokenizer with extra steps. A parser that validates types but cannot yield partial strings is useless for chat UIs. The shape of the parser falls out of the shape of the contract you want to give the application.
Why Existing JSON Tools Were the Wrong Starting Point
- https://www.aha.io/engineering/articles/streaming-ai-responses-incomplete-json
- https://github.com/promplate/partial-json-parser-js
- https://github.com/easyagent-dev/streamjson
- https://github.com/JacksonKearl/gjp-4-gpt
- https://github.com/st3w4r/openai-partial-stream
- https://github.com/beenotung/best-effort-json-parser
- https://github.com/1000ship/incomplete-json-parser
- https://news.ycombinator.com/item?id=45518033
- https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-object
- https://ai-sdk.dev/docs/ai-sdk-ui/object-generation
- https://platform.openai.com/docs/api-reference/responses-streaming
- https://platform.claude.com/docs/en/build-with-claude/streaming
- https://blog.vllm.ai/2025/01/14/struct-decode-intro.html
- https://github.com/guidance-ai/llguidance
- https://arxiv.org/pdf/2411.15100
