Generative UI as a Production Discipline: When the Model Renders the Screen
The button label that shipped to your users last Tuesday was never seen by a copywriter, never reviewed in Figma, never QA'd, and didn't exist until inference time. It was generated by a model that decided, mid-conversation, that the right way to collect a shipping address was a six-field form rendered inline rather than three more turns of prose. The form worked. The label was fine. Nobody on the team can tell you which model run produced it, because the trace was rotated out of hot storage and the eval suite tests text outputs, not component graphs.
This is generative UI in production: the model is no longer just a text generator that occasionally invokes a tool. It is a UI compiler whose output is a component tree, and the design system is now a contract the model is constrained to rather than a guideline a human loosely follows. The shift breaks an entire stack of assumptions — QA against static specs, accessibility audits of fixed layouts, copy review of finalized strings, design-system adherence checks at build time — and most teams ship the feature before they have replaced any of them.
The pattern is quietly everywhere now. Agent products render dynamic forms to gather slot values instead of asking in prose. Conversational dashboards compose per-session from a primitive library — chart, table, KPI tile, filter chip — chosen by the model based on the question. Onboarding flows skip the static spec entirely: the agent decides which fields to ask for, which to skip, and how to lay them out, based on the user's stated goal. Open standards like A2UI define a declarative format where the agent emits a flat list of typed components and the client renders them against a trusted catalog. Frameworks like Vercel's json-render and the AI SDK's RSC streaming have made the wiring almost trivial. The wiring is not the hard part. The discipline around the wiring is the hard part, and it is where teams without a plan accumulate quiet defects faster than they can find them.
The Design System Becomes a Type System
The first thing that has to land is that your component vocabulary stops being suggestions and starts being a schema the model is forced to emit against. Nobody who has shipped this in production lets the model output free HTML or arbitrary React. The blast radius is too large: a free-form output channel means prompt injection can render arbitrary controls, accessibility regressions are unbounded, and design review becomes a Sisyphean diff against an output space the model can re-roll on every request.
The working pattern is a constrained component catalog — Card, Button, TextField, Select, List, Row, Column, with explicit props and explicit allowed children — exposed as a JSON Schema or Zod definition that the model emits structured output against. A2UI codifies this as an adjacency-list of typed components plus a client-defined catalog the agent cannot escape. Vercel's json-render uses Zod schemas for both the component catalog and the actions a button is allowed to invoke. The mental model is the same: the model picks from a finite vocabulary, the validator rejects anything outside it, and the renderer is a pure function from validated tree to DOM.
Three things follow from this discipline that surprise teams the first time:
- Schema validation is a runtime gate, not a build-time check. The model can produce an unrenderable component combination on any request — a List whose children are not list items, a Select with no options, a TextField labeled but unbound. The validator runs on every output, not just in CI, and the fallback path is a first-class product surface, not an exception page.
- The catalog has to be small enough for the model to hold in working memory. A 200-component design system is too wide; the model picks suboptimal components or hallucinates props. Production catalogs converge on 20–40 primitives plus a handful of composed patterns, with the rest of the design system reachable only through composition.
- Props are part of the contract, not an afterthought. "A Button can have an
onClick" is not a contract; the contract is "a Button has anactionprop that names a registered handler from a closed enum." If the model can emit an arbitrary string as a click target, you have re-introduced the unsafe-eval problem in a new form.
Accessibility Is Not Something the Model Will Get Right
Audits of AI-generated frontend code keep finding the same thing: when the model is allowed to emit raw markup, it produces <div onClick> instead of <button>, missing ARIA state attributes, and custom controls with no keyboard handling. The training data is the cause — the public corpus of React is dominated by <div> patterns — and no amount of prompt engineering reliably fixes it. CSS can make a <div> look like a button, but only HTML semantics can make it be one.
In generative UI, this stops being a frontend hygiene problem and becomes an architectural one. The components in your catalog must be accessible by construction, because the model cannot be relied upon to apply the right roles, focus order, and labels. The teams that get this right pick a primitive library — Radix, React Aria, Headless UI — that ships with the semantics baked in, then expose only those primitives to the model. The model picks which control to render and what to label it; the primitive guarantees that the rendered control is operable by a screen reader, navigable by keyboard, and announces state changes correctly.
This shifts where the accessibility audit happens. You do not audit a fixed page anymore — there is no fixed page. You audit the catalog. Each primitive has a one-time, high-rigor accessibility certification, and the model's freedom is bounded by that certification. The eval suite then verifies that the model picks semantically correct primitives in context (a "submit" affordance is rendered as a Button, not a Card-with-onClick), but the heavy a11y lifting is in the component library, not the runtime check.
Eval Coverage on UI-as-Output
A text output gets evaluated for factual correctness, tone, and refusal behavior. A UI output needs all of that, plus four additional axes that text evals do not capture:
- Functional correctness — does the rendered tree actually let the user accomplish the task? A form that asks for the right fields in the wrong order is wrong.
- Design-system adherence — does the output use sanctioned components, sanctioned props, sanctioned spacing tokens? A surface that looks right but reaches outside the catalog is a slow-burning regression that destabilizes the design system over time.
- Layout integrity — does the output render correctly across breakpoints, locales, and right-to-left scripts? A model that has never seen RTL traffic in its training distribution will confidently emit layouts that break under Arabic or Hebrew rendering.
- https://developers.googleblog.com/a2ui-v0-9-generative-ui/
- https://github.com/google/A2UI
- https://www.infoq.com/news/2026/03/vercel-json-render/
- https://frontendmasters.com/blog/ai-generated-ui-is-inaccessible-by-default/
- https://research.google/blog/generative-ui-a-rich-custom-visual-interactive-user-experience-for-any-prompt/
- https://vercel.com/blog/ai-sdk-3-generative-ui
- https://www.copilotkit.ai/blog/the-developer-s-guide-to-generative-ui-in-2026
- https://www.nngroup.com/articles/generative-ui/
- https://genai.owasp.org/llmrisk/llm01-prompt-injection/
- https://developers.googleblog.com/introducing-a2ui-an-open-project-for-agent-driven-interfaces/
