Skip to main content

The AI Accessibility Audit Nobody Runs

· 11 min read
Tian Pan
Software Engineer

Open your agent product, turn on VoiceOver, and hit send on any prompt. If you have a typical streaming UI with an inline reasoning trace, what you will hear in the next thirty seconds is not your product. It is a torrent of partial tokens, mid-word reflows, status changes nobody announced, and a reasoning monologue your sighted users opted into but your blind users cannot escape. The interface that demoed beautifully on stage is, to a screen reader, a denial-of-service attack delivered as speech.

This is the audit nobody on the AI team runs. The design review approved the streaming animation. The eval suite measured answer quality. The latency dashboard tracked time-to-first-token. None of those instruments noticed that the affordance making the product feel fast and thoughtful for one cohort makes it unusable for another. And that omission is starting to show up in pro-se lawsuit filings — the same federal courts that have been processing accessibility complaints against e-commerce sites for a decade are now seeing AI-interface complaints rise sharply, with one tracker reporting a 40% year-over-year increase in 2025 alone.

The accessibility problem with agentic UX is not the same problem accessibility has been for static web content. The web had decades to standardize on landmarks, heading hierarchies, and form labels. Screen readers got good at announcing those. What screen readers were never designed for is text that mutates token by token over forty seconds, status that changes from "thinking" to "calling tool" to "writing" without a focusable boundary, and a parallel inline track of intermediate reasoning that is neither final output nor decoration but something in between. The traditional a11y playbook does not have a page for any of that, and the AI teams shipping these interfaces are reinventing the failure modes that the web took a decade to fix.

The streaming announcement problem

The first thing that breaks is the live region. The convention for dynamic content is aria-live="polite", which tells the screen reader to wait until its current utterance finishes, then announce the change. That works fine when the change is "your message has been sent" — one short string, fired once. It does not work when the change is twenty tokens per second over a forty-second response, each token a separate DOM mutation. Screen readers see a queue of two hundred announcements, and depending on the implementation, they either announce every single one in a glitchy slurry or they batch-replay the same partial string repeatedly because the live region keeps re-firing.

The fix is more nuanced than "set polite and forget." During the streaming phase, the live region should be marked aria-busy="true", which tells the assistive layer to suppress announcements while the batch is in flight. When the response completes, you clear aria-busy, and at that moment the screen reader announces the final content as a single coherent utterance. This is one line of code that almost no AI product ships. The reason is not that engineers do not know about aria-busy — they have probably never tested with a screen reader, so they have never heard the failure mode and never thought to look.

A subtler version of the same problem happens with the visible "Thinking…" status indicator. Sighted users see a spinner change from "thinking" to "calling search tool" to "writing response." That state change is meaningful. Screen reader users get either nothing — because the status indicator is not in a live region — or three announcements interrupting whatever the screen reader was reading. Neither is right. The fix is to keep status in a polite live region, debounce the announcements so micro-state-changes do not fire, and provide a way for users to mute the status track entirely if they only want the final response.

The reasoning trace is a noise channel

Inline reasoning is the affordance that broke the assumptions hardest. The pattern is now ubiquitous: the model emits a stream of intermediate thought, and the UI renders it in a collapsible panel, often with a different typographic treatment to mark it as not-final. The intent is product-friendly — show the work, build trust, let users follow along. For sighted users it works. For screen reader users, the reasoning trace is the same DOM stream as the final answer, often inside the same scrollable container, and the assistive layer cannot tell the difference between "this is the model thinking out loud" and "this is the actual response the user asked for."

The result is that a blind user asking "What is the capital of France?" hears the model verbally narrate seventeen seconds of self-reflection — "the user is asking about France, I should consider whether they mean the capital of metropolitan France or…" — before getting to the word Paris. For users who use AI as an assistive technology rather than a curiosity, this is the difference between a tool that helps them and a tool that wastes their morning.

The discipline is to give the reasoning trace its own ARIA role and announcement behavior. Sighted users get the panel; screen reader users get an option, surfaced prominently, to suppress reasoning content from the live announcement track while still keeping it available on demand. Some products do this with a keyboard shortcut to read the reasoning if the user wants it; others suppress reasoning by default and announce only the final response. Either is defensible. What is not defensible is rendering both streams identically and trusting the screen reader to figure out which one is the answer.

The keyboard breaks at the second turn

Walk through your agent product using only a keyboard. The first turn usually works. You can tab to the input, type a prompt, and hit Enter. Now read the response. Can you tab to the suggested follow-up chips? In half the products on the market, the answer is no — those chips were rendered dynamically and the focus order was not updated, so the keyboard user has to tab through twenty hidden elements to reach them, or worse, the focus is trapped inside a modal that should have closed.

The deeper issue is that conversational interfaces shift focus in patterns the keyboard convention was never tuned for. When a new message arrives, where should focus go? If you move it to the new message, you interrupt whatever the user was doing. If you do not move it, the user has to manually navigate to the new content, which screen reader users will not even know is there unless your live region fired correctly. Most products do nothing — focus stays where it was — and the result is a keyboard user who has to alt-tab and arrow-key through the entire conversation log every time the model responds.

The fix is a focus management contract that is explicit about conversational state. Focus stays in the input by default. New messages announce via the live region. A clearly labeled keyboard shortcut moves focus to the latest response if the user wants to interact with it. Tool calls and intermediate states do not steal focus, ever. And the modal patterns that hold confirmation dialogs or settings panels follow the actual accessible-modal contract — focus trap, Escape to close, return focus to the trigger — which the AI team has probably violated three times because the patterns were lifted from a component library that was last audited two years ago.

The audit nobody runs has a price now

The reason this matters more in 2026 than it did in 2024 is that the regulatory and litigation environment shifted faster than most AI product teams noticed. Federal courts processed a forty-percent year-over-year increase in pro-se ADA filings in 2025, and the AI tools that made it trivial to draft a complaint are the same tools that are now being named as defendants. California's algorithmic accessibility assessment requirement took effect in January 2026 for public-facing AI systems. Courts are increasingly citing WCAG 2.2 as the de facto standard, and WCAG 2.2 has criteria that streaming AI interfaces routinely fail — focus visible, focus not obscured, dragging movements, target size — even before you get to the live region issues that are not in WCAG at all because the standard was written before this UX existed.

The cost of running the audit is small. The cost of not running it is a class of failures that does not show up in your eval suite, does not show up in your latency dashboard, and shows up first when an organized plaintiff's-bar firm decides AI products are the next category to scrape with a Lighthouse-equivalent tool and file in bulk. Teams that have been through the e-commerce accessibility cycle know what that looks like. Teams that have not are about to learn.

What the audit actually contains

A real AI accessibility audit is not a checkbox sweep. It is a structured walk through the product with assistive technology turned on, recorded, and reviewed. The minimum is three passes.

The screen reader pass is run with VoiceOver on macOS or NVDA on Windows, headphones on, with the eyes closed if you can manage it. You send three prompts: a short one, a long one, and one that triggers tool use. You write down every announcement that is wrong, missing, redundant, or interrupted. You note where the reasoning trace bled into the live region. You note where status changes were silent. You note where the final response was announced before it was complete because aria-busy was never set. This pass alone usually surfaces fifteen to thirty distinct issues in a product that nobody had been auditing.

The keyboard pass is run with the trackpad and mouse unplugged. You navigate the entire conversation using only Tab, Shift-Tab, Enter, Escape, and arrow keys. You attempt every flow a sighted user can do — sending a message, editing a previous message, copying a response, opening settings, switching conversations, regenerating. You note every flow that is impossible, every focus trap that does not trap correctly, every place where focus disappears because a dynamic element was removed without focus being moved first.

The cognitive load pass is the one most teams skip and the one that catches what the other two miss. You time how long it takes a screen reader user to get to the answer for a simple question, then compare to a sighted user. If the ratio is worse than two-to-one — and it usually is, by a lot — you have a usability problem even if every individual ARIA attribute is technically correct. The fix is rarely more ARIA. The fix is usually less: a quieter interface, suppressed reasoning, condensed status, and a clear path from prompt to answer that does not require listening to your model's inner monologue.

The team that does this changes what they build

There is a secondary effect of running this audit that the teams who do it tend to discover late. The interface that is good for screen readers is also a better interface for the sighted user who is multitasking, the user driving with a voice assistant, the user on a slow connection where streaming animations look broken, and the user who simply wants the answer without watching the model think. The accessibility audit is not only a compliance exercise. It is a forcing function for the kind of interface restraint that AI products tend to lose track of when every new model capability becomes a new UI flourish.

The team that ships an agent product without running this audit in 2026 is making a bet that nobody on the disability side of their user base will complain, that the regulators will not catch up, and that the design choices made for the demo will not turn out to be the design choices that cause a federal filing. The bet has been losing in the e-commerce category for a decade. There is no structural reason it will keep winning in the AI category, and several reasons — the lawsuit volume, the regulatory shift, the assistive-technology user base who depend on AI more than the average user — to think it will lose faster.

The audit is a half-day of work. The findings are a sprint of fixes. The product is better, the lawsuit risk is smaller, and the cohort the design review forgot can now use the thing you built. There is no version of the AI product roadmap in 2026 where this is not worth doing.

References:Let's stay in touch and Follow me for more thoughts and updates