Skip to main content

One post tagged with "llm-evals"

View all tags

The Demo Loop Bias: How Your Dev Process Quietly Optimizes for Impressive Failures

· 10 min read
Tian Pan
Software Engineer

There is a particular kind of meeting that happens at every AI-product team, usually on Thursdays. Someone shares their screen, drops a prompt into a notebook, and runs three or four examples. The room reacts. People say "wow." Someone takes a screenshot for Slack. A decision gets made — ship it, swap models, change the temperature. No one writes down the failure rate, because no one measured it.

This is the demo loop, and it has a structural bias that almost no team accounts for: it does not select for the best output. It selects for the most legible output. Over weeks and months, your prompt evolves to produce answers that land in a meeting — confident, fluent, well-formatted, on-topic. Whether they are correct is a separate variable, and it is one your process is not measuring.

The result is what I call charismatic failure: outputs that are wrong in ways your demo loop has been trained, by selection pressure, to ignore.