The Demo Account Eval Set Your Sales Team Is Running Without You
The most expensive eval set in your company isn't in your repo. It's in a slide deck a sales engineer assembled six months ago, plus three demo accounts named after your top-five logos, plus a half-remembered script that says "click here, ask the agent to summarize last quarter, watch the magic happen." It runs once or twice a week, in front of prospects worth six or seven figures. Nobody on the AI team has ever scored a run.
Then you ship a model migration on a Tuesday. On Thursday at 4 PM, the sales engineer pings the on-call channel: the summary output now starts with "Certainly! Here is a summary…" instead of jumping into the bullet points, the numbers are spelled out instead of digits, and the prospect — a Fortune 500 CFO who scheduled this meeting four weeks ago — just asked whether the product is always this chatty. The release notes called it a 1.2-percentage-point eval lift.
