Skip to main content

The AI Standup Where Yesterday's Status Is a Lie

· 9 min read
Tian Pan
Software Engineer

The team meets at 10am. The first engineer reports what their agents finished overnight. Except the eval suite that kicked off at 7am hasn't returned, the PR the agent opened at 3am is waiting on a review from another agent whose queue depth is unknown, and the long-running refactor agent is on hour eleven of an estimated four-hour run with no signal that it's stuck and no signal that it's healthy. Yesterday's status is not "done" and not "in progress." Yesterday's status is unknowable from inside the room.

The standup was a synchronous ritual built for synchronous human work. Each person did a thing, finished it, slept on it, and reported it the next morning. The unit of work was a workday. The unit of reporting was a person. The cadence matched the substrate. None of that holds anymore. The unit of work is now an agent run that started before you went to bed and may finish during the meeting or three hours after. The unit of reporting is a fleet, not a person. And the cadence — a 9- to 15-minute round-robin at 10am sharp — is a frequency the substrate doesn't produce events on.

The Round-Robin Has No Answer

The Scrum question is "what did you do yesterday, what will you do today, what is blocking you." Every word of it assumes the engineer is the agent. When the engineer's actual role is queue management — kicking off runs, reviewing outputs, deciding what to merge, deciding what to redo, deciding what to abandon — none of the three questions has a clean answer.

"What did you do yesterday" is the wrong shape because the engineer didn't do yesterday's work in any direct sense. They queued it. The agents did it, or are still doing it, or attempted it and gave up four steps in without raising a flag. The honest answer is closer to "I queued seven runs; three landed PRs, two are still going, one bailed at step 3, one looks like it succeeded but the eval suite hasn't certified it yet." That's a portfolio report, not a personal report, and the standup format doesn't have a slot for it.

"What will you do today" is the wrong shape because today's work is dictated by what comes back from yesterday's queue. If three agents finish their refactors before lunch, today is review-and-merge day. If two of them get blocked on a flaky integration test, today is debugging the test. If the long-running migration agent throws at step 47, today is rescuing it. The engineer can't pre-state today's plan because today is a function of agent state the standup happens before they've checked.

"What is blocking you" is the wrong shape because the blockers are no longer human handoffs. The blocker is an eval suite that takes three hours to run, a review queue with twelve pending PRs and no triage, an agent that's been stuck waiting on a CAPTCHA for forty minutes, a downstream API that's brown-outing in a way the agent didn't notice. None of these are visible to the round-robin. They live in dashboards no one opens before 10am.

The Substrate Has Moved, the Ritual Has Not

The ritual was right for its substrate. When the work was synchronous — type code, run tests, open PR, get review, merge — the workday was the natural unit, and end-of-day was a meaningful boundary. End-of-day meant the engineer had stopped working. The engineer's mental cache was warm. They could report what they did without needing a tool to remember it.

Async agent work has none of those properties. The work doesn't stop when the engineer goes home. The engineer's mental cache is cold by 10am because the agents have continued for ten hours without them. The boundary between "yesterday's work" and "today's work" is no longer the engineer's sleep cycle — it's an arbitrary cut through a stream of agent runs that started at 11pm and will finish at noon.

The empirical signal here is loud. Team-level delivery is slowing even as individual generation accelerates. PR volume nearly doubles. Review latency stretches by a comparable factor. Code churn rises. The cost is borne by the parts of the pipeline the standup is supposed to surface — review, integration, quality — and those parts have shifted from human bottlenecks to system bottlenecks the standup format wasn't designed to read.

What the standup measures is the engineer. What needs measuring is the system the engineer operates: how many agent runs are active, how many are blocked on review, how many failed silently, how deep is the eval-suite backlog, how long has the longest-running PR been waiting, how many runs were abandoned mid-execution with no cleanup. The ritual reports on agency that has migrated elsewhere.

Queue Snapshots, Not Status Updates

The shift the room needs to make is from status updates to queue snapshots. A queue snapshot is a small set of numbers that describe the state of the work-in-flight, read off a dashboard the agents are emitting into, not narrated from memory by a person who wasn't there.

The minimum useful queue snapshot for a small team running agents has roughly five numbers. Active agent runs and what tier of work they're on — refactor, bug, feature, eval. PRs open and waiting on review, broken into "ready to land" and "needs human judgment." Runs that bailed mid-execution in the last 24 hours and have not been retried, triaged, or written off. Eval-suite runtime debt — how many hours of evals are queued versus how many hours of capacity exist before the next release decision. Cost burn against budget for agent compute, because at 30+ runs a day the bill is a real signal of fleet health.

The point of these numbers is not to be exhaustive. The point is to be portfolio state — what's happening across the system, not what each person did. When the room reads the snapshot together, the conversation that follows is triage, not narration. Why are there nine PRs waiting on review? Who's clearing them? Why did four runs bail at step 3 in the last day — is something upstream broken? The eval queue is six hours deep and we ship Friday; what gets cut?

This is closer to an incident standup than a Scrum standup. It assumes a system that does work and a room of humans who steer it. It treats the daily meeting as a brief, structured pull from the dashboard, followed by the few decisions only humans can make. The cadence question is whether 10am daily is even right for that — for some teams, twice a day, before-and-after the long-running eval batch, is a better fit than once-a-day before the morning agent queue has flushed.

The Productivity-Theater Trap

The wrong adaptation is the one most teams are already drifting into. Managers, reaching for a number, fall back on counting agent PR throughput as a productivity proxy. PRs per engineer per week becomes the headline. Engineers who babysit fast agents get rewarded. Engineers who design the system that makes the fleet healthier — better tools, better evals, better retry logic, better cleanup — get invisible.

This is the same failure mode the industry already knows about for human work, accelerated. PR count was a bad proxy for engineer value when humans wrote PRs. It is a worse proxy when agents write them, because the agent can be told to open more, smaller PRs at essentially zero marginal cost. The number goes up. The signal it carries goes down. Within a quarter, the metric is gamed; within two, the team has selected for the wrong skill.

The leadership question is not "how many PRs did each engineer's agents land." The leadership question is "is the system that produces and reviews those PRs healthy." Those are different questions and they want different metrics. Throughput is a property of the fleet. Health is a property of the system around the fleet — review SLA, abandon rate, mean time to triage a stuck run, eval-suite freshness, cost per landed change. If the standup or the dashboard behind it doesn't show those, the team is reporting on shadow productivity while the real work degrades unobserved.

The Standup as Triage, Not Roll Call

The standup that survives the transition stops being a round-robin. It becomes a short triage meeting against an agent-work dashboard, with a fixed agenda: read the snapshot, name the two or three things that are off, decide who chases each. The role of the engineer in the room shifts from reporting their own work to interpreting the system's state and making the calls the system can't make on its own.

This works because it acknowledges the substrate honestly. There is no "what I did yesterday" that means what it used to. There is "what the queue did since the last meeting, and what we should do about it." The artifact the room reads is real-time and shared. The decisions the room makes are about resource allocation across a fleet, not about individual progress reports.

Teams making this shift early are getting two things at once. They get back the half hour a day that was being spent narrating things the dashboard already knew. And they catch failure modes the round-robin format was structurally blind to — abandoned runs, silent regressions in the eval suite, review queues that quietly grew to thirty PRs, agent infrastructure that's been flaky for a week without anyone naming it.

The teams still running 2018's standup against 2026's substrate are not reporting on their work. They are reporting on a fiction of their work, narrated from memory, while the actual work happens in a queue no one in the room has opened. The ritual is intact. The information it carries is not. The day-one move is not to abandon the meeting — it's to point it at what the team actually does now, which is steer a fleet they didn't write the code of, against a clock they don't fully control, on a substrate they're still learning to observe.

References:Let's stay in touch and Follow me for more thoughts and updates