The Coding Agent CI Bill That Doubled Without a Postmortem

June 2, 2026 · 10 min read

Software Engineer

The line item climbed 130% over six weeks and nobody on the engineering team noticed. PRs were landing faster. Per-PR CI cost on the dashboard looked the same as last quarter. The agent's branches went green on the first try more often than the humans' branches did, which actually pulled the median CI duration down. Finance found it during quarterly review, flagged it as an unexplained variance, and asked engineering for the postmortem. Engineering had nothing to write — no incident, no regression, no failed deploy. Just a budget line that had quietly doubled while every dashboard reported normal.

That postmortem-shaped hole is the artifact. The cost shifted from a labor-dominant curve to an infrastructure-dominant curve, and the team that owned the labor budget was not the team that owned the infrastructure budget. The agent didn't break anything. It just changed which line on the P&L absorbed the work.

The numbers at the platform level tell the same story at a different scale. GitHub reported PRs opened by AI agents rising from roughly 4 million in September 2025 to more than 17 million in March 2026, and weekly Actions compute minutes climbing from 500 million in 2023 to 2.1 billion in a single week of 2026. The June 1, 2026 shift of Copilot to usage-based billing — and the move to charge Copilot code review against Actions minutes at the same per-minute rate as any other workflow — is the provider repricing the same curve your finance team is staring at. The bill is moving because the workflow moved. The dashboard is the last place it shows up because the dashboard was designed around the workflow that no longer dominates.

CI Was an O(commits) Cost; The Agent Made It O(plan steps)

A human engineer pushes a commit when they think the change is ready, or close to ready. Maybe two or three CI runs per PR — one when they first push, one after they address review comments, one after a rebase. The cost of CI scales with the number of commits the human authored, and the number of commits is bounded by how often a human types git push.

A coding agent doesn't push when something is ready. It pushes when it wants to find out if something is ready. Tests are not the gate at the end of the work; tests are the feedback signal during the work. Each iteration of the agent's plan triggers a CI run because that run is the cheapest way for the agent to verify whether its last edit moved toward the goal. Ten plan steps in one PR is not unusual. Twenty is not unusual. Cost per PR didn't change because per-PR was always the wrong denominator. The denominator that moved was runs per outcome, and the agent multiplied it.

This is the part the engineering dashboard hides. Dashboards almost always normalize CI cost by "per PR" or "per commit" or "per merged change." Each of those denominators implicitly assumes that the unit of authorship is human-paced. When the author is an agent, every one of those denominators inflates in lockstep with the numerator, and the ratio stays flat. The bill goes up. The ratio is unchanged. The dashboard is technically correct and operationally useless.

A useful denominator measures the cost of CI against something the agent can't inflate: shipped features, customer-visible fixes, or external requirements like compliance attestations. Once you switch to that denominator, the curve becomes visible immediately, and the conversation with finance shifts from "why did infra spend grow" to "what was the per-feature cost of agent-authored work, and is that the rate we want to pay."

The Attribution Gap Is the Failure Mode

If your CI logs do not distinguish agent-authored commits from human-authored commits, you cannot do the analysis your finance team is asking for. You cannot answer "is the growth coming from the agent" because the data does not contain the column.

The fix is straightforward at the metadata level and tedious at the implementation level. Every job that runs in CI should be tagged with the authorship class of the commit that triggered it. That tag has to be applied at the moment the job is enqueued — retroactive attribution from git log will always under-count because some commits get squashed, some agents commit-as-the-human, and some plan steps run on ephemeral branches that never land. Capture it at trigger time, store it on the job, and ship it to the same dashboard that tracks runner minutes.

The same attribution problem shows up at the LLM-cost layer. Practitioners who have wired up production agent cost monitoring tend to converge on a single rule: tags get attached at request creation, never reconstructed from logs. Anthropic's usage API now lets you tag each call with project, team, and task identifiers; the equivalent move for CI is to tag each job at enqueue with actor=agent or actor=human and propagate that tag through every downstream metric. Without the tag, you can audit cost. With the tag, you can govern it.

GitHub's June 2026 introduction of cost centers and per-user budgets exists for exactly this reason. The platform is offering you a column. The work is wiring your CI configuration to populate it correctly — and noticing when an agent runs as the human's identity, which silently mis-classifies the row.

A Per-Author Budget Is Not Punishment; It Is a Signal Channel

The instinct, when finance flags a variance, is to cap something. Cap the agent's runs, cap the per-PR minutes, cap the model the agent is allowed to use. The cap stops the bleeding, but it also stops the work, and it does not tell the team what to change.

A per-author CI budget has a different purpose. It is a signal channel. It tells the agent — or the human supervising the agent — that the inner loop has become expensive, and it does that early enough to change the loop rather than retroactively after a quarter-end review. Three structural patterns produce the signal without breaking the workflow.

The first is a tiered CI configuration where the agent's inner-loop runs use a fast, cheap test subset, and the full suite is reserved for the moment the PR is marked ready for human review. This mirrors the way fast monorepo build systems like Bazel — and dynamic pipelines on Buildkite — let you compute the affected target set from the diff and run only the tests that intersect it. The agent gets fast feedback. The full suite still runs before merge. The cost of "the agent iterates twenty times" goes down by an order of magnitude because nineteen of those iterations don't run the slow integration tier.

The second is a cost signal exposed back to the agent itself. If the agent can read the cost of its last CI run as part of its observation, it can choose cheaper verification strategies on subsequent steps — running a subset of tests, deferring the slow tier, deciding to read source instead of run a probe. Most teams skip this because plumbing cost back to the agent feels like over-engineering. It is the single highest-leverage piece of plumbing once the agent's run rate exceeds the human team's.

The third is a hard cap that fires not at the per-run level but at the per-task level. A budget that says "this PR has used 40 minutes of Actions time across iterations; the next push from this branch requires a human sign-off" gives the human a place to intervene without preemptively forbidding iteration. The cap is not a refusal. It is a checkpoint, and checkpoints are what let you trust an autonomous loop with a real budget.

The Forecasting Model Was the Hidden Assumption

The thing that broke is not the CI bill. The CI bill is doing exactly what you would predict given the new workload. The thing that broke is the forecasting model the FP&A team is running against. That model was built when CI cost grew linearly with headcount and shipped feature volume, because a human can only push so many commits per day and only opens a PR when the work is roughly done. The constants in that model — minutes per engineer per week, runs per PR, retries per failed deploy — were stable enough that quarterly variance was a noise term.

Once an agent is authoring commits, those constants are no longer stable. Minutes per engineer per week becomes minutes per engineer-supervised agent loop per week, and the multiplier on that depends on how many concurrent agents the engineer can supervise, which is itself a function of how good your review tooling and your agent's planning loop have gotten this quarter. The forecasting model has a new independent variable, and it is one finance was not told about because the rollout looked like a productivity tool, not a cost-curve change.

The conversation engineering should be having with finance is not "we found the variance and capped it." It is "the cost model assumed labor was the binding constraint and that is no longer true; here is the new model, here is the new run-rate assumption, and here is the per-feature cost we are now willing to pay because the throughput moved." Without that conversation, finance is forecasting against a labor-dominant cost model that doesn't exist anymore, and engineering is treating each quarterly variance as a separate surprise. The variance is not a surprise. It is the new normal expressing itself through a model that hasn't been updated.

Treat the Coding Agent as a Cost-Curve Shift, Not a Productivity Tool

The discipline the team that owned this incident wishes they had practiced earlier is small and uncomfortable. Before rolling out the coding agent broadly, write down which line items you expect it to move and by how much. CI minutes is the obvious one. LLM token spend is the obvious one. Less obvious: artifact storage if the agent's iterations produce more build artifacts, secret-scanning and dependency-review costs because they run on every push, code-review tool costs that meter by event volume, and observability ingestion costs because the agent's traces are not free.

Then wire the attribution before the rollout. Tag jobs at enqueue. Add cost centers. Stand up the per-author dashboard before the first agent-authored PR lands, not after the second quarterly variance review. Decide on the per-feature denominator that finance and engineering will both agree to track against. Pre-commit, in writing, to the run-rate you are buying — so that when the rate is reached, the conversation is about renegotiating the rate, not explaining why the variance happened.

The coding agent is not a tool the team adopts. It is a workflow that shifts which budget pays for which work, and the org that doesn't notice the shift is going to keep finding the cost in places it forgot to instrument. The postmortem you want to write is the one that happens before the variance — the one that says: the labor curve is flattening, the infrastructure curve is bending up, and the team that pays for each has agreed on what that trade is worth.

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Coding Agent CI Bill That Doubled Without a Postmortem

CI Was an O(commits) Cost; The Agent Made It O(plan steps)

The Attribution Gap Is the Failure Mode

A Per-Author Budget Is Not Punishment; It Is a Signal Channel

The Forecasting Model Was the Hidden Assumption

Treat the Coding Agent as a Cost-Curve Shift, Not a Productivity Tool

Recommended Reading

About Tian Pan

CI Was an O(commits) Cost; The Agent Made It O(plan steps)​

The Attribution Gap Is the Failure Mode​

A Per-Author Budget Is Not Punishment; It Is a Signal Channel​

The Forecasting Model Was the Hidden Assumption​

Treat the Coding Agent as a Cost-Curve Shift, Not a Productivity Tool​

Recommended Reading

About Tian Pan

CI Was an O(commits) Cost; The Agent Made It O(plan steps)

The Attribution Gap Is the Failure Mode

A Per-Author Budget Is Not Punishment; It Is a Signal Channel

The Forecasting Model Was the Hidden Assumption

Treat the Coding Agent as a Cost-Curve Shift, Not a Productivity Tool