Skip to main content

24 posts tagged with "coding-agents"

View all tags

The Coding Agent CI Bill That Doubled Without a Postmortem

· 10 min read
Tian Pan
Software Engineer

The line item climbed 130% over six weeks and nobody on the engineering team noticed. PRs were landing faster. Per-PR CI cost on the dashboard looked the same as last quarter. The agent's branches went green on the first try more often than the humans' branches did, which actually pulled the median CI duration down. Finance found it during quarterly review, flagged it as an unexplained variance, and asked engineering for the postmortem. Engineering had nothing to write — no incident, no regression, no failed deploy. Just a budget line that had quietly doubled while every dashboard reported normal.

That postmortem-shaped hole is the artifact. The cost shifted from a labor-dominant curve to an infrastructure-dominant curve, and the team that owned the labor budget was not the team that owned the infrastructure budget. The agent didn't break anything. It just changed which line on the P&L absorbed the work.

The PR Description Your Coding Agent Generated That Humans Stopped Reading

· 11 min read
Tian Pan
Software Engineer

A year ago your team adopted a PR description template. It had a ## Summary, a ## Changes, a ## Test plan, and a row of checkboxes. Reviewers loved it: every PR had context, every PR had a test plan, every PR had structure. Six months later the coding agent learned to fill it in. Now every PR has a ## Summary, a ## Changes, a ## Test plan, and a row of checkboxes — and reviewers no longer read past the title. The format that once focused attention now signals that there is nothing worth focusing on. The structure outlived the signal it carried.

This is not a code-quality problem. The code in those PRs is often fine. The problem is that the act of writing a description has been amputated from the act of thinking about the change, and the description is the artifact reviewers used to triage what to spend their finite attention on. When that artifact becomes uniformly formatted, plausibly worded, and indistinguishable from every other PR, the reviewer's attention triage breaks. The system that used to surface the unusual now flattens everything into the same shape.

The Branch State Your Coding Agent Forgot to Check

· 10 min read
Tian Pan
Software Engineer

Your coding agent does not know which branch it is on. It thinks it does. It saw a git status output twelve turns ago, it has a CLAUDE.md in its context that mentions the branch name the session opened against, and it watched a tool result list five files that were the right files at the time. The agent has been quietly reasoning against that snapshot ever since. Meanwhile, in a second terminal, you ran git checkout main. The agent's diff lands cleanly on the file system because the OS does not care which branch the bytes belong to. The diff is semantically wrong because the agent's mental model of the branch is stale by three hundred commits and the parent it was reasoning against no longer exists in your working tree.

This is branch-state drift, and it is the coding-agent analog of a read-modify-write race in a database. The agent reads the world at turn N, modifies its plan across turns N+1 through N+k, and writes back to disk at turn N+k+1 — and somewhere in that window the world changed underneath it. No exception fires. No tool returns an error. The patch applies. The harm shows up downstream: a PR opened against the wrong base, a hand-written commit that silently reverts an intervening fix, a feature implemented against a schema that was migrated yesterday.

The Coding Agent That Passes Locally and Fails in CI

· 11 min read
Tian Pan
Software Engineer

The agent's diff was green on your laptop. Tests passed, lint passed, the dev server hot-reloaded clean. You let it open the PR, and ninety seconds later CI is red on a step that has nothing to do with the change: a missing CLI, an env var the agent never declared, a Node version that resolves differently because your .nvmrc resolves through a global shim that the runner does not have. The agent did not write a broken diff. It wrote a diff that depends on your machine, and your machine and the runner are not the same computer.

"Works on my machine" was a human bug. The fix was discipline — pin versions, write Dockerfiles, read the CI logs. Coding agents inherited the bug at scale and removed the discipline that used to compensate, because the agent does not know which of the things it relied on came from the repo and which came from the warm sediment of your shell history. Every developer's laptop is a uniquely configured environment that the agent absorbs without naming. Then the same agent runs in a runner that is none of those things, and the failure mode looks like the agent's fault when it is actually an environmental contract that nobody wrote down.

The Hot-Reload Loop Your Coding Agent Polluted

· 12 min read
Tian Pan
Software Engineer

A coding agent and a hot-module-replacement dev server are both, individually, magic. Put them in the same working directory and they become a producer-consumer pair with no synchronization primitive between them. The agent writes a file. The watcher fires. The dev server reloads to a state that exists for ninety milliseconds before the agent's next write replaces it. The error overlay reflects a snapshot the file system already moved past. The agent reads that overlay, treats it as ground truth, and writes a fix for a problem the next save will erase anyway.

You don't notice this on a one-line edit. You notice it when the agent is doing a coordinated multi-file change — renaming a prop across a component, threading a new field through a hook, splitting a module — and every intermediate state between "start" and "done" is, by construction, broken. The watcher does not know the difference between an intermediate state and a final one. The agent, observing the watcher's output, cannot tell the difference between a real error and an artifact of its own in-flight work.

The Idiom Your Coding Agent Wrote Around Instead Of Using

· 11 min read
Tian Pan
Software Engineer

A senior engineer on a payments team I work with told me a story that I think every team running coding agents will eventually live through. Their codebase has a Result<T, E> wrapper — homegrown, sits in a single core/result.ts file, used in roughly two hundred call sites across the service. New code is expected to thread Result through every function that can fail; throwing is reserved for genuinely unexpected states. It's not enforced by a lint rule. It is the dialect.

Six months into shipping with a coding agent, they audited the diffs the agent had merged. About a third of the new functions ignored Result entirely. The agent had reached for try/catch, returned T | null, thrown Error subclasses with descriptive messages — every one of those choices is correct in some imagined codebase. None of them was correct in this one. The code typechecked. The tests passed. Reviewers approved it because nothing in it looked wrong line by line. But the file the agent touched no longer fit the file it lived next to, and the team had quietly grown a second dialect inside their own service.

This is the failure mode I want to talk about: not bugs, not hallucinations, not lint violations — idiomatic drift. The agent ships code that compiles, runs, and passes tests, in a style your codebase does not speak. Over enough merges, the codebase bifurcates into agent-style zones and human-style zones, and the cost shows up in places no dashboard is watching.

The Inner Loop Your Coding Agent Quietly Broke

· 8 min read
Tian Pan
Software Engineer

The productivity claim around coding agents is that they remove the typing bottleneck. The bottleneck the engineer actually hits in practice is different. The engineer can no longer hold the system in their head, because the agent is editing files faster than the engineer can read them, writing tests faster than the engineer can reason about coverage, and refactoring abstractions faster than the engineer can verify they still type-check at the design level rather than just the compiler level.

The tight inner loop — hypothesize, change, observe, refine — that defines competent engineering quietly collapses into a different loop. The engineer is now reviewing agent output rather than building intuition about the system. A METR randomized controlled trial from mid-2025 found experienced open-source developers were 19% slower on familiar codebases when using AI assistants, while reporting they felt 20% faster. The 39-point gap between perceived and actual productivity is not a measurement error. It is the sound of comprehension being silently traded for throughput.

The Library Version Your Coding Agent Remembers Wrong

· 10 min read
Tian Pan
Software Engineer

The diff looks clean. The agent imported the right module, called what looks like the right function, and TypeScript stayed quiet. The PR description even cites the docs. Then the build runs in CI and the call explodes with TypeError: x is not a function — because the function was split into two in a minor bump eight months ago, and the agent generated against the version of the library that existed inside its training data, not the version installed in your package.json.

This is not the kind of failure the "LLMs hallucinate" frame prepares you for. The model isn't inventing an API that never existed. It's remembering an API that existed once and doesn't anymore. The mental model the agent is reasoning from is a snapshot frozen at training time. The world has moved on. The codebase has moved on. And the agent has no idea, because nobody told it.

The PR Description Your Coding Agent Cannot Write

· 10 min read
Tian Pan
Software Engineer

Your coding agent finished the task. The diff is small, the tests are green, the lint is clean, and the PR body says, in its entirety, "Fixes the bug in module X." A reviewer six time zones away opens the page, reads the diff in isolation, sees nothing wrong with it, and approves a technically correct change that solves the wrong problem. The change ships. Two days later a customer asks why the workaround they had been relying on stopped working, and you discover that the bug your agent fixed was not the bug the ticket was about.

The code was fine. The reviewer was conscientious. The agent did exactly what it was asked. The artifact between them — the pull request — was empty of everything that would have caught the mistake.

What You Deleted Is Invisible to Your Coding Agent

· 10 min read
Tian Pan
Software Engineer

You spent Tuesday afternoon deleting a dead utility module. You cleaned up the imports, ran the type checker, watched CI go green, and merged the PR. Wednesday morning, a fresh agent session looks at the same code, decides the codebase is "missing" a small helper, and writes the dead module back in — same name, same shape, slightly different style. The reviewer who approved the deletion yesterday now has to remember why they killed it, find the conversation that justified it, and explain it again. The agent is not malfunctioning. It is doing exactly what its context says to do.

This is the structural reliability problem of coding agents that nobody is solving with prompt engineering: the agent's context starts from the repository's current state, but not from the history of why that state is what it is. The file you removed leaves no trace the agent can see. The dependency you migrated away from is just another package on npm. The flaky test you intentionally deleted is a coverage gap waiting to be "fixed." Absence — the negative space of decisions you made — is invisible.

The Test the Agent Wrote That Tests Nothing

· 10 min read
Tian Pan
Software Engineer

Ask a coding agent to "add tests for this module" and you will get tests. They will be neatly formatted, they will follow your project's conventions, and they will pass. Coverage will tick up. The PR will look like diligence. And a meaningful fraction of those tests will not be able to catch a single bug you might plausibly introduce.

This is not a story about a model being dumb. The agent did exactly what it was asked. The problem is that "add tests" and "add tests that constrain the behavior" are different requests, and only one of them is easy to verify at a glance. A green checkmark looks identical whether it sits on top of a real assertion or a tautology.

The result is a suite that grows in line count and shrinks in power. You end up with more files, more CI minutes, more things to maintain — and roughly the same probability of shipping a regression as before you started.

The Mixed PR Queue: Reviewer Throughput Is Now the Binding Constraint

· 10 min read
Tian Pan
Software Engineer

For the last twenty years, the Theory of Constraints answer in software delivery was the same: the bottleneck is producing code. We tooled around that assumption — pair programming, IDE autocompletion, faster CI, smaller services, all designed to push more code through a fixed-width review pipe. Then coding agents arrived, the production side of the pipe got 5–10x wider, and the review pipe stayed exactly the same width. A senior engineer who used to open three PRs a week now supervises a fleet that opens thirty in an afternoon. The team's velocity is no longer set by how fast anyone writes code. It's set by how fast a human can read it.

This is not a future problem. Median PR review time has been measured at +441% year over year in some samples, and 31% more PRs are merging with zero review — not by policy, but because reviewers gave up trying to keep pace. Stripe is shipping over a thousand agent-produced PRs per week. Feature-branch throughput grew 59% YoY in one benchmark while main-branch throughput fell 7% — code is being written, but it's not getting promoted, because it's stuck in review.