The Coding Agent That Passes Locally and Fails in CI

June 1, 2026 · 11 min read

Software Engineer

The agent's diff was green on your laptop. Tests passed, lint passed, the dev server hot-reloaded clean. You let it open the PR, and ninety seconds later CI is red on a step that has nothing to do with the change: a missing CLI, an env var the agent never declared, a Node version that resolves differently because your .nvmrc resolves through a global shim that the runner does not have. The agent did not write a broken diff. It wrote a diff that depends on your machine, and your machine and the runner are not the same computer.

"Works on my machine" was a human bug. The fix was discipline — pin versions, write Dockerfiles, read the CI logs. Coding agents inherited the bug at scale and removed the discipline that used to compensate, because the agent does not know which of the things it relied on came from the repo and which came from the warm sediment of your shell history. Every developer's laptop is a uniquely configured environment that the agent absorbs without naming. Then the same agent runs in a runner that is none of those things, and the failure mode looks like the agent's fault when it is actually an environmental contract that nobody wrote down.

The Warm-Environment Tax Nobody Itemized

Your laptop is a warm environment. You have a shell that sources a dotfile that exports AWS_PROFILE, a ~/.npmrc with a private registry token, a global Node installed by Homebrew that your nvm shim resolves to when the repo's .nvmrc is missing a minor version, a Docker image from six months ago still cached in your local registry, gh already authenticated, kubectl pointing at a cluster, and a Postgres running on localhost because you brew-installed it on a Thursday in 2024.

The agent did not install any of that. It inherited it. When it ran npm install, the private registry resolved because the token was already in your shell. When it ran make seed, the container started because the image was already pulled. When it ran the test suite, the database was already there. None of that machinery is in the repo. The runner has none of it.

The tax compounds because the agent does not see what it is using. A human developer who runs npm install and sees npm warn deprecated using cached token at least has a chance to notice. An agent reading the same output as a tool result treats the success as the contract: the command works, the test passes, the task is done. The agent then writes a commit message that asserts the change is complete. The CI runner reads the same repo, runs the same commands, and learns that none of it is true.

The Runner Is a Stranger To Your Shell

CI runners are deliberately bland. They have a stock shell, a minimum tool set, no cached credentials, scrubbed env vars, a fresh filesystem, and a network policy that does not silently let your private registry through. This is the right design — the runner exists to certify that the artifact in git is sufficient. Anything the runner can do that the artifact in git cannot describe is a hole in the certification.

For human developers, the runner's blandness is annoying but legible. You read the failure, you find the missing piece, you commit it. For coding agents, the runner's blandness is invisible because the agent never met the runner. The agent met your shell, ran your commands, observed your outputs, and concluded the task was done. The runner only exists to the agent as a failure log it has to interpret after the fact, often a turn or two later when the human paste-bombs the CI output back into the conversation.

The drift is asymmetric. Local-only success can ship; CI-only success cannot. Every dev-env affordance the agent absorbed becomes a tax the CI runner pays in the form of a red build, and the agent has no native channel for "did this work in the environment that decides whether it ships."

Why The Agent Cannot Smell What It Inherited

A senior engineer who runs npm install knows, at some level, that the install used a private registry, that the credentials came from somewhere, that the lockfile resolved to a specific Node version, and that some of those facts are not in the repo. The senior engineer has an unwritten model of "what would a stranger need to reproduce this." That model is what makes the engineer commit a .tool-versions file, or write a make bootstrap target, or push the credential setup into a script.

The agent does not have that model unless somebody wrote it down. The agent reads the repo, sees a package.json, runs npm install, gets success, and moves on. There is no internal signal that says the command succeeded for reasons not contained in the codebase you just read. The agent's training optimizes for "complete the task." Surfacing dependencies the task did not explicitly declare is not the task.

This is where the failure mode becomes architectural. The agent is not being lazy. It is operating exactly as designed — convert a goal into actions, observe the outputs, declare done when the outputs are green. The piece nobody designed is the layer that asks "would this still be green if you were not on this developer's machine."

Patterns That Close The Gap

The fix is not "make the agent smarter." It is to give the agent a contract it can read about what the environment must contain, and then to verify the contract before editing.

Loading…

References:

Let's stay in touch and Follow me for more thoughts and updates

Twitter LinkedIn Telegram Discord 小红书

The Coding Agent That Passes Locally and Fails in CI

The Warm-Environment Tax Nobody Itemized

The Runner Is a Stranger To Your Shell

Why The Agent Cannot Smell What It Inherited

Patterns That Close The Gap

Recommended Reading

About Tian Pan

The Warm-Environment Tax Nobody Itemized​

The Runner Is a Stranger To Your Shell​

Why The Agent Cannot Smell What It Inherited​

Patterns That Close The Gap​

Recommended Reading

About Tian Pan

The Warm-Environment Tax Nobody Itemized

The Runner Is a Stranger To Your Shell

Why The Agent Cannot Smell What It Inherited

Patterns That Close The Gap