Skip to main content

The Codebase Index Your Coding Agent Rebuilt From a Checkout Three Weeks Behind Main

· 10 min read
Tian Pan
Software Engineer

A coding agent on your team opens a pull request that calls parseUserToken() four times across two files. The function does not exist in the repository, has not existed for nineteen days, and was replaced by decodeSessionClaim() in a commit your engineers all remember reviewing. The agent did not invent the name. It read the name from its semantic index — a vector store rebuilt from a working copy that was twenty-one days behind main. The agent's edit step, by contrast, ran git pull at session start and operated on fresh code. Two views of the same repository, three weeks apart, and the agent confidently bridged them with code that does not compile against anything real.

This is the failure mode that doesn't announce itself. The agent ran. The tests appeared to pass. The PR landed. The first reviewer noticed only because a stubbed-out function shared a name with an unrelated helper and tripped the linter. By then the agent had spent a full sprint writing against a phantom version of the codebase, and no one on the team — including the agent — had any signal that something was wrong.

The seam between an agent's understanding of the codebase and the codebase's actual state is a coherence boundary that nobody draws on the architecture diagram. The index is updated by one team on one cadence. The working tree is fetched by a different team on a different cadence. The agent's reasoning lives on the first surface. The agent's actions land on the second. When the two diverge by even a few commits — let alone three weeks — the result is an agent that grounds confident claims on a repository that no longer exists.

How the Drift Gets In

The index and the working tree are two ways the platform answers the question "what code is in this repo." Most agent platforms keep them in different places, refresh them on different schedules, and assume the result is the same. It often isn't.

The most common source of drift is a mirror cache. To absorb GitHub rate limits and accelerate cold clones, agent platforms hold a local mirror of each tracked repository that refreshes from origin on an interval — every few minutes, every hour, sometimes longer. The mirror has been the same kind of infrastructure as a CDN for years; tools like gitcache and git-cache-clone are explicit about it. The interval is configurable, and that's where the problem lives. If the refresh job is paused, throttled, misconfigured, or forgotten during a platform migration, the mirror lags origin invisibly. The index pulls from the mirror; the working tree pulls from origin directly. The two diverge.

Other vectors are subtler. A semantic index is durable across sessions for performance reasons — embeddings are expensive to recompute, and most agent platforms reuse them aggressively. After a large branch swap, a dependency bump that rewrites thousands of files, a generated-code regeneration, or an LFS pull, the index can carry orphaned entries pointing at code that no longer exists. Cursor's documentation acknowledges this: when autocomplete starts referencing files you deleted three branches ago, the index has stale entries. The Merkle-tree change detector that Cursor uses to incrementally re-index is efficient at finding modified files, but it doesn't always invalidate aggressively enough on structural rewrites.

A third vector is failed indexing without failed retrieval. The indexer crashes mid-run, leaves a partial index in place, and the agent platform keeps serving search results from whatever was indexed last. The agent has no way to know its results are now selectively missing the last six commits' worth of changes. Search returns confidently. Confidence is the problem.

What the Agent Does With a Stale View

A coding agent uses its index for two things: search ("find me the function that handles auth") and grounding ("here's the function signature I'm going to call"). Both are vulnerable to staleness in different ways.

Search degrades quietly. A stale index returns the file that used to implement a feature, and the agent reads it as ground truth. If the file has been split, renamed, or merged into another module, the agent's mental map of where logic lives is wrong from the first search. Subsequent searches don't repair this — they often compound it, because the agent's follow-up queries are conditioned on what the first search returned.

Grounding fails more loudly, but only sometimes. If the agent calls a function that no longer exists, the compiler usually catches it — and that should end the story. But the modern coding-agent loop is "iterate until tests pass," and many implementations of that loop have a fallback when the build fails: stub out the call, mock the return, comment out the failing block. The agent's prompt told it to make tests pass, so it makes the tests pass, by removing the thing that was failing. The stub gets shipped. The reviewer sees a passing CI run and a clean diff and approves.

The most insidious case is partial overlap. The function the agent calls does exist, but with a different signature than the index remembers. The argument order changed. A required parameter was added. A return type was narrowed. The code compiles in some cases and crashes at runtime in others. The agent never gets a clear signal that its grounding was wrong, because the wrongness is intermittent.

Why "Stale Index" Doesn't Show Up in Reviews

Loading…
References:Let's stay in touch and Follow me for more thoughts and updates