Skip to main content

Cloud Agents Are Rewriting How Software Gets Built

· 7 min read
Tian Pan
Software Engineer

The first time an AI coding agent broke a team's CI pipeline—not by writing bad code, but by generating pull requests faster than GitHub Actions could process them—it became clear something fundamental had shifted. We were no longer talking about a smarter autocomplete. We were talking about a different model of software production entirely.

The arc of AI-assisted coding has moved quickly. Autocomplete tools changed how individuals typed. Local agents changed what a single session could accomplish. Cloud agents are now changing how teams build software—parallelizing work across multiple asynchronous threads, running tests before handing off PRs, and increasingly handling 3-hour tasks while developers sleep or move on to other problems.

This shift has real implications for how you design systems, manage context, and think about the developer's role. Here's what the transition actually looks like in practice.

Three Eras, Three Different Abstractions

It helps to think about AI coding tooling through three phases, each representing a different unit of work:

Era 1: Tab completion — The unit of work is a line or block. The developer drives; the AI suggests. Latency matters a lot. Developers who adopted this heavily saw 30–55% faster task completion on individual coding exercises, though the gains were uneven across experience levels.

Era 2: Local agents — The unit of work is a task, typically resolved in 2–5 minutes. The developer describes intent; the agent iterates. Context window limits create a hard ceiling on what fits in a single session. This is where "chat with your codebase" products lived, and where the first friction with real-world codebases started showing up.

Era 3: Cloud agents — The unit of work is a feature or ticket, resolved over minutes to hours. The agent operates in a full VM, runs its own tests, produces a PR with a short video demonstration, and posts back to Slack. The developer reviews output rather than driving execution. Agents now represent 35% of merged production pull requests in teams that have fully adopted this model.

Each era required a different relationship between human and machine. Cloud agents require the biggest adjustment: you're no longer a driver, you're a reviewer and strategist.

What Cloud Agents Actually Run On

The infrastructure differences matter more than they might initially seem.

Earlier agent experiments used "blank VM" setups: minimal containers that ran code but lacked the full environment a developer would use. This limited what agents could do—no GUI, no secrets management, no way to preview a running app visually.

Cloud agents now provision full desktop VMs pre-loaded with development tools. The agent gets VNC access to a real desktop, can run a browser, execute terminal commands, upload files, and interact with services. When something breaks, developers can drop into the same VM directly, inspect state, issue corrections, and hand back control. This tight feedback loop—more like pair programming with an async partner than delegating a ticket—is what enables tasks measured in hours rather than minutes.

The other key architectural change is how model selection works. Rather than routing every request to a single model, current systems run best-of-N execution across multiple models in parallel and select the best result. Long-running tasks spawn subagents: faster, cheaper models for exploration and file search; stronger models for implementation decisions. This manages context window limits at agent boundaries rather than trying to compress everything into one session.

The Parallelism Shift

The most underappreciated change cloud agents bring isn't raw speed—it's parallelism.

The instinct when getting a faster assistant is to expect each task to complete faster. But the real leverage is running many tasks simultaneously. A developer with access to cloud agents doesn't wait longer for individual features; they manage a queue of concurrent work, checking in on each thread every 90–120 seconds rather than spending hours sequentially on each one.

This changes the bottleneck from execution to review. As agents generate code faster, code review becomes the constraint. One internal data point: agent-generated documentation PRs achieve an 82% acceptance rate; new feature PRs come in at 66%—still well above what most developers expect, but lower, reflecting the additional judgment calls involved in greenfield work.

Teams adapting to this have moved review from a line-by-line diff exercise to a multi-layer strategy: watch a 30-second video demonstration of the feature running, then review critical paths at the code level, then run security and performance checks as a final gate. The video layer alone eliminates a surprising amount of back-and-forth on behavioral questions.

Slack becomes the coordination layer. Instead of "are you working on this?" the conversation shifts to "the agent is working on this—do we like the direction?" Teams collaboratively refine prompts as a shared artifact. The change in social dynamics around code ownership is real and worth thinking through deliberately.

The Hard Problems That Remain

Cloud agents are powerful, but the failure modes are specific and worth knowing before you run into them.

Context window exhaustion is the number-one failure mode. Studies on long-horizon coding tasks show over 50% failure rates attributable to context limits, tool budget exhaustion, or agents getting stuck in repetition loops once context exceeds 100k tokens. Agents start repeating prior actions rather than synthesizing new plans. SWE-Bench PRO—a benchmark designed for long-horizon tasks—shows pass rates below 23% even for current top models, compared to much higher scores on shorter-context tasks.

This isn't a model problem that will resolve automatically with bigger context windows. It's an architectural problem. The agents that perform best use structured context management: summarizing completed work before spawning new subagents, keeping active context focused on the current subtask, and explicitly checkpointing state.

Repository onboarding is slow and fragile. Agents need to know how to start your backend, run your tests, check service health, navigate your monorepo. This knowledge is usually tribal—in developers' heads or scattered across READMEs. Current onboarding approaches handle initial setup but break when dependencies change, credentials expire, or deployment targets shift. Building an AGENTS.md or equivalent that stays maintained is operational work that most teams haven't done yet.

Codebase-specific conventions don't persist. An agent working on your codebase today has no memory of conventions it learned last week. It doesn't know you always use a specific pattern for error handling, or that make dev doesn't work and you need to run three separate services. This knowledge gets re-communicated repeatedly unless you systematically document it in agent-readable form.

Token Economics and What Comes Next

The cost curve matters here. Developer-facing AI tool costs have been declining, but cloud agents with hours-long tasks and best-of-N model routing consume dramatically more tokens than autocomplete did. Token spend per developer could move from tens of dollars monthly to hundreds or thousands as agent leverage scales up.

There's a Jevons Paradox dynamic: as agents get cheaper and more capable, teams use them more, not less. Lower cost-per-line doesn't mean lower total cost—it means more lines get written. Teams planning infrastructure spend should model this curve explicitly rather than assuming AI tooling costs stay flat.

Looking forward, the trajectory is toward more autonomy with tighter verification loops. The current pattern—human writes spec, agent implements, human reviews—will likely shift toward agents that handle spec refinement, parallelized implementation across hypotheses, and automated verification against acceptance criteria. The human role increasingly resembles a product and architecture function: providing taste, setting constraints, making judgment calls that require understanding user context.

The teams building well on this are thinking about it as infrastructure work, not just tool adoption. Invest in agent-readable documentation. Build CI/CD that can handle parallelized PR generation without breaking. Establish a review process that scales with throughput. And accept that the skills for working effectively with cloud agents—prompt clarity, system decomposition, verification design—are different from the skills that made you productive with tab completion.

The era of AI as a typing accelerator is ending. The era of AI as a parallel collaborator has already started.

Let's stay in touch and Follow me for more thoughts and updates