I have been running multi-agent coding workflows for the past four months, starting with Cursor’s multi-file editing and recently experimenting with Claude Code running parallel tasks. I want to share what I have actually experienced versus what the headlines claim, because the gap is significant.
What 12,000 Lines Per Day Actually Means (And Does Not Mean)
First, yes, it is possible to generate enormous quantities of code with parallel agents. On my best day, I generated roughly 8,000 lines across multiple agents working on a new React + Node.js application. But here is the context that matters:
- About 3,000 of those lines were boilerplate: type definitions, component scaffolding, API route handlers following established patterns
- About 2,500 were test files that the agents generated to match the implementations
- About 1,500 were actual business logic implementations
- About 1,000 were configuration, imports, and plumbing
Of the 1,500 lines of actual business logic, approximately 400 needed significant revision after review. The agents got the general approach right but missed edge cases, made incorrect assumptions about business rules, or chose suboptimal data structures.
So the “true” output for that day was roughly 1,100 lines of production-quality business logic plus 5,500 lines of correct-but-mundane supporting code. That is genuinely impressive – it would take me 3-4 days to produce the same output manually. But it is not the 10x multiplier the marketing implies.
What Actually Works Well
Scaffolding and boilerplate: Multi-agent systems excel at generating repetitive code that follows patterns. CRUD endpoints, database models, form components, test setups – anything where the pattern is well-established and the variation is minimal.
Test generation: When you have a clear function signature and documented behavior, agents generate excellent test suites. I had an agent produce 400 lines of property-based tests for a data validation module that caught two bugs I had missed in my specification.
Code migration: Moving code between frameworks or updating API versions across many files. This is tedious human work that agents handle well because the transformation rules are consistent.
Documentation: Generating JSDoc comments, README files, and API documentation from code. Agents are remarkably good at reading code and describing what it does.
What Does Not Work Well
Complex state management: Anything involving distributed state, race conditions, or complex lifecycle management. Agents consistently generate code that works in happy-path scenarios but fails under concurrent access or error conditions.
Cross-module integration: When a change requires understanding how multiple modules interact, agents working independently often produce code that is internally consistent but fails at integration points. This is the architectural coherence problem that others have discussed.
Implicit business rules: Requirements that are not explicitly documented – the kind of knowledge that lives in senior developers’ heads. Agents cannot infer what they have not been told, and in complex business domains, the undocumented rules are often the most important ones.
Performance-sensitive code: Agents rarely optimize for performance unless specifically instructed. The generated code is correct but may use O(n^2) algorithms where O(n log n) solutions exist, or make unnecessary database queries that are fine at small scale but fail under load.
My Practical Workflow
Here is the workflow I have settled on after four months of iteration:
-
I write the specification and tests first (30-60 minutes): Clear function signatures, expected behavior, edge cases. This is the highest-leverage human work.
-
Agents implement against my specs (10-20 minutes): I run 3-5 parallel agents, each working on a different module or component. More than 5 creates diminishing returns due to merge conflicts and context management overhead.
-
I review and integrate (45-90 minutes): Read each agent’s output, verify against specs, resolve conflicts, fix integration issues.
-
Agents handle follow-up work (15-30 minutes): Documentation, additional test cases, code style cleanup.
Total cycle: 2-3 hours for what would previously take 1-2 days. That is a 3-4x productivity multiplier – real and valuable, but not the 10x or 300% that gets quoted in articles.
The Honest Assessment
Multi-agent coding is a genuine productivity improvement for experienced developers who know what they want to build and can write clear specifications. It is not a replacement for engineering judgment, architectural thinking, or domain expertise.
The DORA finding of 9% more bugs is consistent with my experience: agents introduce subtle errors that pass tests but manifest in production. The bugs are different from human bugs – they tend to be logic errors from incorrect assumptions rather than typos or off-by-one mistakes.
The question for every developer and team is: does the productivity gain outweigh the quality cost and the investment in reviewing agent output? For me, the answer is yes for certain types of work and no for others.
What are other practitioners actually experiencing? I would love to hear from people who have been doing this for more than a month, not just weekend projects.