Skip to main content

10 posts tagged with "coding-agents"

View all tags

The Unmergeable Agentic Refactor: Why Multi-File Diffs Break at the Seam

· 9 min read
Tian Pan
Software Engineer

A 40-file refactor from a coding agent lands on your desk. You open the PR, scroll through the diff, and every hunk looks fine. The rename is consistent, the imports are tidy, the tests compile in isolation. You merge. Forty minutes later, CI on main goes red because two call sites in a sibling package still pass three arguments to a function that now takes four, and the type checker that would have caught it was never part of the agent's inner loop.

This is the most common failure mode in agent-authored refactors today, and it has almost nothing to do with the quality of the individual edits. Each file, reviewed on its own, looks like something a careful human would have written. The bug lives at the seams — the boundaries where edits from different files have to agree. File-level review hides seam-level correctness, and most review workflows were designed around files.

AI Coding Agents on Legacy Codebases: What Works and What Backfires

· 10 min read
Tian Pan
Software Engineer

Most AI coding demos show an agent building a greenfield Todo app or implementing a clean API from scratch. Your codebase, however, is a fifteen-year-old monolith with undocumented implicit contracts, deprecated dependencies that three teams depend on in ways nobody fully understands, and a service layer that started as a single class and now spans forty files. The gap between demo and reality is not just a size problem — it's a structural one, and understanding it before you hand your agents the keys prevents a specific category of subtle, expensive failures.

AI coding agents genuinely help with legacy systems, but only within certain task boundaries. Outside those boundaries, they don't just fail noisily — they produce plausible-looking, syntactically valid, semantically wrong changes that slip through code review and surface in production.

The AI-Generated Code Maintenance Trap: What Teams Discover Six Months Too Late

· 11 min read
Tian Pan
Software Engineer

The pattern is almost universal across teams that adopted coding agents in 2023 and 2024. In month one, velocity doubles. In month three, management holds up the productivity metrics as evidence that AI investment is paying off. By month twelve, the engineering team can't explain half the codebase to new hires, refactoring has become prohibitively expensive, and engineers spend more time debugging AI-generated code than they would have spent writing it by hand.

This isn't a story about AI code being secretly bad. It's a story about how the quality characteristics of AI-generated code systematically defeat the organizational practices teams already had in place — and how those practices need to change before the debt compounds beyond recovery.

Coding Agents in the Monorepo: Why Context Windows and 50-Service Repos Don't Mix

· 9 min read
Tian Pan
Software Engineer

Here's a failure mode that happens silently: you ask a coding agent to update the authentication service's token refresh endpoint. The agent produces clean-looking code — confident, well-commented, type-safe. It also calls a method signature that was renamed three months ago in a shared library three directories up. The tests for that endpoint pass because the mock still uses the old signature. The bug surfaces in staging when the real library gets pulled in.

This isn't a hallucination in the abstract sense. The model knew about that method — it existed somewhere in the training data or was briefly visible in context. The problem is architectural: the agent never had access to the current version of the interface it was calling.

The Deprecated API Trap: Why AI Coding Agents Break on Library Updates

· 10 min read
Tian Pan
Software Engineer

Your AI coding agent just generated a pull request. The code looks right. It compiles. Tests pass. You merge it. Two days later, your CI pipeline in staging starts throwing AttributeError: module 'openai' has no attribute 'ChatCompletion'. The agent used an API pattern that was deprecated a year ago and removed in the latest major version.

This is the deprecated API trap, and it bites teams far more often than the conference talks about AI code quality suggest. An empirical study evaluating seven frontier LLMs across 145 API mappings found that most models exhibit API Usage Plausibility (AUP) below 30% across popular Python libraries. When explicitly given deprecated context, all tested models demonstrated 70–90% deprecated usage rates. The problem is structural, not a quirk of a particular model or library.

Machine-Readable Project Context: Why Your CLAUDE.md Matters More Than Your Model

· 8 min read
Tian Pan
Software Engineer

Most teams that adopt AI coding agents spend the first week arguing about which model to use. They benchmark Opus vs. Sonnet vs. GPT-4o on contrived examples, obsess over the leaderboard, and eventually pick something. Then they spend the next three months wondering why the agent keeps rebuilding the wrong abstractions, ignoring their test strategy, and repeatedly asking which package manager to use.

The model wasn't the problem. The context file was.

Every AI coding tool — Claude Code, Cursor, GitHub Copilot, Windsurf — reads a project-specific markdown file at the start of each session. These files go by different names: CLAUDE.md, .cursor/rules/, .github/copilot-instructions.md, AGENTS.md. But they share the same purpose: teaching the agent what it cannot infer from reading the code alone. The quality of this file now predicts output quality more reliably than the model behind it. Yet most teams write them once, badly, and never touch them again.

Measuring Real AI Coding Productivity: The Metrics That Survive the 90-Day Lag

· 9 min read
Tian Pan
Software Engineer

Most teams adopting AI coding tools hit the same wall. Month one looks like a success story: PR throughput is up, sprint velocity is climbing, and the engineering manager is putting together a slide deck to share with leadership. By month three, something has quietly gone wrong. Incidents creep up. Senior engineers are spending more time in review. A simple bug fix now requires understanding code nobody on the team actually wrote. The productivity gains have evaporated — but the measurement system never caught it.

The problem is that the metrics most teams reach for first — lines generated, PRs merged, story points burned — are the wrong unit of measurement for AI-assisted development. They measure the cost of producing code, not the cost of owning it. And AI has made production nearly free while leaving ownership costs untouched.

Agentic Coding in Production: What SWE-bench Scores Don't Tell You

· 11 min read
Tian Pan
Software Engineer

When a frontier model scores 80% on SWE-bench Verified, it sounds like a solved problem. Four out of five real GitHub issues, handled autonomously. Ship it to your team. Except: that same model, on SWE-bench Pro — a benchmark specifically designed to resist contamination with long-horizon tasks from proprietary codebases — scores 23%. And a rigorous controlled study of experienced developers found that using AI coding tools made them 19% slower, not faster.

These numbers aren't contradictions. They're the gap between what benchmarks measure and what production software engineering actually requires. If you're building or buying into agentic coding tools, that gap is the thing worth understanding.

CLAUDE.md and AGENTS.md: The Configuration Layer That Makes AI Coding Agents Actually Follow Your Rules

· 9 min read
Tian Pan
Software Engineer

Your AI coding agent doesn't remember yesterday. Every session starts cold — it doesn't know you use yarn not npm, that you avoid any types, or that the src/generated/ directory is sacred and should never be edited by hand. So it generates code with the wrong package manager, introduces any where you've banned it, and occasionally overwrites generated files you'll spend an hour recovering. You correct it. Tomorrow it makes the same mistake. You correct it again.

This is not a model quality problem. It's a configuration problem — and the fix is a plain Markdown file.

CLAUDE.md, AGENTS.md, and their tool-specific cousins are the briefing documents AI coding agents read before every session. They encode what the agent would otherwise have to rediscover or be corrected on: which commands to run, which patterns to avoid, how your team's workflow is structured, and which directories are off-limits. They're the equivalent of a thorough engineering onboarding document, compressed into a form optimized for machine consumption.

The 80% Problem: Why AI Coding Agents Stall and How to Break Through

· 10 min read
Tian Pan
Software Engineer

A team ships 98% more pull requests after adopting AI coding agents. Sounds like a success story — until you notice that review times grew 91% and PR sizes ballooned 154%. The code was arriving faster than anyone could verify it.

This is the 80% problem. AI coding agents are remarkably good at generating plausible-looking code. They stall, or quietly fail, when the remaining 20% requires architectural judgment, edge case awareness, or any feedback loop more sophisticated than "did it compile?" The teams winning with coding agents aren't the ones who prompted most aggressively. They're the ones who built better feedback loops, shorter context windows, and more deliberate workflows.