Agentic Engineering: Build Your Own Software Pokémon Army

Trainer Profile

Tian
Pan

Started companies, shipped at scale,
didn't study CS in college.

TianPan.co →

Uber

Brex

Founder

Airbnb

Yale MS

"Have you ever had no idea what you were doing — and did it anyway?"

That's basically my entire career.

Not a CS major. Management undergrad in Beijing → Yale CS → built things used by hundreds of millions. That path is your advantage, not your disadvantage.

Trainer Stats

22k

monthly users on his startup

90M+

users on systems he built

30x

speed improvement shipped at Uber

$2B

company value helped grow from $0

A Trainer's Journey in 8 Stages

Agentic Engineering:
Build Your Own Software Pokémon Army

From solo grinder to gym leader. How one trainer builds an autonomous army.

Act I · Departure

Solo grind → First catch

→

Act II · Initiation

Build party → Team battle → Wipeout

→

Act III · The Return

Rebuild → Gym leader

A lone trainer grinding at a single computer

Act I · Departure

1. The Ordinary World

The Solo Grind

You are a lone trainer. No Pokémon, no party. Just you, grinding XP manually, line by line.

100–150 lines/day. That's a senior engineer. The rest is meetings, reviews, debugging, context-switching.
10x engineer was the ceiling. Productivity scaled linearly with headcount.
Knowledge lived in heads. Architecture decisions locked in Slack threads, lost wikis, and tenured engineers who left.
The bottleneck was always people. Not compute, not infrastructure, not ideas — people.

From experience

At Uber, the hardest part of any task was never writing the code. It was the research and design phase — figuring out where and what to change. When the codebase is massive, the documentation is lost, and the previous maintainer is gone, you spend 80% of your time building a mental model of a system someone else built years ago.

Act I · Departure

2. The Call to Adventure

Rare Candy Arrives

Someone hands you a Rare Candy after a lifetime of manual grinding. Copilot. Cursor. Windsurf. The XP gains are real:

+26%

merged PRs across 4,000+ devs
MS/Accenture RCT

10x

faster on file migrations
Cognition/Devin

+35%

for junior developers
seniors: +8–16%

Rare Candy doesn't change what you can build. The ceiling is still you. Remove yourself from the battle. Start training a team.

A trainer receiving a glowing Rare Candy

Trainer catching their first AI agent Pokémon

Act II · Initiation

3. Crossing the Threshold

Your First Pokémon

You write a spec. Hand it to an autonomous agent. Walk away. Come back to a pull request. You just caught your first Pokémon.

	Rare Candy (Cursor)	Pokémon (Claude Code)
Input	Your cursor position + surrounding code	A goal: "Add Stripe billing to this app"
Output	Code suggestion	Plan → code → tests → iterate until done
Who drives?	You	The agent, within your guardrails
Key skill	Prompt engineering	Context engineering

My workflow

I always start in Plan Mode. The agent analyzes the codebase, proposes an approach. I review the plan, adjust, then say "execute." And one rule: "You debug it yourself. I only want results." The agent curls the API, reads the logs, writes tests to prove it's correct.

Act II · Initiation

The Skill Book

Context Engineering in Practice

The model is the engine. The context is the skill book. What you teach your Pokémon determines how it fights.

Specs

Write clear specifications with acceptance criteria before the agent touches code. The spec is your highest-leverage artifact.

Codebase

Structure your codebase so agents can navigate it — clear file naming, module boundaries, up-to-date docs. The agent reads your code like a new hire.

Feedback Signals

Tests, type checkers, linters. The Pokémon needs to know when it's wrong. Without feedback, it confidently produces garbage.

"The quality of an agent often depends less on the model itself and more on how its context is structured and managed."

— Anthropic (Sept 2025)

Act II · Initiation

4. The Road of Trials

Training the Inspection Line

Your Pokémon did something. Half the tests fail. A Pokémon without self-healing just creates chaos at scale.

while not all_tests_pass:
    agent writes / fixes code
    → run tests (unit, integration, typecheck, lint)
    → agent reads error output
    → agent diagnoses and fixes
    → repeat

My inspection line

The agent must write its own integration and unit tests. For the backend: curl the actual API and verify responses. For the frontend: Playwright MCP — the agent opens a real browser, navigates the UI, and verifies things render and behave correctly. The agent doesn't just write code. It proves the code works.

Act II · Initiation

5. Assembling the Alliance

Assembling the Party

One Pokémon handles one task. You need specialized party members with combo moves.

MCP — The Item Bag

Agent-to-tool communication. Any Pokémon can use any item, API, or data source. Anthropic-originated. 97M+ monthly SDK downloads. MCP gives your Pokémon hands.

Claude Skills — The Move Set

Custom slash commands (/today, /blog, /ci) encode repeatable combo moves. CLAUDE.md is the trainer's manual every Pokémon reads on startup. Hooks trigger before/after agent actions — your inspection gates in code.

A shared CLAUDE.md + custom skills + MCP tools gets you 90% of the way.

A party of specialized robot-Pokémon in formation

Multiple Pokémon fainted in a battle failure

Act II · The Abyss

6. The Abyss

Total Party Wipeout

What this actually looks like

The most dangerous failure isn't the loud one. I had a coding agent make changes that passed all existing tests, looked correct in review, and shipped. Days later, I discovered the changes had broken a subtle invariant that no test covered. The code failed silently. No error logs. No crash. Just wrong behavior that took days to trace back.

41–87%

Failure rate across 7 multi-agent frameworks.
NeurIPS 2025

36.9%

of failures are coordination breakdowns.

14

distinct failure modes. Root cause: party design.

The nightmare: a Pokémon that produces defective results that pass inspection. Your inspection line has blind spots, and the Pokémon will find every single one.

Act II · The Abyss

The Scaling Study

More Pokémon ≠ Better

Google DeepMind + MIT tested 180 configurations across 5 architectures (arXiv 2512.08296):

+80.9%

Centralized trainer improves parallelizable work.

−39–70%

ALL multi-party setups degraded sequential work.

4

Pokémon is the saturation point.

Error Amplification

Uncoordinated party: errors amplify 17.2x
Centralized trainer: contained to 4.4x

The Lesson

Don't add Pokémon. Add the right Pokémon.

Trainer carefully rebuilding with 4 organized stations

Act III · The Return

7. Resurrection

Rebuilding with Constraints

No single architecture consistently wins (SWE-Bench, 80 approaches). But four principles survived the wipeout:

1. Inspection > Production

Better inspection gates, not stronger Pokémon.

2. Context > Model

Better skill books > better engines.

3. Start with One

Gains plateau beyond 4. Start simple.

4. Co-learn with AI

The team improves itself.

Practical tip

Start free: Claude.ai free tier, GitHub Copilot student plan, and Cursor free tier get you surprisingly far. When you outgrow them, I run my entire operation on Claude's multiple $200/mo subscriptions + a cli-to-api proxy — 1/7 to 1/10 the cost of raw API calls.

Act III · The Return

8. Return with the Elixir

My Actual Gym

This is not a metaphor. This is what one trainer's Pokémon army looks like today.

10

Claude Code agents across 4 Macs, 6 screens

5

agent writers 24/7

1

trainer. Replaces 10–15 people.

Morning: Run /today — agent proposes today's priorities
Workday: Dispatch tasks to 10 coding agents. Review PRs, make architecture decisions
Background: 5 agent writers run yarn blog loop producing content. cronjobs run recurring agentic tasks with `claude -p` in cloudflare sandbox.
Bug fixes: GitHub Copilot for small, bounded tasks
Every 6 months: Roadmap and OKR planning — irreducibly human

Bird's eye view of a thriving Pokémon gym factory

Act III · The Return

Trainer's Rules

How I Train the Army

"You debug it yourself"

I only want results. Curl the API, search logs, write tests. Don't ask me to verify.

Tokens consumed = efficiency

How many agents can I keep busy simultaneously?

Work without supervision

Best agents initiate work proactively. Cronjobs + infinite task loops.

Architecture = freedom to fail

Even if they produce garbage, the blast radius is contained.

Measurable, improvable, composable

If you can't measure it, you can't improve it.

Use agents for everything

Code, content, video, social media, customer support, calendar.

Act III · The Return

The Value Shift

The Gym Leader

DORA 2025: 80%+ devs report gains, but org delivery metrics unchanged. AI amplifies existing quality. The Pokémon doesn't fix the strategy.

The Pokémon Does This

Writing boilerplate code
Generating test cases
Translating specs into code
Producing documentation
Fixing well-defined bugs

The Trainer Does This

Defining what to build and why
Designing systems that are testable
Writing specs worth translating
Architecture decisions under uncertainty
Diagnosing ill-defined problems

Context Engineering

The spec is your highest-leverage artifact.

Evaluation Design

If you can't evaluate output, you can't run a gym.

Systems Thinking

Pokémon do local optimization. Trainers do global coherence.

Product Taste

When anyone can build anything, the question is what's worth building.

Gym leader commanding an army vs solo fighter

Two trainers - one stopped by barriers, one walking through

Act III · The Return

One More Thing

The Non-CS Advantage

CS Backgrounds

Tend to be conservative at the edges of what agents can do. They know too much about what should be hard, so they don't try. They optimize within known boundaries.

Non-CS Backgrounds

Use their imagination. They ask "what if I just told the agent to do this?" and discover it works more often than the experts expect. They push the boundaries because they don't know where they are.

If I were back in college

I would follow my passion, follow the money, and try doing everything with AI agents. Not because the technology is perfect — you've seen in this talk it's far from it. But the skill of designing agent systems, finding their limits, and building around those limits is the most valuable skill you can develop right now.

Steam factory transforming into AI factory

Act III · The Return

The Paradigm Shift

The Paradigm Shift

Using AI as “fancy autocomplete” is like bolting an electric motor onto a steam engine.
The real revolution is rebuilding everything around AI.

Ⅰ · AI-CENTRIC

AI-First — A Copernican Revolution

Rebuild every process around AI. The only question to ask: what obstacles can I remove for AI?

Ⅱ · CLOSED LOOP

Closed-Loop Iteration — Velocity 10–100×

Remove humans from the execution loop. Let AI autonomously iterate with full access to the environment. Extending AI’s reliable autonomy from minutes to hours — the trillion-dollar question.

Ⅲ · HARNESS ENGINEERING

Harness Engineering — Quality & Guardrails

Humans define boundaries and guardrails. Decouple architecture into minimal components. Multi-agent cross-validation ensures quality.

“Not patching old processes — a Copernican revolution for the digital world.”

Game Clear

Solo grinder → Rare Candy → First Pokémon → Party → Wipeout → Gym Leader.

The Pokémon keep getting stronger. But the trainer who designs the system — that's you.

      Tian Pan · @tianpan_co · March 2026
    

Quest Board

Your First Quest

Pick one project. Write a one-page spec. Hand it to Claude Code.
Review what comes back. You just caught your first Pokémon.

LV 1 · EASY

Organize Your Notes

Feed Claude your messy notes & TODOs. Get a structured knowledge base + daily plan back.

LV 2 · MEDIUM

Build & Deploy a Site

Write a one-page spec. Claude builds your personal site. Deploy to Vercel in one afternoon.

LV 2 · MEDIUM

Research with a Browser

Playwright MCP — Claude browses sites, extracts data, writes a report for your paper or startup.

LV 2 · MEDIUM

Automate Social Media

Generate a week of posts from your notes, projects, or interests with a single command.

LV 3 · ADVANCED

Create Your Own Skill

Turn any routine into a slash command: /study-plan, /job-apply. Build your own automation.

LV 3 · ADVANCED

Build Your Own Tool

A Discord bot. A budget tracker. A study scheduler. You write the spec, the agent builds it.

Get Claude — Referral 1 → Get Claude — Referral 2 →