1 / 18
Trainer Profile
Tian
Pan
Started companies, shipped at scale,
didn't study CS in college.
Uber
Brex
Founder
Airbnb
Yale MS
"Have you ever had no idea what you were doing — and did it anyway?"
That's basically my entire career.
Not a CS major. Management undergrad in Beijing → Yale CS → built things used by hundreds of millions. That path is your advantage, not your disadvantage.
Trainer Stats
22k
monthly users on his startup
90M+
users on systems he built
30x
speed improvement shipped at Uber
$2B
company value helped grow from $0
A Trainer's Journey in 8 Stages
Agentic Engineering:
Build Your Own Software Pokémon Army
From solo grinder to gym leader. How one trainer builds an autonomous army.
Trainer overlooking a digital Pokémon army landscape
Act I · Departure
Solo grind → First catch
Act II · Initiation
Build party → Team battle → Wipeout
Act III · The Return
Rebuild → Gym leader
A lone trainer grinding at a single computer
Act I · Departure
1. The Ordinary World
The Solo Grind
You are a lone trainer. No Pokémon, no party. Just you, grinding XP manually, line by line.
From experience

At Uber, the hardest part of any task was never writing the code. It was the research and design phase — figuring out where and what to change. When the codebase is massive, the documentation is lost, and the previous maintainer is gone, you spend 80% of your time building a mental model of a system someone else built years ago.

Act I · Departure
2. The Call to Adventure
Rare Candy Arrives
Someone hands you a Rare Candy after a lifetime of manual grinding. Copilot. Cursor. Windsurf. The XP gains are real:
+26%
merged PRs across 4,000+ devs
MS/Accenture RCT
10x
faster on file migrations
Cognition/Devin
+35%
for junior developers
seniors: +8–16%
Rare Candy doesn't change what you can build. The ceiling is still you. Remove yourself from the battle. Start training a team.
A trainer receiving a glowing Rare Candy
Trainer catching their first AI agent Pokémon
Act II · Initiation
3. Crossing the Threshold
Your First Pokémon
You write a spec. Hand it to an autonomous agent. Walk away. Come back to a pull request. You just caught your first Pokémon.
Rare Candy (Cursor)Pokémon (Claude Code)
InputYour cursor position + surrounding codeA goal: "Add Stripe billing to this app"
OutputCode suggestionPlan → code → tests → iterate until done
Who drives?YouThe agent, within your guardrails
Key skillPrompt engineeringContext engineering
My workflow

I always start in Plan Mode. The agent analyzes the codebase, proposes an approach. I review the plan, adjust, then say "execute." And one rule: "You debug it yourself. I only want results." The agent curls the API, reads the logs, writes tests to prove it's correct.

Act II · Initiation
The Skill Book
Context Engineering in Practice
The model is the engine. The context is the skill book. What you teach your Pokémon determines how it fights.
Specs

Write clear specifications with acceptance criteria before the agent touches code. The spec is your highest-leverage artifact.

Codebase

Structure your codebase so agents can navigate it — clear file naming, module boundaries, up-to-date docs. The agent reads your code like a new hire.

Feedback Signals

Tests, type checkers, linters. The Pokémon needs to know when it's wrong. Without feedback, it confidently produces garbage.

"The quality of an agent often depends less on the model itself and more on how its context is structured and managed."
— Anthropic (Sept 2025)
A glowing skill book with magical runes
Pokémon self-healing in a testing arena
Act II · Initiation
4. The Road of Trials
Training the Inspection Line
Your Pokémon did something. Half the tests fail. A Pokémon without self-healing just creates chaos at scale.
while not all_tests_pass: agent writes / fixes code run tests (unit, integration, typecheck, lint) agent reads error output agent diagnoses and fixes repeat
My inspection line

The agent must write its own integration and unit tests. For the backend: curl the actual API and verify responses. For the frontend: Playwright MCP — the agent opens a real browser, navigates the UI, and verifies things render and behave correctly. The agent doesn't just write code. It proves the code works.

Act II · Initiation
5. Assembling the Alliance
Assembling the Party
One Pokémon handles one task. You need specialized party members with combo moves.
MCP — The Item Bag

Agent-to-tool communication. Any Pokémon can use any item, API, or data source. Anthropic-originated. 97M+ monthly SDK downloads. MCP gives your Pokémon hands.

Claude Skills — The Move Set

Custom slash commands (/today, /blog, /ci) encode repeatable combo moves. CLAUDE.md is the trainer's manual every Pokémon reads on startup. Hooks trigger before/after agent actions — your inspection gates in code.

A shared CLAUDE.md + custom skills + MCP tools gets you 90% of the way.
A party of specialized robot-Pokémon in formation
Multiple Pokémon fainted in a battle failure
Act II · The Abyss
6. The Abyss
Total Party Wipeout
What this actually looks like

The most dangerous failure isn't the loud one. I had a coding agent make changes that passed all existing tests, looked correct in review, and shipped. Days later, I discovered the changes had broken a subtle invariant that no test covered. The code failed silently. No error logs. No crash. Just wrong behavior that took days to trace back.

41–87%
Failure rate across 7 multi-agent frameworks.
NeurIPS 2025
36.9%
of failures are coordination breakdowns.
14
distinct failure modes. Root cause: party design.
The nightmare: a Pokémon that produces defective results that pass inspection. Your inspection line has blind spots, and the Pokémon will find every single one.
Act II · The Abyss
The Scaling Study
More Pokémon ≠ Better
Google DeepMind + MIT tested 180 configurations across 5 architectures (arXiv 2512.08296):
+80.9%
Centralized trainer improves parallelizable work.
−39–70%
ALL multi-party setups degraded sequential work.
4
Pokémon is the saturation point.
Error Amplification

Uncoordinated party: errors amplify 17.2x
Centralized trainer: contained to 4.4x

The Lesson

Don't add Pokémon. Add the right Pokémon.

Too many Pokémon crowded in chaos
Trainer carefully rebuilding with 4 organized stations
Act III · The Return
7. Resurrection
Rebuilding with Constraints
No single architecture consistently wins (SWE-Bench, 80 approaches). But four principles survived the wipeout:
1. Inspection > Production
Better inspection gates, not stronger Pokémon.
2. Context > Model
Better skill books > better engines.
3. Start with One
Gains plateau beyond 4. Start simple.
4. Co-learn with AI
The team improves itself.
Practical tip

Start free: Claude.ai free tier, GitHub Copilot student plan, and Cursor free tier get you surprisingly far. When you outgrow them, I run my entire operation on Claude's multiple $200/mo subscriptions + a cli-to-api proxy1/7 to 1/10 the cost of raw API calls.

Act III · The Return
8. Return with the Elixir
My Actual Gym
This is not a metaphor. This is what one trainer's Pokémon army looks like today.
10
Claude Code agents across 4 Macs, 6 screens
5
agent writers 24/7
1
trainer. Replaces 10–15 people.
Bird's eye view of a thriving Pokémon gym factory
Stone tablet with glowing trainer rules
Act III · The Return
Trainer's Rules
How I Train the Army
"You debug it yourself"
I only want results. Curl the API, search logs, write tests. Don't ask me to verify.
Tokens consumed = efficiency
How many agents can I keep busy simultaneously?
Work without supervision
Best agents initiate work proactively. Cronjobs + infinite task loops.
Architecture = freedom to fail
Even if they produce garbage, the blast radius is contained.
Measurable, improvable, composable
If you can't measure it, you can't improve it.
Use agents for everything
Code, content, video, social media, customer support, calendar.
Act III · The Return
The Value Shift
The Gym Leader
DORA 2025: 80%+ devs report gains, but org delivery metrics unchanged. AI amplifies existing quality. The Pokémon doesn't fix the strategy.
The Pokémon Does This
  • Writing boilerplate code
  • Generating test cases
  • Translating specs into code
  • Producing documentation
  • Fixing well-defined bugs
The Trainer Does This
  • Defining what to build and why
  • Designing systems that are testable
  • Writing specs worth translating
  • Architecture decisions under uncertainty
  • Diagnosing ill-defined problems
Context Engineering
The spec is your highest-leverage artifact.
Evaluation Design
If you can't evaluate output, you can't run a gym.
Systems Thinking
Pokémon do local optimization. Trainers do global coherence.
Product Taste
When anyone can build anything, the question is what's worth building.
Gym leader commanding an army vs solo fighter
Two trainers - one stopped by barriers, one walking through
Act III · The Return
One More Thing
The Non-CS Advantage
CS Backgrounds

Tend to be conservative at the edges of what agents can do. They know too much about what should be hard, so they don't try. They optimize within known boundaries.

Non-CS Backgrounds

Use their imagination. They ask "what if I just told the agent to do this?" and discover it works more often than the experts expect. They push the boundaries because they don't know where they are.

If I were back in college

I would follow my passion, follow the money, and try doing everything with AI agents. Not because the technology is perfect — you've seen in this talk it's far from it. But the skill of designing agent systems, finding their limits, and building around those limits is the most valuable skill you can develop right now.

Steam factory transforming into AI factory
Act III · The Return
The Paradigm Shift
The Paradigm Shift
Using AI as “fancy autocomplete” is like bolting an electric motor onto a steam engine.
The real revolution is rebuilding everything around AI.
Ⅰ · AI-CENTRIC
AI-First — A Copernican Revolution
Rebuild every process around AI. The only question to ask: what obstacles can I remove for AI?
Ⅱ · CLOSED LOOP
Closed-Loop Iteration — Velocity 10–100×
Remove humans from the execution loop. Let AI autonomously iterate with full access to the environment. Extending AI’s reliable autonomy from minutes to hours — the trillion-dollar question.
Ⅲ · HARNESS ENGINEERING
Harness Engineering — Quality & Guardrails
Humans define boundaries and guardrails. Decouple architecture into minimal components. Multi-agent cross-validation ensures quality.

“Not patching old processes — a Copernican revolution for the digital world.”

Game Clear
Solo grinder → Rare Candy → First Pokémon → Party → Wipeout → Gym Leader.
The Pokémon keep getting stronger. But the trainer who designs the system — that's you.
Tian Pan · @tianpan_co · March 2026
Quest Board
Your First Quest
Pick one project. Write a one-page spec. Hand it to Claude Code.
Review what comes back. You just caught your first Pokémon.
LV 1 · EASY
Organize Your Notes
Feed Claude your messy notes & TODOs. Get a structured knowledge base + daily plan back.
LV 2 · MEDIUM
Build & Deploy a Site
Write a one-page spec. Claude builds your personal site. Deploy to Vercel in one afternoon.
LV 2 · MEDIUM
Research with a Browser
Playwright MCP — Claude browses sites, extracts data, writes a report for your paper or startup.
LV 2 · MEDIUM
Automate Social Media
Generate a week of posts from your notes, projects, or interests with a single command.
LV 3 · ADVANCED
Create Your Own Skill
Turn any routine into a slash command: /study-plan, /job-apply. Build your own automation.
LV 3 · ADVANCED
Build Your Own Tool
A Discord bot. A budget tracker. A study scheduler. You write the spec, the agent builds it.
Get Claude — Referral 1 Get Claude — Referral 2