Last week I left an AI coding agent running while I went to lunch.
Came back to 47 commits spread across 8 files. The code worked. Tests passed. But I spent the next hour trying to understand why it made certain architectural decisions.
This is the new reality we’re navigating: agentic coding agents that run for hours without human input. Not minutes. Hours.
The Rakuten Reality Check
There’s a case study making the rounds: Rakuten engineers gave Claude Code a task involving a 12.5-million-line codebase spanning multiple programming languages. The agent ran autonomously for 7 hours and completed the implementation with 99.9% numerical accuracy. Seven. Hours. ![]()
No human wrote a single line of code during that time.
When I first read this, my immediate thought was: “Okay, but would I trust that? Would I hit merge?”
The Trust Paradox We’re All Living
Here’s what the data tells us: developers now use AI in roughly 60% of their daily work. That’s massive adoption. But here’s the kicker—we can only fully delegate 0-20% of tasks.
Think about that gap. We’re using AI constantly, but we’re not trusting it to work unsupervised for most things.
In my design systems work, I see this play out constantly:
-
Component generation? Agent crushes it.
Give it a design spec, it creates React components with proper TypeScript types, follows our naming conventions, even adds basic tests. -
Accessibility decisions? Nope.
Still need human judgment for ARIA labels, keyboard navigation patterns, focus management. The agent makes reasonable guesses, but “reasonable” isn’t good enough when you’re building for inclusive access.
The agents are incredible at execution. But the judgment calls? That’s still us.
Are We Building Trust or Just Better Debugging?
Here’s my honest question: After six months of using agentic tools daily, I’m not sure if I trust the AI more… or if I’ve just gotten better at reviewing and debugging AI-generated code.
Those are not the same thing.
Trust means I can look away. Debugging means I’m still in the loop, just in a different phase. Maybe that’s fine! Maybe that’s exactly where we should be. But let’s be clear about what we’re actually doing.
When I let an agent run for an hour on a component library refactor, I’m not trusting it the way I’d trust a senior engineer. I’m trusting that:
- It won’t break anything critical (tests + CI will catch that)
- The blast radius is contained (it’s working in a feature branch)
- I can review and course-correct after
That’s not the same as “I trust this will be production-ready when I get back from lunch.”
The Changing Nature of Our Work
What I am seeing change is where I spend my time:
- Less time writing boilerplate and repetitive code

- More time reviewing architectural decisions

- Less time debugging syntax errors

- More time validating accessibility, performance, edge cases

The craft is shifting. I’m becoming more of an architect and critic than a builder. And honestly? For design systems work, that’s probably the right direction. My value isn’t typing React code—it’s knowing which components should exist and how they should behave across contexts.
But I’m curious: Are we ready for this shift? Or are we pretending to trust AI while secretly just becoming really good debuggers of autonomous systems?
What’s your experience been? Where do you draw the line between delegation and supervision?
Related: We’re also seeing agents complete 20+ actions autonomously before requiring human input—literally double what was possible six months ago. The capabilities are accelerating faster than our frameworks for using them.