Shopify's "Prove AI Can't Do It Before Hiring" Is 8 Months Old. The Industry Copied It. Has Anyone Counted What It Actually Costs?

I’ve been watching the Shopify AI-first hiring policy play out in real time, and I need to talk about what I’m seeing—both the parts that work and the parts nobody wants to discuss.

The Background

For those who missed it: in April 2025, Tobi Lutke sent an internal memo mandating that every Shopify team must prove AI cannot do a job before requesting new headcount. It became public on X, went viral, and within months Box, Fiverr, Duolingo, and half of Fortune 500 leadership teams adopted some version of the same policy.

The stated goal was elegant: reflexive AI usage as a “baseline expectation,” baked into performance reviews and prototyping workflows. Shopify claimed teams were achieving “100X the work done” through AI integration.

Eight months later, we have real data. Job postings requiring AI skills doubled from 5% to 9% in a single year. Workers in occupations requiring AI fluency grew from 1 million to 7 million. And Shopify itself has reduced headcount by 34% since 2022.

What I’m Seeing at My Company

I run engineering at an EdTech startup that’s scaling from 25 to 80+ engineers—or at least, that was the plan. After the Shopify memo, our board started asking the same question: “Have you proven AI can’t do this before opening a req?”

Here’s what happened in practice:

The good: We got genuinely rigorous about where human judgment matters vs. where AI tooling can handle the work. Our prototyping velocity doubled. We killed 3 roles that were honestly glorified data entry.

The uncomfortable: We also delayed hiring a second SRE for 4 months while “proving” our AI-assisted monitoring couldn’t replace them. Then we had a production incident at 2 AM that required human judgment about whether to wake up customers or silently fix the data. One person. Single point of failure. The AI monitoring flagged the anomaly but couldn’t make the business decision.

The quietly devastating: Two senior engineers left during that period. Not because of AI—because the team was stretched too thin to do interesting work. They were spending 70% of their time firefighting instead of building. When you structurally can’t hire, you structurally can’t give your best people the breathing room to stay.

The Second-Order Effects Nobody’s Discussing

  1. Bus factor goes to 1. When you freeze hiring, you freeze redundancy. Every domain expert becomes irreplaceable. That’s not resilience, it’s fragility wearing an efficiency costume.

  2. Institutional knowledge stops accumulating. Junior hires are how organizations build knowledge depth. If you’re only keeping seniors and augmenting with AI, you’re running down a clock. Who trains the next generation?

  3. The talent pipeline atrophies. We’re part of an industry that’s telling new grads “we’d rather try AI than hire you.” That message has a 5-10 year echo. The engineers who would’ve been your next staff engineers in 2031 are pivoting to other careers right now.

  4. Survivor overload is real. The people who remain absorb the work of the people who weren’t replaced. AI doesn’t actually do 100% of those departed roles—it does 60-70%. The remaining 30-40% lands on humans who were already at capacity.

The Question I Can’t Shake

Shopify’s headcount went down 34%. Their stock went up. Wall Street called it efficient. But is anyone measuring the institutional resilience they burned through to get those numbers?

Fiverr cut 30% and called it “AI-first.” How many of those people held relationships with freelancers that no LLM can replicate?

I’m not anti-AI. We use AI aggressively at my company. But there’s a difference between “AI should be part of every workflow” and “prove a human is needed before we’ll invest in one.” The first is good engineering practice. The second is a hiring freeze wearing a technology hat.

Has anyone else implemented a version of this policy? What happened to your team’s resilience, retention, and institutional knowledge 6+ months in?

I’d especially love to hear from folks at companies that tried it and quietly walked it back.

Keisha, this hits close to home. I lead a 40+ person engineering org at a Fortune 500 financial services company, and we adopted a softer version of this policy in Q3 2025—not a hard freeze, but every new req now requires a “human necessity justification” that gets reviewed by a committee that includes our AI strategy lead.

Here’s what six months taught us:

The Compliance Angle Nobody Mentions

In financial services, regulatory compliance isn’t something you can hand to an AI and hope for the best. We had a case where our AI-assisted code review flagged a transaction processing change as “low risk.” A human reviewer—one we almost didn’t hire because of the new policy—caught that the change would have violated SOX controls in a way that no current LLM understands. The AI was looking at code patterns. The human was thinking about the auditor who’d review this in April.

We calculated the potential fine exposure: $2-4M. The fully-loaded cost of that reviewer’s role: $280K/year.

The “Prove It” Burden Falls Unevenly

What I’ve noticed is that the “prove AI can’t do it” requirement creates a massive paperwork burden on the teams that need help the most. My infrastructure team—already understaffed—had to spend 3 weeks building a case for why they needed a database reliability engineer. They had to prototype with AI tools, document the gaps, write a formal justification, and present it to leadership.

Meanwhile, the team that was fully staffed and doing fine had no such burden. It’s a regressive policy: the teams with the least bandwidth get taxed the most to prove they need bandwidth.

What Actually Worked

The one genuinely positive outcome: we consolidated 4 separate internal tools teams into 2, using AI-assisted development to maintain the same output. That was a real efficiency gain. The people whose roles changed were redeployed to higher-value work, not laid off.

But that’s a very different thing from “structurally refusing to grow.” Consolidation is a one-time optimization. A permanent hiring gate is a compounding constraint.

Your bus factor point is the one that keeps me up at night. In regulated industries, having one person who understands a critical system isn’t just a risk—it’s an audit finding waiting to happen.

I’m going to push back slightly on the framing here, because I think the real problem isn’t the Shopify policy itself—it’s how everyone else cargo-culted it without Shopify’s context.

Shopify vs. Everyone Else

Shopify had been investing in AI infrastructure since 2020. By the time Tobi sent that memo, they had mature internal tooling, established AI workflows, and a culture where AI usage was already widespread. The memo codified existing behavior.

Most companies that copied the policy in 2025-2026 don’t have any of that foundation. They adopted the conclusion without doing the prerequisite work. That’s not an AI strategy—it’s a cost-cutting strategy wearing an AI costume. (Sound familiar, Keisha? I think you nailed it with “hiring freeze wearing a technology hat.”)

The Scaling Problem Is Real

I’m CTO at a mid-stage SaaS company scaling from 50 to 120 engineers. Or trying to. Here’s what my board doesn’t understand: AI doesn’t scale organizational complexity. You can use AI to write code faster, but you can’t use AI to:

  • Navigate a political conversation with your enterprise customer’s CISO about a security incident
  • Decide whether to prioritize the feature that retains your biggest account vs. the one that opens a new market segment
  • Mentor a promising engineer through their first architecture decision that affects 200K users
  • Build trust with a team that’s burned out and questioning whether leadership sees them as replaceable

These aren’t edge cases. This is 60% of what growing a technology organization actually means.

The Metric I Watch

I’ve started tracking what I call “organizational load per engineer”—the number of critical systems, customer relationships, and domain contexts each engineer is responsible for. Before the AI-first policy conversations, our average was 2.3 critical domains per senior engineer. It’s now 3.8. That’s not efficiency. That’s a fragility score.

When it crosses 4.0, I’m going back to our board with a very different conversation: “We didn’t save money. We shifted cost from payroll to risk.”

The companies that will win in 2027 are the ones hiring strategically right now while their competitors are frozen. Talent markets have memory.

Can I give the view from the ground floor? Because the leadership conversation here is important, but I think it’s missing something.

What It Feels Like When the Org Won’t Grow

I lead design systems for 3 product teams. When the “AI-first” hiring discussions started at my company, the first thing that happened wasn’t a policy change—it was a vibe shift. People stopped talking about what we’d build next quarter and started talking about whether their role was “AI-proof.”

My most creative designer—someone who’d been pushing us toward genuinely innovative accessibility patterns—started spending her evenings learning prompt engineering instead of doing the deep craft work that made her irreplaceable in the first place. The irony is thick enough to cut.

The Work AI Can’t See

Here’s what drives me up the wall about these “prove AI can’t do it” policies: they assume all work is visible and measurable. The most valuable things I do—building trust between my design team and engineering, noticing when someone’s struggling before they burn out, having the hallway conversation that prevents a bad product decision—none of that shows up in a Jira ticket.

Try writing a “human necessity justification” for organizational empathy. I dare you.

The Startup Angle

As someone who’s been through a startup failure, I’ll add this: the companies adopting these policies are optimizing for the current quarter, not the next crisis. When my startup hit its hardest moment, what saved us (temporarily) wasn’t our tools or our processes—it was 3 people who cared enough to work through a weekend because they believed in what we were building.

You can’t AI your way to that kind of commitment. It comes from being part of a team that invests in people, not one that treats humans as the option of last resort.

@vp_eng_keisha your “fragility wearing an efficiency costume” line is going to live in my head rent-free.

Reading this thread as a product leader, and I want to add the go-to-market dimension that I think is getting overlooked.

The Roadmap Compression Problem

When you structurally can’t grow your team, you structurally can’t expand your surface area. At my fintech startup, we’re trying to launch an enterprise product line while maintaining our existing platform. The “prove AI can’t do it” conversation cost us a critical PM hire for 3 months.

During those 3 months, two competitors launched in our target segment. We had the better product thesis. They had the people to execute.

AI helped us ship features faster—I’ll give it that. But AI didn’t attend the 14 customer discovery calls that PM would have run. AI didn’t build the relationship with the design partner who eventually signed with our competitor. Speed of code is not speed of market.

The Investor Narrative Problem

Here’s what’s insidious about this: VCs and board members love the Shopify narrative because it maps cleanly to a metric they already worship—revenue per employee. “Look, we’re doing more with less” is the easiest board slide in the world.

But revenue per employee is a lagging indicator optimized for today’s business, not a leading indicator of tomorrow’s growth capacity. I’ve watched companies optimize this metric right into a corner where they can’t pursue new market opportunities because they don’t have the people to explore them.

A Framework That’s Working Better For Us

Instead of “prove AI can’t do it,” we’ve moved to what I call a role impact matrix:

Work Category AI Augmented AI Replaced Human Required
Repetitive execution Yes Often Rarely
Pattern recognition Yes Sometimes For novel patterns
Relationship building Marginally No Always
Strategic judgment As input No Always
Creative synthesis As tool No For originality

The conversation shifts from “do we need a human?” to “what kind of work are we hiring for?” It’s a subtle but critical reframe. We still use AI aggressively—we just don’t use it as a gate against hiring.

@cto_michelle your “organizational load per engineer” metric is something I’m going to start tracking from the product side. What’s the equivalent? “Strategic surface area per PM” maybe?