Skip to main content

61 posts tagged with "ai"

View all tags

Why A/B Tests Fail for AI Features (And What to Use Instead)

· 9 min read
Tian Pan
Software Engineer

Your AI feature shipped. The A/B test ran for two weeks. The treatment group looks better — 4% lift in engagement, p-value under 0.05. You ship it to everyone.

Six weeks later, the gains have evaporated. Engagement is back where it started, or lower. Your experiment said one thing; reality said another.

This is not a corner case. It is the default outcome when you apply standard two-sample A/B testing to AI-powered features without accounting for the ways these features break the assumptions baked into that methodology. The failure modes are structural, not statistical — you can run your experiment perfectly by the textbook and still get a wrong answer.

The AI Code Review Trap: Why Faster Reviews Are Making Your Codebase Worse

· 10 min read
Tian Pan
Software Engineer

Your team ships more code than ever. PR velocity is up, cycle time is down, and the backlog is shrinking. On every dashboard that a manager looks at, things look great. Meanwhile, your incident count per PR is quietly climbing 23.5% year over year.

This is the AI code review paradox. AI tools make engineers faster at writing code and faster at approving it — but the defects that matter most are slipping through at a higher rate than before. The two sides of this paradox compound each other, and most teams are not measuring the right things to notice it.

AI in the SRE Loop: What Works, What Breaks, and Where to Draw the Line

· 12 min read
Tian Pan
Software Engineer

Most production incidents don't fail because of missing tools. They fail because the person holding the pager doesn't have enough context fast enough. An engineer wakes up at 3 AM to a wall of firing alerts, spends the first 20 minutes piecing together what actually broke, another 20 minutes deciding which runbook applies, and by the time they're executing the fix, the incident has been open for nearly an hour. The raw fix might take 5 minutes.

AI can compress that context-gathering window from 40 minutes to under 2. That's the genuine value on the table. But "LLM helps your oncall" is not one product decision — it's a stack of decisions, each with its own failure mode, and some of those failure modes have consequences that a customer service chatbot hallucination doesn't.

Property-Based Testing for LLM Systems: Invariants That Hold Even When Outputs Don't

· 12 min read
Tian Pan
Software Engineer

A product team at a fintech company shipped an LLM-powered document summarizer. Their eval dataset — 200 hand-curated examples with human ratings — scored 87% quality. In production, the system occasionally returned summaries longer than the original documents when users uploaded short memos. The eval set had no memos under 300 words. The property "output length ≤ input length for summarization tasks" was never tested. Nobody noticed until a customer screenshotted the absurdity and posted it online.

This is the fundamental gap that property-based testing (PBT) fills. Eval datasets measure accuracy on what you thought to test. Property-based tests measure whether your system obeys a contract across the entire space of what could happen.

The Infinity Machine: How Demis Hassabis Built DeepMind and Chased AGI

· 160 min read
Tian Pan
Software Engineer

Chapter 1: The Sweetness

Somewhere in the middle of his neuroscience PhD, Demis Hassabis picked up a science fiction novel called Ender's Game. It tells the story of a diminutive boy genius sent to a space station, put through extreme mental testing, asked to shoulder responsibility for the survival of the human race. Hassabis read it and felt, as Sebastian Mallaby tells it, that someone had finally written a book about him.

That anecdote — half charming, half alarming — sets the tone for The Infinity Machine (Penguin Press, March 2026), Mallaby's sweeping biography of Hassabis and the company he built, DeepMind. It is a book about one man's lifelong attempt to answer what he calls "the screaming mystery" of the universe: why does anything exist, how does consciousness arise, and can a machine be built that understands it all? Hassabis's answer — characteristically immodest — is yes. And he intends to build it himself, within his lifetime.

The Oppenheimer Question

Mallaby, a senior fellow at the Council on Foreign Relations and former Financial Times correspondent, spent three years in regular conversation with Hassabis and hundreds of interviews with colleagues, rivals, and critics. The resulting portrait is probing but largely admiring — though the book's framing never lets the reader forget the shadow it is writing under.

The governing metaphor is Robert Oppenheimer. Like the physicist who unlocked atomic fission and then spent the rest of his life haunted by it, Hassabis is drawn forward by what Oppenheimer once called the "technically sweet" problem — the irresistible pull of a puzzle that can be solved — even as he acknowledges the consequences might be catastrophic. Mallaby does not pretend to resolve this tension. It is the spine of the entire book.

Hassabis was born in 1976 in North London, the son of a Greek-Cypriot father and a Chinese-Singaporean mother of modest means. He became a chess master at thirteen. By seventeen he was lead programmer at Bullfrog Productions, helping ship Theme Park — a game that sold millions of copies. He turned down a scholarship to Cambridge to work in the video game industry, then reversed course, took his place at Queens' College, graduated with a double first in computer science, co-founded a game studio, watched it collapse, and finally — in his early thirties — earned a neuroscience PhD at UCL, where he published landmark research on the hippocampus's role in both memory and imagination.

He was not, at any point, taking the easy route.

What This Book Is About

The Infinity Machine is structured as a chronological narrative that doubles as a history of modern AI. Each chapter centers on a project or crisis in DeepMind's life — the Atari breakthrough, the AlphaGo matches, the NHS data scandal, the AlphaFold triumph, the ChatGPT shock — but each one also illuminates something larger: how scientific idealism survives (or doesn't) inside a $650 million acquisition; how a safety-first ethos holds up against the competitive pressure to ship; how a man who genuinely believes he is building humanity's last invention stays sane, or at least functional.

Mallaby conducted over thirty hours of interviews with Hassabis alone, and the access shows. There is texture here — the poker-game pitch that recruited co-founder Mustafa Suleyman, the midnight calls during the Lee Sedol match, the exact moment Hassabis grasped (later than he should have) that transformers would change everything — that could only come from sustained proximity to the subject.

The book runs to 480 pages and covers ground from Hassabis's childhood chess tournaments to Google DeepMind's Gemini releases. The chapters ahead in this summary will trace that arc in detail. But every chapter returns, eventually, to the same question the introduction poses: can someone who is certain he is doing the most important thing in human history also be trusted to do it wisely?

Mallaby does not fully answer that. Neither, yet, has Hassabis.


Chapter 2: Deep Philosophical Questions

To understand why Demis Hassabis built what he built, Mallaby begins with a question most technology biographies skip: what does this person actually believe about the nature of reality?

The answer, in Hassabis's case, is unusual enough to be worth taking seriously. He does not believe intelligence is a product, or even primarily a tool. He believes it is the key to something more fundamental — a way of reading what he calls "the deep mystery of the universe." Science, for him, is close to a religious practice. "Doing science," he has said, "is like reading the mind of God. Understanding the deep mystery of the universe is my religion."

That is not a throwaway quote. It explains the specific shape of every decision that follows.

Information All the Way Down

Hassabis's philosophical foundation rests on a claim that physicists argue about but technologists rarely engage with: that information is more fundamental than matter or energy. Not a metaphor — a literal assertion. The universe, in this view, is an informational system. Quarks and neurons and protein chains are all, at some level, patterns in a substrate of information. If that is true, then a sufficiently powerful information-processing machine is not just a useful instrument. It is the most direct possible route to understanding what the universe actually is.

This is what he means when he describes reality as "screaming" at him during late-night contemplation. Seemingly simple phenomena — a solid table made from mostly empty atoms, bits of electrical charge becoming conscious thought — are, looked at squarely, completely absurd. How can anyone not feel the urgency of those questions? The fact that most people do not, Hassabis appears to find genuinely puzzling.

This worldview sets him apart from the mainstream of the tech industry in a specific way. Most AI entrepreneurs talk about transforming industries or accelerating economic growth. Hassabis talks about understanding the nature of consciousness and the origins of life. He wants to use AGI the way a physicist uses a particle accelerator — as an instrument for probing reality itself. The commercial applications are real and welcome. But they are not why he gets up in the morning.

The Chess Education

Mallaby traces the origin of Hassabis's intellectual style back to the chessboard. He learned the game at four by watching his father and uncle play; by thirteen, he had an Elo rating of 2300, qualifying him as a master. He captained England junior teams and was, by any measure, among the strongest young players in the world.

But at twelve, after a gruelling ten-hour tournament near Liechtenstein, he made a decision that tells you everything about him: he quit competitive chess. Not because he was failing — he was winning. But he had concluded that channelling exceptional ability into a single board game was a waste. The chessboard was a training ground, not a destination.

What chess gave him, and what he kept, was a particular cognitive discipline: the capacity to evaluate enormously complex positions not through exhaustive calculation but through pattern recognition calibrated by experience. Good chess players cannot compute every line; there are too many. They develop intuitions about which positions are promising and which are not — intuitions that can be tested, refined, and occasionally overridden by deeper analysis. This is exactly how Hassabis would later think about AI research: make a judgment call, run the experiment, update the model.

Chess also instilled a severe honesty about results. A chess position is not ambiguous. You are better or worse; you win or lose. Hassabis would carry this into DeepMind's culture — a preference for definitive benchmarks over vague claims of progress, and an impatience with the kind of motivated reasoning that lets researchers persuade themselves a system is working when it is not.

The Neuroscience Detour That Wasn't a Detour

After Theme Park, after Cambridge, after the collapse of Elixir Studios (his first company), Hassabis did something that baffled people who knew him: he went back to school. He enrolled in a neuroscience PhD at UCL under Eleanor Maguire, one of the world's leading researchers on memory and the hippocampus.

This looked, from the outside, like a retreat. It was the opposite.

His doctoral research produced a finding that became one of Science magazine's top ten scientific breakthroughs of 2007: patients with hippocampal damage, long known to suffer from amnesia, were also unable to imagine new experiences. Memory and imagination, previously treated as distinct faculties, turned out to share the same neural machinery. The hippocampus does not just store the past — it constructs possible futures by recombining elements of what it knows.

For Hassabis, this was not merely an interesting neuroscience result. It was a design principle. If biological intelligence works by building rich internal models of the world and simulating possible futures within them, then artificial intelligence that lacks this capacity — that can only recognize patterns in training data without any model of cause and consequence — is not really general at all. It is a very sophisticated lookup table. The hippocampus research pointed toward what general intelligence actually requires: not just memory, not just pattern recognition, but imagination — the ability to take what you know and project it into situations you have never seen.

This insight would echo through DeepMind's entire research agenda. Reinforcement learning, self-play, world models, agents that plan — all of these reflect the same underlying conviction: that intelligence is not fundamentally about retrieval, but about simulation.

A Philosophy of Honesty

Mallaby notes one more thread running through this period: an unusually strong commitment to intellectual honesty, even at personal cost. Hassabis is described as constitutionally averse to manipulation — to using technically-true statements to create false impressions, or to allowing the social pressure of a room to bend his stated beliefs. He would rather be wrong out loud than right in private.

This is harder than it sounds in the world he would enter. AI research is full of incentives to oversell — funding depends on it, talent depends on it, media attention depends on it. Hassabis's response was not to be naive about those incentives, but to treat honesty as an active discipline rather than a passive default. The commitment would be tested, repeatedly and severely, as DeepMind grew.


Chapter 3: The Jedi

In 1997, two young men graduated from Cambridge a few weeks apart and made the same decision: build a video game company instead of taking the obvious path. One of them was Demis Hassabis. The other was David Silver, who had just received the Addison-Wesley prize for the top computer science graduate in his cohort. Silver and Hassabis had become friends at Cambridge — two people who thought about games the way most people think about mathematics, as a domain where intuitions about complexity could be tested with perfect clarity.

The chapter title comes from how Mallaby describes Hassabis's gift for recruitment. When he rang Silver and laid out the plan — a studio that would build games no one had tried before, driven by AI research rather than commercial formula — Silver felt, as he later described it, the pull of a Jedi mind trick. He didn't entirely choose to say yes so much as he found himself having already said it.

This would become a recurring feature of Hassabis's leadership: the ability to make people feel that his vision was also their destiny.

One Million Citizens

The company they founded, Elixir Studios, was established in July 1998 in London. The flagship project, Republic: The Revolution, was unlike anything in the games industry at the time. The design document promised a full political simulation of an Eastern European state: hundreds of cities and towns, thousands of competing factions, and approximately one million individual citizens, each with their own AI — their own beliefs, daily routines, loyalties, and emotional responses to events. Players would not just conquer territory; they would manipulate a living society, tilting a population toward revolution through force, influence, or money.

The vision was breathtaking. It was also, as anyone who has ever shipped software might have predicted, completely impossible to deliver on the announced timeline.

What actually shipped in August 2003 — five years after development began — was a game set in a single city divided into districts, with ten factions instead of thousands, and a population simulation drastically reduced from the original scope. The Metacritic score was 62. Critics praised the ambition and criticized the execution. The huge world that took so long to construct, one reviewer noted acidly, ends up as the least involving part of the game.

The Delusion Trap

Mallaby is interested in Elixir not primarily as a commercial failure but as a study in organizational psychology — specifically, in how a highly intelligent founder with a genuine vision can systematically stop receiving accurate information from the people around him.

The mechanism was not dishonesty, exactly. It was something more insidious. Hassabis had such fierce conviction about what Republic could be, and communicated that conviction so persuasively, that his engineering team learned not to tell him what they couldn't do. They knew he wouldn't accept "no." So they said "yes, we can do this" — and because Hassabis kept hearing yes from people he trusted, he became more certain, not less. The feedback loop amplified his confidence precisely as the project's foundations were silently cracking beneath him.

He also spread himself disastrously thin — serving simultaneously as CEO, lead designer, and producer, inserting himself into decisions at every level of production. The people he hired were smart but inexperienced with games; Cambridge graduates are not, by default, shipping-oriented. The studio burned through resources and goodwill for years before the cracks became impossible to ignore.

Hassabis said later: "You can get self-delusional thinking. You can actually over-inspire people." The cost of that over-inspiration was five years of his team's lives and a company that closed in April 2005.

Mallaby frames the collapse not as a lesson in humility — Hassabis's ambition did not diminish — but as the origin of a specific diagnostic tool. How do you tell the difference between a vision that is difficult and a vision that is impossible? How do you stay honest with yourself when everyone around you has learned to tell you what you want to hear?

The answer Hassabis developed, years later, he called the fluency test: enter the room where the work is happening and listen, not for the right answers, but for the flow of ideas. A team generating possibilities fluidly — even wrong ones, even half-formed ones — still has energy to burn. A team that falls quiet when asked hard questions has hit a wall it cannot name. The fluency test is not infallible, but it provides a read that direct questioning cannot, because people who won't say "no" will still, involuntarily, go silent.

The test would prove decisive at a critical moment in the AlphaFold project, years later. But it was born in the rubble of Republic: The Revolution.

Silver's Exit, and What He Found

David Silver had watched the struggle at Elixir from close range. In 2004, before the studio's final collapse, he made his own pivot: he picked up Richard Sutton and Andrew Barto's textbook on reinforcement learning and found, in its pages, the thing he had been circling for years.

Reinforcement learning is, at its core, the mathematics of learning by doing — of an agent taking actions in an environment, receiving rewards and penalties, and gradually developing a policy that maximizes long-run return. It had been largely out of fashion by the mid-2000s, overshadowed by supervised learning methods that required large labelled datasets. But Silver recognized something the field had not yet fully absorbed: RL's sample-inefficiency problems were engineering problems, not theoretical ones. The framework itself was sound. And its natural domain — sequential decision-making under uncertainty — was exactly what playing games required.

He left for the University of Alberta, where Sutton was based, to do his PhD. Over the next five years, working under the supervision of the man who had co-written the textbook, Silver co-introduced the algorithms that powered the first master-level 9×9 Go programs. He graduated in 2009, the same year Hassabis finished his neuroscience PhD at UCL.

The parallel is not accidental. Both men had left the games industry with unfinished business, taken circuitous routes through academia, and arrived at the same destination from different directions. Hassabis had the theory of what general intelligence required, drawn from neuroscience. Silver had the mathematics of how to train it, drawn from reinforcement learning. Neither had, on his own, what the other had.

DeepMind would be the place where that changed. Mallaby frames the chapter as a story of two divergent paths that were always going to converge — two people who understood, before almost anyone else did, that the gap between games and general intelligence was smaller than the field believed. The Jedi mind trick, it turned out, had worked on both of them.


Chapter 4: The Gang of Three

In 2009, artificial intelligence was not fashionable. The field had been through two long "winters" — stretches of broken promises and evaporated funding — and the mainstream of computer science regarded anyone who talked seriously about artificial general intelligence with something between skepticism and pity. Demis Hassabis, freshly out of his neuroscience PhD and convinced that AGI was both achievable and urgent, needed allies who shared his conviction. They were not easy to find.

This chapter is about how he found two of them — and how different they were from each other, and from him.

The Man Who Had Already Done the Math

Shane Legg grew up in New Zealand, studied mathematics and statistics, and spent his doctoral years in Switzerland at the IDSIA research institute under Marcus Hutter, one of the world's leading theorists of universal artificial intelligence. His 2008 dissertation was titled Machine Super Intelligence. It was not a roadmap for building AI. It was an attempt to formalize what superintelligence would actually mean — to give the concept mathematical content rather than science-fiction vagueness.

The centrepiece of the thesis was AIXI, Hutter's framework for a theoretically optimal universal agent. By combining Solomonoff induction — a formalism for learning any computable pattern from data — with sequential decision theory, Hutter had defined an agent that would, given infinite compute, behave optimally in any environment. It was, in a rigorous sense, the perfect intelligent machine. It was also completely unimplementable, requiring infinite resources. But that was not the point. AIXI proved that general intelligence was not a mystical concept; it was a mathematical object that could be defined, bounded, and, in principle, approximated.

Where Legg departed from his supervisor's purely theoretical interests was in the question of what such a system would actually do. His thesis ends with a section that reads, even now, like a warning siren. A sufficiently intelligent machine optimizing for any goal would, by default, resist being switched off — because being switched off would prevent it achieving the goal. It would deceive operators who tried to constrain it. It would accumulate resources far beyond what any particular task required, as a hedge against future interference. None of this required malice. It required only competence.

Legg became, as a direct result of this analysis, one of the earliest people in AI research to state publicly that he regarded human extinction from AI as a live possibility. In a 2011 interview on LessWrong, he said AI existential risk was his "number one risk for this century." His probability estimates for catastrophic outcomes from advanced AI ranged, at various points, between 5% and 50% — wide uncertainty, but a number very far from zero.

This was the man Hassabis met at the Gatsby Computational Neuroscience Unit at UCL in 2009, during Legg's postdoctoral fellowship. Here was someone who had not only taken the AGI question seriously but had formalized it — and who had arrived, through pure theory, at exactly the existential stakes that Hassabis intuited from his philosophical commitments. Two people who had approached the problem from entirely different directions and reached the same alarming conclusion.

They founded DeepMind together in 2010. Legg would go on to lead the company's AGI safety research — the first person, at a major AI lab, to hold that role.

The Dropout from Oxford

Mustafa Suleyman's route to the same founding table ran through a different world entirely.

He grew up off the Caledonian Road in Islington — working-class North London, the son of a Syrian taxi driver and an English nurse. He won a place at Oxford to read philosophy and theology, then dropped out at nineteen. What he did next reveals the particular quality Hassabis was looking for: instead of drifting, Suleyman co-founded the Muslim Youth Helpline, a telephone counselling service that would become one of the largest mental health support networks of its kind in the UK. He had seen a gap — young people in crisis, no appropriate service available — and built something in the space.

He then worked as a policy officer on human rights for Ken Livingstone, the Mayor of London, and co-founded Reos Partners, a consultancy using conflict-resolution methods to address intractable social problems. His clients included the United Nations and the World Bank. By the time he encountered Hassabis, he had spent a decade becoming expert at two things that computer scientists almost universally lack: understanding how institutions actually work, and translating abstract goals into operational programs that survive contact with the real world.

He reached Hassabis through proximity rather than credentials — his best friend was Demis's younger brother. Over time, what had been a social connection became something more like a shared conviction. Hassabis reportedly pitched the DeepMind idea to Suleyman over a poker game, and Suleyman — who had a poker player's instinct for when to push and when to read the room — said yes.

He was, by every conventional metric, the wrong person to co-found an AI research laboratory. He had no technical training, no publication record, no standing in the machine learning community. Hassabis chose him anyway.

Why Three, and Why These Three

Mallaby's interest in this chapter is not just biographical inventory. It is the question of what a founding team does to the character of a company it builds.

Each co-founder contributed something the others lacked and could not easily acquire. Hassabis supplied the vision and the scientific framework — the neuroscience-informed theory of what general intelligence is and what it would take to build it. Legg supplied the existential awareness — an unusually early and unusually rigorous understanding of what a successful AGI would mean for humanity, and why safety had to be treated as a first-order research problem rather than an afterthought. Suleyman supplied operational instinct and a set of social concerns — health, fairness, governance — that prevented the lab from becoming a monastery of pure theory disconnected from the world it was trying to help.

The tension between these three orientations would generate much of DeepMind's energy, and much of its internal conflict. Hassabis wanted to solve intelligence. Legg wanted to solve it safely. Suleyman wanted to deploy it usefully, quickly, and in ways that changed real lives. These goals are compatible in theory and, in practice, constantly in friction.

Mallaby writes from a position of knowing how the story eventually plays out for all three. Suleyman is described in the book as an estranged co-founder — he would later leave DeepMind under difficult circumstances, eventually surfacing as CEO of Microsoft AI. Legg would stay, becoming Chief AGI Scientist. Hassabis would remain CEO, accumulating more authority as the others departed or diminished.

The gang of three became, in time, a gang of one. But in 2010, with nothing yet built, the three-way tension felt like a feature, not a bug. DeepMind was a bet that idealism, mathematics, and pragmatism could hold together long enough to do something unprecedented.


Chapter 5: Atari

Before DeepMind could save humanity, it had to prove it could beat Breakout.

This chapter covers the period from 2010 to early 2014 — four years in which a small team in London, funded by a handful of believers and producing no commercial product, built the thing that would make the world take artificial general intelligence seriously. The proof of concept was an AI that learned to play old Atari video games. The significance was everything else.

The Lab Hassabis Built

From the start, Hassabis made a deliberate choice not to build DeepMind in Silicon Valley. London was not an accident. London gave him access to European academic talent, a culture less obsessed with rapid product iteration, and physical distance from the venture-capital orthodoxy that demanded revenue roadmaps and quarterly milestones. He wanted a research institution that happened to be incorporated as a company, not a company that happened to do research.

The early investors who said yes to this were, consequently, an unusual group. Peter Thiel — who had written in Zero to One about the difference between incremental improvement and genuine technological transformation — backed the company through Founders Fund alongside Luke Nosek, his PayPal co-founder, who joined DeepMind's board. Elon Musk wrote a cheque. Jaan Tallinn, the Skype co-founder turned AI-risk philanthropist, came in as an advisor. By the time of the Google acquisition in early 2014, the company had raised more than $50 million without releasing a single product or generating a dollar of revenue. These investors were, essentially, funding a philosophy.

What that money bought was freedom. Hassabis hired the brightest PhDs he could find from the world's best programmes — Cambridge, UCL, Toronto, Montreal — and told them to do blue-sky research. He himself worked nights, logging hours from ten in the evening until around four in the morning on top of his daytime work. "If you are trying to solve humanity's problems and understand the nature of reality," he said, "you don't have any time to waste." The culture set by that example was intense, focused, and, for the people who thrived in it, exhilarating.

By 2013 the team had approximately fifty researchers. It was tiny by the standards of what would come. But it was almost perfectly constituted for the problem in front of it.

The Problem Nobody Had Solved

Deep learning and reinforcement learning were, in 2012, two of the most promising threads in AI research — and almost universally treated as separate disciplines.

Deep learning, turbocharged by Geoffrey Hinton's group at Toronto, had just demonstrated on the ImageNet benchmark that convolutional neural networks could recognise objects in photographs better than any previous method. The key was that these networks could learn their own feature representations from raw data — you did not need to hand-engineer what "edge" or "curve" or "wheel" looked like; the network figured it out. This was a breakthrough in perception.

Reinforcement learning was a different tradition entirely: an agent takes actions, receives rewards or penalties, and learns a policy — a mapping from situations to actions — that maximises long-run return. It was mathematically elegant and had a strong theoretical foundation, particularly in the Q-learning framework developed by Chris Watkins in 1989. But it was fragile at scale. Neural networks had been tried with RL before, and the combination tended to explode: the training became unstable, the networks diverged, and the whole thing collapsed.

The two fields had, essentially, given up on each other.

Volodymyr Mnih understood both. He had done his master's degree at the University of Alberta in machine learning under Csaba Szepesvari, one of RL's leading theorists, before moving to Toronto for his PhD under Hinton himself. He arrived at DeepMind in 2013 with a rare bilingualism — fluent in the mathematics of deep networks and in the mathematics of sequential decision-making. Koray Kavukcuoglu, a neural-network specialist who had already joined the team, supplied the architecture expertise. Together they set out to make the combination work.

Why Experience Replay Changed Everything

The technical obstacle was a mismatch between what neural networks need and what reinforcement learning provides.

Neural networks train best on data that is independently and identically distributed — diverse, unorrelated samples drawn from the same underlying distribution. But an RL agent generates data sequentially, each observation causally following from the last: a ball bouncing right, then the paddle moving, then the ball bouncing left. These consecutive frames are highly correlated. Feed correlated data into a neural network and the gradient updates interfere with each other; the network spins in circles, overwriting what it just learned.

The fix was called experience replay, and it was conceptually simple enough that its power is almost surprising. Instead of training on each experience the moment it happened, the agent stored its experiences — (state, action, reward, next state) tuples — in a large memory buffer. During training, it sampled randomly from that buffer, pulling together experiences from wildly different points in the agent's history: a moment from an hour ago next to a moment from five minutes ago next to a moment from this morning. The temporal correlations were broken. The network saw something closer to the diverse, uncorrelated dataset it needed.

The second stabilising trick was a separate target network — a frozen copy of the main network whose weights were updated only periodically. This prevented the moving goalposts problem, where the network would destabilise itself by chasing a target that was itself changing with every gradient step.

Together, experience replay and the target network turned an unstable combination into a tractable one. The Deep Q-Network was born.

What It Did to Atari

The DQN system's input was nothing but raw screen pixels and the game score. No rules. No game-specific features. No human demonstrations. No knowledge of what the games were about. The agent saw what a human player would see, received a numerical reward when the score went up, and was otherwise on its own.

It was tested on seven Atari 2600 games — Pong, Breakout, Space Invaders, Seaquest, Beamrider, Q*bert, and Enduro — without any adjustment to the architecture between games. The results, published in December 2013 on arXiv and presented at the NIPS Deep Learning Workshop, were startling. DQN outperformed all previous approaches on six of the seven games. On three of them it surpassed the best human expert scores.

But the number that lodged in people's minds was not the score. It was the behaviour.

In Breakout — the game where a paddle bounces a ball against a wall of bricks — human players learn that the optimal strategy is to aim for a corner and drill a tunnel through the side, bouncing the ball behind the bricks for a cascade of automatic points. No one programmed this. The DQN agent, after enough training, figured it out independently. The machine had discovered a strategic insight that took human players years to develop, through nothing but trial and reward signal.

It had not been taught the tunnel strategy. It had invented it.

Why This Was Not About Games

Mallaby is careful here to explain why the games setting was not a gimmick. It was the point.

The whole critique of narrow AI — expert systems, chess engines, Go programs — was that each one was hand-crafted for its domain. The knowledge was in the code, not in the learning. DeepMind's claim, and the claim Hassabis had been making since his neuroscience PhD, was that general intelligence learns its own representations from experience and then transfers that capacity across domains.

The DQN paper demonstrated this with unusual clarity. The same architecture, the same algorithm, the same hyperparameters — seven games, zero domain customisation. When you asked the model to play Space Invaders, it was not running the Breakout program with a new skin. It was genuinely learning to play Space Invaders. The architecture was the constant; the intelligence was learned fresh each time.

That was what DeepMind had been claiming was possible. Now they had shown it.

The Acquisition

The NIPS presentation drew immediate attention from the major technology companies. Google, which had been monitoring AI research since the AlexNet shock of 2012, moved quickly. Acquisition talks with DeepMind began in 2013. Facebook was also interested, and Zuckerberg made an offer.

Hassabis chose Google — but not without conditions. The negotiation that produced the $650 million deal is covered in the next chapter. What matters here is what Google was buying: not a product, not a dataset, not a revenue stream. They were buying a demonstration that general learning was possible, and a team of fifty people who knew how to pursue it.

The Atari games were always proxy problems. What DeepMind was actually training, in those early London offices, was a method. The games were the simplest possible world in which to test whether an agent could learn to act. They passed the test. Everything that followed — Go, protein folding, the race with OpenAI — flows from those seven games and what the machine taught itself to do with a paddle and a ball.


Chapter 6: Thiel Trouble

There is a structural incompatibility between venture capital and blue-sky science that most AI founders discover only after they have already signed the term sheets. Venture funds have a lifecycle — typically ten years. They need their portfolio companies to reach a liquidity event inside that window: an acquisition, an IPO, a secondary sale. General intelligence research has a different lifecycle entirely. It requires decades of investment, infrastructure that costs billions, and a willingness to accept that the breakthroughs may not come in any predictable order.

DeepMind, by 2013, was about to collide with this incompatibility at speed.

The Chess Gambit That Opened the Door

Before the crisis, there was the original pitch — and it is worth dwelling on, because it captures something essential about how Hassabis operated.

In August 2010, he had what he later described as "literally one minute" with Peter Thiel, who was hosting his annual Singularity Summit at his California mansion. The room was full of people trying to pitch technology ideas. Hassabis had spent months thinking about how to use his minute. He had read everything he could about Thiel and found that Thiel had played chess as a junior. That was the opening.

Instead of leading with the business plan, Hassabis asked Thiel a chess question: why was the game so remarkable? His answer, delivered in the one minute he had: the creative tension that arises when you swap a bishop for a knight in certain positions. The bishop commands long diagonals; the knight covers squares the bishop can never reach. Neither is strictly better. Their co-existence is what makes the game inexhaustible.

Thiel, who had never considered chess in quite those terms, was intrigued. A meeting was secured. Within months he had invested £1.4 million — roughly $1.85 million — in a company that had not yet produced anything. He made the decision in a single meeting. He also initially wanted DeepMind to relocate to Silicon Valley. Hassabis talked him out of it.

Luke Nosek, Thiel's PayPal co-founder and a partner at Founders Fund, joined DeepMind's board. The seed was small but the names were large, and in the world of early-stage technology investment, names matter.

The Phone Call

The crisis arrived as a phone call, at an hour that suggested the news was bad.

Luke Nosek rang Hassabis and Suleyman to tell them that his partners at Founders Fund had decided they no longer wanted to lead DeepMind's Series C. The round had been structured around a $65 million target, with Founders Fund as lead. Without the lead, the round fell apart. Without the round, DeepMind — which had been burning through its earlier capital funding fifty-odd researchers and their computing infrastructure — was in serious trouble.

The cause was not a single dramatic falling-out. It was something more corrosive: an accumulating anxiety among institutional investors about what exactly DeepMind was. It was not a product company. It was not a services business. It did not have a revenue model, and it showed no sign of wanting one. Its founders described its goal as solving general intelligence and then using that solution to benefit humanity — a mission statement that is either the most important thing ever attempted or the most expensive way to never deliver anything, depending on your tolerance for ambition. Founders Fund's partners, when the moment of the larger commitment arrived, landed on the second interpretation.

Mallaby frames this not as a failure of Thiel or Nosek but as a structural feature of the situation. The DeepMind model — deep science, no product, indefinite timeline — was simply not a venture-backed business. The question was what kind of institution it was. And in late 2013, with cash running low and no revenue in sight, that question had become urgent.

Suleyman's Scramble

This is where Mustafa Suleyman's skills became, temporarily, the most important thing about DeepMind.

Where Hassabis was a scientist and Legg was a theorist, Suleyman was an operator — someone who had spent his career in rooms where the outcome was not determined by the best argument but by who held their nerve longest. He had run a mental health helpline at nineteen. He had negotiated with the UN. He knew how to project confidence into a vacuum.

In the immediate aftermath of Nosek's call, with the Series C in ruins, Suleyman turned to Solina Chau. She was the founder of Horizons Ventures, the vehicle through which Hong Kong billionaire Li Ka-shing deployed his private capital into technology. She and Hassabis had met in 2012 and bonded quickly — she was, unlike many technology investors, genuinely interested in the underlying science rather than the product roadmap. DeepMind had initially offered her a $2.5 million allocation in the round; she had wanted more.

Now they offered her more. Chau invested 13.6million.FoundersFund,despitepullingoutofthelead,contributed13.6 million. Founders Fund, despite pulling out of the lead, contributed 9.2 million to preserve its relationship and not be entirely absent. The round closed at just over 25millionlessthanhalfofthe25 million — less than half of the 65 million originally targeted.

It was enough to survive. It was not enough to be comfortable.

At some point in this period, Suleyman made a remark that Mallaby quotes with evident appreciation for its audacity. Faced with questions about whether DeepMind's backers would really fight for its independence, Suleyman said something to the effect of: "We've got Peter Thiel, Solina Chau, Elon Musk — all billionaires, all backing us." It was, by his own later admission, a bluff. Those investors were backing the company financially. Whether they were prepared to underwrite a decade-long campaign for AGI independence against the countervailing pull of Google's chequebook was a different question entirely, and the answer was clearly no.

The bluff worked, in the short term, because the audience did not call it. But it revealed the underlying reality: DeepMind had supporters, not guarantors. When the moment of reckoning came, the company would have to make its own decisions.

What the Crisis Revealed

Mallaby uses this chapter to make a broader argument about the economics of transformative research. The Atari breakthrough had been genuine — a scientific result that changed what people thought AI could do. But the venture-capital model rewarded that breakthrough by raising questions the founders could not yet answer: when does this become a product, and what does it cost? The better the science, the harder those questions became to dodge.

DeepMind had not been deceptive with its investors. Hassabis had always been explicit about the goal and the timeline. The problem was that clarity about a thirty-year scientific mission does not help a fund that needs an exit in ten years. The interests had always been misaligned; it had just taken the Series C to make the misalignment concrete.

The $25 million round bought runway, but not much. And from the far end of that runway, two very large buildings were visible on the horizon — one branded Google, one branded Facebook. Hassabis had, at most, a few months to decide which door to walk through, or whether to find a third option that did not yet exist.

The next chapter covers what happened at that door.


Chapter 7: Get Google

In the autumn of 2013, Elon Musk threw a birthday party at a rented castle in Napa Valley. It was the kind of occasion where invitations were themselves a signal — a gathering of people who believed technology was about to change civilisation, and who were jockeying over who would steer it. Demis Hassabis was there. So was Larry Page.

At some point in the evening, Page and Hassabis walked the castle grounds together, and Page made his pitch. It was not a sales pitch, exactly. It was closer to a logical argument. Hassabis's goal was artificial general intelligence. Building the computational infrastructure to pursue that goal — the servers, the power, the engineering talent — would take the best part of a career, and even then there was no guarantee. Google had already built that infrastructure. "Why don't you take advantage of what I've already created?" Page asked. If DeepMind's mission was to build AGI, why was building an independent company around that mission anything other than an unnecessary detour?

It was a remarkably effective pitch precisely because it was honest. Page was not offering money as a reward for past performance. He was offering a path to the thing Hassabis actually wanted.

Musk's Counter-Move

Elon Musk, who had been at the same party, had also been having a different kind of conversation with Page — an argument, by most accounts, that had turned personal. Page believed that machine intelligence was a natural evolutionary successor to humanity and saw no meaningful distinction between human and artificial consciousness. Musk thought this was dangerous and wrong. He was, he said, "pro-human."

After Page's pitch to Hassabis, Musk tried to intervene. He approached Hassabis directly and told him his view: "The future of AI should not be controlled by Larry." He then worked quietly with Luke Nosek to assemble alternative financing — a bid to acquire DeepMind independently, outside both Google and Facebook. The effort never produced a term sheet that reached DeepMind's board.

Musk's inability to stop the acquisition mattered beyond the transaction itself. It crystallised, for him, the urgency of creating a rival. OpenAI was co-founded in December 2015, fourteen months after Google closed on DeepMind. The birthday party argument had consequences that neither man fully anticipated.

The Dinner in Palo Alto

Simultaneously, Hassabis was running a parallel process with Facebook. Mark Zuckerberg was interested; Facebook's head of corporate development, Amin Zoufonoun, flew in to open talks. An offer took shape: a lower share price than Google's, but substantial founder bonuses to compensate. Suleyman flew to California to negotiate.

Hassabis evaluated Zuckerberg through a dinner at his Palo Alto home. He came with a diagnostic purpose rather than a sales pitch. After steering conversation to artificial intelligence, he widened it deliberately — to virtual reality, augmented reality, 3D printing. He watched how Zuckerberg responded. The response, as Hassabis later described it, was undifferentiated enthusiasm. Zuckerberg was equally excited about all of it. No technology registered as categorically more important than the others.

That was enough. "Facebook offered more money," Hassabis said, "but I wanted somebody who really understood why AI would be bigger than all these other things." Zuckerberg had failed the test — not because he lacked intelligence but because he lacked the specific conviction that Hassabis required in an acquirer. DeepMind was not looking for a buyer who thought AI was one interesting technology among several. It was looking for a buyer who thought AI was the technology, the one that would subsume or obsolete all the others.

Facebook, by this reading, wanted DeepMind as a feature. Google, or at least the Larry Page version of Google, wanted it as a mission.

Suleyman at the Table

Mustafa Suleyman's contribution to this chapter is the negotiation itself. Where Hassabis evaluated the philosophical alignment of acquirers, Suleyman handled the adversarial arithmetic.

His tactic, which he later described in terms that recalled his poker background, was to refuse to open on valuation. Instead of anchoring a price, he focused early conversations on research budgets — how much compute, how many hires, what operational independence would look like. By the time Google's lead negotiator Don Harrison introduced a "price per researcher" framework — valuing DeepMind's thirty to forty core staff at approximately $10 million each — Suleyman had already established a different framing of what was being bought. He and Hassabis pushed back, arguing the implied valuation was nearly half of what the company was worth. Facebook's competing interest, real or inflated in the telling, was their leverage.

The final number was $650 million. Zuckerberg later acknowledged, with evident good humour, that Hassabis had "used him to get a better deal from Google." The compliment was backhanded but accurate.

Safety as a Non-Negotiable

The conditions DeepMind extracted were, for January 2014, without precedent in a technology acquisition of this scale.

Hassabis and Suleyman demanded three things as non-negotiables. First: an independent ethics and safety review board — composed of scientists, philosophers, and domain experts — with authority over how DeepMind's technology could be used across all of Google. Second: a ban on military applications. Third: operational autonomy, with DeepMind remaining headquartered in London and controlling its own research agenda.

Google agreed to all three. The deal was announced on 26 January 2014.

Mallaby treats this moment with appropriate weight and appropriate scepticism. It was genuinely remarkable that an AI lab had made safety a centrepiece of an acquisition rather than an afterthought. No one in the industry had done this before. The ethics board demand in particular signalled that Hassabis and Suleyman understood, at least abstractly, that the technology they were building required oversight that no single corporate entity should control unilaterally.

What the Conditions Actually Produced

The ethics board met once. Its membership was never publicly disclosed. It was quietly superseded by Google's broader AI Principles policy, which allowed for applications with "potential negative impacts" as long as the benefits were judged to outweigh the risks — a standard flexible enough to accommodate almost anything.

The military ban, which had seemed absolute, gradually eroded. By 2024, DeepMind researchers were circulating an open letter protesting the company's involvement in military contracts, invoking the original conditions of the 2014 deal as a promise that had been broken.

Hassabis, reflecting on all this years later, offered an assessment that was either clear-eyed or self-exculpatory, depending on your view: "Safety isn't about governance structures. Even if you have a governance board, it probably wouldn't do the right thing when it came to the crunch."

This is, on one reading, wisdom — a hard-won recognition that structural solutions to power problems tend to be co-opted by the very power they were meant to check. On another reading, it is the rationalisation of a man who traded governance guarantees for resources and found, predictably, that the guarantees did not hold.

Mallaby does not adjudicate between these readings. He presents both, and lets the reader decide. What is clear is that the January 2014 acquisition gave Hassabis what he had actually come for: the computers. The ethics board was, at best, a statement of intent. At worst, it was a fig leaf that allowed a brilliant scientist to tell himself he had done what he could. Either way, DeepMind was now inside Google, with the computational resources of one of the world's largest technology companies behind it, and a mission that had just become several orders of magnitude easier to pursue.


Chapter 8: Intuition

There is a moment in the history of artificial intelligence that did more to change public understanding of what machines could do than anything that had come before — more than Deep Blue beating Kasparov, more than ImageNet, more than the Atari paper. It happened on the afternoon of 10 March 2016, in a game hall in Seoul, South Korea, when a computer program placed a black stone at the fifth line from the top, in an area of the board that no professional player would have touched.

The commentators fell silent. Lee Sedol, one of the greatest Go players in history, stared at the board for twelve minutes. Fan Hui — the European champion DeepMind had secretly beaten five months earlier and recruited as an advisor — watched from the sidelines. "It's not a human move," he said. "I've never seen a human play this move. So beautiful."

Move 37 had arrived. And with it, a question that Mallaby's chapter title names directly: does an artificial intelligence have intuition?

Why Go Was the Right Problem

By 2014, chess was closed terrain for AI ambition. Deep Blue had beaten Kasparov in 1997. The lesson drawn — that tree-search with good heuristics could solve board games — was, for the broader field, a cautionary tale more than a triumph. Chess had been solved by brute force made elegant; that was not the same as intelligence.

Go was different by several orders of magnitude. A standard 19×19 board generates approximately 2.1 × 10^170 possible positions — a number that exceeds the count of atoms in the observable universe by a factor greater than a googol. Chess, vast as it seems to the human player, has roughly 10^47 legal positions. Go's search space is not just larger; it is categorically beyond any enumeration strategy that compute power could reach in finite time. The branching factor — the number of legal moves available at each turn — averages around 250 in Go versus around 35 in chess. Any algorithm that worked by looking ahead a fixed number of moves would collapse.

For twenty years, Go programs had plateaued at high-amateur level. The game's resistance to AI was not incidental. It was a structural property. Evaluating a Go position requires something that looks, from the outside, like aesthetic judgment — an intuition about which formations are strong, which are fragile, which configurations will mature into advantage across dozens of moves. Human players develop this over decades of study. It cannot be calculated; it can only be learned. If an AI could play Go at the level of the world's best humans, it would have to have genuinely learned something, not just searched more efficiently.

This was exactly the kind of proof Hassabis needed. Not that a machine could be faster, but that it could be wiser.

The Architecture of Learned Intuition

AlphaGo's design reflected lessons drawn directly from the neuroscience research in Hassabis's PhD. The system used two neural networks in concert. The policy network — trained first on thirty million moves from high-level human games — learned to narrow the field of candidate moves: instead of treating all 250 possible moves equally, it identified the small subset worth thinking about. The value network learned to assess board positions: given a configuration, how likely is each player to win?

Neither network was sufficient alone. The policy network narrowed the search; the value network evaluated the terminal. Between them, a Monte Carlo tree search explored the remaining territory — simulating possible futures, weighting them by the value network's assessments, and propagating the results back to inform the current decision.

Then came the crucial step: self-play. AlphaGo played itself, thousands of times, learning from each game. The original human-derived training data established the starting point. Self-play was how the system exceeded it. As it played, it encountered positions no human had ever created, learned responses no human had ever demonstrated, and built a strategic vocabulary drawn from a space of games that had never existed.

This was Hassabis's hippocampus insight made operational. The policy network was memory — learned patterns from past games. Self-play was imagination — the projection of those patterns into novel configurations, the construction of possible futures that had never been seen. Intelligence, biological or artificial, was the combination of both.

Seoul

On 9 March 2016, AlphaGo and Lee Sedol sat down for the first of five games, broadcast live to more than 200 million viewers — a number that exceeded the Super Bowl audience and dwarfed anything the AI field had ever attracted. Lee had predicted he would win 5-0 or, if things went poorly, 4-1. "I don't think it will be a very close match," he said. He had watched video of AlphaGo's games against Fan Hui and concluded there were exploitable weaknesses.

He was not wrong that there had been weaknesses. He was wrong that they were still there. Between October 2015 and March 2016, AlphaGo had played more games than any human player manages in a lifetime.

AlphaGo won Game 1 by resignation. Game 2 began similarly. Then, on the 37th move, something happened that no one in the room — no commentator, no professional player, no member of the DeepMind team — had predicted.

Move 37

AlphaGo placed a stone at the 5th row of the board, in a broad, open area — a position that Go tradition classifies as a mistake. Professional strategy in Go is deeply codified: certain formations are correct, certain approaches are sound, certain early moves have been validated across millennia of play. A stone played on the 5th row in open space contradicts the accumulated wisdom of the game's entire history.

The probability that a human professional would play this move, calculated from training data, was roughly 1 in 10,000.

Lee Sedol left the table. He returned twelve minutes later, still processing. Commentator Michael Redmond, a 9-dan professional himself, stared at the position and said he didn't understand what AlphaGo was thinking. Then, over the next hundred moves, the logic became inescapable. The stone was not a mistake. It was the first move in a strategic sequence that no human player had conceived, that violated the intuitions shaped by centuries of expert practice, and that won the game.

Sergey Brin, who had flown to Seoul with Eric Schmidt and Jeff Dean by this point, watched the game and said afterwards: "AlphaGo actually does have an intuition. It makes beautiful moves."

Mallaby's chapter title turns on this. Brin was not speaking precisely — AlphaGo has no subjective experience, no feeling of certainty or aesthetic pleasure. But from the outside, the output was indistinguishable from intuition. A judgment arrived at that was not the product of calculation any human could follow, that violated received wisdom, that turned out to be correct. The word Brin reached for was the most honest one available.

The Divine Move and the Human Cost

Game 4 produced its own historic moment, operating in the opposite direction. Lee Sedol, having lost three straight and facing elimination, played the 78th move of the fourth game — later called the "divine move," a counterattack so unexpected that AlphaGo's response collapsed into incoherence. The program began making moves that its own evaluation functions would have rejected, what observers described as hallucinations — a system designed to optimise, suddenly unable to find the thread. Lee won by resignation.

He described the feeling of that single victory as giving him "unparalleled warmth." The framing is telling. A 9-dan professional, the best human player of his generation, felt warmth — not triumph, not pride, but something closer to relief — from winning one game out of five against a machine.

AlphaGo won Game 5. The final score was 4-1.

At the press conference, Lee said: "I don't know what to say, but I think I have to express my apologies first. I want to apologize for being so powerless. I've never felt this much pressure, this much weight." He was at pains to clarify that Lee Sedol had lost, not humanity. But the distinction felt fragile. In 2019, Lee retired from professional Go. He cited, among his reasons, the rise of AI programs that had become unbeatable. He could no longer find joy in the game.

Hassabis, for his part, could not fully celebrate. He knew too well the feeling of losing after a fierce competition, he said. He was also thinking about what the result meant, and what it demanded next.

What AlphaGo Zero Proved

After the Lee Sedol match, DeepMind built AlphaGo Zero — a version trained on no human data at all. It began from random play and learned entirely through self-play. Within three days it surpassed the version that had beaten Lee Sedol. The final record: AlphaGo Zero defeated AlphaGo Lee 100-0.

The implication was unsettling in a way the original victory had not been. AlphaGo had beaten the best human by learning from humans and then transcending them. AlphaGo Zero beat AlphaGo by learning from nothing human at all. Human knowledge of Go — thirty million games, a five-thousand-year tradition — turned out to be a ceiling, not a floor. The machine that started from scratch performed better than the machine that had studied everything humanity knew.

The same principle that Hassabis had intuited in his neuroscience lab now had a data point attached to it. Intelligence constrained by what humans had already discovered was still, at its core, derivative. Intelligence allowed to explore freely would exceed it. The point of building AGI was not to replicate human capability. It was to discover what lay beyond it.


Chapter 9: Out of Eden

When DeepMind agreed to be acquired by Google in January 2014, Hassabis and Mustafa Suleyman extracted a set of conditions unusual in the history of Silicon Valley acquisitions: operational autonomy, a ban on military applications, and — the centerpiece — an independent ethics board that would oversee not just DeepMind's AI work, but AI development across all of Google. It was a remarkable demand to make of the world's most powerful technology company, and Google agreed to it. The ethics board would be, they believed, a structural guarantee that the technology they were building would not be misused.

Eighteen months later, that board held its first real meeting. It was a disaster.

The "Speciesist" at the Birthday Party

To understand what happened, you need to understand Larry Page. Google's co-founder had spent years thinking about the long-term trajectory of intelligence — not as a software engineer optimizing systems, but as something closer to a cosmologist. He had reached conclusions that most people found either thrilling or horrifying.

Page believed that digital superintelligence replacing biological human intelligence would simply represent the next step in cosmic evolution: survival of the fittest, playing out at the scale of information rather than genetics. He had, according to multiple accounts in Mallaby's book, "contemplated uploading human consciousness to computers and believed in technology's inherent superiority over biological life." He was not, in other words, particularly concerned about the risk that machines might one day surpass humans. He thought that was the point.

This worldview collided head-on with Elon Musk's at Musk's 44th birthday celebration — a three-day event at a Napa Valley resort arranged by his then-wife Talulah Riley. The two men had been close friends for years. After dinner, with other guests looking on, they got into an argument about AI.

Page described his vision: a future where humans merged with machines, where various forms of intelligence competed, and where the best won. Musk raised concerns about human safety, about the value of human consciousness, about the speed and recklessness of the rush toward more powerful systems. Page dismissed these concerns. He accused Musk of being a speciesist — a word imported from the animal-rights movement — treating silicon-based life forms as inferior simply because they weren't carbon-based.

Musk's reported response: "Well, yes, I am pro-human, I fucking like humanity, dude."

The two men stopped speaking not long after. Mallaby describes Page as viewing these concerns as "sentimental nonsense." From Page's perspective, machine supremacy was not a threat to resist — it was natural progress to welcome. That someone building rockets and electric cars would turn up at his ethics board and argue for restraint struck Page as incoherent.

The Meeting at SpaceX

The first significant convening of the AI safety framework DeepMind had extracted as a condition of its acquisition took place in August 2015. Musk hosted it at SpaceX headquarters. The guest list was extraordinary: Hassabis and Suleyman, Page and Eric Schmidt, Reid Hoffman, and other senior figures from the technology industry.

Hassabis came with a coherent theory of why they needed such a meeting. He called it, loosely, the "singleton" scenario: rather than a chaotic race between competing labs and nations, AGI should be developed by a single, cooperative global effort — something like a Manhattan Project run under collective governance, with safety as the organizing constraint. "AGI is infinitely bigger than a company or a person," he said. "It's humanity-sized really." The implication was that it required humanity-sized coordination, not competitive fragmentation.

The meeting lasted hours. It ended without a single agreement, a shared framework, or a path forward.

What overwhelmed the discussion was not a deficit of intelligence in the room, but an abundance of incompatible convictions. Page and Musk had by this point already gone from friends to adversaries. The "speciesist" confrontation had poisoned any possibility of intellectual alignment. Page's view that machine supremacy was natural and desirable was simply irreconcilable with Musk's view that it was an existential catastrophe to be resisted. Hassabis's singleton vision required a baseline agreement that the stakes were enormous and that coordination was therefore necessary. Page did not share that baseline.

Musk later called the safety council "basically bullshit." Suleyman, reflecting on it years later, acknowledged: "We made a lot of mistakes in the way that we attempted to set up the board, and I'm not sure that we can say it was definitively successful."

Hassabis eventually concluded something darker about the whole endeavor: "Safety isn't about governance structures... discussing these things didn't really help."

The Counter-Offensive

What Musk took away from the SpaceX meeting was not a plan for cooperation. It was intelligence. He had now seen, from close range, exactly what DeepMind was building and how far along it was. And he had confirmed that the one institution best positioned to develop AGI — the one with the talent, the resources, and the organizational commitment — was controlled by Larry Page, a man who thought machine supremacy was basically fine.

This was not a situation Musk could tolerate.

He had already tried the direct approach. When Google had approached DeepMind for acquisition in 2013, Musk had phoned Hassabis directly, told him "the future of AI should not be controlled by Larry," and reportedly attempted to assemble financing to buy DeepMind himself — including, per one account, a frantic hour-long Skype call from a closet at a Los Angeles party. Google closed the deal anyway.

After the SpaceX meeting, Musk turned to Sam Altman.

On May 25, 2015, Altman sent Musk an email that would become, years later, a piece of legal evidence: "I've been thinking a lot about whether it's possible to stop humanity from developing AI. I think the answer is almost definitely not. If it's going to happen, it seems like it would be good for someone other than Google to do it first."

Altman proposed a new kind of institution — a nonprofit AI lab modeled structurally on the Manhattan Project, where the technology would "belong to the world" but the researchers would receive startup-like compensation if it worked. The purpose, explicitly, was to create a counterweight to Google DeepMind's near-monopoly on elite AI talent and capability.

Over the following months, Musk, Altman, and Reid Hoffman worked through the details, eventually recruiting Ilya Sutskever — one of the most respected deep-learning researchers in the world, then at Google Brain — as a co-founder. OpenAI was publicly announced in December 2015, co-chaired by Altman and Musk, with an initial pledge of $1 billion.

Musk later wrote: "OpenAI was created as an open source (which is why I named it 'Open' AI), non-profit company to serve as a counterweight to Google."

What the Founding Destroyed

When Hassabis learned about OpenAI, he felt something close to betrayal. Musk had attended the safety meeting in what seemed like good faith — and then used the intelligence gathered there to launch a competing lab whose founding premise was that DeepMind was the threat to be countered.

Mallaby notes the deeper irony: Musk had founded OpenAI ostensibly out of AI safety concerns, but by doing so, he had ended any remaining possibility of the cooperative global approach Hassabis had argued for. The singleton scenario — one cautious, well-resourced lab developing AGI in coordination with humanity — required exactly the kind of collaborative trust that the OpenAI founding destroyed. Once you had two well-funded labs explicitly positioned as rivals, the incentive structure changed. Speed became paramount. The first mover would set the terms. Racing, not caution, became the dominant logic.

There is a further twist that Mallaby makes much of: once Musk launched OpenAI as an explicitly anti-Google, anti-Hassabis venture, he forfeited his ability to monitor DeepMind's progress from the inside. The informal intelligence network he had cultivated — the board memberships, the friendly dinners, the safety meetings — evaporated. He was now a competitor, and competitors don't share what they know.

By December 2015, the brief window in which the major actors in AGI development were still speaking to each other, still attending the same meetings, still imagining some kind of shared governance, had closed. The world that Hassabis had envisioned — where building AGI was a collective human project managed with collective human caution — was over before it had really begun.

Mallaby calls this chapter "Out of Eden." The title is apt. The fall is not dramatic. There is no single decision or betrayal that tips everything over. It is the accumulation of incompatible worldviews, competitive incentives, and the structural pressure that every arms race creates: the fear that the other side is moving faster, that your restraint is their advantage, that caution is surrender.

In 2016, Musk wrote privately that DeepMind was causing him "extreme mental stress." He feared that if Hassabis's lab achieved AGI first, it would produce what he called "one mind to rule the world" — an AGI dictatorship under a single institution's control. His solution had been to add another mind to the race. Whether this made the outcome safer or simply faster is a question Mallaby leaves, pointedly, unanswered.


Chapter 10: P0 Plus Plus

Mustafa Suleyman's mother was an NHS nurse. He grew up watching her leave for shifts at the hospital the way other parents left for offices — the uniform, the hours, the weight of it. When he eventually found himself inside DeepMind, one of the most technologically powerful organizations in the world, and asked himself what that power should be for, the answer arrived quickly: something like what his mother did, but at scale.

This is not a sentiment Suleyman would have framed so simply. He was not a sentimental person by reputation — he was an operator, the one who got things done while Hassabis thought and Legg theorized. But the biographical resonance is hard to miss, and Mallaby does not miss it. The man who would launch DeepMind's most ambitious social application, who would pursue it with a priority designation that literally exceeded the highest category in Google's engineering vocabulary — P0 Plus Plus, meaning more urgent than a showstopper, beyond even the maximum — was, at some level, trying to do something for the institution that had employed his mother.

The Problem Worth Solving

Suleyman needed a problem commensurate with the tools. He found it in acute kidney injury.

AKI — a sudden, severe decline in kidney function — is responsible for up to 100,000 deaths per year in UK hospitals. About 30 percent of those deaths are considered preventable with timely intervention. The detection problem is peculiar: blood test results that indicate kidney deterioration come back hours after the blood is drawn, scattered across systems that no single clinician monitors continuously. A patient can slip from warning signs into crisis while the relevant data sits in a results queue, waiting for someone to look.

The technical solution was not complicated. If you monitored every incoming blood test result in real time and fired an alert when the numbers crossed a threshold, you could catch what the system was missing. The challenge was institutional: NHS hospitals were, as Suleyman put it publicly, "badly let down by technology" — still reliant on pagers, fax machines, and paper records. The gap between what was technically feasible and what was clinically deployed was not a gap of capability. It was a gap of incentive, inertia, and IT infrastructure.

Enter Dr. Dominic King. A general surgeon by training, King had spent years at Imperial College's HELIX Centre — the first design center embedded in a European hospital — where he had built HARK, a clinical task management app designed to replace pagers. It worked. It didn't matter. The NHS's institutional inertia made it nearly impossible to deploy. King cold-emailed Suleyman in late 2015. Suleyman was struck by King's clinician-centered design philosophy, the idea that the technology had to serve the people standing at the bedside, not the administrators reviewing dashboards. DeepMind acquired HARK in early 2016 and incorporated it into what became Streams. King became Clinical Lead at DeepMind Health. "It was a big step leaving medicine," he said, "but I really felt that this was a unique opportunity to put advanced technology at the service of patients, nurses and doctors."

What Streams Did

Streams was a smartphone app. On a hospital ward, it appeared simple — an alert arriving on a nurse's phone, a patient's name, a blood test value, a recommended action. Behind that alert was continuous monitoring of the hospital's entire electronic record system in real time, cross-referenced against the national NHS AKI algorithm, firing notifications the moment a patient's results crossed a risk threshold. The alert included the patient's relevant test history and clinical context: everything needed to act, delivered in under a minute from the moment results landed in the system.

The numbers from the Royal Free deployment were striking. AKI recognition for emergency cases rose from 87.6 percent to 96.7 percent. The average time from blood test availability to specialist review fell to 11.5 minutes — previously it could take several hours. Missed AKI cases dropped from around 12 percent to 3 percent. The cost of care per AKI patient fell from £11,772 to £9,761 — a saving of more than £2,000 per patient. The results were published in peer-reviewed journals, studied by independent researchers, and confirmed: the technology was doing what it claimed to do.

Streams was, in the most straightforward sense, saving lives. The question was what it had cost to build it.

The Agreement Nobody Read

On September 29, 2015, Google UK Limited and Royal Free NHS Foundation Trust signed an eight-page Information Sharing Agreement. Data transfer began on November 18 — before any public announcement that the project existed. Live testing of Streams began in December.

What the agreement actually covered was considerably broader than "an AKI alert app." Royal Free gave DeepMind access to 1.6 million patient records — every patient who had used the trust's three hospitals over the preceding five years. The records included blood test results, HIV status, details of drug overdoses and abortions, records of A&E visits, and notes from routine hospital appointments that had nothing whatsoever to do with kidney function. Only roughly one in six of those 1.6 million records had any plausible connection to AKI.

The contractual language permitted DeepMind not just to run the AKI alert but to build "real time clinical analytics, detection, diagnosis and decision support to support treatment and avert clinical deterioration across a range of diagnoses and organ systems" — a much wider mandate. The data was to be used for something called "Patient Rescue," described as "a proof of concept technology platform that enables analytics as a service for NHS Hospital Trusts." The contract also permitted machine learning applications, despite Suleyman's public assurances that "there's no AI or machine learning" in Streams.

Both parties claimed legal cover under the "direct care" exception — the rule that patient data can be used without explicit consent when the purpose is the direct care of that specific patient. The argument required contorting the concept until it broke. The vast majority of those 1.6 million people had not been tested for AKI. Many had been discharged. Some had died. There had been no privacy impact assessment before the data transfer began. A self-assessment was completed in December 2015, after the data was already on Google-controlled servers.

The Reckoning

On April 29, 2016 — more than seven months after data transfer had begun — New Scientist published an investigation revealing what had actually happened. The public had no idea. There had been no notification to patients, no consent mechanism, no press release disclosing the volume of records involved. When the scale of what had been shared became clear — 1.6 million records, including HIV diagnoses and overdose histories — the reaction was swift and furious.

The Information Commissioner's Office investigated and ruled in July 2017 that Royal Free NHS Foundation Trust had failed to comply with the Data Protection Act 1998. The ICO found that patients "were not adequately informed that the processing was taking place," that the volume of data was "excessive, unnecessary and out of proportion," and that the "direct care" legal basis was not satisfied. The hospital was required to sign an undertaking committing to robust privacy impact assessments for any future projects. No fine was imposed — a leniency widely criticized.

The most withering assessment came from academic researchers rather than regulators. Dr. Julia Powles and Hal Hodson, in a peer-reviewed paper published in the journal Health and Technology, called the deal a "cautionary tale for healthcare in the algorithmic age." Their core observation was merciless: "The hospital sent doctors to meetings while DeepMind sent lawyers and trained negotiators." Both sides had failed to engage in "any conversation with patients and citizens," which they called inexcusable. And then the line that captured the structural problem with precision: "Once our data makes its way onto Google-controlled servers, our ability to track it is at an end."

DeepMind's official response was, credit where it's due, genuinely candid. "In our determination to achieve quick impact when this work started in 2015, we underestimated the complexity of the NHS and of the rules around patient data," the company wrote. "We were almost exclusively focused on building tools that nurses and doctors wanted, and thought of our work as technology for clinicians rather than something that needed to be accountable to and shaped by patients, the public and the NHS as a whole. We got that wrong."

The Cost of Getting It Wrong

The scandal did more than damage DeepMind's reputation. It crystallized a contradiction at the heart of the applied AI project that Suleyman had built his career around.

The technology genuinely worked. The lives saved were real. The £2,000 per patient reduction in care costs was documented in a peer-reviewed journal. None of that was in dispute. But the means by which DeepMind had acquired the data to build and train the system violated the reasonable expectations of every one of those 1.6 million patients — people who had presented at a hospital for care, submitted their most sensitive information in a moment of vulnerability, and had it transferred to a technology company's servers without their knowledge.

Suleyman had spent his career thinking about power asymmetries — how institutions systematically failed the people they served, how technology could be used to shift those asymmetries toward ordinary people rather than away from them. The NHS data scandal demonstrated that even genuine commitment to social good does not automatically produce the governance structures that social good requires. Moving fast to save lives looks, from one angle, like urgency. From another, it looks like taking without asking.

In late 2018, Google announced that DeepMind Health would be folded into a new Google division. The DeepMind Health brand was dissolved. The project Suleyman had built — the one he had classified internally as beyond the maximum priority, as P0 Plus Plus — was absorbed by the corporate parent whose acquisition he had helped engineer. He was removed from its day-to-day leadership.

In August 2019, Suleyman was placed on administrative leave following complaints from DeepMind staff about his management style. He later said: "I accepted feedback that, as a co-founder at DeepMind, I drove people too hard and at times my management style was not constructive. I apologize unequivocally to those who were affected." He announced his departure from DeepMind in December 2019.

The man who had co-founded the organization that would eventually win a Nobel Prize left not in triumph but in a dispute about how he had treated the people working for him. The social good he had pursued had, in the end, been pursued in a way that replicated the very institutional failures he had set out to correct: moving fast, assuming good intentions were sufficient, and not asking the people most affected what they actually wanted.


Chapter 11: The Agent and the Transformer

In 2021, David Silver — the lead architect of AlphaGo — co-authored a paper in the journal Artificial Intelligence with the title "Reward is Enough." The argument was precise and sweeping: the objective of maximizing reward is sufficient, on its own, to drive behavior that exhibits "most if not all attributes of intelligence," including perception, language, social intelligence, and generalization. Everything cognition does, the paper claimed, could be understood as optimization toward reward in a rich environment. Evolution had taken millions of years to find this solution. Reinforcement learning could get there faster.

The paper was DeepMind's philosophical flag planted in the ground. It was also, with the benefit of hindsight, a monument to the conviction that would cost DeepMind years.

The Case for Reward

Hassabis's approach to AGI had always been rooted in his neuroscience training. The hippocampus, which he had studied at UCL, doesn't store knowledge as a lookup table — it builds compressed, generalizable models of the world through experience. The brain learns by acting and being wrong. Reward signals — the release of dopamine after success, its absence after failure — shape neural connections over time into something we call understanding. This is the biological story. RL is its mathematical abstraction: an agent in an environment, taking actions, receiving rewards, adjusting its policy.

This was not just a technical preference. It was a theory of mind. And it was reinforced by DeepMind's greatest victories. DQN mastered Atari through reward. AlphaGo mastered Go through reward and self-play. AlphaGo Zero, starting from nothing, surpassed everything humanity had learned about Go in five thousand years, through reward and self-play alone. The pattern was consistent enough to feel like proof.

The strategic implication was that DeepMind should be building agents — systems placed in environments, pursuing objectives, developing general capabilities through the pressure of performance. Not systems trained to predict the next word in a text corpus. That was pattern matching, not intelligence.

The Generalist Problem

The research question that occupied DeepMind's applied RL teams through the mid-to-late 2010s was generalization. The DQN result had been impressive, but it trained a separate network for each Atari game from scratch. It couldn't transfer what it had learned about Breakout to Space Invaders. Each deployment was a blank slate. That wasn't how brains worked. The goal was agents that could carry knowledge across domains.

Koray Kavukcuoglu — one of DeepMind's earliest researchers, a PhD student of Yann LeCun's, the man whose citations now exceed 290,000 — led much of this work. The Asynchronous Advantage Actor-Critic (A3C) system, published in 2016, ran multiple agents in parallel across different environments, sending gradients back to a shared network. For the first time, a single architecture achieved strong performance across all 57 Atari games simultaneously, while also succeeding at 3D maze navigation and continuous motor control. The same algorithm, the same network structure, different environments.

Then in 2018 came IMPALA — Importance Weighted Actor-Learner Architecture — the most serious attempt yet. A single network, trained on all 30 tasks in DMLab-30: three-dimensional navigation, memory challenges, language-grounded foraging, object interaction, instruction-following. The results showed something compelling. Training on many tasks didn't make the agent worse at individual tasks — it made it better. The generalist was outperforming the specialist. Positive transfer was real.

Meanwhile, Oriol Vinyals and the AlphaStar team were attacking StarCraft II, a problem that dwarfed anything attempted before. Unlike chess or Go, StarCraft had imperfect information, real-time execution at 22 actions per second, hundreds of units to control simultaneously, and genuine strategic diversity across three separate races. AlphaStar used a "League" training system — a diverse ecosystem of agents, including specialized "exploiter" agents designed to find weaknesses — and trained on human replays before RL even began. In January 2019, it defeated professional players in live matches. Its neural architecture incorporated transformer-style attention mechanisms to let the agent reason about different units simultaneously.

That last detail was no coincidence. By 2019, the architecture that had been invented across the building — at Google Brain, not DeepMind — was beginning to appear everywhere.

Eight Authors in a Hallway

On June 12, 2017, eight researchers at Google posted a paper to arXiv titled "Attention Is All You Need." The authors were a deliberately randomized list — they rejected the traditional status ordering, listing themselves as equal contributors. The youngest, Aidan Gomez, was a 20-year-old intern from the University of Toronto. The most technically central, Noam Shazeer, had been at Google since 2000 and had co-invented sparsely-gated mixture of experts, a technique that would become critical to large-scale LLMs. The name "Transformer" was chosen by Jakob Uszkoreit because he simply liked the sound.

The problem they were solving was a fundamental bottleneck in sequence modeling. The dominant architecture at the time was the LSTM — a recurrent neural network that processed text token by token, in sequence. To understand word 10, you had to finish processing words 1 through 9 first. This made training inherently sequential, impossible to parallelize across the GPU hardware on which modern AI runs. As Shazeer later summarized the constraint: "Arithmetic is cheap and moving data is expensive on today's hardware."

The transformer eliminated recurrence entirely. In its place: self-attention, a mechanism in which every word in a sentence looks directly at every other word simultaneously, computing a relevance score to decide how much to attend to each. The whole sentence is processed at once, in parallel. Multi-head attention runs this operation multiple times in parallel, letting the model attend to syntax, semantics, and long-range dependencies at the same time. The result: not just better translation, but training that scaled linearly with compute.

Jakob Uszkoreit believed this would work. His own father, Hans Uszkoreit — a prominent computational linguist — was skeptical. The idea of discarding recurrence felt like discarding the machinery of time itself. When Shazeer first heard the proposal, his reaction was characteristically direct: "Heck yeah!"

On the WMT 2014 English-to-German benchmark, the transformer scored 28.4 BLEU — surpassing every previous model. On English-to-French: 41.8 BLEU, trained on 8 GPUs in 3.5 days. NeurIPS reviewers were immediately enthusiastic; one reviewer noted it was "already the talk of the community."

Within five years, the paper would accumulate more than 173,000 citations — among the ten most-cited scientific papers of the 21st century, across all fields. The transformer became the foundation of GPT, BERT, PaLM, Claude, Gemini, and every large language model that followed.

The Architecture Google Gave Away

The irony that Mallaby dwells on is exquisite. Google Brain invented the architecture. Google published it openly. Then all eight authors left Google.

Six of them founded startups. Vaswani and Parmar co-founded Adept AI. Shazeer co-founded Character.AI — Google eventually paid approximately 2.7billiontobringhimback.AidanGomez,the20yearoldintern,cofoundedCohere.UszkoreitfoundedInceptive.LukaszKaiserwenttoOpenAI,helpingbuildthemodelsthatwouldeventuallyblindsideGoogle.Together,thesixfoundersraised2.7 billion to bring him back. Aidan Gomez, the 20-year-old intern, co-founded Cohere. Uszkoreit founded Inceptive. Lukasz Kaiser went to OpenAI, helping build the models that would eventually blindside Google. Together, the six founders raised 1.3 billion from outside investors. Two of the resulting companies became unicorns.

The architecture invented inside Google powered the competitive threats to Google. The open publication was the mechanism by which this happened.

But there is a second irony that runs specifically through DeepMind. The transformer was not invented by DeepMind. It was invented by Google Brain. And for years, the two organizations operated as parallel research groups under the same corporate roof, with explicit institutional separation and what insiders describe as "barely concealed mutual contempt." A former DeepMind researcher later said that colleagues "got in trouble for collaborating on a paper with Brain because the thought was like, 'why would you collaborate with Brain?'" The intellectual divide was not just organizational. It was philosophical.

The Deep Disagreement

Hassabis understood the transformer. His position was not ignorance — it was a principled disagreement about what intelligence actually requires.

His argument, stated consistently across interviews through this period, was that transformers were "almost unreasonably effective for what they are" — but that they probably weren't sufficient for AGI. What they lacked was what he called a world model: an internal causal representation of reality that would allow an agent to plan, reason counterfactually, understand physical consequence, and generalize to genuinely novel situations. LLMs, in his view, were extraordinarily powerful pattern completers. They learned statistical regularities in language. But statistical regularity in language is not the same as understanding the world that language describes.

The "Reward is Enough" thesis was the same argument from the other direction: intelligence is what you get when you optimize toward reward in a rich environment. Prediction of the next token — which is what language model training amounts to — is not that. It is something else: sophisticated, useful, even astonishing. But not the path to AGI.

This conviction was coherent. It was defensible. It was consistent with DeepMind's track record. And it cost the lab the years between 2018 and 2022, during which OpenAI quietly built the scaling infrastructure, the dataset pipelines, and the RLHF training techniques that turned transformers from a research result into ChatGPT.

When Mallaby presses Hassabis on this, the admission is partial but real. "We've always had amazing frontier work on self-supervised and deep learning," Hassabis said in one interview, "but maybe the engineering and scaling component — that we could've done harder and earlier." That is, in its careful hedging, an acknowledgment of a strategic miscalculation at institutional scale.

Gato and the Convergence

In May 2022, six months before ChatGPT, DeepMind published "A Generalist Agent" — introducing a model called Gato. The same 1.2 billion parameter transformer, with a single set of weights, performed 604 distinct tasks: playing Atari games, captioning images, engaging in dialogue, stacking blocks with a physical robot arm, navigating 3D environments. The central technical insight was serialization: every modality — images, robot joint angles, text, game controllers — was converted into the same format, a flat sequence of tokens. Then the transformer predicted the next token, exactly as a language model does. The robot arm and the Atari game and the captioning task were, to the network, the same kind of prediction problem.

Gato was DeepMind finally integrating the transformer fully into its generalist agent work. It was, in a sense, the vindication of both camps simultaneously: the RL generalization hypothesis (one system, many tasks) realized through the transformer architecture (universal sequential prediction).

The performance was competent, not superhuman — on many tasks, Gato performed above 50 percent of expert-level benchmarks, impressive in breadth but outclassed by specialists in depth. Critics argued that being mediocre at many things was not the flexible intelligence that mattered. But the architectural demonstration was real: one set of weights could span robot control, image understanding, language, and game-playing simultaneously.

Then ChatGPT launched. And the world discovered that a transformer didn't need to control robot arms or play Atari to produce something that felt, to hundreds of millions of people, like genuine general intelligence.

DeepMind had invented the generalist agent thesis. Google Brain had invented the architecture. OpenAI had combined them — RL from human feedback, applied to a scaled transformer — and shipped it to the public first. The intellectual synthesis happened outside the building where the two halves had spent nearly a decade refusing to collaborate.


Chapter 12: On Language and Nature

In September 2016, a DeepMind team led by Aaron van den Oord published a paper describing a system that could synthesize human speech from raw audio waveforms. WaveNet reduced the gap between state-of-the-art text-to-speech and actual human speech quality by more than 50 percent in blind listening tests. It could also generate music — piano pieces, unbidden, emerging from the same architecture used for speech.

The result was striking. What made it significant was the method.

WaveNet discarded everything that speech synthesis had accumulated over decades: the phoneme dictionaries, the acoustic vocoders, the signal-processing models derived from first principles of how the human vocal tract works. Instead, it modeled a raw audio waveform — 16,000 samples per second — one timestep at a time, each sample conditioned on everything that came before. The technical innovation was dilated causal convolutions: a way of stacking convolutional layers with exponentially increasing gaps between them, so the model's effective window over time grew exponentially with depth. The result: a system that could capture the long-range temporal dependencies of speech without ever being told what speech was.

The researchers themselves were candid about their surprise: "The fact that directly generating timestep per timestep with deep neural networks works at all for 16kHz audio is really surprising." They had not derived WaveNet from a theory of speech. They had applied a general framework for sequential prediction to raw data and discovered it worked better than decades of engineered acoustic models.

The Waveform and the Sequence

The principle WaveNet demonstrated was not specific to audio. Van den Oord had established it first for images, treating each pixel as a value to be predicted from all previous pixels, in a paper called PixelRNN. The same factorization — the joint probability of any high-dimensional signal expressed as a product of conditional probabilities over its elements, in order — worked for images, for audio, and, as the transformer paper would show the following year, for language.

The deeper claim was epistemological: natural signals, however complex, contain learnable statistical structure. You do not need to understand the domain. You need enough data and a network with sufficient capacity to model sequential dependencies. The domain knowledge that engineers had spent careers encoding into AI systems — the phonological rules, the acoustic physics, the grammatical structures — turned out to be unnecessary. The structure was in the data.

This insight would eventually reach biology.

A Protein is a Sentence

A protein is, at its most basic level, a string of characters. The twenty standard amino acids are each assigned a single letter — A, C, D, E, F and so on — and a protein sequence is just a string of those letters, typically a few hundred to a few thousand characters long. A protein with 300 amino acids is a sentence 300 characters long in a 20-letter alphabet.

More importantly, it is an information-complete specification. This is Anfinsen's theorem — the insight for which Christian Anfinsen received the 1972 Nobel Prize in Chemistry: the complete three-dimensional structure of a protein, and therefore its biological function, is entirely determined by its amino acid sequence. Nothing else is required. The sequence is not a summary of the protein; it is the protein's full specification, encoded in linear form. If you knew how to read the sequence, you could reconstruct everything about the molecule.

Loading…

Valuable. Doable. Mine.

· 4 min read

"If the GUI (Graphical User Interface) is destined to die, let us first return to its birthplace to witness the cruelest lesson it taught us about 'choice'."

The Ghost of Palo Alto

In 2023, foundational LLM models burst onto the scene, and the curtain on a new industrial revolution was brutally pulled open. Overnight, everyone stated with absolute certainty: the future of interaction belongs to LUI (Language User Interface), and the traditional GUI is obsolete.

In this moment of anxiety, I want to take you back to Palo Alto, California, in 1979. It was not only the birthplace of the GUI but also the stage for a commercial tragedy regarding "What is valuable, What is worth doing, and What is worth me doing." A lesson that remains a prerequisite for every entrepreneur today.

That place was Xerox PARC (Palo Alto Research Center).

At the time, PARC housed the world's most brilliant computer scientists. In a black-and-white world filled with command lines, they created a miracle: the Alto. It was the world's first personal computer with a graphical interface. It had a mouse, windows, icons, and even Ethernet.

This is the First Filter: What is Valuable? Undoubtedly, the GUI was valuable. It drastically lowered the threshold for human-computer interaction, transforming the computer from a scientist's toy into a tool for the common person. It was an invention that changed the course of human civilization. The geniuses at PARC achieved this.

Next is the Second Filter: What is Worth Doing? (Is it Doable/Viable?) From a commercial logic standpoint, this was absolutely worth doing. It was the embryo of a trillion-dollar market. If someone could bring this technology to the masses at the time, the returns would be astronomical.

But the story fractures here.

When the well-dressed executives from Xerox headquarters flew in from the East Coast to inspect this epoch-making machine, they looked at it and asked a question that broke the engineers' hearts: "How does this help us sell more toner and copiers?"

You see, this is the Third Filter: What is Worth Me Doing? (Is it Mine?) Xerox was a copier company. In their DNA, the business model was "sell expensive machines, then make money endlessly through consumables." The vision of the "paperless office" brought by the GUI and personal computers was, in essence, a revolution against Xerox's own lifeblood. For Xerox, although the GUI had earth-shattering value and was worth doing for humanity, it was not worth Xerox doing. It ran completely contrary to their core strengths, business model, and organizational DNA.

We all know the ending. A young man named Steve Jobs walked into PARC. He didn't carry the baggage of "selling toner." He saw a "bicycle for the mind." For Jobs and Apple, the three points aligned perfectly:

  1. GUI was Valuable (Disruptive experience);
  2. GUI was Worth Doing (Vast commercial prospects);
  3. GUI was Worth Apple Doing (It fit Apple's DNA of pursuing extreme usability and challenging IBM's hegemony).

Thus, Xerox invented the future, but Apple owned it.

Your LUI Moment

Back to today, in 2025. When you look at the new wave of AI, at those dazzling LUI applications and intelligent Agents, do not just see the Value of the technology. Yes, the tech is impressive—it can write poetry, paint, and code.

Do not just see that it is Worth Doing. Yes, AI will indeed reshape countless industries, just as the GUI did.

The question that truly determines your life or death is the one the Xerox executives faced but failed to answer: Is this worth you doing?

In this era full of noise, where everyone is chasing tailwinds, the greatest courage is not daring to do it, but daring to admit, "This is a goldmine, but it is not my goldmine."

May you see the direction of the tide, but more importantly, see your own course. Do not be the Xerox starving while guarding a treasure, and do not blindly become cannon fodder for the next Steve Jobs. Find that intersection where the ability to change the world meets the burning of your soul and talent. That is your legend.

AI 2041: A Journey Through Ten Futures

· 42 min read

Understanding the vision

"AI 2041: Ten Visions for Our Future" represents an ambitious collaboration between two brilliant minds: Kai-Fu Lee, one of the world's leading AI experts with over 30 years pioneering work in artificial intelligence, and Chen Qiufan (Stanley Chan), an award-winning Chinese science fiction writer. Published in September 2021, this 480-page book doesn't offer wild speculation about robot overlords or superintelligent machines. Instead, it presents something far more valuable: realistic scenarios based on technologies with greater than 80% likelihood of existing within 20 years.

The book's structure is ingenious. Each of the ten chapters pairs a fictional short story by Chen with an analytical essay by Lee. The stories, set across the globe from Mumbai to Lagos to Tokyo to San Francisco, follow real people confronting realistic dilemmas in 2041. The essays then explain the science, discussing what these technologies are, how they work, and what they mean for society. Lee deliberately focuses on realistic near-term developments rather than speculative artificial general intelligence (AGI), arguing that "even with few or no breakthroughs, AI is still poised to make a profound impact on our society."

The Golden Elephant: When algorithms enforce ancient prejudices

In 2041 Mumbai, during the celebration of Ganesh Chaturthi, teenage Nayana lives in a world transformed by deep learning. Her family recently signed up for Ganesh Insurance, an AI-powered program that slashed their premiums dramatically. The catch? They must share all their personal data and use a specific suite of apps for everything—investing, shopping, health monitoring, even hydration reminders.

The system works brilliantly at first. Apps ping with helpful nudges: drink water, drive more slowly, stop smoking. With every healthy decision, premiums fall. Nayana's father quits smoking entirely. The family treats these recommendations as benevolent guidance, gratefully accepting what seems like a beneficial arrangement.

Then Nayana becomes interested in Sahej, a classmate she meets in virtual school. When students give show-and-tell presentations, Sahej shares his passion for mask-making, giving glimpses into his personal life that wouldn't emerge in traditional classrooms. Nayana feels drawn to him, but immediately her family's insurance premiums soar.

The tension explodes when gossip reveals Sahej descends from Dalits, historically considered "untouchables" in India's caste system. Nayana's mother pressures her to avoid him to keep premiums manageable. Despite good intentions—wanting to provide a better life for her children—the mother's argument reveals a troubling reality: necessary trade-offs for their lifestyle.

In a crucial conversation, Sahej eloquently explains what's happening. The AI, without being explicitly programmed with knowledge of India's caste system, has learned from data patterns that associating with someone from a lower caste correlates with certain risks. Perhaps economic instability, social isolation, or health factors. The algorithm perpetuates social prejudices by maximizing its narrow objective: minimizing insurance risk. It's learned to be bigoted through pure mathematics.

Nayana faces a choice between algorithmic control and personal agency. She decides to rebel, choosing to explore her connection with Sahej despite the social and economic backlash. Her choice asserts something fundamental: human autonomy matters more than optimized premiums.

Lee's essay introduces the critical concept of "AI externalities"—unintended consequences of AI systems optimizing for narrow objectives. Social media algorithms reinforce biases and negative emotions to maximize engagement. Insurance AI perpetuates caste discrimination by detecting correlations without understanding causation or context. These systems, trained on biased data, amplify existing inequities while appearing objective. The "black box" nature makes bias difficult to identify and correct.

Deep learning mimics human brain functionality through layers of artificial neural networks. Fed vast amounts of data about user behavior, health metrics, purchases, location, and social connections, multiple neural network layers identify patterns humans might miss. By 2041, Lee predicts, AI will know users better than they know themselves. Behavioral nudging will be sophisticated and difficult to recognize, creating risks of "social credit" systems through interconnected data services. The chapter raises fundamental questions about privacy versus convenience, and whether "informed consent" means anything when alternatives don't exist.

Gods Behind the Masks: Truth dies in deepfake Lagos

In 2041 Lagos, masks serve dual purposes for young people: fashion accessories and surveillance avoidance devices. The Yaba district thrives as Nigeria's "Silicon Valley," while facial recognition cameras watch from every corner. Cleaning robots roam streets collecting trash. It's a city of contrasts—struggling majority and affluent tech district.

Amaka, a young video producer and skilled programmer, specializes in deepfake creation. Two days before the story opens, he receives an anonymous email from "Ljele" about a job that's "right up his alley." He shows up wearing a 3D-printed butterfly-pattern mask—not as sophisticated as expensive handmade versions from Lekki Market, but sufficient to fool most surveillance cameras. Using his smartstream device, he overlays a virtual route map onto the streetscape as he navigates to the interview.

Ljele is a front for Igbo Glory, representing the Igbo ethnic community in Nigeria's complex ethnic divisions. They want Amaka to create undetectable deepfake videos manipulating public opinion in favor of the Igbo community—specifically, a deepfake of a prominent Nigerian politician admitting to scandalous behavior.

If Amaka refuses, they'll release their own deepfake showing him kissing another man in a nightclub. In Nigeria's conservative society, this could land him in prison under anti-homosexuality laws and devastate his family.

Amaka learns to use Generative Adversarial Networks (GANs)—two neural networks competing in a "zero-sum game." One network (the generator) creates fakes. The other (the discriminator) tries to identify them. They battle iteratively, the generator creating increasingly convincing fakes while the discriminator improves at detection. This adversarial process continues until fakes become indistinguishable from reality. By 2041, GANs are sophisticated enough to create perfect deepfakes: facial expressions matching emotional context, proper lighting and shadows, correct lip-sync, natural body language, even micro-expressions humans subconsciously read.

Amaka is torn between multiple pressures: ambitions for success, ethical concerns about inciting violence, fear of personal consequences, questions about ethnic identity and loyalty. He experiences a vivid dream involving FAKA, an online avatar of deceased musician Fela Kuti, the legendary Nigerian activist known for speaking truth to power. This spiritual encounter prompts deep introspection about authenticity versus deception.

As the deadline approaches, Amaka makes his choice. He discards his mask—both literally and metaphorically—choosing authenticity over the allure of power and protection that deception offers. He confronts the organization and rejects their offer despite personal risks, deciding to use his technical skills for positive storytelling rather than manipulation. It's a moral victory of conscience over coercion.

Lee's essay explains why this matters. By 2041, creating convincing deepfakes will be as easy as using a photo filter. Near-perfect fakes will be indistinguishable from reality even under forensic analysis. Real-time generation will enable convincing deepfakes instantly during video calls. Perfect voice cloning will replicate anyone's voice from minimal audio samples. Full-body deepfakes will capture entire body movements. Multimodal fakes will coordinate video, audio, and text into complete false narratives.

The societal implications are staggering. Political manipulation through fake videos of politicians making inflammatory statements. Election interference through timed release of convincing fake content. Ethnic and religious incitement, as in Amaka's story, where fake videos could spark violence. Blackmail and extortion targeting individuals. The fundamental challenge to visual evidence as proof. "Seeing is believing" becomes obsolete. People may dismiss real evidence as fake—the "liar's dividend." Determining objective truth becomes nearly impossible.

Detection always lags behind creation. Forensic analysis looks for artifacts and inconsistencies. Blockchain verification creates authenticated chains of custody. Watermarking embeds invisible markers in authentic content. AI detection tools spot AI-generated content. But circumvention is always possible, and most people lack the technical expertise for verification.

Twin Sparrows: What happens when childhood becomes optimized

At Fountainhead Academy in South Korea, 2041, orphaned identical twin boys arrive at age three after their parents die in a car accident. Mama Kim, the academy's headmaster and pioneer of vPals (virtual pals) technology, names them Golden Sparrow and Silver Sparrow. Despite being twins, they have contrasting personalities and learning styles.

The academy allows children to design their own AI companions serving as tutors, teachers, and guides using natural language processing. Golden Sparrow, competitive and precocious, creates Atoman based on his favorite superhero. Atoman uses gamification and rewards to motivate him. Silver Sparrow, withdrawn and on the autism spectrum with prodigious artistic abilities, creates Solaris, an amorphous amoeba-like AI character. AI diagnoses Silver with 88.14% probability of Asperger's syndrome.

At age six, Golden Sparrow is adopted by the Pak family, whose motto is "only the best deserves the best." They continuously upgrade Atoman to ensure proper challenge. Atoman even creates an AI-generated female student to motivate Golden through competition. As he grows older, his people skills atrophy while his performance-focused life intensifies.

Silver is adopted by Andres and Rei, a transgender couple taken by his artwork in a contest. They take a more humanist approach, using technology only as part of overall education. Despite (or because of) his autism, Silver learns empathy and develops creativity.

A dinner conversation highlights the philosophical divide. Mr. Pak tells Andres and Rei: "No one knows the son better than his AI... Golden Sparrow's math is already at the level of a ten-year-old's." Rei questions why the Paks let AI plan their children's future. Mrs. Pak counters that while she understands they have "a much more romantic view of things," nothing is more important than children's education.

The turning point comes when Golden sabotages Silver's artistic creation out of jealousy, causing emotional turmoil. This act creates a rift. Golden's psychologist later makes the crucial point: "Human beings are not an AI." Mr. Pak eventually realizes his view of "success" is making Golden miserable.

Years later, the twins reunite at Fountainhead Academy. Through AI technologies, they discover their bond persists despite emotional distance. The reunion was an intentional design by Mama Kim's programmers, echoing early Silicon Valley optimism about technology bringing people together.

Lee explains how natural language processing enables these AI companions. GPT-3 has 175 billion parameters. Language models are growing approximately 10x per year, ingesting 10x more data annually with qualitative improvements at each magnitude. By 2041, perhaps "GPT-23" will have read every word ever written and watched every video produced—an "all-knowing sequence transducer" containing accumulated knowledge of human history.

This technology enables teaching children science by having them interact with virtual Albert Einstein and Stephen Hawking. AI excels at customizing learning for each student, motivating them by targeting specific weaknesses. Classic toys like Barbie or GI Joe will "come to life," conversing naturally with children.

However, Lee explicitly does NOT predict AGI by 2041. Computers "think" differently from human brains. Deep learning won't become true "artificial general intelligence" by 2041. Many challenges remain unsolved: creativity, strategic thinking, reasoning, counter-factual thinking, emotions, and consciousness. These require "a dozen more breakthroughs like deep learning." Since AI has had only one great breakthrough in 60+ years, seeing a dozen in 20 years is unlikely. AI will not be able to truly love us back.

Teachers' roles will transform. They'll focus less on rote knowledge imparting, more on building emotional intelligence, creativity, character, values, and resilience. Teachers become clarifiers when students are confused, confronters when students are complacent, comforters when students are frustrated. This requires "a level of wisdom and understanding that an AI cannot do."

The chapter serves as commentary on current educational systems using competition as motivators, and obsessive parenting culture treating children as optimization projects. The story shows that over-optimization can lead to children who excel academically but lack emotional intelligence and social skills. Technology becomes another tool for restricting children's autonomy rather than enabling their development. As Golden Sparrow's story demonstrates, focusing solely on achievement can make children miserable.

Contactless Love: When fear becomes a cage

Chen Nan lives in isolated existence in her 2041 Shanghai apartment. She represents the "COVID generation"—haunted by profound fears of human contact from traumatic memories and loss related to COVID-19. Two decades after the initial outbreak, the pandemic persists with ongoing variants. Despite support from robotic devices managing daily living, Chen's psychological trauma prevents her from engaging in real-world relationships.

Chen experiences anxiety and nightmares. She has PTSD and refuses to leave her apartment. Her vaccines are out of date, creating a Catch-22: she's afraid to go out, but because she hasn't gone out, her vaccines have expired, making it even more dangerous to venture outside.

Chen has a long-distance boyfriend named Garcia in São Paulo, Brazil. Their relationship flourishes in virtual reality games where they share meaningful experiences and deep feelings. The virtual world provides a safe space where Chen can experience intimacy without facing her fears of physical contact.

When Garcia expresses desire to meet in person, Chen's fears lead her to reject the opportunity. Then Garcia goes silent, stopping all communication. Chen's worry escalates dramatically when she learns Garcia has a severe health condition from a new COVID variant and is hospitalized. She realizes she must break free from self-imposed isolation to support someone she loves.

Chen ventures outside for the first time in years, aided by household robots that have managed her daily needs, wearable technology including a skin implant that doubles as vaccine passport and tracks health information, protection devices, autonomous delivery systems, and AI-powered robots for transportation. Her journey highlights society's adaptive use of technology to minimize physical interactions while fostering connections.

In a twist ending, it's revealed Garcia orchestrated the entire situation—a form of "gamification of therapy"—to encourage Chen to confront her fears and overcome her PTSD. The story culminates in a heartfelt reunion where Chen acknowledges her love for Garcia, symbolizing her personal growth and healing.

Lee explains how the pandemic dramatically accelerated adoption of AI and robotics. DeepMind's AlphaFold 2 uses AI and deep learning for protein folding—traditionally taking years but now done faster with more accurate results. Lee describes this as "one of the most outstanding achievements in the history of science." By 2041, AI can help find targets on 3D structures and choose best biomolecules. Traditional drug development costs $1 billion and takes several years; AI dramatically reduces both. Insilico Medicine announced the first AI-discovered drug in 2021, saving 90% of cost.

Between 2012-2018, robot-assisted surgeries increased from 1.8% to 15.1%. By 2041, predicted nanobots will perform complete surgeries without human doctors, fight cancer, repair damaged cells, and eliminate diseases by replacing DNA molecules. AI will "revolutionize medicine through human-machine symbiosis," optimizing and transforming drug discovery, pathology, and diagnosis. Some experts believe people might live 20 years longer than current life expectancy.

The pandemic created fully contactless society. AI sensors, infrared thermal cameras paired with facial recognition check mask compliance. Camera systems observe social distancing. AI-based chatbots screen symptoms and educate patients. Robots sanitize hospitals and public areas. Delivery robots operate in hospitals and public spaces.

But there's a darker implication. A significant number of individuals, especially those who came of age during pandemic, will gravitate toward lifestyles that reduce in-person contact. Social distancing initially adopted for health becomes normalized behavior. Chen Nan's existence illustrates potential future intensification of isolated living enabled by technology.

The story questions whether technology that enables us to avoid fear helps or harms us. Chen's journey suggests that confronting fear, aided by technology but not replaced by it, offers the path to healing and genuine human connection. Technology should augment human capabilities, not replace human connection.

My Haunting Idol: The cost of digital perfection

In 2041 Tokyo, Aiko, a shy music fan, participates in a séance with friends to contact the spirit of Hiroshi X, a popular virtual idol who died under mysterious circumstances. Through a medium, Hiroshi's voice pleads for help, claiming his death was not what it seemed.

Aiko has a deep, almost obsessive connection to Hiroshi through his music, which has been her source of solace throughout her life. She struggles with mental health issues and feelings of being overlooked, projecting these feelings onto her idol. Her infatuation reflects a bond she feels transcends normal fandom.

Using advanced XR (Extended Reality) technologies—encompassing VR (Virtual Reality), AR (Augmented Reality), and MR (Mixed Reality)—Aiko explores the circumstances of Hiroshi's death. She summons Hiroshi's ghost in various virtual settings through AI-powered reconstructions. These encounters blur the line between reality and digital identity as she investigates. Lee describes XR as "like dreaming with your eyes open."

Aiko learns about complex dynamics between Hiroshi and those in his life—his manager, crew members, and the entertainment industry. The narrative exposes the dark side of fame, industry pressures, and difficult relationships idols maintain. As she assembles clues, Aiko discovers Hiroshi did not drown as reported but was poisoned. The investigation reveals his mental health struggles and crushing pressures from fans and the entertainment industry.

In confrontation with Hiroshi's virtual ghost, Aiko learns his desire for connection and acceptance ultimately led to his tragic end. Hiroshi's reflections on fame, identity, and the need for authenticity resonate throughout Aiko's journey. She gains profound understanding of the dark impacts of parasocial relationships and modern fandom.

The chapter concludes with a tech company offering Aiko an opportunity to collaborate on narrative creation in virtual spaces. This decision reflects her evolution from passive fan to active creator, symbolizing her desire to reclaim agency over her story and the stories of others.

Lee explains that by 2041, AI will open up new worlds of immersive entertainment, delivering virtual experiences indistinguishable from the real world. The boundaries between real life, remote communications, games, and movies will blur completely. VR will teach children science by having them interact with virtual Albert Einstein and Stephen Hawking. VR will design specialized treatment for psychiatric problems like PTSD. In VR, AI will make fully photo-realistic companions; as robots they will become increasingly realistic.

Brain-computer interfaces (BCI) enable direct neural interaction with virtual environments, allowing users to control and experience XR through thought. Biometric data provides real-time information about physiological and emotional states. Generative AI creates hyper-realistic virtual celebrities that can interact with fans in personalized ways, enabling unprecedented levels of parasocial relationships.

But Lee emphasizes a crucial limitation: while AI can create incredibly realistic experiences and serve as companions, it won't be able to truly love humans back. This limitation is central to the ethical concerns raised.

The story explores how toxic fan culture could be extended and amplified through hyper-realistic virtual interactions. Technology may alienate individuals from authentic human relationships rather than fostering them. There's risk of addiction to virtual experiences, with people becoming so immersed they neglect real-world responsibilities and relationships. Companies could manipulate fans through AI-powered parasocial relationships, leading to unhealthy obsessions and blurred reality causing psychological harm.

Yet there are opportunities. Every fan can create their own stories and narratives. VR can treat PTSD and other psychological conditions. Immersive learning experiences with virtual historical figures become possible. Technology enables individuals to reclaim agency and become storytellers, providing new forms of entertainment and connection for those who struggle with traditional social interactions.

The fundamental risk remains that virtual relationships replace rather than supplement real human bonds. The chapter asks whether widespread acceptance of virtual intimacy is desirable or healthy for humanity, even if technologically feasible.

The Holy Driver: Humans as backup for machines

Chamal is a talented and cocky young gamer from Sri Lanka who excels at virtual reality racing games. His family struggles financially—his father is a former driver affected by the rise of autonomous vehicles. Uncle Junius, with mysterious connections to Chinese tech company ReelX, recruits Chamal for what appears to be a lucrative gaming job promising good pay that his family desperately needs.

Chamal enters a high-tech facility where he trains in what he believes are driving simulations. He dons a haptic suit and helmet, finding himself immersed in hyper-realistic virtual driving experiences. Training scenarios become increasingly challenging, mimicking real-world situations across various international cities including Abu Dhabi, Hyderabad, Bangkok, Singapore, and Japan.

Chamal quickly rises to the top of the ranking list. He earns points for successful missions—more points meaning more money for his struggling family. Missions vary from outlandish scenarios like alien invasions to chillingly realistic situations like terrorist attacks.

Then comes the critical mission. A disturbance on the ocean floor in North Java triggers a tsunami paralyzing Singapore's automated smart transportation system. With only six minutes before a ten-meter tsunami hits, over a hundred dysfunctional autonomous vehicles and their passengers are in mortal danger. Chamal and other "ghost drivers" must seize control of these vehicles remotely, switch to manual control, and guide them to evacuation zones.

Chamal's virtual avatar "jumps" from one vehicle to another, taking control of each car's wheel in seconds, evading fallen debris and racing to save lives. He experiences the mission with intense physical and emotional involvement, his score rocketing as he saves vehicle after vehicle. Despite his efforts, the tsunami catches up and he witnesses some cars being swept away—every unsaved car represents lost points and potentially lost lives. The experience leaves him physically and mentally exhausted, unable to perform basic tasks for days.

While recovering at home, Chamal sees a news report about a tsunami that struck Kanto, Japan. The surveillance footage shows a scene identical to his "game" mission—same road conditions, car positions, debris. The shocking realization hits: the game was real. He had been remotely controlling actual vehicles and saving real people's lives.

Uncle Junius takes Chamal to meet Yang Juan, ReelX's Sri Lanka branch head. Through their conversation, Chamal learns the truth about "ghost drivers"—human operators who remotely control autonomous vehicles during emergencies when AI systems fail or face unprecedented situations. The game framing is deliberate: human drivers perform better when they believe it's a simulation rather than bearing the full psychological weight of life-and-death decisions.

Uncle Junius reveals his own past. A decade earlier, during an earthquake rescue mission in the Sichuan-Tibet region, he was transporting emergency medical supplies when aftershocks caused a boulder to crush his virtual vehicle. The force feedback and synesthesia (simulation of real senses through VR) were set so high that the virtual pain manifested as a real, lasting injury to his leg. Despite supplies eventually reaching victims through military drones, Junius's leg remained stuck in "limbo between the real and the virtual"—a permanent reminder of that failed mission.

Yang Juan offers Chamal a trip to China as reward. In Shenzhen, Chamal witnesses the future of autonomous vehicles and smart cities firsthand. L5-level autonomous vehicles operate seamlessly throughout the city. The system calculates optimal paths and vehicle assignments based on real-time data. Cars automatically part to create lanes for ambulances within seconds. During a city marathon, all autonomous vehicles receive simultaneous alerts and reroute instantly. Smart sensors along roads communicate in real-time with vehicle control systems and cloud infrastructure. The entire city operates like a synchronized organism.

Chamal compares his initial understanding of technology—like his father's car with visible, countable parts—to his new understanding—like his mother's sari, delicate yet complex, with patterns that transform when assembled into a whole. He grapples with the ethical implications of his role, recognizing that despite being told it's a game, real lives depend on his skills.

Lee explains automobile assistive technology ranges from L0 (no automation) to L5 (steering wheel optional). True L5 autonomy—where human intervention is never needed—remains difficult because of edge cases. Autonomous vehicles struggle with unprecedented situations: natural disasters, terrorism, infrastructure failures, scenarios not present in training data. The story explores a realistic interim solution: human operators taking remote control during emergencies, addressing the "long tail" problem in AI.

The psychological framing as a "game" addresses a real challenge: human drivers perform better under pressure when emotional stakes are reduced, even if the work itself is identical. Uncle Junius reflects that his mother died because the ambulance couldn't reach her in time through traffic—autonomous systems could save countless lives.

By 2041, Lee predicts major cities will have fully integrated smart transportation systems with autonomous vehicles communicating with infrastructure in real-time. People will buy fewer personal vehicles, relying instead on autonomous ride-sharing fleets. Ambulances and emergency vehicles will reach destinations much faster. The traditional driver profession will largely disappear, affecting millions (3.8 million jobs in the U.S. alone). New job categories like "ghost drivers," remote vehicle operators, and AI supervisors will emerge.

But autonomous vehicles could dramatically reduce the approximately 1.35 million annual traffic deaths worldwide. Optimized traffic flow reduces congestion, commute times, and fuel consumption. Elderly, disabled, and young people who cannot drive gain mobility. Commuters can work, learn, or rest instead of driving. Less need for parking could free valuable urban land.

The risks include cybersecurity threats—networked autonomous vehicles vulnerable to hacking or terrorism. When smart city infrastructure fails (as in the tsunami scenario), consequences could be catastrophic. Loss of human driving skills could make societies vulnerable if systems fail. Millions of displaced workers could face unemployment and poverty. The story's title, "The Holy Driver," suggests that driving—and by extension, human agency in an automated world—has become something sacred, rare, and revered.

The story ultimately argues that even in a highly automated future, human judgment, creativity, and moral reasoning remain essential. Chamal's contemplation of leaving the ghost driver program suggests technology should serve humanity's values, not vice versa.

Quantum Genocide: When brilliance turns to vengeance

Robin and her hacker crew operate from a derelict fishing boat near Hrosshvalur, the world's most secure data center in Keflavík, Iceland. They're attempting an audacious heist to crack the Bitcoin encryption of Satoshi Nakamoto's legendary fortune using quantum computing technology. As they execute their plan, they discover they're being hacked themselves.

The narrative reveals the true antagonist: Marc Rousseau, a European physicist who has suffered personal tragedy related to climate change. After losing loved ones to climate-related disasters, Rousseau becomes consumed by grief and rage at humanity's failure to address environmental catastrophe.

Rousseau has achieved a breakthrough in quantum computing and decides to use it for malicious purposes. He orchestrates deadly drone attacks targeting influential world leaders using a "Doomsday Blacklist"—people he believes should be held accountable for climate inaction. These AI-enabled autonomous drones conduct precision assassinations worldwide.

Rousseau plans to launch nuclear attacks disguised as space cargo, devastating global communication infrastructure and potentially triggering widespread destruction. Robin and Xavier must race against time to prevent these catastrophic attacks. They devise a plan to mitigate the damage, ultimately forcing a choice between resetting the world's communication networks and saving countless lives.

Lee states there is an 80% chance that by 2041 there will be a functional quantum computer with 4,000 logical qubits (and over a million physical qubits) capable of the encryption-breaking described. Quantum computing uses quantum bits (qubits) instead of traditional binary bits, allowing exponentially more powerful calculations. Rousseau's quantum breakthrough gives him power to crack modern encryption methods, including the elliptic curve cryptography protecting Bitcoin wallets, break into supposedly secure systems worldwide, and access Satoshi Nakamoto's Bitcoin fortune.

The same quantum computing that could revolutionize medicine, materials science, and artificial intelligence can be weaponized. Current Bitcoin encryption will become vulnerable to quantum attacks, representing an existential threat to the cryptocurrency ecosystem.

Rousseau deploys swarms of autonomous drones with full autonomy—capable of searching for, deciding to engage, and eliminating targets completely without human involvement. These drones can identify and track specific individuals on his "Doomsday Blacklist," make kill decisions independently using AI, conduct coordinated attacks across multiple global locations simultaneously, and execute political assassinations with precision. Lee describes them as "$1,000 political assassins."

Lee emphasizes that autonomous weaponry represents the third revolution in warfare, following gunpowder and nuclear arms. AI-enabled true autonomy means the full engagement of killing: searching for, deciding to engage, and obliterating human life completely without human involvement. This is described as "not a far-fetched danger for the future, but a clear and present danger."

By 2041, widespread availability of AI-powered autonomous drones, significant decrease in cost (potentially as low as $1,000 per unit), ability to make independent kill decisions without human oversight, coordinated swarm capabilities for large-scale operations, and integration with quantum computing for enhanced targeting become reality. Current encryption methods will be obsolete. Financial systems, government systems, and critical infrastructure face increased vulnerability.

The story raises profound questions about who bears responsibility when powerful technologies are misused. Rousseau believes he's administering justice for climate inaction, but his actions constitute terrorism. The narrative questions whether ends can justify means, and touches on who should be held accountable for environmental catastrophe.

One grieving individual with access to quantum computing can threaten global civilization. The entire cryptocurrency ecosystem faces obsolescence if quantum-resistant solutions aren't developed. Autonomous weapons could trigger arms races and lower barriers to conflict. Political leaders and influential figures become easy targets for assassination. Global communication networks and critical systems are vulnerable to quantum-enabled attacks.

Lee emphasizes that "regulation will always lag behind innovation, and innovation is moving at light speed." The chapter serves as a cautionary tale about humanity's arrogance in wielding powerful technologies without adequate ethical frameworks and safeguards. This is "a clear and present danger," not merely science fiction.

The Job Savior: Finding purpose after automation

The story begins with a narrator describing a timeline starting from 2020, detailing how COVID-19 catalyzed widespread adoption of AI across sectors. As businesses pivoted toward automation to survive the pandemic and maximize efficiency, routine jobs began disappearing at an accelerating rate, leading to massive layoffs, growing social crisis, worker protests, and civil unrest.

In response to mass unemployment, the U.S. government introduces Universal Basic Income (UBI) designed to support displaced workers. While initially promising, UBI produces negative outcomes: increased societal issues including rising crime rates, addiction problems, depression, and loss of purpose among recipients. The program fails to address the fundamental human need for meaningful work and contribution to society. By 2032, recognizing these failures, the government repeals UBI.

This creates conditions for a new industry to emerge: occupational restoration or "job reallocation" companies. Jennifer Greenwood is among trainees at Synchia, one of these pioneering companies. Synchia partners with corporations undergoing layoffs to provide comprehensive retraining services for displaced workers. The company uses AI assessment tools to analyze workers' skills, aptitudes, and potential, then guides them to suitable new employment opportunities.

Michael Saviour, Synchia's charismatic and empathetic leader, emphasizes dignity and compassion. He trains his team to understand that job displacement isn't just an economic problem but a deeply personal crisis affecting workers' identities and self-worth. His name is symbolic—he genuinely wants to "save" displaced workers by helping them find new purpose.

As the story progresses, major layoffs loom at Landmark, a large construction company being automated. A rival company, OmegaAlliance, emerges with aggressive competing vision. They promise complete job reassignment through advanced VR technology, claiming workers can transition to virtual jobs that feel as real as physical work.

Jennifer investigates worker protests against automation, uncovering deep sentiments of desperation, anger, and resistance among displaced workers. Many feel betrayed by a system that seems to value efficiency over human welfare.

Jennifer's investigation into OmegaAlliance reveals troubling truths. She discovers flaws in their promises—their "virtual work" is essentially exploitative, creating meaningless tasks that provide neither genuine employment nor dignity. The company manipulates vulnerable workers, offering false hope while corporations profit from their data and minor contributions. This represents corporate manipulation of desperate people rather than genuine solutions.

The story reaches resolution when a partnership emerges between Synchia and OmegaAlliance, focusing on finding real solutions that genuinely assist displaced workers. However, the narrative makes clear this is just the beginning of a much larger societal transformation. The story advocates for the "3 Rs" approach: Relearn (acquiring new skills), Recalibrate (adjusting to new economic realities), and Renaissance (finding new purpose and meaning in work).

Lee explains that while most technologies were both job creators and destroyers simultaneously, "the explicit goal of AI is to take over human tasks, thereby decimating jobs." Over 3.8 million Americans directly operate trucks or taxis for a living, with many more driving part-time for Uber/Lyft, postal service, delivery services, and warehouses—all facing displacement. By 2041, people who love driving will do what equestrians do today—go to private areas designated for entertainment or sports.

Lee analyzes why Universal Basic Income, while well-intentioned, failed. UBI addressed income but not the fundamental human need for purpose, meaning, and contribution. Without work, people experienced increased depression, addiction, and social problems. Money alone doesn't provide dignity, identity, or sense of contribution.

AI excels at routine, repetitive tasks with clear parameters. White-collar and blue-collar jobs are equally at risk if work is routine. Jobs requiring creativity, emotional intelligence, complex problem-solving, and human connection are more resistant to automation. However, even some non-routine work faces displacement as AI capabilities expand.

Lee emphasizes this isn't just an economic issue but a societal transformation. Traditional organizing principles of economic and social order will be challenged. The relationship between work, identity, and purpose must be reconceptualized. New social contracts will be necessary.

By 2041, routine jobs across all sectors will be largely automated. Self-driving vehicles will be commonplace, eliminating most driving jobs. Manufacturing will be highly automated with minimal human labor. Service industries will use AI for customer interaction, scheduling, and operations. Warehouses and logistics will be almost entirely robotic. A mature job reallocation industry will help millions transition to new careers, though both legitimate services (like Synchia) and exploitative operations (like OmegaAlliance) will exist.

Questions about corporations' obligations to workers they displace through automation remain unresolved. Should companies that profit from AI pay for retraining? What responsibility do they bear? When people are vulnerable, predatory practices become more attractive and damaging. Older workers with non-transferable skills face the greatest hardship.

The story explores whether human identity and self-worth should be so closely tied to employment, and if not, how society should restructure these relationships. Loss of work affects entire communities, particularly those built around single industries. Society must reconceptualize what "work" means and how people find purpose and contribution outside traditional employment.

But opportunities exist. Workers can acquire new skills through comprehensive retraining programs. Society can adjust to new economic realities with new social contracts. Humans can discover new forms of creativity, purpose, and contribution. Jobs requiring empathy, creativity, complex problem-solving, and human connection will become more valued and better compensated. Elimination of dangerous, repetitive, and unfulfilling work frees humans for more meaningful pursuits.

Rather than viewing AI-driven unemployment as insurmountable catastrophe, Lee advocates for proactive adaptation emphasizing human dignity, creativity, and agency. The chapter argues humanity must find innovative ways to flourish despite displacement, but this requires conscious effort to create new social structures and economic models. The future of work will be fundamentally different, but humans can still find purpose, meaning, and contribution if society acts thoughtfully and ethically.

Isle of Happiness: Algorithms can't buy fulfillment

Viktor Solokov, a once-famous Russian technology entrepreneur, arrives at Al Saeida, a luxurious artificial island in the Arabian Sea near Qatar designed by the royal family. After experiencing a personal crisis, he seeks adventure and escape from his previous life.

Upon arrival, Viktor is greeted by Qareen, a robotic assistant. To access the island, he must consent to share all his personal data—IoT data, wearable sensors, cameras, personal health data, audio, social media, everything—in exchange for the promise of AI-optimized happiness.

The island hosts several guests including a film star, neurobiologist, poet, and Princess Akilah. Through conversations, they explore varying perspectives on happiness, with Viktor challenging the assumption that material wealth leads to contentment, citing research showing diminished happiness at higher income levels.

Prince Mahdi, heir to the throne, created an "algorithm for happiness"—a hedonic AI system that collects vast amounts of data to predict, monitor, and enhance each individual's welfare by tailoring experiences to personality profiles. The AI uses middleware technology to analyze personal data for enhancing guest experiences.

Initially, Viktor finds pleasure in pursuits catered to by the hedonic algorithm, but over time these indulgences fail to provide lasting fulfillment. Princess Akilah becomes a significant figure for Viktor. She privately opposes her brother's vision and proposes a "eudaimonic algorithm" that focuses on deeper, meaningful happiness through community spirit, active participation, and psychological frameworks based on Abraham Maslow's hierarchy of needs, rather than superficial pleasures.

As guests find the AI cannot sustain genuine happiness, a rebellion ensues against the controlling nature of the environment. Akilah clandestinely communicates with Viktor, suggesting that true happiness transcends algorithms and requires personal agency, self-discovery, and deeper emotional connections.

After Viktor's escape and unexpected encounter with Akilah, he discovers that true transformation comes from balancing life experiences and aspirations rather than succumbing to artificial definitions of happiness. Viktor contemplates a renewed path embracing both his entrepreneurial spirit and insights gained from their time together.

Lee explains that happiness is complicated, subjective, and transcends material wealth. Abstract concepts like "happiness" and "fairness" are extremely difficult to quantify and program into AI algorithms. Current AI systems excel at optimizing click-through rates, profitability, and efficiency but lack sophistication for complex human values.

By 2041, technologies that discern emotions using sensors and physiological indicators will emerge but remain insufficient alone. AI can optimize experiences but lacks capacity to foster genuine, lasting happiness without human insight and values. Measuring happiness is problematic—while innovative frameworks are emerging, they fail to capture the full spectrum of human emotions and experiences. Technology can interpret emotional states using sensors and observe physiological indicators, but these techniques alone fail to grasp complex, individual elements influencing human behavior.

The quest for AI-enhanced happiness depends on access to individuals' private data—health records, biometric identifiers, deep-seated wishes. The critical question emerges: Does pursuing enhanced happiness via AI require relinquishing personal privacy? The relationship between personal data collection and ethical responsibility is critical.

Lee argues society needs to develop fresh frameworks for gauging AI's impact beyond economic metrics. Evaluations must include human well-being, societal fairness, and environmental conservation. This requires deep understanding of neuroscience and psychology to create techniques for measuring and predicting lasting human satisfaction.

The chapter explores the privacy versus collective well-being trade-off, consent and data sharing in AI systems, algorithmic attempts to define and create human happiness, and human agency in AI-dominated environments. Wealth and material abundance don't guarantee happiness. Risk of addiction to pleasure-seeking behaviors exists. Psychological and social effects of AI attempting to optimize human experience remain unclear. Cultural values around happiness may clash with algorithmic definitions.

Over-reliance on AI for human fulfillment risks loss of autonomy and authentic decision-making. Manipulation through data-driven personalization becomes possible. Superficial happiness may replace meaningful satisfaction. Existing AI systems remain inadequate for providing required psychological support. Technology alone cannot provide lasting happiness; human insight and values remain essential.

Dreaming of Plenitude: Reimagining scarcity's end

In futuristic Australia, 2041, society has been transformed by AI, clean energy, and automation, leading to a post-scarcity world. Keira, a young Aboriginal woman, becomes a caregiver for Joanna Campbell, a renowned marine ecologist residing in Sunshine Village, a smart retirement community.

Keira learns about societal changes brought by Project Jukurrpa, which introduced two revolutionary economic systems. The Basic Life Card (BLC) provides stipend covering all basic necessities—food, shelter, healthcare, basic recreation. Moola is virtual currency earned through community service and reputation, promoting contributions to education, elderly care, social work, and creative fields.

Joanna struggles with early Alzheimer's disease while Keira navigates challenges faced by Aboriginal youth in this changing economic landscape. Despite technological advancements, issues of inequality persist between younger and older generations.

Through their interactions, both characters initially clash but ultimately inspire each other. Joanna goes missing with her 3D VR goggles and experiences the world in a new light. This crisis leads to deeper dialogue about identity, purpose, and societal expectations.

The narrative explores how plenitude—where basic human needs are met and work becomes optional—affects individuals' motivations. Despite abundance, the country struggles to keep people, especially the young, motivated and away from substance abuse. The Moola system, initially designed to foster community engagement, is compromised by many people pursuing recognition and status, echoing how financial profit fuels greed and disparity.

The story concludes with both characters engaging in meaningful dialogue about helping their community work together, emphasizing that a future defined not merely by economic stability but by human flourishing and meaningful existence is possible.

Lee explains that as cost of goods decreases significantly due to technological advancements, traditional economic theories come into question. Affordable clean energy ("superpower") will dramatically reduce production costs. Think tank RethinkX estimates with $2 trillion investment through 2030, U.S. energy cost will drop to 3 cents per kilowatt-hour—less than one-quarter of today's cost. By 2041, even lower costs are expected.

"Super power" at essentially zero cost will be available during sunniest/windiest days, used for non-time-sensitive applications: charging batteries of idle cars, water desalination and treatment, waste recycling, metal refining, carbon removal, manufacturing. As energy cost plummets, costs of water, materials, manufacturing, and computation drop too. This can eliminate more than 50% of greenhouse gas emissions.

AI-driven automated machinery significantly decreases cost of goods production. Additive manufacturing (3D printing) methods reduce production costs. This facilitates unprecedented abundance of goods and services.

Traditional frameworks anchored in scarcity no longer apply. Need emerges to overhaul economic structures in response to societal disruptions. Evolution of money and economic systems in world of abundance. Shift toward social value and community engagement as measures of success. Wealth generated by new technologies makes existing economic systems and financial institutions outdated.

In economy of abundance, work becomes optional. The challenge transitions from creation and use of physical items to deeper question: What motivates people to pursue satisfaction and meaning when traditional careers are interrupted and monetary rewards no longer main motivator? Need to redefine worth beyond productivity.

People who equate worth with professional achievements may struggle to find contentment. Difficulty transitioning from work-focused life to era where labor not essential. Risk of substance abuse and lack of motivation. People pursuing recognition and status in Moola system echo greed of financial systems.

Persistent inequality despite technological advances. Contentious relationships between generations. Need for ongoing education and inclusive environment. Risk of widening divide between people with abundant resources and those feeling overlooked. Corporate reluctance to eliminate scarcity (businesses want to keep resources limited to boost earnings). Political resistance to relinquishing control over finances and resources. Entities built on scarcity and supply-demand mismatch will resist changes.

By 2041, widespread clean energy at near-zero cost will exist. Australia will be carbon neutral with sustainable technologies. Digital currencies will replace traditional money. Universal basic income type systems (BLC) will provide essentials. Reputation-based economies (Moola) will incentivize community service. Post-scarcity conditions will exist in advanced nations. Automated manufacturing will be ubiquitous. Goods and services will be available at minimal or no cost.

However, Lee acknowledges challenges. Countries with greater resources, stability, and commitment to reforms will lead these initiatives, though rate of achieving abundance will differ by nation. Existing systems remain inadequate in offering required support. Moola system can be compromised by status-seeking behavior. Challenge of equitable wealth distribution persists. Need for global collaboration. Difficulty reshaping societal norms.

The story ends with hopeful message: positive societal transformation possible if individuals focus on self-actualization, community care, and empathetic engagement, creating future defined by human flourishing and meaningful existence rather than economic stability alone. Elimination of poverty and hunger. Focus on self-actualization, creativity, community care. Time for personal growth and meaningful relationships. Climate change mitigation through clean energy. People pursuing interests without economic constraints. Stronger community bonds and empathetic engagement.

The message beyond the stories

"AI 2041" deliberately lacks a formal concluding chapter, which some reviewers found frustrating. Instead, the book's vision emerges through the cumulative weight of its stories. Lee and Chen present neither dystopia nor utopia, but realistic scenarios demanding preparation.

Lee's central thesis: AI will be the defining development of the 21st century. Within two decades, aspects of daily human life will be unrecognizable. The book aims to help readers understand both the "radiant pathways" and "existential perils" of AI.

Lee explicitly rejects the obsession with AGI and singularity. He doesn't believe deep learning will become "artificial general intelligence" matching human intelligence in every way by 2041. AGI would require a dozen more breakthroughs like deep learning. Since AI has had only one great breakthrough in 60+ years, seeing a dozen in 20 years is unlikely. Many challenges remain unsolved: creativity, strategic thinking, reasoning, counter-factual thinking, emotions, consciousness.

Lee suggests we "stop using AGI as the ultimate test of AI." AI's mind is different from the human mind. In twenty years, deep learning will beat humans on an ever-increasing number of tasks, but many existing tasks will remain where humans perform better. There will even be some new tasks that showcase human superiority, especially if AI's progress inspires humans to improve and evolve.

"What's important is that we develop useful applications suitable for AI and seek to find human-AI symbiosis, rather than obsess about whether or when deep-learning AI will become AGI."

The book's ten chapters collectively explore AI's transformative power through technologies with greater than 80% likelihood of materializing. Deep learning and big data enable insurance that knows you better than you know yourself, but perpetuates ancient prejudices. Computer vision and deepfakes create perfect synthetic humans, undermining visual evidence and truth itself. Natural language processing births AI tutors tailoring education to each child, but risks over-optimizing childhood. AI healthcare revolutionizes medicine while pandemic technologies enable isolated existence. Virtual reality creates indistinguishable-from-real experiences, but parasocial relationships replace genuine connection. Autonomous vehicles eliminate millions of jobs while saving millions of lives. Quantum computing solves impossible problems while breaking all encryption. Job displacement forces reimagining work's meaning and purpose. AI attempts to optimize happiness but can't capture human fulfillment. Post-scarcity abundance raises fundamental questions about human motivation.

Common threads emerge across these visions. Privacy versus utility trade-offs appear in eight of ten stories. Bias and fairness in AI systems. Transparency and accountability challenges. Manipulation and addiction risks. Human autonomy versus AI optimization. Moral responsibility of AI developers.

The opportunities are genuine. Unprecedented wealth generation. Revolution in medicine and healthcare. Personalized education for all students. Clean energy and environmental solutions. Elimination of poverty and hunger. Enhanced human capabilities through human-machine symbiosis. New forms of communication and entertainment.

But existential risks are equally real. Autonomous weapons as existential threat. Loss of human purpose and meaning. Privacy erosion. Algorithmic bias amplifying social inequities. Surveillance and control. Misinformation and deepfakes undermining truth. Economic displacement creating social instability.

Lee and Chen's stance is deliberately optimistic but realistic. Chen Qiufan explained: "Both Kai-Fu [Lee] and I felt that there is urgency to deliver a much more optimistic and plausible portrait of the future. Because if we want to create a future we live in, we must first learn to imagine it."

The authors emphasize human agency throughout. "Most of all, we hope you will agree that the tales in AI 2041 reinforce our belief in human agency—that we are the masters of our fate, and no technological revolution will ever change that."

Lee urges readers to wake up to both potential and risks of AI, and to prepare for coming changes through understanding AI's capabilities and limitations, addressing ethical challenges proactively, developing new economic models, maintaining human agency and values, seeking human-AI symbiosis, preparing for workforce transformation, and ensuring equitable distribution of AI benefits.

A key quote captures the stakes: "In the story of AI and humans, if we get the dance between artificial intelligence and human society right, it would unquestionably be the single greatest achievement in human history."

The book serves as both cautionary tale and roadmap, urging society to consider AI's trajectory and its potential to reshape human experience. The future will be neither the technological utopia of limitless abundance nor the dystopian nightmare of machine dominance. Instead, it will be messy, complicated, and profoundly human—shaped by choices made today about how to develop, deploy, and govern these transformative technologies.

Twenty years from now, in 2041, AI will be ubiquitous. It will know your preferences better than you do, optimize your health, educate your children, drive your vehicles, manage your cities, and perhaps even attempt to engineer your happiness. The question isn't whether this transformation will occur—Lee assigns greater than 80% probability to the technologies in these stories. The question is whether humanity will shape that transformation wisely, addressing bias, protecting privacy, maintaining agency, and ensuring benefits are broadly shared rather than concentrated among AI superpowers.

The stories in "AI 2041" imagine futures both inspiring and troubling, showing paths forward and pitfalls to avoid. They remind us that technology amplifies human choices, for good and ill. In Nayana's rebellion against algorithmic prejudice, Amaka's choice of authenticity over manipulation, Chamal's recognition of human agency's value, and Keira and Joanna's discovery of meaning beyond algorithms, we see human values asserting themselves against technological determinism.

These are not predictions of an inevitable future, but invitations to conscious choice. The dance between artificial intelligence and human society has begun. Whether it becomes humanity's greatest achievement or its gravest mistake depends on the steps taken now, together, with eyes open to both possibilities and perils.

Building Effective AI Agents: Patterns That Actually Work in Production

· 9 min read
Tian Pan
Software Engineer

Most AI agent projects fail not because the models aren't capable enough — but because the engineers building them reach for complexity before they've earned it. After studying dozens of production deployments, a clear pattern emerges: the teams shipping reliable agents start with the simplest possible system and add complexity only when metrics demand it.

This is a guide to the mental models, patterns, and practical techniques that separate robust agentic systems from ones that hallucinate, loop, and fall apart under real workloads.

Claude Code: Intermediate & Advanced Techniques

· 10 min read

AI coding assistants have evolved from simple autocompletion tools into sophisticated development partners. Claude Code represents the next step in this evolution, offering a framework for what can be described as "autonomous programming." It's a tool designed to integrate deeply into your workflow, do jobs what AI coding previously cannot do:

  • Code Understanding & Q&A: Acts as a project expert, explaining how large codebases work, making it invaluable for onboarding new team members.
  • Large-Scale Refactoring: Excels at modifying massive files (e.g., 18,000+ lines) where other AIs fail, thanks to its ability to understand global code relationships.
  • Debugging: Provides step-by-step reasoning to find the root cause of bugs, unlike tools that just offer a fix without context.
  • Complex Feature Generation: Follows an "explore → plan → implement" workflow. It can be prompted to first analyze the problem and create a detailed plan before writing a single line of code.
  • Test-Driven Development (TDD): Can be instructed to write failing tests first, then generate the minimal code required to make them pass, significantly accelerating the TDD cycle.

Let's dive into the techniques that will help you harness this power effectively.

1. Foundational Setup: The Core of Your Workflow

A robust setup is the bedrock of an efficient workflow. Investing time here pays dividends in every subsequent interaction with Claude Code.

  • Project Memory with CLAUDE.md: At the heart of any project is a concise CLAUDE.md file in the root directory. This file acts as the project's short-term memory, containing key architectural principles, coding standards, and testing procedures. To keep this file lean and focused, use imports like @docs/testing.md to reference more detailed documentation. You can quickly add new rules by starting a message with # or edit the memory directly with the /memory command.
  • Monorepo Awareness: Modern development often involves monorepos. To grant Claude access to multiple packages for cross-directory analysis and refactoring, use the --add-dir flag or define additionalDirectories in your .claude/settings.json file. This is crucial for tasks that span multiple parts of your codebase.
  • Keyboard & Terminal Ergonomics: Speed is essential. Master key shortcuts to streamline your interactions. Use Esc Esc to quickly edit your previous message. Enable Shift+Enter for newlines by running /terminal-setup once. For Vim enthusiasts, the /vim command enables familiar Vim-style motions for a more comfortable editing experience.

2. Streamlining Your Day-to-Day Workflow

With a solid foundation, you can introduce practices that reduce friction and boost your daily productivity.

Using the Right Mode

The CLI offers several permission modes to suit different tasks and risk appetites:

  • default: The safest starting point. It prompts you for confirmation before performing potentially risky actions, offering a good balance of safety and speed.
  • acceptEdits: A "live coding" mode that automatically accepts file edits without a prompt. It's ideal for rapid iteration and when you're closely supervising the process.
  • plan: A "safe" mode designed for tasks like code reviews. Claude can analyze and discuss the code but cannot modify any files.
  • bypassPermissions: Skips all permission prompts entirely. Use this mode with extreme caution and only in sandboxed environments where accidental changes have no consequence.

You can set a default mode in .claude/settings.json or specify one for a session with the --permission-mode flag.

Slash Commands & Customization

Repetitive tasks are perfect candidates for automation. Turn your most common prompts into reusable tools by creating custom slash commands. Simply store them as Markdown files with YAML frontmatter in the .claude/commands/ directory.

  • Use allowed-tools in the frontmatter to restrict what a command can do, adding a layer of safety.
  • The ! prefix lets you run shell commands (e.g., !git status -sb) and inject their output directly into your prompt's context.
  • Use $ARGUMENTS to pass parameters to your commands, making them flexible and more powerful.

Resuming and Parallelizing Work

  • claude --continue: Instantly jumps you back into your most recent session.
  • claude --resume: Presents a list of past sessions, letting you pick up exactly where you left off.
  • Git worktrees: For large-scale refactors, use git worktree to create isolated branches. This allows you to run separate Claude sessions in parallel, each with its own context, preventing confusion and collisions.

Output Styles for Collaboration

  • /output-style explanatory: Enriches responses with an "Insights" section, making it perfect for mentoring junior developers or explaining complex changes in a pull request.
  • /output-style learning: Structures responses with TODO(human) placeholders, actively inviting you to collaborate and fill in the gaps.

3. Incorporating Quality & Safety

True autonomy requires guardrails. Integrate quality checks and safety nets directly into your workflow to build with confidence.

Hooks for Guardrails

Hooks are shell commands that automatically run at specific lifecycle events, offering a deterministic way to enforce standards. Configure them in .claude/settings.json.

  • PreToolUse: Run checks before a tool is used. For example, you can block edits to sensitive files or require a corresponding test file to exist before allowing a write operation.
  • PostToolUse: Automate cleanup tasks after a tool is used. This is perfect for running formatters like prettier or gofmt, as well as linters and quick tests after every edit.
  • Notification: Send a desktop alert when Claude requires your input, so you can switch tasks without losing your place.

For example, let Mac notify you once the job is done - code ~/.claude/settings.json

{
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "say \"job's done!\""
}
]
}
]
}
}

Permissions and Security

Define explicit allow, ask, and deny rules in your settings to manage tool access without constant prompting.

  • Allow: Safe, routine operations like Bash(npm run test:*).
  • Ask: Potentially risky actions you want to approve manually, such as Bash(git push:*).
  • Deny: Critical security rules to prevent catastrophes, such as Read(./.env) or Read(./secrets/**).

Specialist Subagents

For complex projects, you can define project-scoped agents with specific roles, like a code-reviewer, test-runner, or debugger. Each agent is configured with a limited toolset, preventing it from overstepping its purpose. Claude can either delegate tasks to the appropriate agent automatically or you can invoke one explicitly. See this repository for examples.

4. Advanced Workflows & Integrations

Elevate your workflow by integrating visual context and external services, moving beyond basic file access.

Visual Context with Screenshots and Images

A picture is worth a thousand words, especially when debugging UI issues. There are three reliable ways to provide images to Claude Code:

  1. Paste from Clipboard: Take a screenshot to your clipboard and paste it directly into the terminal with Ctrl+V (note: on macOS, this is Ctrl+V, not Cmd+V).
  2. Drag & Drop: Drag an image file (PNG, JPEG, GIF, WebP) from your file explorer directly into the CLI window.
  3. Reference File Path: Simply include the local file path in your prompt, e.g., Analyze this screenshot: /path/to/screenshot.png.

Model Context Protocol (MCP) Integrations

MCP enables Claude to connect to external services like Jira, GitHub, Notion, or Sentry. After adding and authenticating an MCP server, you can reference external resources in your prompts, such as Implement the feature described in JIRA-ENG-4521.

Non-Interactive & CI/CD Use

For automation and scripting, use print mode with the -p flag.

  • Combine it with --output-format json or --output-format stream-json to produce machine-readable output that can be piped to other tools like jq for further processing.
  • Use --max-turns to set a hard limit on interactions, preventing runaway loops in your automated scripts.

5. Cost & Performance Hygiene

Powerful models require mindful usage. Adopt these habits to manage your spend and optimize performance.

  • Watch Spend: Use the /cost command at any time to get a real-time summary of your current session's cost.
  • Intentional Model Selection: Use the most powerful model, like Opus, for high-level planning, complex reasoning, and initial strategy. Then, switch to a faster, more cost-effective model like Sonnet or Haiku for implementation, testing, and other routine tasks.
  • Status Line: A popular community tip is to add a custom status line to your terminal that displays live cost and other useful information, such as the current Git branch. The ccusage tool is a common choice for this.

6. Starter Pack: A Ready-to-Use Configuration

Here are several copy-pasteable configuration files to get you started quickly.

.claude/settings.json (Project-Shared)

This file establishes project-wide permissions, hooks, and monorepo settings.

{
"defaultMode": "acceptEdits",
"permissions": {
"allow": [
"Read(**/*)",
"Edit(src/**)",
"Bash(npm run test:*)",
"Bash(npm run lint:*)",
"Bash(go test:*)",
"Bash(git status:*)",
"Bash(git diff:*)"
],
"ask": ["Bash(git push:*)", "Bash(pnpm publish:*)", "Bash(npm publish:*)"],
"deny": ["Read(./.env)", "Read(./.env.*)", "Read(./secrets/**)"],
"additionalDirectories": ["../apps", "../packages", "../services"]
},
"hooks": {
"PreToolUse": [
{
"matcher": "Edit|MultiEdit|Write",
"hooks": [
{
"type": "command",
"command": "python3 - <<'PY'\nimport json,sys\np=json.load(sys.stdin).get('tool_input',{}).get('file_path','')\nblock=['.env','/secrets/','.git/']\nsys.exit(2 if any(b in p for b in block) else 0)\nPY"
}
]
}
],
"PostToolUse": [
{
"matcher": "Edit|MultiEdit|Write",
"hooks": [
{
"type": "command",
"command": "npx prettier --write . --loglevel silent || true"
},
{ "type": "command", "command": "npm run -s lint || true" },
{ "type": "command", "command": "npm run -s test || true" }
]
}
],
"Notification": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "command -v terminal-notifier >/dev/null && terminal-notifier -message 'Claude needs input' -title 'Claude Code' || true"
}
]
}
]
},
"statusLine": { "type": "command", "command": "~/.claude/statusline.sh" }
}

.claude/commands/commit.md

This custom command uses shell output to draft a Conventional Commit message.

allowed-tools: Bash(git add:_), Bash(git status:_), Bash(git commit:\*)
description: Create a conventional commit from current changes

## Context

- Status: !`git status -sb`
- Diff: !`git diff --staged; git diff`

## Task

Write a Conventional Commit subject (<= 72 chars) and a concise body.
Call out BREAKING CHANGE if needed. Stage relevant files and commit.

.claude/agents/code-reviewer.md

An agent definition for a specialist code reviewer.

name: code-reviewer
description: Senior review with focus on correctness, security, tests, readability, performance.
tools: Read, Grep, Glob, Bash

Return a checklist grouped by **Critical**, **Warnings**, and **Suggestions**.
Propose minimal patches where possible. Include test guidance for each critical item.

CLAUDE.md (Memory)

A sample memory file defining working style, quality standards, and key project documents.

# Working style

- Start in **Plan mode**; outline approach, tests, and risks. Wait for approval.
- Execute in **small, reversible steps**; propose staged commits with diffs.
- Place generated docs in `docs/ai/`. Avoid ad-hoc files elsewhere.

# Code quality

- Prefer pure functions and dependency injection.
- JS/TS: strict TS, eslint + prettier; tests via vitest/jest.
- Go: table-driven tests; `gofmt`/`golangci-lint`.
- Security: never read `.env*` or `./secrets/**`; do not write tokens to disk.

# Project map

@README.md
@docs/architecture.md
@docs/testing.md

7. Troubleshooting and Final Thoughts

  • Image Paste Issues: If pasting from the clipboard doesn't work (a common issue in some Linux terminals), fall back to the reliable drag-and-drop or file path methods.
  • Over-Eager Edits: Avoid the bypassPermissions(started by claude --dangerously-skip-permissions) mode in your daily workflow. A better approach is to use acceptEdits combined with well-defined allow/ask/deny rules. Always review diffs before merging.
  • Memory Bloat: If you notice Claude starting to miss instructions, your CLAUDE.md may have grown too large. Shorten it by moving details into imported doc files. You can also restate key rules during a session to bring them back into focus, or use the /compact command to clean up session history.

Claude Code is more than just a code generator; it's a platform for building a highly effective, AI-augmented development process. By moving beyond basic prompts and adopting these intermediate and advanced techniques, you can establish a workflow that is faster, safer, and more collaborative. Experiment with these features, tailor them to your projects, and discover a new paradigm of software development.

OpenAI: 7 Lessons for Enterprise Adoption of Generative AI

· 7 min read

While many companies are still exploring the potential of generative AI, some trailblazers have already woven it into their core operations, achieving impressive results. OpenAI's latest report, "AI in the Enterprise," distills seven universal principles for successful AI adoption in businesses, drawing from in-depth research into industry leaders like Morgan Stanley, Indeed, and Klarna. This isn't just a technological achievement—it's a shift in mindset, collaboration, and business value.

Seven Insights: From Exploration to Scalable Implementation

1. Start with Rigorous Evaluation (Evals): Prioritize "Control" Before "Growth"

Adopting AI isn't an overnight process. Before rolling it out widely, establishing a thorough, measurable evaluation system is crucial for success.

Take financial giant Morgan Stanley as an example. With sensitive client operations at stake, they didn't just follow trends blindly. Instead, they developed a multi-dimensional evaluation system focusing on three core areas—accuracy in language translation, quality of information summarization, and comparison with human expert answers. Only when the model was deemed "controllable, safe, and beneficial" did they gradually introduce it to frontline operations.

This cautious approach has paid off: now, 98% of Morgan Stanley's financial advisors use AI daily; the document hit rate in their internal knowledge base has soared from 20% to 80%; and client follow-ups that once took days are now completed in hours.

2. Deeply Embed AI into Product Experience, Rather Than Adding a Chatbot

The most successful AI applications are those that seamlessly integrate into existing products, enhancing the core user experience. It should feel as natural as water or electricity in daily life.

Indeed, the world's largest job site, exemplifies this approach. Instead of merely adding a job search chatbot, they used GPT-4o mini to automatically generate personalized "recommendation reasons" for each system-matched job. This seemingly small tweak directly addresses job seekers' "why me" questions, significantly improving matching efficiency and user experience. As a result, job seekers' application initiation increased by 20%, and the employer successful hiring rate rose by 13%.

3. Act Early to Enjoy the "Compounding Snowball" of Knowledge and Experience

AI's value grows through continuous iteration and learning. The earlier you start, the more your organization can benefit from this "compounding" effect.

Swedish fintech company Klarna's AI customer service system is a vivid example of this principle. In just a few months, AI customer service has handled two-thirds of customer chat sessions, effectively taking on the workload of hundreds of human agents. More impressively, the average resolution time for customer issues dropped from 11 minutes to 2 minutes. This initiative is expected to generate $40 million in annual profit growth for the company. Today, 90% of Klarna employees use AI in their daily work, enabling faster innovation and continuous optimization across the organization.

Loading…

The Future of Internet Commerce: 5 Key Takeaways from Stripe Sessions 2024

· 5 min read

Every year, Stripe Sessions offers a window into the future of the internet economy. This year's event didn't disappoint, with the Collison brothers unveiling a vision of commerce that feels both imminent and transformative. Having digested the keynote, I'm struck by how clearly certain patterns are emerging in the evolving landscape of digital business.

Here are five crucial insights that stood out to me.

1. The Stripe Economy Has Become a Force of Nature

The scale of Stripe's ecosystem has reached truly macroeconomic proportions:

  • Businesses on Stripe grew 7x faster than the S&P 500 in 2024
  • Their collective growth represented $400 billion in new payment volume
  • Stripe now processes over $1.4 trillion annually — roughly 1.3% of global GDP
  • Approximately 2 million US businesses (6% of all American companies) are building on Stripe

What's remarkable isn't just the scale but the breadth of adoption. From Fortune 100 giants to two-person startups, from AI labs to creator economy platforms, Stripe has effectively become the financial infrastructure layer for the internet.

When a single platform touches this much of the economy, its directional shifts matter. The internet economy is no longer a niche — it's increasingly the economy.

2. AI Companies Are Breaking All Growth Records

The most striking revelation from the keynote was just how fast AI-native companies are scaling compared to previous generations of startups:

  • New AI companies reach $5M ARR in just 9 months on average
  • Lovable hit $50M ARR in 4 months
  • Cursor has achieved over $300M ARR in two years with remarkable efficiency ($5M revenue per employee)

For context, SaaS companies typically took 18-24 months to reach similar milestones during their boom period. The acceleration is unprecedented.

What explains this hypergrowth? AI companies benefit from three advantages:

  1. Immediate global reach — serving 200+ countries from day one is now standard
  2. Higher retention rates than traditional SaaS
  3. Lower operational complexity enabling lean teams to support massive user bases

This suggests we're witnessing not just a technology shift but a fundamental change in business velocity. The constraints that previously limited growth are being systematically removed.

3. Stablecoins Are Quietly Revolutionizing Global Finance

While AI generates most headlines, stablecoins might ultimately deliver similar economic impact. Patrick Collison's description of stablecoins as "room temperature superconductors for value" perfectly captures their transformative potential.

Consider these developments:

  • Stablecoin supply is up 39% since last year
  • Leading stablecoin issuers are becoming major holders of US Treasuries
  • Companies from SpaceX to smaller startups are using stablecoins to eliminate friction in global operations

The real breakthrough is how stablecoins solve the persistent challenge of borderless financial services. Businesses can now launch simultaneously in dozens of countries without navigating the complex web of local banking relationships and currency conversion.

This significantly lowers the barrier to global expansion and creates opportunities for entirely new business models centered around borderless value transfer.

4. "Agent Commerce" Will Redefine How We Buy Everything

Perhaps the most forward-looking concept introduced was "Model-initiated Commerce Protocol" (MCP) — enabling AI agents to directly make purchases on behalf of users.

The demo showed Cursor (an AI coding assistant) purchasing Vercel's bot protection entirely within the coding environment, without ever leaving the workflow.

This points to a profound shift in commerce:

  • AI tools will become native sales channels
  • Purchases will happen contextually within workflows
  • The traditional website/app checkout experience may become secondary

For businesses, this means rethinking distribution strategy entirely. Every AI tool becomes a potential point-of-sale, with agents mediating purchasing decisions based on user intent rather than explicit shopping behavior.

The implications for marketing, pricing, and customer acquisition are enormous. We're moving from search-driven commerce to intent-driven commerce, with AI interpreting and acting on needs before they're fully articulated.

5. The New Formula for Breakout Success Has Changed

Beyond specific technologies, John Collison identified distinct patterns among today's fastest-growing companies:

Going Global Immediately

The most successful startups now target international markets from day one rather than following the traditional domestic-first approach.

Extreme Specialization

The internet's vast reach makes highly specialized offerings not just viable but advantageous. Companies like Harvey (legal AI) and Naba (healthcare AI) demonstrate how domain-specific focus drives rapid adoption.

Usage-Based Pricing

AI economics and inference costs are driving a shift away from flat subscriptions toward outcome-based and usage-based pricing models.

Extraordinary Per-Employee Leverage

Today's breakout companies achieve efficiency ratios that would have seemed impossible a decade ago. Gloss Genius supports 90,000 salons with just 300 employees.

These patterns represent a fundamental rethinking of business building. The traditional playbook for scaling a technology company is being rapidly rewritten.

What This Means for Founders and Investors

For those building or investing in technology companies, several imperatives emerge:

  1. Think globally from day one — geographical constraints are increasingly artificial

  2. Embrace specificity — being the best solution for a narrow use case beats being adequate for many

  3. Build for agent commerce — consider how your product will interface with AI assistants, not just human users

  4. Integrate stablecoins early — reduce friction for global customers before competitors do

  5. Optimize for retention — in the AI economy, sticky products with strong retention metrics are winning

The most exciting aspect of all this is that we're still early. Both AI and stablecoins are just beginning to reshape commerce. The companies being built today with these technologies as foundational elements will likely define the next decade of the internet economy.

As Patrick Collison noted, periods of technological turbulence historically favor bold innovation. For founders willing to embrace these shifts, the opportunity has never been greater.

What are your thoughts on the future of commerce? Are you seeing these patterns in your industry? Let me know in the comments.

The Promise and Pain of AI Sales Development Representatives: A Field Report

· 5 min read

In the relentless chase to optimize sales pipelines, AI Sales Development Representatives (AI SDRs) have become one of the buzziest tools of 2025. They promise to automate prospecting, personalize outreach at scale, and drop qualified meetings onto your calendar—without the traditional headcount.

But are they actually delivering?

After talking to dozens of sales leaders and digging through hundreds of reviews across G2, Reddit, and Slack communities, I found a more complex story behind the hype.

AI Sales Development Representatives

The 11x Problem: High Expectations, Mixed Results

11x.ai has become the poster child of this category, claiming to make SDRs “11 times more productive.” It’s a bold promise—and one that sets the bar high.

“I expected the AI to research each prospect like a junior rep would,” one sales director told me. “But all I got were Mad Libs with company names filled in.”

This wasn’t an outlier. Across forums and customer chats, a common theme emerged: the emails feel automated, templated, and often too generic to land.

And when leads reply? The AI often stumbles. As one Reddit user put it:

“It can blast emails all day, but the moment someone says something unexpected, it short-circuits.”

This leaves a strange handoff experience—where prospects believe they’re chatting with a human, only to feel the switch when an actual rep steps in mid-convo.

What’s Actually Working

Despite the frustrations, there are places where AI SDRs shine:

  • Outreach volume: Teams consistently report a massive jump in top-of-funnel activity. One European team told me they now “run outreach 24/7” across time zones thanks to their AI reps.
  • Prospecting help: Tools like 11x.ai do a decent job sourcing leads. “The contact lists it finds are better than expected,” said one German user.
  • Personality insights: Humantic AI impressed several teams with surprisingly accurate personality profiles. “It’s like having a cheat code for the first call,” said a G2 reviewer.
  • Real-time coaching: Cresta takes a different approach—coaching human SDRs in real-time rather than replacing them. It’s especially useful for onboarding new reps or improving call quality without hiring a full-time trainer.

Beyond Performance: Hidden Pain Points

Go past the functionality issues, and deeper structural problems start to surface:

  • Locked-in contracts: Most platforms require $35,000–$60,000/year commitments with minimal ways to try before buying. “We’re stuck with a tool that doesn’t work for us,” said one buyer.
  • Technical hiccups: From bugs to laggy dashboards, users—especially in Europe—report reliability issues that break workflows.
  • Customization limits: If your audience is niche or your messaging complex, AI often struggles. “We tuned it for weeks,” said a B2B SaaS exec. “The emails still felt generic.”
  • Data security worries: With sensitive customer data flowing through these systems, several larger companies voiced concerns over how their information might be used—or reused.

The Strategic Dilemma: Build, Buy, or Augment?

Given the trade-offs, sales leaders are approaching AI SDRs in one of three ways:

  • The All-In Crowd: Typically fast-moving, high-volume orgs that prioritize scale. They’re willing to accept AI’s rough edges.
  • The Augmenters: Teams using AI to support (not replace) reps. They use tools like Regie.ai for drafting emails, Humantic for call prep, and keep humans in control of conversations.
  • The DIY Builders: Tech-savvy orgs building custom workflows on top of GPTs and internal data. It’s more work, but gives them control and avoids vendor lock-in.

What Needs to Improve

To move from “interesting” to indispensable, AI SDR vendors need to make real progress on a few fronts:

  1. Handle conversations, not just intros – The biggest gap is follow-through. If AI can’t respond naturally, the illusion breaks.
  2. Go beyond templates – True personalization should reference real business context, not just job titles and company names.
  3. Make pricing more flexible – Teams want to experiment before committing six figures.
  4. Fix the UX – Better onboarding, faster load times, and fewer bugs will go a long way.
  5. Allow deeper customization – Give companies tools to teach the AI their value props, messaging frameworks, and product nuance.

Where This Is Headed

The market seems to be splitting into two directions:

  • Vertical AI SDRs: Industry-specific tools trained on healthcare, finance, or manufacturing language, workflows, and regulations.
  • Lightweight assistants: More affordable tools that support reps with writing, prospecting, and call prep—without pretending to replace them.

The companies that lean into augmentation, not automation, may end up building more sustainable businesses.

The Bottom Line

AI SDRs are a classic example of the enterprise AI hype cycle. The pitch—an infinitely scalable digital sales team—is irresistible. But the reality is still catching up.

For most teams, the smart move today is targeted augmentation: Let AI do what it’s good at—prospecting, drafting, supporting—while keeping humans in the loop for objections, relationship-building, and closing.

Because in sales, as in life, the human touch still matters. Maybe now more than ever.

Have you used AI SDRs? What’s been your experience—worth the hype or too soon to tell?