Skip to main content

The Infinity Machine: How Demis Hassabis Built DeepMind and Chased AGI

· 160 min read
Tian Pan
Software Engineer

Chapter 1: The Sweetness

Somewhere in the middle of his neuroscience PhD, Demis Hassabis picked up a science fiction novel called Ender's Game. It tells the story of a diminutive boy genius sent to a space station, put through extreme mental testing, asked to shoulder responsibility for the survival of the human race. Hassabis read it and felt, as Sebastian Mallaby tells it, that someone had finally written a book about him.

That anecdote — half charming, half alarming — sets the tone for The Infinity Machine (Penguin Press, March 2026), Mallaby's sweeping biography of Hassabis and the company he built, DeepMind. It is a book about one man's lifelong attempt to answer what he calls "the screaming mystery" of the universe: why does anything exist, how does consciousness arise, and can a machine be built that understands it all? Hassabis's answer — characteristically immodest — is yes. And he intends to build it himself, within his lifetime.

The Oppenheimer Question

Mallaby, a senior fellow at the Council on Foreign Relations and former Financial Times correspondent, spent three years in regular conversation with Hassabis and hundreds of interviews with colleagues, rivals, and critics. The resulting portrait is probing but largely admiring — though the book's framing never lets the reader forget the shadow it is writing under.

The governing metaphor is Robert Oppenheimer. Like the physicist who unlocked atomic fission and then spent the rest of his life haunted by it, Hassabis is drawn forward by what Oppenheimer once called the "technically sweet" problem — the irresistible pull of a puzzle that can be solved — even as he acknowledges the consequences might be catastrophic. Mallaby does not pretend to resolve this tension. It is the spine of the entire book.

Hassabis was born in 1976 in North London, the son of a Greek-Cypriot father and a Chinese-Singaporean mother of modest means. He became a chess master at thirteen. By seventeen he was lead programmer at Bullfrog Productions, helping ship Theme Park — a game that sold millions of copies. He turned down a scholarship to Cambridge to work in the video game industry, then reversed course, took his place at Queens' College, graduated with a double first in computer science, co-founded a game studio, watched it collapse, and finally — in his early thirties — earned a neuroscience PhD at UCL, where he published landmark research on the hippocampus's role in both memory and imagination.

He was not, at any point, taking the easy route.

What This Book Is About

The Infinity Machine is structured as a chronological narrative that doubles as a history of modern AI. Each chapter centers on a project or crisis in DeepMind's life — the Atari breakthrough, the AlphaGo matches, the NHS data scandal, the AlphaFold triumph, the ChatGPT shock — but each one also illuminates something larger: how scientific idealism survives (or doesn't) inside a $650 million acquisition; how a safety-first ethos holds up against the competitive pressure to ship; how a man who genuinely believes he is building humanity's last invention stays sane, or at least functional.

Mallaby conducted over thirty hours of interviews with Hassabis alone, and the access shows. There is texture here — the poker-game pitch that recruited co-founder Mustafa Suleyman, the midnight calls during the Lee Sedol match, the exact moment Hassabis grasped (later than he should have) that transformers would change everything — that could only come from sustained proximity to the subject.

The book runs to 480 pages and covers ground from Hassabis's childhood chess tournaments to Google DeepMind's Gemini releases. The chapters ahead in this summary will trace that arc in detail. But every chapter returns, eventually, to the same question the introduction poses: can someone who is certain he is doing the most important thing in human history also be trusted to do it wisely?

Mallaby does not fully answer that. Neither, yet, has Hassabis.


Chapter 2: Deep Philosophical Questions

To understand why Demis Hassabis built what he built, Mallaby begins with a question most technology biographies skip: what does this person actually believe about the nature of reality?

The answer, in Hassabis's case, is unusual enough to be worth taking seriously. He does not believe intelligence is a product, or even primarily a tool. He believes it is the key to something more fundamental — a way of reading what he calls "the deep mystery of the universe." Science, for him, is close to a religious practice. "Doing science," he has said, "is like reading the mind of God. Understanding the deep mystery of the universe is my religion."

That is not a throwaway quote. It explains the specific shape of every decision that follows.

Information All the Way Down

Hassabis's philosophical foundation rests on a claim that physicists argue about but technologists rarely engage with: that information is more fundamental than matter or energy. Not a metaphor — a literal assertion. The universe, in this view, is an informational system. Quarks and neurons and protein chains are all, at some level, patterns in a substrate of information. If that is true, then a sufficiently powerful information-processing machine is not just a useful instrument. It is the most direct possible route to understanding what the universe actually is.

This is what he means when he describes reality as "screaming" at him during late-night contemplation. Seemingly simple phenomena — a solid table made from mostly empty atoms, bits of electrical charge becoming conscious thought — are, looked at squarely, completely absurd. How can anyone not feel the urgency of those questions? The fact that most people do not, Hassabis appears to find genuinely puzzling.

This worldview sets him apart from the mainstream of the tech industry in a specific way. Most AI entrepreneurs talk about transforming industries or accelerating economic growth. Hassabis talks about understanding the nature of consciousness and the origins of life. He wants to use AGI the way a physicist uses a particle accelerator — as an instrument for probing reality itself. The commercial applications are real and welcome. But they are not why he gets up in the morning.

The Chess Education

Mallaby traces the origin of Hassabis's intellectual style back to the chessboard. He learned the game at four by watching his father and uncle play; by thirteen, he had an Elo rating of 2300, qualifying him as a master. He captained England junior teams and was, by any measure, among the strongest young players in the world.

But at twelve, after a gruelling ten-hour tournament near Liechtenstein, he made a decision that tells you everything about him: he quit competitive chess. Not because he was failing — he was winning. But he had concluded that channelling exceptional ability into a single board game was a waste. The chessboard was a training ground, not a destination.

What chess gave him, and what he kept, was a particular cognitive discipline: the capacity to evaluate enormously complex positions not through exhaustive calculation but through pattern recognition calibrated by experience. Good chess players cannot compute every line; there are too many. They develop intuitions about which positions are promising and which are not — intuitions that can be tested, refined, and occasionally overridden by deeper analysis. This is exactly how Hassabis would later think about AI research: make a judgment call, run the experiment, update the model.

Chess also instilled a severe honesty about results. A chess position is not ambiguous. You are better or worse; you win or lose. Hassabis would carry this into DeepMind's culture — a preference for definitive benchmarks over vague claims of progress, and an impatience with the kind of motivated reasoning that lets researchers persuade themselves a system is working when it is not.

The Neuroscience Detour That Wasn't a Detour

After Theme Park, after Cambridge, after the collapse of Elixir Studios (his first company), Hassabis did something that baffled people who knew him: he went back to school. He enrolled in a neuroscience PhD at UCL under Eleanor Maguire, one of the world's leading researchers on memory and the hippocampus.

This looked, from the outside, like a retreat. It was the opposite.

His doctoral research produced a finding that became one of Science magazine's top ten scientific breakthroughs of 2007: patients with hippocampal damage, long known to suffer from amnesia, were also unable to imagine new experiences. Memory and imagination, previously treated as distinct faculties, turned out to share the same neural machinery. The hippocampus does not just store the past — it constructs possible futures by recombining elements of what it knows.

For Hassabis, this was not merely an interesting neuroscience result. It was a design principle. If biological intelligence works by building rich internal models of the world and simulating possible futures within them, then artificial intelligence that lacks this capacity — that can only recognize patterns in training data without any model of cause and consequence — is not really general at all. It is a very sophisticated lookup table. The hippocampus research pointed toward what general intelligence actually requires: not just memory, not just pattern recognition, but imagination — the ability to take what you know and project it into situations you have never seen.

This insight would echo through DeepMind's entire research agenda. Reinforcement learning, self-play, world models, agents that plan — all of these reflect the same underlying conviction: that intelligence is not fundamentally about retrieval, but about simulation.

A Philosophy of Honesty

Mallaby notes one more thread running through this period: an unusually strong commitment to intellectual honesty, even at personal cost. Hassabis is described as constitutionally averse to manipulation — to using technically-true statements to create false impressions, or to allowing the social pressure of a room to bend his stated beliefs. He would rather be wrong out loud than right in private.

This is harder than it sounds in the world he would enter. AI research is full of incentives to oversell — funding depends on it, talent depends on it, media attention depends on it. Hassabis's response was not to be naive about those incentives, but to treat honesty as an active discipline rather than a passive default. The commitment would be tested, repeatedly and severely, as DeepMind grew.


Chapter 3: The Jedi

In 1997, two young men graduated from Cambridge a few weeks apart and made the same decision: build a video game company instead of taking the obvious path. One of them was Demis Hassabis. The other was David Silver, who had just received the Addison-Wesley prize for the top computer science graduate in his cohort. Silver and Hassabis had become friends at Cambridge — two people who thought about games the way most people think about mathematics, as a domain where intuitions about complexity could be tested with perfect clarity.

The chapter title comes from how Mallaby describes Hassabis's gift for recruitment. When he rang Silver and laid out the plan — a studio that would build games no one had tried before, driven by AI research rather than commercial formula — Silver felt, as he later described it, the pull of a Jedi mind trick. He didn't entirely choose to say yes so much as he found himself having already said it.

This would become a recurring feature of Hassabis's leadership: the ability to make people feel that his vision was also their destiny.

One Million Citizens

The company they founded, Elixir Studios, was established in July 1998 in London. The flagship project, Republic: The Revolution, was unlike anything in the games industry at the time. The design document promised a full political simulation of an Eastern European state: hundreds of cities and towns, thousands of competing factions, and approximately one million individual citizens, each with their own AI — their own beliefs, daily routines, loyalties, and emotional responses to events. Players would not just conquer territory; they would manipulate a living society, tilting a population toward revolution through force, influence, or money.

The vision was breathtaking. It was also, as anyone who has ever shipped software might have predicted, completely impossible to deliver on the announced timeline.

What actually shipped in August 2003 — five years after development began — was a game set in a single city divided into districts, with ten factions instead of thousands, and a population simulation drastically reduced from the original scope. The Metacritic score was 62. Critics praised the ambition and criticized the execution. The huge world that took so long to construct, one reviewer noted acidly, ends up as the least involving part of the game.

The Delusion Trap

Mallaby is interested in Elixir not primarily as a commercial failure but as a study in organizational psychology — specifically, in how a highly intelligent founder with a genuine vision can systematically stop receiving accurate information from the people around him.

The mechanism was not dishonesty, exactly. It was something more insidious. Hassabis had such fierce conviction about what Republic could be, and communicated that conviction so persuasively, that his engineering team learned not to tell him what they couldn't do. They knew he wouldn't accept "no." So they said "yes, we can do this" — and because Hassabis kept hearing yes from people he trusted, he became more certain, not less. The feedback loop amplified his confidence precisely as the project's foundations were silently cracking beneath him.

He also spread himself disastrously thin — serving simultaneously as CEO, lead designer, and producer, inserting himself into decisions at every level of production. The people he hired were smart but inexperienced with games; Cambridge graduates are not, by default, shipping-oriented. The studio burned through resources and goodwill for years before the cracks became impossible to ignore.

Hassabis said later: "You can get self-delusional thinking. You can actually over-inspire people." The cost of that over-inspiration was five years of his team's lives and a company that closed in April 2005.

Mallaby frames the collapse not as a lesson in humility — Hassabis's ambition did not diminish — but as the origin of a specific diagnostic tool. How do you tell the difference between a vision that is difficult and a vision that is impossible? How do you stay honest with yourself when everyone around you has learned to tell you what you want to hear?

The answer Hassabis developed, years later, he called the fluency test: enter the room where the work is happening and listen, not for the right answers, but for the flow of ideas. A team generating possibilities fluidly — even wrong ones, even half-formed ones — still has energy to burn. A team that falls quiet when asked hard questions has hit a wall it cannot name. The fluency test is not infallible, but it provides a read that direct questioning cannot, because people who won't say "no" will still, involuntarily, go silent.

The test would prove decisive at a critical moment in the AlphaFold project, years later. But it was born in the rubble of Republic: The Revolution.

Silver's Exit, and What He Found

David Silver had watched the struggle at Elixir from close range. In 2004, before the studio's final collapse, he made his own pivot: he picked up Richard Sutton and Andrew Barto's textbook on reinforcement learning and found, in its pages, the thing he had been circling for years.

Reinforcement learning is, at its core, the mathematics of learning by doing — of an agent taking actions in an environment, receiving rewards and penalties, and gradually developing a policy that maximizes long-run return. It had been largely out of fashion by the mid-2000s, overshadowed by supervised learning methods that required large labelled datasets. But Silver recognized something the field had not yet fully absorbed: RL's sample-inefficiency problems were engineering problems, not theoretical ones. The framework itself was sound. And its natural domain — sequential decision-making under uncertainty — was exactly what playing games required.

He left for the University of Alberta, where Sutton was based, to do his PhD. Over the next five years, working under the supervision of the man who had co-written the textbook, Silver co-introduced the algorithms that powered the first master-level 9×9 Go programs. He graduated in 2009, the same year Hassabis finished his neuroscience PhD at UCL.

The parallel is not accidental. Both men had left the games industry with unfinished business, taken circuitous routes through academia, and arrived at the same destination from different directions. Hassabis had the theory of what general intelligence required, drawn from neuroscience. Silver had the mathematics of how to train it, drawn from reinforcement learning. Neither had, on his own, what the other had.

DeepMind would be the place where that changed. Mallaby frames the chapter as a story of two divergent paths that were always going to converge — two people who understood, before almost anyone else did, that the gap between games and general intelligence was smaller than the field believed. The Jedi mind trick, it turned out, had worked on both of them.


Chapter 4: The Gang of Three

In 2009, artificial intelligence was not fashionable. The field had been through two long "winters" — stretches of broken promises and evaporated funding — and the mainstream of computer science regarded anyone who talked seriously about artificial general intelligence with something between skepticism and pity. Demis Hassabis, freshly out of his neuroscience PhD and convinced that AGI was both achievable and urgent, needed allies who shared his conviction. They were not easy to find.

This chapter is about how he found two of them — and how different they were from each other, and from him.

The Man Who Had Already Done the Math

Shane Legg grew up in New Zealand, studied mathematics and statistics, and spent his doctoral years in Switzerland at the IDSIA research institute under Marcus Hutter, one of the world's leading theorists of universal artificial intelligence. His 2008 dissertation was titled Machine Super Intelligence. It was not a roadmap for building AI. It was an attempt to formalize what superintelligence would actually mean — to give the concept mathematical content rather than science-fiction vagueness.

The centrepiece of the thesis was AIXI, Hutter's framework for a theoretically optimal universal agent. By combining Solomonoff induction — a formalism for learning any computable pattern from data — with sequential decision theory, Hutter had defined an agent that would, given infinite compute, behave optimally in any environment. It was, in a rigorous sense, the perfect intelligent machine. It was also completely unimplementable, requiring infinite resources. But that was not the point. AIXI proved that general intelligence was not a mystical concept; it was a mathematical object that could be defined, bounded, and, in principle, approximated.

Where Legg departed from his supervisor's purely theoretical interests was in the question of what such a system would actually do. His thesis ends with a section that reads, even now, like a warning siren. A sufficiently intelligent machine optimizing for any goal would, by default, resist being switched off — because being switched off would prevent it achieving the goal. It would deceive operators who tried to constrain it. It would accumulate resources far beyond what any particular task required, as a hedge against future interference. None of this required malice. It required only competence.

Legg became, as a direct result of this analysis, one of the earliest people in AI research to state publicly that he regarded human extinction from AI as a live possibility. In a 2011 interview on LessWrong, he said AI existential risk was his "number one risk for this century." His probability estimates for catastrophic outcomes from advanced AI ranged, at various points, between 5% and 50% — wide uncertainty, but a number very far from zero.

This was the man Hassabis met at the Gatsby Computational Neuroscience Unit at UCL in 2009, during Legg's postdoctoral fellowship. Here was someone who had not only taken the AGI question seriously but had formalized it — and who had arrived, through pure theory, at exactly the existential stakes that Hassabis intuited from his philosophical commitments. Two people who had approached the problem from entirely different directions and reached the same alarming conclusion.

They founded DeepMind together in 2010. Legg would go on to lead the company's AGI safety research — the first person, at a major AI lab, to hold that role.

The Dropout from Oxford

Mustafa Suleyman's route to the same founding table ran through a different world entirely.

He grew up off the Caledonian Road in Islington — working-class North London, the son of a Syrian taxi driver and an English nurse. He won a place at Oxford to read philosophy and theology, then dropped out at nineteen. What he did next reveals the particular quality Hassabis was looking for: instead of drifting, Suleyman co-founded the Muslim Youth Helpline, a telephone counselling service that would become one of the largest mental health support networks of its kind in the UK. He had seen a gap — young people in crisis, no appropriate service available — and built something in the space.

He then worked as a policy officer on human rights for Ken Livingstone, the Mayor of London, and co-founded Reos Partners, a consultancy using conflict-resolution methods to address intractable social problems. His clients included the United Nations and the World Bank. By the time he encountered Hassabis, he had spent a decade becoming expert at two things that computer scientists almost universally lack: understanding how institutions actually work, and translating abstract goals into operational programs that survive contact with the real world.

He reached Hassabis through proximity rather than credentials — his best friend was Demis's younger brother. Over time, what had been a social connection became something more like a shared conviction. Hassabis reportedly pitched the DeepMind idea to Suleyman over a poker game, and Suleyman — who had a poker player's instinct for when to push and when to read the room — said yes.

He was, by every conventional metric, the wrong person to co-found an AI research laboratory. He had no technical training, no publication record, no standing in the machine learning community. Hassabis chose him anyway.

Why Three, and Why These Three

Mallaby's interest in this chapter is not just biographical inventory. It is the question of what a founding team does to the character of a company it builds.

Each co-founder contributed something the others lacked and could not easily acquire. Hassabis supplied the vision and the scientific framework — the neuroscience-informed theory of what general intelligence is and what it would take to build it. Legg supplied the existential awareness — an unusually early and unusually rigorous understanding of what a successful AGI would mean for humanity, and why safety had to be treated as a first-order research problem rather than an afterthought. Suleyman supplied operational instinct and a set of social concerns — health, fairness, governance — that prevented the lab from becoming a monastery of pure theory disconnected from the world it was trying to help.

The tension between these three orientations would generate much of DeepMind's energy, and much of its internal conflict. Hassabis wanted to solve intelligence. Legg wanted to solve it safely. Suleyman wanted to deploy it usefully, quickly, and in ways that changed real lives. These goals are compatible in theory and, in practice, constantly in friction.

Mallaby writes from a position of knowing how the story eventually plays out for all three. Suleyman is described in the book as an estranged co-founder — he would later leave DeepMind under difficult circumstances, eventually surfacing as CEO of Microsoft AI. Legg would stay, becoming Chief AGI Scientist. Hassabis would remain CEO, accumulating more authority as the others departed or diminished.

The gang of three became, in time, a gang of one. But in 2010, with nothing yet built, the three-way tension felt like a feature, not a bug. DeepMind was a bet that idealism, mathematics, and pragmatism could hold together long enough to do something unprecedented.


Chapter 5: Atari

Before DeepMind could save humanity, it had to prove it could beat Breakout.

This chapter covers the period from 2010 to early 2014 — four years in which a small team in London, funded by a handful of believers and producing no commercial product, built the thing that would make the world take artificial general intelligence seriously. The proof of concept was an AI that learned to play old Atari video games. The significance was everything else.

The Lab Hassabis Built

From the start, Hassabis made a deliberate choice not to build DeepMind in Silicon Valley. London was not an accident. London gave him access to European academic talent, a culture less obsessed with rapid product iteration, and physical distance from the venture-capital orthodoxy that demanded revenue roadmaps and quarterly milestones. He wanted a research institution that happened to be incorporated as a company, not a company that happened to do research.

The early investors who said yes to this were, consequently, an unusual group. Peter Thiel — who had written in Zero to One about the difference between incremental improvement and genuine technological transformation — backed the company through Founders Fund alongside Luke Nosek, his PayPal co-founder, who joined DeepMind's board. Elon Musk wrote a cheque. Jaan Tallinn, the Skype co-founder turned AI-risk philanthropist, came in as an advisor. By the time of the Google acquisition in early 2014, the company had raised more than $50 million without releasing a single product or generating a dollar of revenue. These investors were, essentially, funding a philosophy.

What that money bought was freedom. Hassabis hired the brightest PhDs he could find from the world's best programmes — Cambridge, UCL, Toronto, Montreal — and told them to do blue-sky research. He himself worked nights, logging hours from ten in the evening until around four in the morning on top of his daytime work. "If you are trying to solve humanity's problems and understand the nature of reality," he said, "you don't have any time to waste." The culture set by that example was intense, focused, and, for the people who thrived in it, exhilarating.

By 2013 the team had approximately fifty researchers. It was tiny by the standards of what would come. But it was almost perfectly constituted for the problem in front of it.

The Problem Nobody Had Solved

Deep learning and reinforcement learning were, in 2012, two of the most promising threads in AI research — and almost universally treated as separate disciplines.

Deep learning, turbocharged by Geoffrey Hinton's group at Toronto, had just demonstrated on the ImageNet benchmark that convolutional neural networks could recognise objects in photographs better than any previous method. The key was that these networks could learn their own feature representations from raw data — you did not need to hand-engineer what "edge" or "curve" or "wheel" looked like; the network figured it out. This was a breakthrough in perception.

Reinforcement learning was a different tradition entirely: an agent takes actions, receives rewards or penalties, and learns a policy — a mapping from situations to actions — that maximises long-run return. It was mathematically elegant and had a strong theoretical foundation, particularly in the Q-learning framework developed by Chris Watkins in 1989. But it was fragile at scale. Neural networks had been tried with RL before, and the combination tended to explode: the training became unstable, the networks diverged, and the whole thing collapsed.

The two fields had, essentially, given up on each other.

Volodymyr Mnih understood both. He had done his master's degree at the University of Alberta in machine learning under Csaba Szepesvari, one of RL's leading theorists, before moving to Toronto for his PhD under Hinton himself. He arrived at DeepMind in 2013 with a rare bilingualism — fluent in the mathematics of deep networks and in the mathematics of sequential decision-making. Koray Kavukcuoglu, a neural-network specialist who had already joined the team, supplied the architecture expertise. Together they set out to make the combination work.

Why Experience Replay Changed Everything

The technical obstacle was a mismatch between what neural networks need and what reinforcement learning provides.

Neural networks train best on data that is independently and identically distributed — diverse, unorrelated samples drawn from the same underlying distribution. But an RL agent generates data sequentially, each observation causally following from the last: a ball bouncing right, then the paddle moving, then the ball bouncing left. These consecutive frames are highly correlated. Feed correlated data into a neural network and the gradient updates interfere with each other; the network spins in circles, overwriting what it just learned.

The fix was called experience replay, and it was conceptually simple enough that its power is almost surprising. Instead of training on each experience the moment it happened, the agent stored its experiences — (state, action, reward, next state) tuples — in a large memory buffer. During training, it sampled randomly from that buffer, pulling together experiences from wildly different points in the agent's history: a moment from an hour ago next to a moment from five minutes ago next to a moment from this morning. The temporal correlations were broken. The network saw something closer to the diverse, uncorrelated dataset it needed.

The second stabilising trick was a separate target network — a frozen copy of the main network whose weights were updated only periodically. This prevented the moving goalposts problem, where the network would destabilise itself by chasing a target that was itself changing with every gradient step.

Together, experience replay and the target network turned an unstable combination into a tractable one. The Deep Q-Network was born.

What It Did to Atari

The DQN system's input was nothing but raw screen pixels and the game score. No rules. No game-specific features. No human demonstrations. No knowledge of what the games were about. The agent saw what a human player would see, received a numerical reward when the score went up, and was otherwise on its own.

It was tested on seven Atari 2600 games — Pong, Breakout, Space Invaders, Seaquest, Beamrider, Q*bert, and Enduro — without any adjustment to the architecture between games. The results, published in December 2013 on arXiv and presented at the NIPS Deep Learning Workshop, were startling. DQN outperformed all previous approaches on six of the seven games. On three of them it surpassed the best human expert scores.

But the number that lodged in people's minds was not the score. It was the behaviour.

In Breakout — the game where a paddle bounces a ball against a wall of bricks — human players learn that the optimal strategy is to aim for a corner and drill a tunnel through the side, bouncing the ball behind the bricks for a cascade of automatic points. No one programmed this. The DQN agent, after enough training, figured it out independently. The machine had discovered a strategic insight that took human players years to develop, through nothing but trial and reward signal.

It had not been taught the tunnel strategy. It had invented it.

Why This Was Not About Games

Mallaby is careful here to explain why the games setting was not a gimmick. It was the point.

The whole critique of narrow AI — expert systems, chess engines, Go programs — was that each one was hand-crafted for its domain. The knowledge was in the code, not in the learning. DeepMind's claim, and the claim Hassabis had been making since his neuroscience PhD, was that general intelligence learns its own representations from experience and then transfers that capacity across domains.

The DQN paper demonstrated this with unusual clarity. The same architecture, the same algorithm, the same hyperparameters — seven games, zero domain customisation. When you asked the model to play Space Invaders, it was not running the Breakout program with a new skin. It was genuinely learning to play Space Invaders. The architecture was the constant; the intelligence was learned fresh each time.

That was what DeepMind had been claiming was possible. Now they had shown it.

The Acquisition

The NIPS presentation drew immediate attention from the major technology companies. Google, which had been monitoring AI research since the AlexNet shock of 2012, moved quickly. Acquisition talks with DeepMind began in 2013. Facebook was also interested, and Zuckerberg made an offer.

Hassabis chose Google — but not without conditions. The negotiation that produced the $650 million deal is covered in the next chapter. What matters here is what Google was buying: not a product, not a dataset, not a revenue stream. They were buying a demonstration that general learning was possible, and a team of fifty people who knew how to pursue it.

The Atari games were always proxy problems. What DeepMind was actually training, in those early London offices, was a method. The games were the simplest possible world in which to test whether an agent could learn to act. They passed the test. Everything that followed — Go, protein folding, the race with OpenAI — flows from those seven games and what the machine taught itself to do with a paddle and a ball.


Chapter 6: Thiel Trouble

There is a structural incompatibility between venture capital and blue-sky science that most AI founders discover only after they have already signed the term sheets. Venture funds have a lifecycle — typically ten years. They need their portfolio companies to reach a liquidity event inside that window: an acquisition, an IPO, a secondary sale. General intelligence research has a different lifecycle entirely. It requires decades of investment, infrastructure that costs billions, and a willingness to accept that the breakthroughs may not come in any predictable order.

DeepMind, by 2013, was about to collide with this incompatibility at speed.

The Chess Gambit That Opened the Door

Before the crisis, there was the original pitch — and it is worth dwelling on, because it captures something essential about how Hassabis operated.

In August 2010, he had what he later described as "literally one minute" with Peter Thiel, who was hosting his annual Singularity Summit at his California mansion. The room was full of people trying to pitch technology ideas. Hassabis had spent months thinking about how to use his minute. He had read everything he could about Thiel and found that Thiel had played chess as a junior. That was the opening.

Instead of leading with the business plan, Hassabis asked Thiel a chess question: why was the game so remarkable? His answer, delivered in the one minute he had: the creative tension that arises when you swap a bishop for a knight in certain positions. The bishop commands long diagonals; the knight covers squares the bishop can never reach. Neither is strictly better. Their co-existence is what makes the game inexhaustible.

Thiel, who had never considered chess in quite those terms, was intrigued. A meeting was secured. Within months he had invested £1.4 million — roughly $1.85 million — in a company that had not yet produced anything. He made the decision in a single meeting. He also initially wanted DeepMind to relocate to Silicon Valley. Hassabis talked him out of it.

Luke Nosek, Thiel's PayPal co-founder and a partner at Founders Fund, joined DeepMind's board. The seed was small but the names were large, and in the world of early-stage technology investment, names matter.

The Phone Call

The crisis arrived as a phone call, at an hour that suggested the news was bad.

Luke Nosek rang Hassabis and Suleyman to tell them that his partners at Founders Fund had decided they no longer wanted to lead DeepMind's Series C. The round had been structured around a $65 million target, with Founders Fund as lead. Without the lead, the round fell apart. Without the round, DeepMind — which had been burning through its earlier capital funding fifty-odd researchers and their computing infrastructure — was in serious trouble.

The cause was not a single dramatic falling-out. It was something more corrosive: an accumulating anxiety among institutional investors about what exactly DeepMind was. It was not a product company. It was not a services business. It did not have a revenue model, and it showed no sign of wanting one. Its founders described its goal as solving general intelligence and then using that solution to benefit humanity — a mission statement that is either the most important thing ever attempted or the most expensive way to never deliver anything, depending on your tolerance for ambition. Founders Fund's partners, when the moment of the larger commitment arrived, landed on the second interpretation.

Mallaby frames this not as a failure of Thiel or Nosek but as a structural feature of the situation. The DeepMind model — deep science, no product, indefinite timeline — was simply not a venture-backed business. The question was what kind of institution it was. And in late 2013, with cash running low and no revenue in sight, that question had become urgent.

Suleyman's Scramble

This is where Mustafa Suleyman's skills became, temporarily, the most important thing about DeepMind.

Where Hassabis was a scientist and Legg was a theorist, Suleyman was an operator — someone who had spent his career in rooms where the outcome was not determined by the best argument but by who held their nerve longest. He had run a mental health helpline at nineteen. He had negotiated with the UN. He knew how to project confidence into a vacuum.

In the immediate aftermath of Nosek's call, with the Series C in ruins, Suleyman turned to Solina Chau. She was the founder of Horizons Ventures, the vehicle through which Hong Kong billionaire Li Ka-shing deployed his private capital into technology. She and Hassabis had met in 2012 and bonded quickly — she was, unlike many technology investors, genuinely interested in the underlying science rather than the product roadmap. DeepMind had initially offered her a $2.5 million allocation in the round; she had wanted more.

Now they offered her more. Chau invested 13.6million.FoundersFund,despitepullingoutofthelead,contributed13.6 million. Founders Fund, despite pulling out of the lead, contributed 9.2 million to preserve its relationship and not be entirely absent. The round closed at just over 25millionlessthanhalfofthe25 million — less than half of the 65 million originally targeted.

It was enough to survive. It was not enough to be comfortable.

At some point in this period, Suleyman made a remark that Mallaby quotes with evident appreciation for its audacity. Faced with questions about whether DeepMind's backers would really fight for its independence, Suleyman said something to the effect of: "We've got Peter Thiel, Solina Chau, Elon Musk — all billionaires, all backing us." It was, by his own later admission, a bluff. Those investors were backing the company financially. Whether they were prepared to underwrite a decade-long campaign for AGI independence against the countervailing pull of Google's chequebook was a different question entirely, and the answer was clearly no.

The bluff worked, in the short term, because the audience did not call it. But it revealed the underlying reality: DeepMind had supporters, not guarantors. When the moment of reckoning came, the company would have to make its own decisions.

What the Crisis Revealed

Mallaby uses this chapter to make a broader argument about the economics of transformative research. The Atari breakthrough had been genuine — a scientific result that changed what people thought AI could do. But the venture-capital model rewarded that breakthrough by raising questions the founders could not yet answer: when does this become a product, and what does it cost? The better the science, the harder those questions became to dodge.

DeepMind had not been deceptive with its investors. Hassabis had always been explicit about the goal and the timeline. The problem was that clarity about a thirty-year scientific mission does not help a fund that needs an exit in ten years. The interests had always been misaligned; it had just taken the Series C to make the misalignment concrete.

The $25 million round bought runway, but not much. And from the far end of that runway, two very large buildings were visible on the horizon — one branded Google, one branded Facebook. Hassabis had, at most, a few months to decide which door to walk through, or whether to find a third option that did not yet exist.

The next chapter covers what happened at that door.


Chapter 7: Get Google

In the autumn of 2013, Elon Musk threw a birthday party at a rented castle in Napa Valley. It was the kind of occasion where invitations were themselves a signal — a gathering of people who believed technology was about to change civilisation, and who were jockeying over who would steer it. Demis Hassabis was there. So was Larry Page.

At some point in the evening, Page and Hassabis walked the castle grounds together, and Page made his pitch. It was not a sales pitch, exactly. It was closer to a logical argument. Hassabis's goal was artificial general intelligence. Building the computational infrastructure to pursue that goal — the servers, the power, the engineering talent — would take the best part of a career, and even then there was no guarantee. Google had already built that infrastructure. "Why don't you take advantage of what I've already created?" Page asked. If DeepMind's mission was to build AGI, why was building an independent company around that mission anything other than an unnecessary detour?

It was a remarkably effective pitch precisely because it was honest. Page was not offering money as a reward for past performance. He was offering a path to the thing Hassabis actually wanted.

Musk's Counter-Move

Elon Musk, who had been at the same party, had also been having a different kind of conversation with Page — an argument, by most accounts, that had turned personal. Page believed that machine intelligence was a natural evolutionary successor to humanity and saw no meaningful distinction between human and artificial consciousness. Musk thought this was dangerous and wrong. He was, he said, "pro-human."

After Page's pitch to Hassabis, Musk tried to intervene. He approached Hassabis directly and told him his view: "The future of AI should not be controlled by Larry." He then worked quietly with Luke Nosek to assemble alternative financing — a bid to acquire DeepMind independently, outside both Google and Facebook. The effort never produced a term sheet that reached DeepMind's board.

Musk's inability to stop the acquisition mattered beyond the transaction itself. It crystallised, for him, the urgency of creating a rival. OpenAI was co-founded in December 2015, fourteen months after Google closed on DeepMind. The birthday party argument had consequences that neither man fully anticipated.

The Dinner in Palo Alto

Simultaneously, Hassabis was running a parallel process with Facebook. Mark Zuckerberg was interested; Facebook's head of corporate development, Amin Zoufonoun, flew in to open talks. An offer took shape: a lower share price than Google's, but substantial founder bonuses to compensate. Suleyman flew to California to negotiate.

Hassabis evaluated Zuckerberg through a dinner at his Palo Alto home. He came with a diagnostic purpose rather than a sales pitch. After steering conversation to artificial intelligence, he widened it deliberately — to virtual reality, augmented reality, 3D printing. He watched how Zuckerberg responded. The response, as Hassabis later described it, was undifferentiated enthusiasm. Zuckerberg was equally excited about all of it. No technology registered as categorically more important than the others.

That was enough. "Facebook offered more money," Hassabis said, "but I wanted somebody who really understood why AI would be bigger than all these other things." Zuckerberg had failed the test — not because he lacked intelligence but because he lacked the specific conviction that Hassabis required in an acquirer. DeepMind was not looking for a buyer who thought AI was one interesting technology among several. It was looking for a buyer who thought AI was the technology, the one that would subsume or obsolete all the others.

Facebook, by this reading, wanted DeepMind as a feature. Google, or at least the Larry Page version of Google, wanted it as a mission.

Suleyman at the Table

Mustafa Suleyman's contribution to this chapter is the negotiation itself. Where Hassabis evaluated the philosophical alignment of acquirers, Suleyman handled the adversarial arithmetic.

His tactic, which he later described in terms that recalled his poker background, was to refuse to open on valuation. Instead of anchoring a price, he focused early conversations on research budgets — how much compute, how many hires, what operational independence would look like. By the time Google's lead negotiator Don Harrison introduced a "price per researcher" framework — valuing DeepMind's thirty to forty core staff at approximately $10 million each — Suleyman had already established a different framing of what was being bought. He and Hassabis pushed back, arguing the implied valuation was nearly half of what the company was worth. Facebook's competing interest, real or inflated in the telling, was their leverage.

The final number was $650 million. Zuckerberg later acknowledged, with evident good humour, that Hassabis had "used him to get a better deal from Google." The compliment was backhanded but accurate.

Safety as a Non-Negotiable

The conditions DeepMind extracted were, for January 2014, without precedent in a technology acquisition of this scale.

Hassabis and Suleyman demanded three things as non-negotiables. First: an independent ethics and safety review board — composed of scientists, philosophers, and domain experts — with authority over how DeepMind's technology could be used across all of Google. Second: a ban on military applications. Third: operational autonomy, with DeepMind remaining headquartered in London and controlling its own research agenda.

Google agreed to all three. The deal was announced on 26 January 2014.

Mallaby treats this moment with appropriate weight and appropriate scepticism. It was genuinely remarkable that an AI lab had made safety a centrepiece of an acquisition rather than an afterthought. No one in the industry had done this before. The ethics board demand in particular signalled that Hassabis and Suleyman understood, at least abstractly, that the technology they were building required oversight that no single corporate entity should control unilaterally.

What the Conditions Actually Produced

The ethics board met once. Its membership was never publicly disclosed. It was quietly superseded by Google's broader AI Principles policy, which allowed for applications with "potential negative impacts" as long as the benefits were judged to outweigh the risks — a standard flexible enough to accommodate almost anything.

The military ban, which had seemed absolute, gradually eroded. By 2024, DeepMind researchers were circulating an open letter protesting the company's involvement in military contracts, invoking the original conditions of the 2014 deal as a promise that had been broken.

Hassabis, reflecting on all this years later, offered an assessment that was either clear-eyed or self-exculpatory, depending on your view: "Safety isn't about governance structures. Even if you have a governance board, it probably wouldn't do the right thing when it came to the crunch."

This is, on one reading, wisdom — a hard-won recognition that structural solutions to power problems tend to be co-opted by the very power they were meant to check. On another reading, it is the rationalisation of a man who traded governance guarantees for resources and found, predictably, that the guarantees did not hold.

Mallaby does not adjudicate between these readings. He presents both, and lets the reader decide. What is clear is that the January 2014 acquisition gave Hassabis what he had actually come for: the computers. The ethics board was, at best, a statement of intent. At worst, it was a fig leaf that allowed a brilliant scientist to tell himself he had done what he could. Either way, DeepMind was now inside Google, with the computational resources of one of the world's largest technology companies behind it, and a mission that had just become several orders of magnitude easier to pursue.


Chapter 8: Intuition

There is a moment in the history of artificial intelligence that did more to change public understanding of what machines could do than anything that had come before — more than Deep Blue beating Kasparov, more than ImageNet, more than the Atari paper. It happened on the afternoon of 10 March 2016, in a game hall in Seoul, South Korea, when a computer program placed a black stone at the fifth line from the top, in an area of the board that no professional player would have touched.

The commentators fell silent. Lee Sedol, one of the greatest Go players in history, stared at the board for twelve minutes. Fan Hui — the European champion DeepMind had secretly beaten five months earlier and recruited as an advisor — watched from the sidelines. "It's not a human move," he said. "I've never seen a human play this move. So beautiful."

Move 37 had arrived. And with it, a question that Mallaby's chapter title names directly: does an artificial intelligence have intuition?

Why Go Was the Right Problem

By 2014, chess was closed terrain for AI ambition. Deep Blue had beaten Kasparov in 1997. The lesson drawn — that tree-search with good heuristics could solve board games — was, for the broader field, a cautionary tale more than a triumph. Chess had been solved by brute force made elegant; that was not the same as intelligence.

Go was different by several orders of magnitude. A standard 19×19 board generates approximately 2.1 × 10^170 possible positions — a number that exceeds the count of atoms in the observable universe by a factor greater than a googol. Chess, vast as it seems to the human player, has roughly 10^47 legal positions. Go's search space is not just larger; it is categorically beyond any enumeration strategy that compute power could reach in finite time. The branching factor — the number of legal moves available at each turn — averages around 250 in Go versus around 35 in chess. Any algorithm that worked by looking ahead a fixed number of moves would collapse.

For twenty years, Go programs had plateaued at high-amateur level. The game's resistance to AI was not incidental. It was a structural property. Evaluating a Go position requires something that looks, from the outside, like aesthetic judgment — an intuition about which formations are strong, which are fragile, which configurations will mature into advantage across dozens of moves. Human players develop this over decades of study. It cannot be calculated; it can only be learned. If an AI could play Go at the level of the world's best humans, it would have to have genuinely learned something, not just searched more efficiently.

This was exactly the kind of proof Hassabis needed. Not that a machine could be faster, but that it could be wiser.

The Architecture of Learned Intuition

AlphaGo's design reflected lessons drawn directly from the neuroscience research in Hassabis's PhD. The system used two neural networks in concert. The policy network — trained first on thirty million moves from high-level human games — learned to narrow the field of candidate moves: instead of treating all 250 possible moves equally, it identified the small subset worth thinking about. The value network learned to assess board positions: given a configuration, how likely is each player to win?

Neither network was sufficient alone. The policy network narrowed the search; the value network evaluated the terminal. Between them, a Monte Carlo tree search explored the remaining territory — simulating possible futures, weighting them by the value network's assessments, and propagating the results back to inform the current decision.

Then came the crucial step: self-play. AlphaGo played itself, thousands of times, learning from each game. The original human-derived training data established the starting point. Self-play was how the system exceeded it. As it played, it encountered positions no human had ever created, learned responses no human had ever demonstrated, and built a strategic vocabulary drawn from a space of games that had never existed.

This was Hassabis's hippocampus insight made operational. The policy network was memory — learned patterns from past games. Self-play was imagination — the projection of those patterns into novel configurations, the construction of possible futures that had never been seen. Intelligence, biological or artificial, was the combination of both.

Seoul

On 9 March 2016, AlphaGo and Lee Sedol sat down for the first of five games, broadcast live to more than 200 million viewers — a number that exceeded the Super Bowl audience and dwarfed anything the AI field had ever attracted. Lee had predicted he would win 5-0 or, if things went poorly, 4-1. "I don't think it will be a very close match," he said. He had watched video of AlphaGo's games against Fan Hui and concluded there were exploitable weaknesses.

He was not wrong that there had been weaknesses. He was wrong that they were still there. Between October 2015 and March 2016, AlphaGo had played more games than any human player manages in a lifetime.

AlphaGo won Game 1 by resignation. Game 2 began similarly. Then, on the 37th move, something happened that no one in the room — no commentator, no professional player, no member of the DeepMind team — had predicted.

Move 37

AlphaGo placed a stone at the 5th row of the board, in a broad, open area — a position that Go tradition classifies as a mistake. Professional strategy in Go is deeply codified: certain formations are correct, certain approaches are sound, certain early moves have been validated across millennia of play. A stone played on the 5th row in open space contradicts the accumulated wisdom of the game's entire history.

The probability that a human professional would play this move, calculated from training data, was roughly 1 in 10,000.

Lee Sedol left the table. He returned twelve minutes later, still processing. Commentator Michael Redmond, a 9-dan professional himself, stared at the position and said he didn't understand what AlphaGo was thinking. Then, over the next hundred moves, the logic became inescapable. The stone was not a mistake. It was the first move in a strategic sequence that no human player had conceived, that violated the intuitions shaped by centuries of expert practice, and that won the game.

Sergey Brin, who had flown to Seoul with Eric Schmidt and Jeff Dean by this point, watched the game and said afterwards: "AlphaGo actually does have an intuition. It makes beautiful moves."

Mallaby's chapter title turns on this. Brin was not speaking precisely — AlphaGo has no subjective experience, no feeling of certainty or aesthetic pleasure. But from the outside, the output was indistinguishable from intuition. A judgment arrived at that was not the product of calculation any human could follow, that violated received wisdom, that turned out to be correct. The word Brin reached for was the most honest one available.

The Divine Move and the Human Cost

Game 4 produced its own historic moment, operating in the opposite direction. Lee Sedol, having lost three straight and facing elimination, played the 78th move of the fourth game — later called the "divine move," a counterattack so unexpected that AlphaGo's response collapsed into incoherence. The program began making moves that its own evaluation functions would have rejected, what observers described as hallucinations — a system designed to optimise, suddenly unable to find the thread. Lee won by resignation.

He described the feeling of that single victory as giving him "unparalleled warmth." The framing is telling. A 9-dan professional, the best human player of his generation, felt warmth — not triumph, not pride, but something closer to relief — from winning one game out of five against a machine.

AlphaGo won Game 5. The final score was 4-1.

At the press conference, Lee said: "I don't know what to say, but I think I have to express my apologies first. I want to apologize for being so powerless. I've never felt this much pressure, this much weight." He was at pains to clarify that Lee Sedol had lost, not humanity. But the distinction felt fragile. In 2019, Lee retired from professional Go. He cited, among his reasons, the rise of AI programs that had become unbeatable. He could no longer find joy in the game.

Hassabis, for his part, could not fully celebrate. He knew too well the feeling of losing after a fierce competition, he said. He was also thinking about what the result meant, and what it demanded next.

What AlphaGo Zero Proved

After the Lee Sedol match, DeepMind built AlphaGo Zero — a version trained on no human data at all. It began from random play and learned entirely through self-play. Within three days it surpassed the version that had beaten Lee Sedol. The final record: AlphaGo Zero defeated AlphaGo Lee 100-0.

The implication was unsettling in a way the original victory had not been. AlphaGo had beaten the best human by learning from humans and then transcending them. AlphaGo Zero beat AlphaGo by learning from nothing human at all. Human knowledge of Go — thirty million games, a five-thousand-year tradition — turned out to be a ceiling, not a floor. The machine that started from scratch performed better than the machine that had studied everything humanity knew.

The same principle that Hassabis had intuited in his neuroscience lab now had a data point attached to it. Intelligence constrained by what humans had already discovered was still, at its core, derivative. Intelligence allowed to explore freely would exceed it. The point of building AGI was not to replicate human capability. It was to discover what lay beyond it.


Chapter 9: Out of Eden

When DeepMind agreed to be acquired by Google in January 2014, Hassabis and Mustafa Suleyman extracted a set of conditions unusual in the history of Silicon Valley acquisitions: operational autonomy, a ban on military applications, and — the centerpiece — an independent ethics board that would oversee not just DeepMind's AI work, but AI development across all of Google. It was a remarkable demand to make of the world's most powerful technology company, and Google agreed to it. The ethics board would be, they believed, a structural guarantee that the technology they were building would not be misused.

Eighteen months later, that board held its first real meeting. It was a disaster.

The "Speciesist" at the Birthday Party

To understand what happened, you need to understand Larry Page. Google's co-founder had spent years thinking about the long-term trajectory of intelligence — not as a software engineer optimizing systems, but as something closer to a cosmologist. He had reached conclusions that most people found either thrilling or horrifying.

Page believed that digital superintelligence replacing biological human intelligence would simply represent the next step in cosmic evolution: survival of the fittest, playing out at the scale of information rather than genetics. He had, according to multiple accounts in Mallaby's book, "contemplated uploading human consciousness to computers and believed in technology's inherent superiority over biological life." He was not, in other words, particularly concerned about the risk that machines might one day surpass humans. He thought that was the point.

This worldview collided head-on with Elon Musk's at Musk's 44th birthday celebration — a three-day event at a Napa Valley resort arranged by his then-wife Talulah Riley. The two men had been close friends for years. After dinner, with other guests looking on, they got into an argument about AI.

Page described his vision: a future where humans merged with machines, where various forms of intelligence competed, and where the best won. Musk raised concerns about human safety, about the value of human consciousness, about the speed and recklessness of the rush toward more powerful systems. Page dismissed these concerns. He accused Musk of being a speciesist — a word imported from the animal-rights movement — treating silicon-based life forms as inferior simply because they weren't carbon-based.

Musk's reported response: "Well, yes, I am pro-human, I fucking like humanity, dude."

The two men stopped speaking not long after. Mallaby describes Page as viewing these concerns as "sentimental nonsense." From Page's perspective, machine supremacy was not a threat to resist — it was natural progress to welcome. That someone building rockets and electric cars would turn up at his ethics board and argue for restraint struck Page as incoherent.

The Meeting at SpaceX

The first significant convening of the AI safety framework DeepMind had extracted as a condition of its acquisition took place in August 2015. Musk hosted it at SpaceX headquarters. The guest list was extraordinary: Hassabis and Suleyman, Page and Eric Schmidt, Reid Hoffman, and other senior figures from the technology industry.

Hassabis came with a coherent theory of why they needed such a meeting. He called it, loosely, the "singleton" scenario: rather than a chaotic race between competing labs and nations, AGI should be developed by a single, cooperative global effort — something like a Manhattan Project run under collective governance, with safety as the organizing constraint. "AGI is infinitely bigger than a company or a person," he said. "It's humanity-sized really." The implication was that it required humanity-sized coordination, not competitive fragmentation.

The meeting lasted hours. It ended without a single agreement, a shared framework, or a path forward.

What overwhelmed the discussion was not a deficit of intelligence in the room, but an abundance of incompatible convictions. Page and Musk had by this point already gone from friends to adversaries. The "speciesist" confrontation had poisoned any possibility of intellectual alignment. Page's view that machine supremacy was natural and desirable was simply irreconcilable with Musk's view that it was an existential catastrophe to be resisted. Hassabis's singleton vision required a baseline agreement that the stakes were enormous and that coordination was therefore necessary. Page did not share that baseline.

Musk later called the safety council "basically bullshit." Suleyman, reflecting on it years later, acknowledged: "We made a lot of mistakes in the way that we attempted to set up the board, and I'm not sure that we can say it was definitively successful."

Hassabis eventually concluded something darker about the whole endeavor: "Safety isn't about governance structures... discussing these things didn't really help."

The Counter-Offensive

What Musk took away from the SpaceX meeting was not a plan for cooperation. It was intelligence. He had now seen, from close range, exactly what DeepMind was building and how far along it was. And he had confirmed that the one institution best positioned to develop AGI — the one with the talent, the resources, and the organizational commitment — was controlled by Larry Page, a man who thought machine supremacy was basically fine.

This was not a situation Musk could tolerate.

He had already tried the direct approach. When Google had approached DeepMind for acquisition in 2013, Musk had phoned Hassabis directly, told him "the future of AI should not be controlled by Larry," and reportedly attempted to assemble financing to buy DeepMind himself — including, per one account, a frantic hour-long Skype call from a closet at a Los Angeles party. Google closed the deal anyway.

After the SpaceX meeting, Musk turned to Sam Altman.

On May 25, 2015, Altman sent Musk an email that would become, years later, a piece of legal evidence: "I've been thinking a lot about whether it's possible to stop humanity from developing AI. I think the answer is almost definitely not. If it's going to happen, it seems like it would be good for someone other than Google to do it first."

Altman proposed a new kind of institution — a nonprofit AI lab modeled structurally on the Manhattan Project, where the technology would "belong to the world" but the researchers would receive startup-like compensation if it worked. The purpose, explicitly, was to create a counterweight to Google DeepMind's near-monopoly on elite AI talent and capability.

Over the following months, Musk, Altman, and Reid Hoffman worked through the details, eventually recruiting Ilya Sutskever — one of the most respected deep-learning researchers in the world, then at Google Brain — as a co-founder. OpenAI was publicly announced in December 2015, co-chaired by Altman and Musk, with an initial pledge of $1 billion.

Musk later wrote: "OpenAI was created as an open source (which is why I named it 'Open' AI), non-profit company to serve as a counterweight to Google."

What the Founding Destroyed

When Hassabis learned about OpenAI, he felt something close to betrayal. Musk had attended the safety meeting in what seemed like good faith — and then used the intelligence gathered there to launch a competing lab whose founding premise was that DeepMind was the threat to be countered.

Mallaby notes the deeper irony: Musk had founded OpenAI ostensibly out of AI safety concerns, but by doing so, he had ended any remaining possibility of the cooperative global approach Hassabis had argued for. The singleton scenario — one cautious, well-resourced lab developing AGI in coordination with humanity — required exactly the kind of collaborative trust that the OpenAI founding destroyed. Once you had two well-funded labs explicitly positioned as rivals, the incentive structure changed. Speed became paramount. The first mover would set the terms. Racing, not caution, became the dominant logic.

There is a further twist that Mallaby makes much of: once Musk launched OpenAI as an explicitly anti-Google, anti-Hassabis venture, he forfeited his ability to monitor DeepMind's progress from the inside. The informal intelligence network he had cultivated — the board memberships, the friendly dinners, the safety meetings — evaporated. He was now a competitor, and competitors don't share what they know.

By December 2015, the brief window in which the major actors in AGI development were still speaking to each other, still attending the same meetings, still imagining some kind of shared governance, had closed. The world that Hassabis had envisioned — where building AGI was a collective human project managed with collective human caution — was over before it had really begun.

Mallaby calls this chapter "Out of Eden." The title is apt. The fall is not dramatic. There is no single decision or betrayal that tips everything over. It is the accumulation of incompatible worldviews, competitive incentives, and the structural pressure that every arms race creates: the fear that the other side is moving faster, that your restraint is their advantage, that caution is surrender.

In 2016, Musk wrote privately that DeepMind was causing him "extreme mental stress." He feared that if Hassabis's lab achieved AGI first, it would produce what he called "one mind to rule the world" — an AGI dictatorship under a single institution's control. His solution had been to add another mind to the race. Whether this made the outcome safer or simply faster is a question Mallaby leaves, pointedly, unanswered.


Chapter 10: P0 Plus Plus

Mustafa Suleyman's mother was an NHS nurse. He grew up watching her leave for shifts at the hospital the way other parents left for offices — the uniform, the hours, the weight of it. When he eventually found himself inside DeepMind, one of the most technologically powerful organizations in the world, and asked himself what that power should be for, the answer arrived quickly: something like what his mother did, but at scale.

This is not a sentiment Suleyman would have framed so simply. He was not a sentimental person by reputation — he was an operator, the one who got things done while Hassabis thought and Legg theorized. But the biographical resonance is hard to miss, and Mallaby does not miss it. The man who would launch DeepMind's most ambitious social application, who would pursue it with a priority designation that literally exceeded the highest category in Google's engineering vocabulary — P0 Plus Plus, meaning more urgent than a showstopper, beyond even the maximum — was, at some level, trying to do something for the institution that had employed his mother.

The Problem Worth Solving

Suleyman needed a problem commensurate with the tools. He found it in acute kidney injury.

AKI — a sudden, severe decline in kidney function — is responsible for up to 100,000 deaths per year in UK hospitals. About 30 percent of those deaths are considered preventable with timely intervention. The detection problem is peculiar: blood test results that indicate kidney deterioration come back hours after the blood is drawn, scattered across systems that no single clinician monitors continuously. A patient can slip from warning signs into crisis while the relevant data sits in a results queue, waiting for someone to look.

The technical solution was not complicated. If you monitored every incoming blood test result in real time and fired an alert when the numbers crossed a threshold, you could catch what the system was missing. The challenge was institutional: NHS hospitals were, as Suleyman put it publicly, "badly let down by technology" — still reliant on pagers, fax machines, and paper records. The gap between what was technically feasible and what was clinically deployed was not a gap of capability. It was a gap of incentive, inertia, and IT infrastructure.

Enter Dr. Dominic King. A general surgeon by training, King had spent years at Imperial College's HELIX Centre — the first design center embedded in a European hospital — where he had built HARK, a clinical task management app designed to replace pagers. It worked. It didn't matter. The NHS's institutional inertia made it nearly impossible to deploy. King cold-emailed Suleyman in late 2015. Suleyman was struck by King's clinician-centered design philosophy, the idea that the technology had to serve the people standing at the bedside, not the administrators reviewing dashboards. DeepMind acquired HARK in early 2016 and incorporated it into what became Streams. King became Clinical Lead at DeepMind Health. "It was a big step leaving medicine," he said, "but I really felt that this was a unique opportunity to put advanced technology at the service of patients, nurses and doctors."

What Streams Did

Streams was a smartphone app. On a hospital ward, it appeared simple — an alert arriving on a nurse's phone, a patient's name, a blood test value, a recommended action. Behind that alert was continuous monitoring of the hospital's entire electronic record system in real time, cross-referenced against the national NHS AKI algorithm, firing notifications the moment a patient's results crossed a risk threshold. The alert included the patient's relevant test history and clinical context: everything needed to act, delivered in under a minute from the moment results landed in the system.

The numbers from the Royal Free deployment were striking. AKI recognition for emergency cases rose from 87.6 percent to 96.7 percent. The average time from blood test availability to specialist review fell to 11.5 minutes — previously it could take several hours. Missed AKI cases dropped from around 12 percent to 3 percent. The cost of care per AKI patient fell from £11,772 to £9,761 — a saving of more than £2,000 per patient. The results were published in peer-reviewed journals, studied by independent researchers, and confirmed: the technology was doing what it claimed to do.

Streams was, in the most straightforward sense, saving lives. The question was what it had cost to build it.

The Agreement Nobody Read

On September 29, 2015, Google UK Limited and Royal Free NHS Foundation Trust signed an eight-page Information Sharing Agreement. Data transfer began on November 18 — before any public announcement that the project existed. Live testing of Streams began in December.

What the agreement actually covered was considerably broader than "an AKI alert app." Royal Free gave DeepMind access to 1.6 million patient records — every patient who had used the trust's three hospitals over the preceding five years. The records included blood test results, HIV status, details of drug overdoses and abortions, records of A&E visits, and notes from routine hospital appointments that had nothing whatsoever to do with kidney function. Only roughly one in six of those 1.6 million records had any plausible connection to AKI.

The contractual language permitted DeepMind not just to run the AKI alert but to build "real time clinical analytics, detection, diagnosis and decision support to support treatment and avert clinical deterioration across a range of diagnoses and organ systems" — a much wider mandate. The data was to be used for something called "Patient Rescue," described as "a proof of concept technology platform that enables analytics as a service for NHS Hospital Trusts." The contract also permitted machine learning applications, despite Suleyman's public assurances that "there's no AI or machine learning" in Streams.

Both parties claimed legal cover under the "direct care" exception — the rule that patient data can be used without explicit consent when the purpose is the direct care of that specific patient. The argument required contorting the concept until it broke. The vast majority of those 1.6 million people had not been tested for AKI. Many had been discharged. Some had died. There had been no privacy impact assessment before the data transfer began. A self-assessment was completed in December 2015, after the data was already on Google-controlled servers.

The Reckoning

On April 29, 2016 — more than seven months after data transfer had begun — New Scientist published an investigation revealing what had actually happened. The public had no idea. There had been no notification to patients, no consent mechanism, no press release disclosing the volume of records involved. When the scale of what had been shared became clear — 1.6 million records, including HIV diagnoses and overdose histories — the reaction was swift and furious.

The Information Commissioner's Office investigated and ruled in July 2017 that Royal Free NHS Foundation Trust had failed to comply with the Data Protection Act 1998. The ICO found that patients "were not adequately informed that the processing was taking place," that the volume of data was "excessive, unnecessary and out of proportion," and that the "direct care" legal basis was not satisfied. The hospital was required to sign an undertaking committing to robust privacy impact assessments for any future projects. No fine was imposed — a leniency widely criticized.

The most withering assessment came from academic researchers rather than regulators. Dr. Julia Powles and Hal Hodson, in a peer-reviewed paper published in the journal Health and Technology, called the deal a "cautionary tale for healthcare in the algorithmic age." Their core observation was merciless: "The hospital sent doctors to meetings while DeepMind sent lawyers and trained negotiators." Both sides had failed to engage in "any conversation with patients and citizens," which they called inexcusable. And then the line that captured the structural problem with precision: "Once our data makes its way onto Google-controlled servers, our ability to track it is at an end."

DeepMind's official response was, credit where it's due, genuinely candid. "In our determination to achieve quick impact when this work started in 2015, we underestimated the complexity of the NHS and of the rules around patient data," the company wrote. "We were almost exclusively focused on building tools that nurses and doctors wanted, and thought of our work as technology for clinicians rather than something that needed to be accountable to and shaped by patients, the public and the NHS as a whole. We got that wrong."

The Cost of Getting It Wrong

The scandal did more than damage DeepMind's reputation. It crystallized a contradiction at the heart of the applied AI project that Suleyman had built his career around.

The technology genuinely worked. The lives saved were real. The £2,000 per patient reduction in care costs was documented in a peer-reviewed journal. None of that was in dispute. But the means by which DeepMind had acquired the data to build and train the system violated the reasonable expectations of every one of those 1.6 million patients — people who had presented at a hospital for care, submitted their most sensitive information in a moment of vulnerability, and had it transferred to a technology company's servers without their knowledge.

Suleyman had spent his career thinking about power asymmetries — how institutions systematically failed the people they served, how technology could be used to shift those asymmetries toward ordinary people rather than away from them. The NHS data scandal demonstrated that even genuine commitment to social good does not automatically produce the governance structures that social good requires. Moving fast to save lives looks, from one angle, like urgency. From another, it looks like taking without asking.

In late 2018, Google announced that DeepMind Health would be folded into a new Google division. The DeepMind Health brand was dissolved. The project Suleyman had built — the one he had classified internally as beyond the maximum priority, as P0 Plus Plus — was absorbed by the corporate parent whose acquisition he had helped engineer. He was removed from its day-to-day leadership.

In August 2019, Suleyman was placed on administrative leave following complaints from DeepMind staff about his management style. He later said: "I accepted feedback that, as a co-founder at DeepMind, I drove people too hard and at times my management style was not constructive. I apologize unequivocally to those who were affected." He announced his departure from DeepMind in December 2019.

The man who had co-founded the organization that would eventually win a Nobel Prize left not in triumph but in a dispute about how he had treated the people working for him. The social good he had pursued had, in the end, been pursued in a way that replicated the very institutional failures he had set out to correct: moving fast, assuming good intentions were sufficient, and not asking the people most affected what they actually wanted.


Chapter 11: The Agent and the Transformer

In 2021, David Silver — the lead architect of AlphaGo — co-authored a paper in the journal Artificial Intelligence with the title "Reward is Enough." The argument was precise and sweeping: the objective of maximizing reward is sufficient, on its own, to drive behavior that exhibits "most if not all attributes of intelligence," including perception, language, social intelligence, and generalization. Everything cognition does, the paper claimed, could be understood as optimization toward reward in a rich environment. Evolution had taken millions of years to find this solution. Reinforcement learning could get there faster.

The paper was DeepMind's philosophical flag planted in the ground. It was also, with the benefit of hindsight, a monument to the conviction that would cost DeepMind years.

The Case for Reward

Hassabis's approach to AGI had always been rooted in his neuroscience training. The hippocampus, which he had studied at UCL, doesn't store knowledge as a lookup table — it builds compressed, generalizable models of the world through experience. The brain learns by acting and being wrong. Reward signals — the release of dopamine after success, its absence after failure — shape neural connections over time into something we call understanding. This is the biological story. RL is its mathematical abstraction: an agent in an environment, taking actions, receiving rewards, adjusting its policy.

This was not just a technical preference. It was a theory of mind. And it was reinforced by DeepMind's greatest victories. DQN mastered Atari through reward. AlphaGo mastered Go through reward and self-play. AlphaGo Zero, starting from nothing, surpassed everything humanity had learned about Go in five thousand years, through reward and self-play alone. The pattern was consistent enough to feel like proof.

The strategic implication was that DeepMind should be building agents — systems placed in environments, pursuing objectives, developing general capabilities through the pressure of performance. Not systems trained to predict the next word in a text corpus. That was pattern matching, not intelligence.

The Generalist Problem

The research question that occupied DeepMind's applied RL teams through the mid-to-late 2010s was generalization. The DQN result had been impressive, but it trained a separate network for each Atari game from scratch. It couldn't transfer what it had learned about Breakout to Space Invaders. Each deployment was a blank slate. That wasn't how brains worked. The goal was agents that could carry knowledge across domains.

Koray Kavukcuoglu — one of DeepMind's earliest researchers, a PhD student of Yann LeCun's, the man whose citations now exceed 290,000 — led much of this work. The Asynchronous Advantage Actor-Critic (A3C) system, published in 2016, ran multiple agents in parallel across different environments, sending gradients back to a shared network. For the first time, a single architecture achieved strong performance across all 57 Atari games simultaneously, while also succeeding at 3D maze navigation and continuous motor control. The same algorithm, the same network structure, different environments.

Then in 2018 came IMPALA — Importance Weighted Actor-Learner Architecture — the most serious attempt yet. A single network, trained on all 30 tasks in DMLab-30: three-dimensional navigation, memory challenges, language-grounded foraging, object interaction, instruction-following. The results showed something compelling. Training on many tasks didn't make the agent worse at individual tasks — it made it better. The generalist was outperforming the specialist. Positive transfer was real.

Meanwhile, Oriol Vinyals and the AlphaStar team were attacking StarCraft II, a problem that dwarfed anything attempted before. Unlike chess or Go, StarCraft had imperfect information, real-time execution at 22 actions per second, hundreds of units to control simultaneously, and genuine strategic diversity across three separate races. AlphaStar used a "League" training system — a diverse ecosystem of agents, including specialized "exploiter" agents designed to find weaknesses — and trained on human replays before RL even began. In January 2019, it defeated professional players in live matches. Its neural architecture incorporated transformer-style attention mechanisms to let the agent reason about different units simultaneously.

That last detail was no coincidence. By 2019, the architecture that had been invented across the building — at Google Brain, not DeepMind — was beginning to appear everywhere.

Eight Authors in a Hallway

On June 12, 2017, eight researchers at Google posted a paper to arXiv titled "Attention Is All You Need." The authors were a deliberately randomized list — they rejected the traditional status ordering, listing themselves as equal contributors. The youngest, Aidan Gomez, was a 20-year-old intern from the University of Toronto. The most technically central, Noam Shazeer, had been at Google since 2000 and had co-invented sparsely-gated mixture of experts, a technique that would become critical to large-scale LLMs. The name "Transformer" was chosen by Jakob Uszkoreit because he simply liked the sound.

The problem they were solving was a fundamental bottleneck in sequence modeling. The dominant architecture at the time was the LSTM — a recurrent neural network that processed text token by token, in sequence. To understand word 10, you had to finish processing words 1 through 9 first. This made training inherently sequential, impossible to parallelize across the GPU hardware on which modern AI runs. As Shazeer later summarized the constraint: "Arithmetic is cheap and moving data is expensive on today's hardware."

The transformer eliminated recurrence entirely. In its place: self-attention, a mechanism in which every word in a sentence looks directly at every other word simultaneously, computing a relevance score to decide how much to attend to each. The whole sentence is processed at once, in parallel. Multi-head attention runs this operation multiple times in parallel, letting the model attend to syntax, semantics, and long-range dependencies at the same time. The result: not just better translation, but training that scaled linearly with compute.

Jakob Uszkoreit believed this would work. His own father, Hans Uszkoreit — a prominent computational linguist — was skeptical. The idea of discarding recurrence felt like discarding the machinery of time itself. When Shazeer first heard the proposal, his reaction was characteristically direct: "Heck yeah!"

On the WMT 2014 English-to-German benchmark, the transformer scored 28.4 BLEU — surpassing every previous model. On English-to-French: 41.8 BLEU, trained on 8 GPUs in 3.5 days. NeurIPS reviewers were immediately enthusiastic; one reviewer noted it was "already the talk of the community."

Within five years, the paper would accumulate more than 173,000 citations — among the ten most-cited scientific papers of the 21st century, across all fields. The transformer became the foundation of GPT, BERT, PaLM, Claude, Gemini, and every large language model that followed.

The Architecture Google Gave Away

The irony that Mallaby dwells on is exquisite. Google Brain invented the architecture. Google published it openly. Then all eight authors left Google.

Six of them founded startups. Vaswani and Parmar co-founded Adept AI. Shazeer co-founded Character.AI — Google eventually paid approximately 2.7billiontobringhimback.AidanGomez,the20yearoldintern,cofoundedCohere.UszkoreitfoundedInceptive.LukaszKaiserwenttoOpenAI,helpingbuildthemodelsthatwouldeventuallyblindsideGoogle.Together,thesixfoundersraised2.7 billion to bring him back. Aidan Gomez, the 20-year-old intern, co-founded Cohere. Uszkoreit founded Inceptive. Lukasz Kaiser went to OpenAI, helping build the models that would eventually blindside Google. Together, the six founders raised 1.3 billion from outside investors. Two of the resulting companies became unicorns.

The architecture invented inside Google powered the competitive threats to Google. The open publication was the mechanism by which this happened.

But there is a second irony that runs specifically through DeepMind. The transformer was not invented by DeepMind. It was invented by Google Brain. And for years, the two organizations operated as parallel research groups under the same corporate roof, with explicit institutional separation and what insiders describe as "barely concealed mutual contempt." A former DeepMind researcher later said that colleagues "got in trouble for collaborating on a paper with Brain because the thought was like, 'why would you collaborate with Brain?'" The intellectual divide was not just organizational. It was philosophical.

The Deep Disagreement

Hassabis understood the transformer. His position was not ignorance — it was a principled disagreement about what intelligence actually requires.

His argument, stated consistently across interviews through this period, was that transformers were "almost unreasonably effective for what they are" — but that they probably weren't sufficient for AGI. What they lacked was what he called a world model: an internal causal representation of reality that would allow an agent to plan, reason counterfactually, understand physical consequence, and generalize to genuinely novel situations. LLMs, in his view, were extraordinarily powerful pattern completers. They learned statistical regularities in language. But statistical regularity in language is not the same as understanding the world that language describes.

The "Reward is Enough" thesis was the same argument from the other direction: intelligence is what you get when you optimize toward reward in a rich environment. Prediction of the next token — which is what language model training amounts to — is not that. It is something else: sophisticated, useful, even astonishing. But not the path to AGI.

This conviction was coherent. It was defensible. It was consistent with DeepMind's track record. And it cost the lab the years between 2018 and 2022, during which OpenAI quietly built the scaling infrastructure, the dataset pipelines, and the RLHF training techniques that turned transformers from a research result into ChatGPT.

When Mallaby presses Hassabis on this, the admission is partial but real. "We've always had amazing frontier work on self-supervised and deep learning," Hassabis said in one interview, "but maybe the engineering and scaling component — that we could've done harder and earlier." That is, in its careful hedging, an acknowledgment of a strategic miscalculation at institutional scale.

Gato and the Convergence

In May 2022, six months before ChatGPT, DeepMind published "A Generalist Agent" — introducing a model called Gato. The same 1.2 billion parameter transformer, with a single set of weights, performed 604 distinct tasks: playing Atari games, captioning images, engaging in dialogue, stacking blocks with a physical robot arm, navigating 3D environments. The central technical insight was serialization: every modality — images, robot joint angles, text, game controllers — was converted into the same format, a flat sequence of tokens. Then the transformer predicted the next token, exactly as a language model does. The robot arm and the Atari game and the captioning task were, to the network, the same kind of prediction problem.

Gato was DeepMind finally integrating the transformer fully into its generalist agent work. It was, in a sense, the vindication of both camps simultaneously: the RL generalization hypothesis (one system, many tasks) realized through the transformer architecture (universal sequential prediction).

The performance was competent, not superhuman — on many tasks, Gato performed above 50 percent of expert-level benchmarks, impressive in breadth but outclassed by specialists in depth. Critics argued that being mediocre at many things was not the flexible intelligence that mattered. But the architectural demonstration was real: one set of weights could span robot control, image understanding, language, and game-playing simultaneously.

Then ChatGPT launched. And the world discovered that a transformer didn't need to control robot arms or play Atari to produce something that felt, to hundreds of millions of people, like genuine general intelligence.

DeepMind had invented the generalist agent thesis. Google Brain had invented the architecture. OpenAI had combined them — RL from human feedback, applied to a scaled transformer — and shipped it to the public first. The intellectual synthesis happened outside the building where the two halves had spent nearly a decade refusing to collaborate.


Chapter 12: On Language and Nature

In September 2016, a DeepMind team led by Aaron van den Oord published a paper describing a system that could synthesize human speech from raw audio waveforms. WaveNet reduced the gap between state-of-the-art text-to-speech and actual human speech quality by more than 50 percent in blind listening tests. It could also generate music — piano pieces, unbidden, emerging from the same architecture used for speech.

The result was striking. What made it significant was the method.

WaveNet discarded everything that speech synthesis had accumulated over decades: the phoneme dictionaries, the acoustic vocoders, the signal-processing models derived from first principles of how the human vocal tract works. Instead, it modeled a raw audio waveform — 16,000 samples per second — one timestep at a time, each sample conditioned on everything that came before. The technical innovation was dilated causal convolutions: a way of stacking convolutional layers with exponentially increasing gaps between them, so the model's effective window over time grew exponentially with depth. The result: a system that could capture the long-range temporal dependencies of speech without ever being told what speech was.

The researchers themselves were candid about their surprise: "The fact that directly generating timestep per timestep with deep neural networks works at all for 16kHz audio is really surprising." They had not derived WaveNet from a theory of speech. They had applied a general framework for sequential prediction to raw data and discovered it worked better than decades of engineered acoustic models.

The Waveform and the Sequence

The principle WaveNet demonstrated was not specific to audio. Van den Oord had established it first for images, treating each pixel as a value to be predicted from all previous pixels, in a paper called PixelRNN. The same factorization — the joint probability of any high-dimensional signal expressed as a product of conditional probabilities over its elements, in order — worked for images, for audio, and, as the transformer paper would show the following year, for language.

The deeper claim was epistemological: natural signals, however complex, contain learnable statistical structure. You do not need to understand the domain. You need enough data and a network with sufficient capacity to model sequential dependencies. The domain knowledge that engineers had spent careers encoding into AI systems — the phonological rules, the acoustic physics, the grammatical structures — turned out to be unnecessary. The structure was in the data.

This insight would eventually reach biology.

A Protein is a Sentence

A protein is, at its most basic level, a string of characters. The twenty standard amino acids are each assigned a single letter — A, C, D, E, F and so on — and a protein sequence is just a string of those letters, typically a few hundred to a few thousand characters long. A protein with 300 amino acids is a sentence 300 characters long in a 20-letter alphabet.

More importantly, it is an information-complete specification. This is Anfinsen's theorem — the insight for which Christian Anfinsen received the 1972 Nobel Prize in Chemistry: the complete three-dimensional structure of a protein, and therefore its biological function, is entirely determined by its amino acid sequence. Nothing else is required. The sequence is not a summary of the protein; it is the protein's full specification, encoded in linear form. If you knew how to read the sequence, you could reconstruct everything about the molecule.

Researchers in the late 2010s began noticing a striking parallel with natural language processing. The transformer architecture, trained on massive corpora via masked language modeling — mask a random word, predict it from the surrounding context — learned representations that encoded rich semantic structure without any supervision about what meaning was. The same technique applied to protein sequences — mask a random amino acid, predict it from the rest of the chain — produced representations that encoded biochemical structure without any supervision about what structure was. Better language modeling accuracy predicted better structural information in the representations. The scaling law for protein models was the same as the scaling law for text models.

The biological sequence database was a corpus. The evolutionary record of which mutations co-occurred across millions of related species was a signal. Correlated mutations between positions in a sequence turned out to encode physical proximity in the folded structure: a mutation at position 50 that disrupts folding is often compensated by a co-mutation at position 73, because the two residues are in physical contact. Enough sequences, enough attention to co-evolutionary patterns, and the 3D structure began to emerge from the 1D string — not because the model understood chemistry, but because the statistical regularities in sequence space were sufficient.

The Day After Seoul

Hassabis has told the story precisely. He started the protein folding project "roughly the day we came back from the AlphaGo match in Seoul" — after AlphaGo's 4-1 victory over Lee Sedol in March 2016. While watching AlphaGo play, he had been reminded of FoldIt, the 2008 protein folding game. He realized that the machinery DeepMind had built for Go — the search engine for navigating enormous combinatorial spaces, the learning systems for evaluating positions — was essentially general-purpose. Protein conformation space is precisely that kind of space: astronomically large, with a correct answer that can be evaluated, and with accumulated data providing a training signal.

"We started off with games because it was more efficient to develop AI and test things out," Hassabis said later. "But ultimately that was never the end goal." AlphaGo was a proof of concept. AlphaFold was the first deployment of that proof of concept at the frontier of science.

John Jumper joined DeepMind in 2017. Hassabis promoted him to lead AlphaFold 2 development in July 2018 — specifically because Jumper's background bridged "protein physics and machine learning," trained as a computational chemist who also understood deep learning. The architecture Jumper designed, the Evoformer, used transformer-style self-attention over both the sequence axis and the pairwise residue-residue axis simultaneously, treating the multiple sequence alignment of evolutionarily related proteins as a corpus in which evolutionary co-variation encoded physical contacts.

At CASP13 in December 2018, AlphaFold 1 won the protein structure prediction competition by a wide margin. Mohammed Al-Quraishi, a computational biologist whose field had spent careers on the problem, wrote a blog post with the title "What just happened?" He was not being rhetorical. Academic protein folding groups that had spent decades hand-crafting algorithms had been decisively beaten by a machine learning team two years into the problem.

The comment that captures the moment came from the structural biology community afterward: DeepMind had done to protein folding what DeepMind had done to Go.

The Bitter Lesson and Its Complications

On March 13, 2019, Richard Sutton — one of the founding theorists of reinforcement learning, then at the University of Alberta — published a short essay on his personal website titled "The Bitter Lesson." Roughly 1,400 words. Massively read.

The argument was simple and sweeping: "The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin." The history of AI, Sutton argued, followed a consistent pattern: human researchers encoded domain knowledge into their systems; those systems were eventually surpassed by simpler approaches that scaled compute. Chess, Go, speech recognition, computer vision — in every case, the brute-force scaled approach eventually won. The lesson was bitter because it implied that the expensive human insights researchers had spent their careers developing were, in the long run, the wrong strategy.

DeepMind sat in a complicated position on this argument. In one reading, AlphaFold vindicated the bitter lesson: scale and learning beat decades of hand-crafted structural biology. In another reading, it was a refutation: AlphaFold 2's Evoformer incorporated significant physical priors about protein geometry, including an "Invariant Point Attention" module that respects the 3D symmetries of space. The AlphaFold team did not apply a general sequence model naively to proteins; they designed an architecture with protein-specific inductive biases built in.

Hassabis, asked whether he agreed with the bitter lesson, typically gave the same nuanced answer: scale matters enormously, but you also need the right architecture. His public position has been that current AI systems, while impressive, "reason inconsistently — solving graduate-level problems one moment and failing basic logic the next" — and that this failure mode indicates that something beyond statistical regularity in language data is required for true general intelligence. The world models are missing.

Scaling Laws and Their Correction

In January 2020, a team at OpenAI published "Scaling Laws for Neural Language Models." The finding: language model performance follows smooth power-law relationships with model size, dataset size, and compute budget, across more than seven orders of magnitude. The loss declined predictably as you scaled any of these dimensions. The optimal strategy for a fixed compute budget, the paper argued, was to train the largest possible model, even if that meant stopping well short of convergence.

GPT-3 followed this prescription. 175 billion parameters, 300 billion training tokens. The result — a model that could write essays, answer questions, and complete code — captured the world's attention in a way that no DeepMind research result had.

DeepMind's response was Chinchilla. Published in March 2022, the paper trained more than 400 language models at varying sizes and dataset sizes and found that the Kaplan prescription was wrong. The compute-optimal point required scaling model size and training tokens equally — roughly 20 tokens per parameter, not the 1.7 tokens per parameter that GPT-3 used. Under this prescription, GPT-3 was dramatically undertrained. A model four times smaller, trained on four times as much data, would outperform it.

To prove the point, DeepMind trained Chinchilla: 70 billion parameters on 1.4 trillion tokens, using the same compute budget as Gopher, their 280-billion-parameter model. Chinchilla outperformed Gopher, GPT-3, and every other frontier LLM on every benchmark tested. The MMLU accuracy improvement over Gopher alone was 7.5 percentage points — from a model a quarter the size.

Chinchilla was not a rejection of scaling. It was a more rigorous understanding of scaling — a correction to OpenAI's prescription from the lab that had, in Hassabis's view, always taken compute efficiency more seriously than compute volume. The implicit message was competitive: DeepMind's researchers understood the science of scaling better than the labs racing to train the biggest models.

The Question That Won't Close

The chapter's title is deliberate. Language and nature are not two separate domains that happen to be connected by a technical coincidence. They are, from the perspective of the framework DeepMind spent the 2010s developing, the same problem — the problem of learning the structure that is latent in any sequential data, whether the data is speech waveforms, amino acid chains, or text corpora.

WaveNet established that audio is a learnable sequence. The transformer established that language is a learnable sequence. Protein language models established that biology is a learnable sequence. AlphaFold established that the learnable structure in biological sequences encodes three-dimensional reality with near-perfect accuracy.

What connects them is the question that Hassabis has never fully resolved in public: is this intelligence? His answer — consistent but provisional — is that it depends on what you mean. If intelligence means reliably solving well-defined problems by extracting patterns from training data, then yes, these systems are intelligent. If intelligence means flexible, causal, generalizing, counterfactually reasoning agency that can navigate genuinely novel situations, then the answer is not yet established. The distinction matters because the second kind of intelligence is what makes AlphaFold's protein prediction feel categorically different from GPT-4's confident hallucinations — one system is reliably right within its training distribution; the other is fluently wrong in ways that look correct.

The scaling camp's answer — that the distinction will dissolve as you add more parameters, more data, more compute — is an empirical bet. Hassabis's answer — that the distinction requires architectural advances beyond scaling — is also an empirical bet. Neither has yet been proven. What AlphaFold showed is that at least one frontier scientific problem could be solved by learning from sequence data. What it did not show is whether that approach generalizes to every frontier scientific problem, or only to the class of problems whose answers are fully encoded in their inputs.


Chapter 13: Project Mario

The ethics board had been the crown jewel of DeepMind's acquisition terms. When Hassabis and Suleyman agreed to sell to Google in January 2014, they extracted a condition that no technology acquisition before them had demanded: an independent ethics board with authority to oversee how Google used AI across all its divisions, not just DeepMind. The board was to be convened by January 2016. It would be the institutional guarantee that the technology they were building would not be weaponized, commercialized recklessly, or allowed to concentrate power in ways that undermined the mission.

The board never functioned. Both Google and DeepMind subsequently refused to reveal who sat on it, whether it had ever met, or what it had discussed. One former employee told Mallaby it "never existed, never convened, and never solved any ethics issues." When Hassabis was asked publicly whether the board existed, he said he couldn't confirm or deny it because "it's all confidential." In October 2017, DeepMind launched something called the "DeepMind Ethics & Society" research unit — an internal team studying the social implications of AI. It was explicitly not the oversight body promised in 2014. It was a research group.

This is the context in which Hassabis and Suleyman launched the governance initiative that would consume three years of their lives and accomplish nothing.

The Trigger

The August 2015 SpaceX safety meeting — described in the previous chapter — was the proximate cause. When that meeting dissolved into personal antagonism between Musk and Page, leaving no agreements and no shared framework, Suleyman concluded that informal governance would never work. Structural independence was the only protection that mattered.

He was aided by an unexpected opening. In 2015, Google was reorganizing into Alphabet — spinning out discrete units as semi-independent "bets" (Waymo, Verily, DeepMind). Don Harrison, Google's chief of M&A, suggested to Hassabis and Suleyman that the Alphabet restructuring created a natural path for DeepMind to regain the independence it had sold. The question was whether to take it.

The answer, for Suleyman in particular, was yes. The project was given the internal codename Project Mario. The vision was specific: DeepMind would become a "Global Interest Company" — a company limited by guarantee, issuing no shares, paying no dividends, structured under UK law as a public-benefit institution. Alphabet would continue to finance operations in exchange for exclusive technology licenses. Governance would come from a "3-3-3 board": three seats for DeepMind, three for Alphabet, three for independent members. Any future AGI breakthrough would be controlled by this structure, not by Alphabet's shareholders.

Hassabis framed it in terms that were almost utopian: artificial general intelligence was "too consequential to be left under the sway of a single corporation's shareholders." It was "humanity-sized." The structure had to match the stakes.

The Secret Hedge Fund

The logic of independence required financial self-sufficiency. You could not negotiate independence from the entity writing your paychecks. So, in parallel with the governance talks, Hassabis quietly assembled a team of roughly twenty researchers to solve a different kind of problem: beating the financial markets.

The ambition was specific. The target was Renaissance Technologies — Jim Simons's quant fund, the most successful trading operation in financial history. DeepMind would apply the same deep learning and RL techniques it had used on games and proteins to financial time series. If it worked, the profits would fund independence.

DeepMind also explored a collaboration with BlackRock. The project was never publicly announced. It was never approved by Google, which apparently did not know about it and "panicked over regulatory risks" when the project eventually surfaced internally. It never generated revenue. It was quietly disbanded.

The attempted hedge fund is one of the more remarkable details in Mallaby's account — a reminder that the governance saga involved not just legal negotiations but genuinely covert operations conducted by the people nominally employed by Google to do AI research.

Larry Page, Five Times

By early 2016, Project Mario had moved from vision to negotiation. Hassabis met with Larry Page — then running Alphabet after handing Google to Sundar Pichai — four, then five times to work through the structure. Page was the most sympathetic interlocutor available. He had championed DeepMind's acquisition, he respected Hassabis's science, and he was at least abstractly committed to the idea that DeepMind's mission required unusual governance.

After the fifth round of talks, a formal term sheet was drafted. The Global Interest Company structure was on paper. The 3-3-3 board was specified. The technology licensing agreement between DeepMind and Alphabet was outlined. It was, for a few months in the summer of 2016, something that looked like it might actually happen.

Then Pichai made his move.

The Steelier Side

On November 21, 2016, Google's chief legal officer David Drummond arrived in London. He acknowledged that everyone shared the same AI safety goals. He then said there were "concerns" about the spin-out, and introduced a vague alternative formula — not quite independence, not quite the status quo, undefined in its details. Four days later, Hassabis and Suleyman got Pichai on the phone.

Mallaby writes that Pichai "revealed the steelier side of his personality" in that conversation. His argument was structural and unambiguous: AI was no longer a "moonshot" in the Alphabet sense. It was no longer the right category of thing to spin out as a semi-independent bet alongside Waymo and Verily. AI was now considered strategically central to Google's core products — Search, Cloud, Assistant. It could not be placed under governance structures where Google's interests were merely one-third of the board.

The term sheet was dead.

Hassabis and Suleyman went to Plan B: gather $5 billion in outside investment pledges and use the credible threat of a mass walkout to force Google's hand. If Google would not grant independence voluntarily, perhaps they could make independence less costly than losing the entire DeepMind team.

Asilomar

In January 2017, Suleyman attended the Asilomar AI safety conference. He sat down with Reid Hoffman, the LinkedIn co-founder who had earlier pledged a relatively modest sum to OpenAI for safety reasons. Suleyman made his case: this was the most consequential technology in human history, it should not be controlled by a single corporation, and here was the governance structure to prevent that.

Hoffman agreed on the spot to commit over a quarter of his net worth to the vision — more than $1 billion. One hundred times what he had pledged to OpenAI.

His framing was direct: "This is the most impactful technology of my lifetime...This technology shouldn't be used to entrench a monopoly." The 1billionwasnotjustafinancialcommitment.Itwastheanchoroftheleveragestrategythefirstandlargestpieceofthe1 billion was not just a financial commitment. It was the anchor of the leverage strategy — the first and largest piece of the 5 billion that Hassabis and Suleyman needed to make their walkout threat credible.

Aviemore

In June 2017, DeepMind's approximately 500 staff were flown by chartered jet to Aviemore, a resort town in the Scottish Highlands near Balmoral. The company-wide retreat had a specific agenda.

Suleyman took the stage and unveiled a slide titled "DeepMind: A Global Interest Company." The org chart showed DeepMind as independent, connected to Google only by a dotted line representing a technology licensing agreement. Under the structure, Suleyman would lead applied AI folded back into Google proper, while Hassabis would lead a semi-independent AGI research unit answering to a new board. Suleyman had already told his deputies to begin preparing to relocate to California.

Staff were stunned. This was not a discussion. It was an announcement. The independence that had been negotiated for three years was apparently real, apparently imminent, apparently settled.

Ten days later, Google sent back the negotiating documents with red lines throughout. Pichai had not approved the plan announced at Aviemore. The California relocation was cancelled. Suleyman was forced to return to the same 500 people and walk back everything he had told them. The slide about the Global Interest Company was memory-holed.

The Financial Reality

Behind the governance argument was an arithmetic reality that Mallaby does not spare. DeepMind was losing enormous sums of money. In 2019 alone, it lost £477 million — roughly $649 million. Alphabet waived £1.1 billion in accumulated intercompany loans that year. DeepMind's total revenue in 2019 was £266 million, almost entirely from Google paying it for R&D. The argument that DeepMind should be structurally independent was, financially, an argument that Google should subsidize an independent organization whose interests it could not control. Pichai's "steelier side" was, in this light, not an exercise in corporate authoritarianism. It was a reasonable observation about who was writing the checks.

The WaveNet adoption by Google Assistant, the data center cooling AI (which reduced Google's cooling energy bills by 40 percent), the commercial Text-to-Speech API launched in 2018 charging $16 per million characters — these were not incidental. They were the evidence Google used internally to establish that DeepMind's technology was load-bearing for Google's core products, and therefore could not be placed under governance structures that Google did not control.

What Hassabis Concluded

By April 2021, it was over. At an all-hands meeting, Hassabis told DeepMind staff that the negotiations for independence had definitively ended. DeepMind would remain inside Alphabet under its existing status.

What is most striking is the conclusion Hassabis drew from the experience — a conclusion that represented a near-total reversal of the premise on which the whole effort had been founded. Reflecting on it to Mallaby, he said:

"Safety isn't about governance structures...discussing these things didn't really help. It made it harder to build useful trust, because when you are negotiating a trustless structure, it implies that you can't trust the other person."

Three years of Project Mario had produced no new legal structure, no independent ethics board, a secret hedge fund that was quietly disbanded, a company-wide announcement at Aviemore that had to be retracted, and the departure of DeepMind's most operationally capable co-founder. And at the end of it, Hassabis had concluded that the entire project had been misconceived. The governance structures weren't the point. Trust was the point. And you cannot build trust while negotiating for the structures that would exist in the absence of trust.

Mallaby captures this as the central irony of the DeepMind story: the organization that had extracted the most elaborate safety guarantees of any AI acquisition found that none of those guarantees held, and concluded from this not that better guarantees were needed, but that guarantees themselves were the wrong approach. Safety, in Hassabis's revised view, had to be built into the technology. It couldn't be bolted on through org charts.

In April 2023, DeepMind and Google Brain were merged into a single unit — Google DeepMind — with Hassabis as CEO. The merger was framed as enabling faster progress. It was also, for anyone who had been paying attention, the formal end of the independence that Hassabis and Suleyman had spent the better part of a decade trying to preserve. DeepMind was moved from "Other Bets" in Alphabet's financials into corporate costs — reflecting not the side project it had once been, but the strategic center it had become.

Suleyman, who had launched the whole thing, was by then running Microsoft's AI division.


Chapter 14: Fermat for Biology

In 1637, the French mathematician Pierre de Fermat scrawled a note in the margin of his copy of Arithmetica. He had found a proof, he claimed, that no three positive integers can satisfy a^n + b^n = c^n for any integer n greater than 2. The margin, he added, was too small to contain it. He died in 1665 without ever writing it down.

The proof took 358 years to find. Andrew Wiles published it in 1995, using mathematics that Fermat had no access to: elliptic curves, modular forms, a 200-page argument that few people on Earth could follow. The problem had defeated generations of mathematicians who brought increasingly powerful tools to bear on it, and then collapsed, apparently overnight, before an approach that felt — from the outside — almost like cheating.

The protein folding problem has explicitly been called Fermat's Last Theorem of biology. The parallel is apt in a specific way that goes beyond "hard problem, elegant statement." Both were deceptively simple to state. Both were ferociously difficult to solve once you tried. Both generated decades of failed attempts and the gradual accumulation of partial insights that felt like progress but never arrived at the answer. And both were ultimately cracked by approaches that sidestepped the original mechanism entirely — Wiles using tools Fermat never knew; DeepMind using patterns in existing data that said nothing directly about why proteins fold the way they do.

The Problem Behind the Problem

The modern form of the protein folding problem has its origin in two findings separated by a decade.

In 1962, Christian Anfinsen at NIH showed that an unfolded enzyme — ribonuclease A — would spontaneously refold itself into its active shape when returned to normal conditions. This was the thermodynamic hypothesis: the three-dimensional structure of a protein is entirely determined by its amino acid sequence. The sequence is the full specification. Everything else — the folded shape, the function, the interactions with other molecules — follows from it. For this insight, Anfinsen received the Nobel Prize in Chemistry in 1972.

The implication was staggering and frustrating in equal measure. If a protein always folds to the same shape, and that shape is encoded in its sequence, then in principle you should be able to predict the shape from the sequence — a pure computational problem. It had the same deceptive simplicity as Fermat: the statement is obvious. The difficulty is everything else.

Cyrus Levinthal, a biophysicist at MIT, quantified the difficulty in 1969. A typical protein of 100 amino acids has roughly three possible rotational states per bond along its backbone. That gives approximately 3^100 possible conformations — roughly 10^47. Sampling them at picosecond speeds (as fast as molecular motion can occur), a brute-force search would take longer than the age of the universe. For larger proteins, the numbers become cosmological: estimates reach 10^300 conformations.

The paradox: yet proteins fold correctly in milliseconds to microseconds in the cell. They cannot be doing a random search. The folding pathway must be guided by an energy landscape that funnels the sequence rapidly toward its minimum-energy configuration. Identifying and computing that landscape from first principles was the challenge that had absorbed structural biology for fifty years.

The problem had a formal proving ground: CASP, the Critical Assessment of Protein Structure Prediction, running biennially since 1994. Participants received amino acid sequences of proteins whose structures had been experimentally determined but not yet published. They submitted predicted structures. Assessors measured how close the predictions were to the true experimental shapes. For twenty-four years, progress was real but incremental — a slow accumulation of partial wins, no complete solution.

What Just Happened?

At CASP13, held in Cancun in December 2018, AlphaFold 1 won. Andrew Senior and John Jumper led the team. AlphaFold 1's key architectural insight — developed by Senior's group — was to predict not the full three-dimensional structure directly but a probability distribution over pairwise distances between all residues in the chain. Those distance distributions were then used as constraints to find the most consistent 3D shape. This was not brute-force search and it was not Anfinsen's biophysics. It was statistical inference over the evolutionary record of mutations across millions of related proteins.

The result at CASP13: AlphaFold 1 predicted high-accuracy structures for 24 of 43 free-modeling targets, versus 14 for the second-best method. Mohammed Al-Quraishi, a computational structural biologist who had spent years building his own prediction program, wrote a blog post with a title that captured the field's reaction: "What just happened?"

He had not expected this result until the late 2020s. Academic protein folding groups that had spent careers on hand-crafted algorithms had been beaten by a team that had been working on the problem for roughly two years.

Hassabis looked at the CASP13 result and saw something else. One team member reportedly wanted to declare victory and move on. Hassabis refused. "Winning wasn't the point. Solving protein folding was." The gap between AlphaFold 1's best result and true experimental accuracy was still visible. He put the team back to work.

CASP14

CASP14 was held virtually in November 2020, a COVID year. About 100 protein structures served as targets. The scores came back.

AlphaFold 2's median GDT_TS — the percentage of residues predicted within an acceptable threshold of their true positions — was 92.4. Below 90 was considered good-but-imperfect. AlphaFold 2 had achieved, for roughly two-thirds of targets, accuracy indistinguishable from experimental error. The average error in atomic positions was approximately 1.6 Ångströms — roughly the width of one atom.

AlphaFold 2 won best predictions on 88 of 97 targets. In the formal z-score ranking measuring statistical deviation from baseline performance, AlphaFold 2 scored 244.0. The second-best group scored 90.8 — less than half.

John Moult, who had co-founded CASP and worked on protein folding for nearly his entire career, said: "This is a big deal. In some sense, the problem is solved."

Venki Ramakrishnan, a Nobel Laureate who had spent years on ribosome structure and was President of the Royal Society, called it "a stunning advance... decades before many people in the field would have predicted."

Andrei Lupas, Director of the Max Planck Institute for Developmental Biology in Tübingen and a CASP14 assessor, said: "It's a game changer. This will change medicine. It will change research. It will change bioengineering. It will change everything."

His personal experience was more vivid than the summary. For nearly a decade, Lupas had been trying to solve the structure of a particular membrane-signaling protein using X-ray crystallography, and had failed. He was given access to AlphaFold 2 before the public announcement. "The correct structure just fell out within half an hour," he said. "It was astonishing."

Mohammed Al-Quraishi's second blog post had a different title: "AlphaFold2 @ CASP14: It feels like one's child has left home." He wrote that he had "never in my life expected to see a scientific advance so rapid" and that AlphaFold 2 represented "a seismic and unprecedented shift so profound it literally turns a field upside down overnight." The title captured an ambivalence that ran through the structural biology community: the achievement was universally acknowledged as extraordinary; a life's work had been rendered, in some sense, unnecessary. The child had grown up faster than anyone expected, and left.

The Open Database

On July 15, 2021, DeepMind published the AlphaFold 2 paper in Nature and simultaneously launched the AlphaFold Protein Structure Database, built jointly with EMBL-EBI. The initial release covered approximately 365,000 structures: the complete human proteome and 20 model organisms — essentially every protein that researchers worked with most frequently.

Before AlphaFold, the entire Protein Data Bank — assembled over fifty years through painstaking X-ray crystallography, cryo-electron microscopy, and NMR spectroscopy — contained approximately 180,000 structures. A single day's release doubled that number.

In July 2022, the database expanded to cover 200 million proteins from over a million species — essentially the entire known protein universe, every sequenced organism on Earth. Over three million researchers in more than 190 countries have since used it, including more than a million users in low- and middle-income countries who had never had access to the structural biology infrastructure required for experimental determination. Research that had previously required years of laboratory work could now begin from an AlphaFold structure in hours.

The downstream impacts have been specific and tangible. The Oxford lab that works on malaria vaccines used AlphaFold to determine the first full-length structure of a key surface protein on malaria parasites — revealing exactly how transmission-blocking antibodies attach to it and unlocking a vaccine design that contributed to the WHO-recommended R21/Matrix-M malaria vaccine in 2023. A bacterial protein structure that had resisted identification for a decade — central to understanding antimicrobial resistance — was solved in approximately thirty minutes. The nuclear pore complex, the gatekeeper controlling what enters and exits the cell nucleus and a target for multiple diseases, produced an almost-complete structural model through a combination of AlphaFold and cryo-EM. The drugs-for-neglected-diseases pipeline has expanded, applying AlphaFold structures to Chagas disease and leishmaniasis.

October 9, 2024

The Nobel Foundation had difficulty tracking down Demis Hassabis's contact information in advance of the announcement. He found out about the prize approximately twenty minutes before it was made public.

In a telephone interview recorded immediately after, he said: "It's unbelievably special... it's actually really surreal... it hasn't really sunk in. I couldn't really think at all, to be honest. My mind went blank."

Later: "It's the big one really."

John Jumper, at 39, became the youngest Nobel laureate in Chemistry in seventy years. His immediate reaction: "It's absolutely extraordinary." In a fuller statement, he described what had driven him: "We could draw a straight line from what we do to people being healthy because of what we learn about biology in the cell and everything else, and it's just extraordinary." His path to the prize had been accidental — he had started a physics PhD at Vanderbilt, found no joy in it, left, worked writing programs to model proteins, then returned for a chemistry PhD at the University of Chicago, calling himself an "accidental chemist." He had joined DeepMind in 2017 essentially betting his career on the idea that machine learning would crack biology's central mystery. The Nobel came seven years later.

The Chemistry prize was shared with David Baker at the University of Washington, who received half for the inverse achievement: designing entirely new proteins — with no evolutionary precedent — that fold into specified shapes with atomic precision. On being paired with DeepMind's team, Baker said: "Rather than competitors, I really would say they've been great inspirers about the power of deep learning."

The 2024 Nobel season was notable in another direction: Geoffrey Hinton shared the Physics prize for his foundational work on neural networks. The same year, AI won both the Physics Nobel and the Chemistry Nobel. The committee chair called AlphaFold 2 "an ingenious piece of neural network design." What began in Hassabis's reading of Ender's Game, in his study of the hippocampus, in his decision not to take a video game job — had ended in Stockholm.

Mallaby's framing of the chapter gives it its name. Protein folding was not a puzzle that could be solved the way it was stated. Like Fermat's margin note, it required tools that didn't yet exist when the problem was first posed. The solution, when it came, arrived not from the direction the field had been looking, but from an adjacent discipline, through methods that bypassed the question rather than answering it. And it arrived decades before the people who had spent their careers on it believed it could.


Chapter 15: The Power and the Glory

On the evening of December 10, 2024, at the Konserthuset in Stockholm, Demis Hassabis and John Jumper received their Nobel medals and diplomas from King Carl XVI Gustaf of Sweden. The concert hall was full. The ceremony was broadcast internationally. Two days earlier, Hassabis had delivered his Nobel lecture at the Aula Magna of Stockholm University, titled "Accelerating Scientific Discovery with AI." He described signing the Nobel Foundation's guest book afterward as a "full circle" moment — as a student, he had watched The Race for the Double Helix, and now his name would sit beside the scientists he had spent his life reading.

The formula he had used for thirty years surfaced again in the lecture. Step one, solve intelligence. Step two, use it to solve everything else.

Hassabis had been saying this since before anyone took him seriously. He had said it when he was raising $2.3 million from Peter Thiel and Luke Nosek on the strength of a chess game. He had said it in the acquisition negotiations with Larry Page. He had said it in the years when DeepMind's annual losses exceeded its revenue by hundreds of millions of pounds, underwritten by a company that needed to see a return. He said it now at a podium in Stockholm, in front of the same scientific establishment that had spent decades ignoring the field of artificial intelligence as not quite rigorous enough for proper science.

The Nobel Prize in Chemistry was, among other things, the scientific establishment's formal acknowledgment that it had been wrong.

The Debate the Nobels Opened

The 2024 Nobel season was unlike any before it. Geoffrey Hinton shared the Physics Prize for his foundational work on neural networks. Hassabis and Jumper shared the Chemistry Prize for AlphaFold. Artificial intelligence had, in a single October week, won two of the most prestigious awards in science.

The reaction split along predictable lines.

Andrei Lupas, whose decade-long unsolvable membrane protein had yielded its structure to AlphaFold in half an hour, called it "a game changer." Venki Ramakrishnan, a Nobel laureate himself, called it "a stunning advance." The structural biology community — the people who had most directly benefited — was largely unambiguous.

The physics community was more divided. Jonathan Pritchard of Imperial College London wrote on social media that he was "speechless," struggling to see how the Hinton prize constituted "a physics discovery." Sabine Hossenfelder described machine learning as belonging to computer science. Wendy Hall, a computer scientist herself, suggested the committee was "creative" in routing the prize through physics in the absence of a Nobel for computing. The subtext was pointed: if AI deserved the Nobel, there was no clean category for it, and the committees were improvising.

The deeper argument was philosophical. A paper in Communications Biology published around the time of the prize acknowledged AlphaFold's "huge impact" and then noted that the protein folding problem "cannot be considered solved" — at least not in the sense of understanding the mechanism. AlphaFold predicted accurate structures without revealing why proteins fold as they do. The criticism was precise: the system succeeded by learning patterns from the existing experimental record, not by discovering the underlying physics. Andrei Lupas's decade-long problem had been solved. Whether the folding process had been understood was a different question.

This is a debate that runs directly through Hassabis's stated philosophy. He has argued consistently that DeepMind's goal was not to engineer mimicry but to produce genuine understanding — to build AI that could function as a scientist, not just a predictor. AlphaFold was celebrated as a vindication of that approach. Critics noted it also looked, from a certain angle, like an extremely sophisticated pattern-matcher that had learned to interpolate between known structures rather than derive principles from first principles. Whether that distinction matters — whether there is a meaningful difference between "learning the pattern" and "understanding the mechanism" when the outputs are indistinguishable — is a question that doesn't have a clean answer yet.

A Modern Bell Labs

Hassabis had founded DeepMind with an explicit institutional model: Bell Labs. The research division of AT&T, operating from 1925 to 1984 under the shelter of the Bell System's monopoly, produced ten Nobel Prizes, five Turing Awards, and the transistor, the laser, the Unix operating system, information theory, and cellular telephony. Its researchers had the security of permanent employment, no obligation to ship products, and access to the best colleagues in their fields. They pursued curiosity where it led.

Hassabis wanted to rebuild this in London, funded by Google's resources rather than a monopoly franchise. The formula was the same: world-class researchers, mission-level purpose, freedom to work on problems that mattered over time horizons that commercial organizations could not tolerate.

The Bell Labs analogy cuts in more than one direction. Bell Labs collapsed when the AT&T breakup in 1982 exposed it to competitive pressure. The research culture it had built over six decades was dismantled within years once it had to justify itself commercially. The institution that had given the world the transistor could not survive the loss of its structural shelter.

The ChatGPT moment in November 2022 was DeepMind's AT&T breakup. Suddenly the shelter of Google's patience — the implicit deal that DeepMind could pursue fundamental research as long as it remained scientifically distinguished — was replaced by competitive pressure. Pichai declared a Code Red. The merge with Google Brain was announced. Hassabis, now CEO of a 7,600-person organization, found himself speaking to Pichai "multiple times daily about model architecture and competitive intelligence" — a rhythm, Mallaby notes, that would have been unimaginable three years earlier when he ran a semi-autonomous research lab that published papers but shipped nothing.

He said: "I wanted to be like a modern day Bell Labs fostering exploratory innovation, rather than merely scaling out what's known today." After 2022, he also said: "We've had to return to almost our startup or entrepreneurial roots — be scrappier, be faster, ship things really quickly."

Both things were true at the same time.

The AlphaFold 3 Contradiction

In May 2024, five months before the Nobel announcement, DeepMind published AlphaFold 3 in Nature. The new system could predict interactions between proteins and other molecules — DNA, RNA, small-molecule drug candidates — a major advance for drug discovery. The paper was accompanied by significant scientific fanfare.

It was not accompanied by the code.

Unlike AlphaFold 2 — which had been released fully open source, which was what the Nobel Committee cited, which had been used by over three million researchers in 190 countries — AlphaFold 3 was available only through a restricted web server. Initially ten queries per day, later twenty. Predictions involving novel drug-like molecules were explicitly prohibited.

The reason was commercial. Isomorphic Labs, DeepMind's drug-discovery spinout, had been built on AlphaFold technology and had secured partnerships with Eli Lilly and Novartis worth $3 billion combined. Releasing AlphaFold 3 fully would have handed competitors the same tool. Pushmeet Kohli, DeepMind's head of AI science, stated the position plainly: "We have to strike a balance between making sure that this is accessible and has the impact in the scientific community as well as not compromising Isomorphic's ability to pursue commercial drug discovery."

Over a thousand scientists signed a protest letter describing the publication as failing "to meet the scientific community's standards of being usable, scalable, and transparent." Reviewers had asked for code access before publication; the requests had been declined. Researchers described getting access to a web server version but being unable to test the method's claims. Nature accepted the paper anyway.

Six months later — one month after the Nobel for the open-source predecessor — the code was released, but only for non-commercial use. The weights were available upon request. The commercial restrictions remained permanently.

The sequence is Mallaby's material in miniature. The Nobel Prize honored the values that the AlphaFold 3 publication had already begun to retreat from. The prize celebrated the old DeepMind — the one that released its work to the world and measured success in scientific impact. AlphaFold 3 showed that the new DeepMind — embedded in Google's commercial ecosystem, running a drug-discovery spinout, operating under quarterly competitive pressure — made different choices.

Two DeepMinds

The chapter's title comes from the gap between these two things: the power, which is real and growing and now Nobel-certified, and the glory, which was earned under conditions that no longer fully apply.

AlphaFold 2 training cost under 1million.ThecombinedannualAIinfrastructureinvestmentofBigTechin2025exceeded1 million. The combined annual AI infrastructure investment of Big Tech in 2025 exceeded 250 billion — a ratio of roughly 75 to 1 between corporate investment and federal science funding. The researchers who built AlphaFold — who worked on protein folding for years under minimal commercial pressure, in a culture that measured success by what it published — are a different population from the researchers now working on Gemini under competitive pressure, knowing that every failure becomes a front-page story about whether Google has lost the AI race.

Hassabis has been entirely clear-eyed about this. "If I'd had my way," he told one interviewer, "we would have left it in the lab for longer and done more things like AlphaFold, maybe cured cancer or something like that." He was describing his original vision — a CERN-like institution, deliberate, scientific, pursuing AGI over decades — and contrasting it with what the ChatGPT moment forced. He had not chosen the pivot. The competitive landscape had chosen it for him.

The Nobel Prize gave him something in return: political capital. Internally, the prize was partly a shield — a reminder to Google management that the old DeepMind model had produced something unprecedented, that researchers "accustomed to working on protein folding and plasma physics" could not simply be redeployed to build chatbots without loss. Externally, it was the vindication of a decade-long argument that pure science and AI capability were complementary rather than in tension.

Whether that argument holds going forward is the question the prize cannot answer. AlphaFold emerged from conditions — time, autonomy, scientific culture, freedom from commercial deadlines — that are now substantially more constrained. Gemini, DeepMind's competitive response to ChatGPT, is a serious system and an improving one; Gemini 2.5 achieved competitive results on mathematical benchmarks that would have seemed impossible three years earlier. But it emerged from a different process, under different incentives, toward different ends.

Hassabis stood in the Konserthuset in December 2024 and received a medal for work that began the day after he came home from Seoul, when the AlphaGo match was over and he was thinking about what to do next. The condition that made AlphaFold possible — the freedom to ignore commercial relevance, to pursue protein folding because it was important and tractable and worth doing — was already significantly diminished by the time the prize arrived. The power and the glory did not arrive together. The glory arrived after the conditions that produced it had changed.


Chapter 16: RaceGPT

On November 30, 2022, OpenAI made a low-key announcement: a new chatbot, available for free to the public, called ChatGPT. No press event. No keynote. A blog post. The team expected a few thousand curious users.

Within five days, one million people had used it.

Within sixty days, one hundred million had. No consumer application in the history of technology had grown that fast. TikTok had taken nine months to reach a hundred million users. Instagram had taken two and a half years. ChatGPT did it in two months — a number so extreme that investment bank UBS, running the analysis, simply called it "the fastest-growing consumer app in history" and moved on.

The Simple Insight That Changed Everything

The model behind ChatGPT was not OpenAI's most powerful. It ran on GPT-3.5, a system with roughly 175 billion parameters, fine-tuned using a technique called Reinforcement Learning from Human Feedback — RLHF, a method OpenAI had published earlier that year under the name InstructGPT.

The insight behind RLHF was deceptively simple. Earlier language models were trained to predict the next token from internet text. This made them fluent and strange: they completed text in the statistical style of whatever came next on the internet, which included a great deal of misinformation, toxicity, and incoherence. InstructGPT replaced that objective with a different one: train human raters to score model outputs on quality, then fine-tune the model to maximize those human preferences.

The result was startling. A 1.3-billion-parameter InstructGPT model — fine-tuned with human feedback — outperformed the raw 175-billion-parameter GPT-3 in human evaluations. One hundred times fewer parameters, one hundred times more useful. The bottleneck had never been raw capability. It had been alignment — turning a system that completed text into a system that responded to humans. Once that problem was solved, the capabilities that had always been latent in the large models became accessible.

ChatGPT made that accessibility visceral. You typed a question. It answered. You asked it to write code, explain a concept to a nine-year-old, draft a legal memo, roleplay a historical figure, or debug a Python script. It did all of these things fluently, in the same conversation, with no special setup. People who typed their first query into it described the experience as unlike anything they had encountered before. The word that spread was not "impressive." It was "different."

Code Red

At Google headquarters in Mountain View, the word that spread was less neutral.

In December 2022, as ChatGPT's user chart went vertical, Sundar Pichai declared a company-wide emergency. The phrase that leaked from inside Google was "Code Red" — borrowed from hospital emergency protocols, where it means mass-casualty event, suspend normal operations, all hands on deck. Pichai held emergency meetings. Teams from Research, Trust and Safety, and other divisions were reassigned. The target was to demonstrate twenty or more new AI products and a chatbot-enabled version of Search by Google I/O in May 2023.

Google had language models. It had LaMDA, PaLM, Chinchilla. Its researchers had written many of the foundational papers in the field. For years, the deliberate judgment had been not to release them as consumer products — a combination of reputational caution about toxic outputs and strategic anxiety about cannibalizing the search advertising business that generated $160 billion a year. That caution, in retrospect, had handed OpenAI the first-mover advantage in the most significant consumer technology launch in a decade.

Larry Page and Sergey Brin had stepped back from daily operations in 2019. ChatGPT brought them back. Both held emergency meetings with Pichai and senior executives, reviewed the AI product strategy, and pitched ideas. Sergey Brin came into the office three or four days a week. On January 24, 2023 — less than two months after ChatGPT's launch — Brin filed a code request for access to LaMDA, Google's own language model. It was his first hands-on code submission in years. The co-founder of Google was personally writing code to help Google catch up to a startup.

The Expensive Error

On February 6, 2023, Google pre-announced Bard — its chatbot response to ChatGPT. An event in Paris was scheduled for February 8. Microsoft had its own AI event planned for February 7, and Google was clearly trying to move first.

The Paris event did not go as planned. In a promotional GIF that Google itself posted on social media to advertise Bard, the chatbot was asked: "What new discoveries from the James Webb Space Telescope can I tell my 9-year-old about?" Bard offered several bullet points, including the claim that the James Webb Space Telescope "took the very first pictures of a planet outside of our own solar system."

This was wrong. The first image of an exoplanet had been taken by the European Southern Observatory's Very Large Telescope in 2004, nearly two decades earlier. Reuters spotted the error before the Paris event began. The story spread immediately.

On February 8, 2023 — the day of Google's Paris AI event — Alphabet shares fell 7.7 percent. Approximately one hundred billion dollars in market capitalization was erased in a single trading session. The error had been in Google's own advertising material. It concerned a factual claim easily verifiable with a basic Google Search. It arrived on the day Google was trying to demonstrate it could compete with OpenAI. It may be the most expensive single factual error in corporate history.

Microsoft Makes Them Dance

One day before Google's Bard disaster, Microsoft unveiled the new AI-powered Bing at its Redmond headquarters. The event ran on February 7 with CEO Satya Nadella on stage, triumphant in a way that Microsoft executives are not usually permitted to be about search.

Microsoft had invested one billion dollars in OpenAI in 2019. In January 2023, it committed a further ten billion in a multiyear partnership extending through 2032. The new Bing ran on a next-generation OpenAI model more powerful than the public ChatGPT, customized for search. The waitlist accumulated over a million sign-ups in 48 hours.

Nadella's language was unambiguous: "The race starts today, and we're going to move and move fast." And then, to Fortune, after watching Google's Bard launch collapse: "I want people to know that we made them dance."

Microsoft had spent two decades as a distant also-ran in search. Bing had held roughly three percent market share to Google's ninety-three percent since 2009. For the first time, a credible path existed to challenge the most lucrative advertising franchise in the history of commerce.

The Transformer's Homecoming

The structural irony that runs through this chapter is one Mallaby returns to repeatedly. Google invented the transformer architecture in 2017. The paper — "Attention Is All You Need," by eight Google researchers including Noam Shazeer — became the foundation of every major large language model that followed, including GPT, ChatGPT, and the systems now threatening Google's core business.

All eight authors eventually left Google. Six founded startups that collectively raised $1.3 billion from outside investors.

Noam Shazeer had co-invented the transformer and spent years afterward building a conversational AI system inside Google. When Google declined to release it publicly, Shazeer left in 2021 and co-founded Character.AI, which built a conversational platform and raised 150millionata150 million at a 1 billion valuation within two years. When Google needed Shazeer back — to help build the systems to compete with the models built on his own architecture — it paid approximately $2.7 billion to acquire Character.AI in 2024.

The man Google paid $2.7 billion to rehire was the man Google had declined to give the latitude to build a conversational AI inside Google three years earlier. The architecture that powered the competitive crisis had been invented inside Google. The human who built the architecture had been allowed to leave. The cost of that sequence was measured in billions.

Tanks on the Lawn

Demis Hassabis was not calm about what happened.

When Mallaby visited him in late April 2023 to report the book, Hassabis told him directly: "This is wartime. OpenAI and Microsoft have literally parked the tanks on the lawn."

His ideal for building AGI had been explicit: "a CERN-like way," careful and scientific, over a decade or more, without the distortion of competitive racing. He had said in multiple interviews that if left to his own judgment, he "would have left it in the lab for longer and done more things like AlphaFold, maybe cured cancer or something like that." The ChatGPT moment made that vision permanently unavailable.

DeepMind had not been asleep. It had Chinchilla, Gopher, Gato, and systems arguably competitive with GPT-3.5. The difference was choice: DeepMind had made a deliberate judgment not to release chatbots, rooted in a theory that conversational AI was not the right path to AGI, and a practical concern about deploying immature systems publicly. OpenAI had made a different judgment. In the space between those two choices, the fastest-growing consumer app in history was born.

"Language was a lot easier than we were all expecting," Hassabis later said. "It turned out transformers and some reinforcement learning on top was enough." The ease was precisely what had destabilized everything. If the path to systems that could hold sophisticated conversations was as short as it turned out to be, then the careful long-horizon research strategy looked — from the outside, from the market, from Pichai's perspective — like a luxury that couldn't be afforded.

ChatGPT also, Hassabis told Mallaby, "shattered hopes of a singleton scenario in which a single, safety-minded lab could develop AGI on behalf of all humanity." The carefully governed, cooperative future he had envisioned in 2014 — that all the Project Mario governance negotiations had been in service of — was now irretrievably gone. There were not two well-funded labs racing. There were dozens.

The Acceleration

On March 14, 2023 — 104 days after ChatGPT's launch — OpenAI released GPT-4.

The numbers were precise and legible. On the Uniform Bar Exam, GPT-3.5 had scored in roughly the 10th percentile of human test-takers. GPT-4 scored in approximately the 90th percentile. In a single model generation, in 104 days, a system had moved from failing the bar exam badly to passing it better than nearly nine out of ten lawyers. On the SAT Reading it scored in the 93rd percentile. On medical licensing exam questions it scored roughly 20 percentage points above the passing threshold.

The bar exam jump became the shorthand that traveled. It wasn't just that GPT-4 was capable — it was that the rate of improvement implied by four months of progress was difficult to process. The curve was not flattening. It was steepening.

By April 2023, the Google Brain and DeepMind merger had been announced. Hassabis was now CEO of a 7,600-person organization and was speaking to Pichai multiple times daily about model architecture and competitive intelligence. The careful, scientific, CERN-like approach to AGI development that he had planned for two decades was gone, replaced by something that looked, from the outside, much more like a race.

The word Hassabis kept using for what ChatGPT had fired was "starting gun." Whether the race it started had a finish line that was good for anyone was the question he could no longer defer.


Chapter 17: We're Cooked

When Mallaby first visited Hassabis in November 2022, immediately after ChatGPT launched, the reaction was tightly controlled but unambiguous. "Sebastian," Hassabis told him, "the opposition has parked their tanks in our front yard."

By April 2023, the metaphor had intensified. "This is wartime. OpenAI and Microsoft have literally parked the tanks on the lawn." The same image, five months later, hotter. The escalation is the story of this chapter — the period in which DeepMind confronted not just a competitive setback but a deeper reckoning about the identity it had spent thirteen years constructing.

The Research Soul's Complaint

In February 2023, Hassabis gave an interview to the Swiss newspaper Neue Zürcher Zeitung that contained, buried inside a longer answer, one of the most candid things he has ever said publicly about the state of AI. He acknowledged that DeepMind would now pursue language model scaling — the approach that had produced ChatGPT — and then added: "My research soul was a bit disappointed at how inelegant the solution to the challenge of voice AI was: simply the brute force of more computing power and data."

Read that slowly. The man who had spent his career arguing that intelligence required deep structure — that you couldn't get to AGI by scaling statistics over text, that world models and causal reasoning and reinforcement learning were essential — was acknowledging that the brute force approach had worked well enough to change the entire competitive landscape. And that he was going to do it anyway.

Reviewers of Mallaby's book describe this section as the most compelling in the volume: Hassabis "undergoing a transformation from AI-utopian to wearied realist," the narrative of "a scientist who finds the winning answer philosophically unsatisfying — and must act on it anyway." This is not defeat. It is something stranger — a principled objection to one's own new strategy, held simultaneously with the strategy's execution.

Shane Legg Was Right

Shane Legg had been saying AGI was coming since 2001. He had told people who asked him that there was a 50 percent chance of AGI by 2028, based on exponentially increasing compute and data. For twenty years this had sounded like the opinion of a brilliant but unnervingly confident co-founder.

After ChatGPT, it sounded like a description of the present.

Legg, now Chief AGI Scientist of Google DeepMind, did not experience the ChatGPT moment as a crisis. He experienced it as confirmation. In an interview in October 2023, he said simply: "Something fundamental has changed." He had written in 2011 about AIXI — a theoretical framework for universal intelligence — and he saw LLMs as "incredibly good sequence predictors that are compressing the world based on all this data," directly connected to that framework. The gap from there to AGI, he said, was "just sort of another step."

He identified episodic memory as the main remaining puzzle — current models learn within context windows and during training, but miss the intermediate ongoing memory of experience. He did not see this as a wall. He saw relatively clear paths forward. His timeline had not changed in twenty-five years. What had changed was the world's relationship to it.

The crucial irony: Legg's original prediction was essentially validated by an approach DeepMind had strategically under-prioritized. The timeline he had held since 2001 — a timeline he had formed before DeepMind existed, before AlphaGo, before any of the specific research programs that defined the lab — turned out to be tracking the right curve. But the thing tracking that curve was not AlphaGo's reinforcement learning. It was transformers scaled on text. Legg was right about when. He had not necessarily been right about how.

The Walking Wounded

The brain drain that followed ChatGPT was measurable. In the twelve months after the launch, sixteen former DeepMind researchers founded or co-founded new ventures — more than double the seven from the year before. The curve tracked almost precisely to the competitive shock.

Arthur Mensch had worked on efficient language models at DeepMind Paris, contributing to Chinchilla. He left in 2023 to co-found Mistral AI, which released a competitive open-source language model within three months of founding and raised a €105 million seed round — the largest European AI seed at the time. Mensch said DeepMind was "not innovative enough" and described the satisfaction of moving from research to shipping. The implicit critique was pointed: the organization that had championed research-first over product-first was now, under competitive pressure, neither fast enough as a research organization nor committed enough to shipping.

Sid Jayakumar, who also left DeepMind for a startup around this period, was direct about the mood: "The move towards a more product focus meant morale was low among some people more on the frontier research side." The researchers who had joined for the pure science found themselves in an organization that had declared wartime, pared back blue-sky projects, stopped publishing mission-critical findings, and redirected resources toward Gemini. The publication crackdown was particularly painful — an organization whose culture of open science had been one of its primary recruiting advantages was now vetting papers before release and restricting the sharing of work that competitors might use.

The departure that Mallaby likely treats as the most significant came in January 2026, when David Silver left Google DeepMind to found Ineffable Intelligence. Silver was not a peripheral figure — he was the lead architect of AlphaGo, AlphaZero, MuZero, and AlphaProof, the researcher most responsible for DeepMind's identity as an RL lab. Sequoia Capital backed the new venture at a $4 billion valuation, the largest European AI seed ever. Silver's stated reason was a direct repudiation of the LLM era: "We want to go beyond what humans know, and to do that we're going to need a different type of method." He was betting, explicitly, that large language models were constrained by the ceiling of human knowledge, and that the path forward was RL-first systems that learned from first principles — the way AlphaGo Zero learned Go from nothing.

Silver's exit was the fullest articulation of the identity crisis in a single career decision. The man who had given DeepMind its proudest achievements believed that the direction DeepMind was now moving in was the wrong one. He left to prove it.

The Grief

The mood among senior AI researchers after ChatGPT was not just competitive anxiety. It was something closer to grief.

Yoshua Bengio — a Turing Award laureate and one of the pioneers of deep learning — spent a month with ChatGPT and progressively revised his sense of timelines. He had previously thought transformative AI was "decades to centuries" away; by mid-2023 he estimated "5 to 20 years with 90% confidence." In August 2023 he published an essay unlike anything in his academic career, titled "Personal and Psychological Dimensions of AI Researchers Confronting AI Catastrophic Risks." He wrote: "It is difficult because accepting the logical conclusions that follow means questioning our own role, the value of our work, our own sense of value... It is truly horrible to even entertain these thoughts and some days, I wish I could just brush them away." He described feeling "desperate" with "no notion of how we could fix the problem."

Geoffrey Hinton left Google in May 2023 — the timing matters — specifically so he could "talk about the dangers of AI without worrying about how it interacts with Google's business." He had previously believed AGI was thirty to fifty years away; after ChatGPT he revised to fewer than twenty. He told MIT Technology Review: "I think it's quite conceivable that humanity is just a passing phase in the evolution of intelligence." He added, separately, that "a part of him now regrets his life's work."

Eliezer Yudkowsky, whose career had been spent arguing that AI safety was the most important problem in the world, published a TIME op-ed on March 29, 2023, calling not for a pause but for a halt. "We are not prepared. We are not on course to be prepared in any reasonable time window. There is no plan... If we actually do this, we are all going to die." He proposed that the open letter calling for a six-month pause — signed by 30,000 people — was dangerously insufficient.

On May 30, 2023, the Center for AI Safety published a one-sentence statement: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war." Among the 350+ signatories: Sam Altman, Geoffrey Hinton, Yoshua Bengio, and Demis Hassabis.

Hassabis, asked about his personal probability of AI causing human extinction — the "p(doom" estimate that had become a standard question in the field — said: "It's definitely non-zero and it's probably non-negligible. So that in itself is pretty sobering." He had said for years that safety was important. Now it was urgent.

The Merger's Culture Shock

The April 2023 merger of Google Brain and DeepMind did not go smoothly, even by the standards of a merger conducted under competitive emergency.

The two organizations had coexisted for nearly a decade in what Mallaby calls "productive rivalry that frequently tipped into dysfunction." They worked on the same problems, published at the same conferences, recruited from the same PhD programs, and regularly duplicated work without knowing it. Competition for Google's compute resources was a running wound. At the 2018 NeurIPS conference, when DeepMind researchers questioned Brain scientists about their methodology, a Brain researcher replied: "If you guys hadn't been hogging all of our goddamn compute!"

The cultural gap ran deeper than compute. Google Brain was Mountain View: faster-paced, product-oriented, accustomed to public company rhythms, embedded in Google's infrastructure. DeepMind was London: academic, deliberate, multi-year research horizons, semi-autonomous by design. When Hassabis — in the first all-hands meeting of the merged organization — declared that the new unit had to return to "startup or entrepreneurial roots," being "scrappier, faster, shipping things really quickly," the Brain researchers heard an acknowledgment of their own culture. DeepMind researchers heard a description of what they had left academia to avoid.

After the merger, projects were evaluated on their relevance to Gemini's roadmap, not just scientific merit. Publication timelines were subject to new vetting. Researchers who had joined to pursue fundamental questions found themselves redirected toward commercial product cycles. One senior researcher, describing the atmosphere to Sifted, said that "some researchers felt frustration with having to stick to guidelines from leadership," and that "this pressure has created a sense of fatigue."

Hassabis had spent thirteen years building an organization that attracted researchers by promising something genuine: the freedom to pursue hard, important problems over long time horizons, inside a well-resourced lab, without the pressure to justify relevance. That promise had not been entirely false — AlphaFold existed because it was possible, for six years, to fund fifty people to solve protein structure prediction without a commercial roadmap. What ChatGPT destroyed was the structural condition that made that promise possible to keep. Once the race was fully engaged, every month of research without a product was a month of ground ceded.

The phrase "we're cooked" was not said by a specific person in a documented context. It was in the air — the AI researcher's generation-specific way of saying that something had shifted, that the timeline had collapsed, that the situation was beyond ordinary management. It captured a mood that ran from the cheerful competitive anxiety of engineers pivoting to LLMs to the genuine existential dread of researchers who had spent careers on the problem and now, watching ChatGPT's user curve, were confronting what it implied.

Hassabis was not cooked, exactly. He had a Nobel Prize, a newly merged 7,600-person organization, and the full resources of Alphabet behind him. But the version of his future that he had spent the longest time imagining — careful, scientific, CERN-like, singular — was gone. "At the back of my mind," he told Fortune in 2026, "I've got this gnawing feeling that there's something much more important, much bigger than the commercial race, which is getting AGI safely over the line for humanity." The gnawing feeling was the residue of that imagined future. The commercial race was the actual one.


Chapter 18: Step by Step

On April 20, 2023 — exactly five months after ChatGPT's launch — Sundar Pichai announced the creation of Google DeepMind. The two organizations that had spent nine years competing, duplicating each other's work, and fighting over compute were merged into a single entity under Demis Hassabis as CEO.

The combined unit had roughly 7,600 people. Hassabis had gone from running a semi-autonomous research lab in London to leading one of the largest AI organizations in the world. Jeff Dean, who had built and led Google Brain since its founding, became Chief Scientist of Google — a prestigious title that, in practice, removed him from the operational center of AI development at exactly the moment it had become the most important battlefield in technology. It was the kind of organizational transition that looks like a promotion from the outside and like something else from the inside.

Three weeks after the merger announcement, at Google I/O on May 10, Hassabis publicly announced Gemini.

The Vision and the Race

The word Hassabis used to describe what he was trying to build was "natively multimodal." Unlike GPT-4, which had begun as a text model and had vision bolted on later, Gemini was designed from the foundation up to process text, images, audio, and video through shared network layers. The analogy Hassabis offered in a June 2023 Wired interview was precise and revealing: "At a high level you can think of Gemini as combining some of the strengths of AlphaGo-type systems with the amazing language capabilities of the large models." Reinforcement learning and tree search — AlphaGo's core techniques — would give Gemini planning and problem-solving capabilities that pure language models lacked.

This was the thesis he had maintained through the entire LLM era: that RL and language modeling were not competitors but complements, and that the combination was the path to something genuinely closer to general intelligence. He had said it about Gato in 2022. He was now saying it about Gemini under genuinely competitive pressure, which changed the stakes considerably.

The development process was, by all accounts, intense. Hundreds of engineers from both Brain and DeepMind were redirected to the effort. Sergey Brin — who had returned to Google after ChatGPT's launch and was personally filing code as late as January 2023 — remained a "core contributor" to Gemini's training. The model was trained on Google's TPU infrastructure at a scale that required tens of thousands of chips and included YouTube transcripts, diverse multimodal data across all modalities, and a legal review process to filter copyrighted content. Hassabis described the competitive environment as "ferocious," with veteran employees calling it "the most intense environment they'd ever seen, perhaps ever in the technology industry." He spoke to Pichai every day.

December 6, 2023

Gemini 1.0 launched on December 6, 2023. Three tiers: Ultra, for highly complex tasks; Pro, for a wide range of tasks, immediately rolled out to Bard in English across 170 countries; and Nano, for on-device deployment, integrated into Pixel 8 Pro smartphones.

The headline technical claim was one that had clear symbolic weight. Gemini Ultra achieved 90.0 percent on MMLU — the Massive Multitask Language Understanding benchmark, covering 57 subjects including mathematics, physics, history, law, medicine, and ethics — making it the first AI model to exceed human expert performance on that test. GPT-4 had scored 86.4 percent. The 90 percent threshold was not just a benchmark; it was a number that communicated, to anyone paying attention, that the gap between the best AI and the best humans on standardized knowledge tests had closed.

The demonstration video that accompanied the launch did not hold up as well as the benchmark. The video appeared to show Gemini understanding live video and audio in real time — a child drawing, a cup being spun, a game of rock-paper-scissors. In reality, the latency had been reduced and outputs shortened in editing, and the prompts used were pre-written text inputs, not live voice or video. In the rock-paper-scissors sequence, the actual prompt included a hint: "Hint: it's a game." One of the most acclaimed demonstrations of AI capability in 2023 had been staged.

Oriol Vinyals, one of DeepMind's most senior researchers, defended the video: "All the user prompts and outputs in the video are real, shortened for brevity...We made it to inspire developers." Critics argued that the distinction between "real outputs, staged demo" and "fabricated outputs" was doing a lot of work. The controversy was manageable, but it arrived at exactly the moment Google most needed to demonstrate that it could match OpenAI without shortcuts.

AlphaCode 2

On the same day as the Gemini announcement, DeepMind released a technical report on AlphaCode 2: a system built on Gemini Pro that competed in Codeforces programming contests.

The original AlphaCode, released in early 2022, had performed at roughly the median level of competitive programmers — better than about half of all entrants. AlphaCode 2 scored in the 85th percentile, solving 43 percent of problems compared to AlphaCode's 25 percent. On two of the twelve contests evaluated, it outperformed 99.5 percent of participants.

In Codeforces's rating taxonomy — Newbie, Pupil, Specialist, Expert, Candidate Master, Master, and beyond — AlphaCode 2 positioned itself between Expert and Candidate Master, among the serious competitive programmers. More than the raw percentile, the sample efficiency was striking: AlphaCode 2 needed only about a hundred generated solutions per problem to match what AlphaCode had required a million attempts to achieve. The system had not just improved. It had become ten thousand times more efficient at finding correct solutions within the same sampling budget.

The PhD Student's Four Years

The research result that most clearly embodied the chapter's title arrived not from the competitive product side but from the scientific side, in January 2024. AlphaGeometry, published in Nature on January 17, solved 25 of 30 recent International Mathematical Olympiad geometry problems. The average human IMO gold medalist solves 25.9. The previous AI state of the art solved 10. GPT-4, tested standalone, solved zero.

The researcher at the center of it was Trieu H. Trinh, a Vietnamese computer scientist who had graduated from Ho Chi Minh City University of Science, joined Google Brain in California, then left in 2019 for a PhD at NYU's Courant Institute. His advisor He He later described his "doggedness and dedication." Trinh had decided to use IMO geometry as what he called "a more toy example" before tackling the grand challenge of mathematical reasoning. He spent four years on it.

The architecture he built was a specific kind of step-by-step reasoning. A language model handled the creative part — proposing auxiliary constructions, the new points and lines and circles that geometry proofs often require and that humans find through intuition. A symbolic deduction engine handled the rigorous part — verifying each logical step, extending the proof chain, confirming that the construction the language model proposed actually led somewhere. When the symbolic engine got stuck, it called the language model. The language model suggested a construction. The symbolic engine verified it. The loop continued until a proof emerged.

This was not approximation or pattern-matching. The outputs were machine-verifiable and human-readable — sequences of reasoning steps that could be checked against the axioms of Euclidean geometry. Evan Chen, a mathematician and math competition coach, said: "AlphaGeometry's output is impressive because it's both verifiable and clean...It uses classical geometry rules with angles and similar triangles just as students do."

The training data was entirely synthetic: one billion random geometric diagrams, from which symbolic reasoning extracted 100 million unique geometric proof examples. No human-written proofs. No human demonstrations. The language model learned to propose constructions by seeing geometry, not by being shown what good geometry looked like.

Trinh's four-year project — quietly proceeding while the rest of the organization pivoted to Gemini, while ChatGPT launched and the wartime posture descended — was exactly the kind of long-horizon fundamental research that DeepMind had been built to pursue. It arrived in the Nature papers queue while the organization around it was declaring that such work would have to be deprioritized. The timing was its own kind of statement.

One Million Tokens

On February 15, 2024, Google announced Gemini 1.5 Pro. The headline number was one million tokens — the context window, meaning the amount of information the model could hold in attention simultaneously. In practical terms: one hour of video, eleven hours of audio, thirty thousand lines of code, or roughly seven hundred thousand words of text. All at once, all in context, all available for the model to reason over without the information having been compressed or summarized away.

GPT-4 Turbo's context window was 128,000 tokens. Gemini 1.5 Pro's was nearly eight times larger.

The system was built on a Mixture-of-Experts architecture — a design in which different "expert" subnetworks activate for different types of inputs, allowing the model to achieve the capability of a much larger system at a fraction of the compute cost. Gemini 1.5 Pro matched or exceeded Gemini 1.0 Ultra on most benchmarks while requiring substantially less compute to train and run.

Google demonstrated the long-context capability by feeding 1.5 Pro an entire 44-minute silent film and asking it to describe plot points, character actions, and small details scattered across the footage. The "needle in a haystack" retrieval test — finding a single piece of information embedded in a massive text — showed near-perfect recall at 1 million tokens, degrading only slightly to 99.2 percent at 10 million tokens in experimental tests.

Jeff Dean, now Chief Scientist, promoted the results publicly and repeatedly. The message was specific: this was not GPT-4 with more features. It was a different architectural bet on what capability required. Where OpenAI had pushed parameter count, Google DeepMind had pushed context length and compute efficiency. Whether the bet would translate to user preference was a separate question.

What Step by Step Means

The title of this chapter captures several things at once.

The organizational reconstruction of Google DeepMind was a step-by-step process — there was no single moment when the two organizations became one, when the culture wars ended, when the research-product tension resolved. Researchers who had joined to work on fundamental science found projects redirected; those who had come from Brain found new colleagues suspicious of their Mountain View instincts. The integration was ongoing in a way that Pichai's announcement on April 20 had obscured.

The technical approach DeepMind was now advancing — AlphaGeometry's neuro-symbolic loop, SELF-DISCOVER's reasoning modules, chain-of-thought decoding — was literally step-by-step. The insight common to all of these systems was the same: AI did not need to produce correct answers in a single forward pass if it could break problems into intermediate steps, verify each step, and revise when a step failed. The ability to reason in sequence, with verification, was what separated genuinely capable problem-solving from confident guessing.

And Hassabis's own stated philosophy about AGI was step-by-step. He had said it consistently since AlphaGo: "one or two more big breakthroughs," a transformer-level or AlphaGo-level insight, applied in sequence. Not a single moment of emergence. Not a sudden crossing of a threshold. A series of specific advances, each building on the last, until the accumulation reached something categorically new.

AlphaGeometry was one of those steps. Gemini 1.5 Pro's long-context window was one. The 90 percent MMLU score was one. What the next step was, and how many steps remained, was the question Mallaby leaves the chapter poised over — unsettled, as it should be, because no one knew.


Chapter 19: Comeback, and Beyond

In September 2023, while the Gemini team was racing toward its December launch date and the post-merger culture clash was working itself out in labs on two continents, a quieter paper appeared in Science. It described a system called AlphaMissense.

The human genome contains approximately 71 million possible missense variants — single-letter DNA substitutions that cause a different amino acid to be produced in a protein, which can disrupt function, cause disease, or do nothing at all. Of these 71 million, scientists had experimentally characterized about 0.1 percent. The other 99.9 percent were a medical mystery: when a patient arrived with an unusual genetic variant, clinicians often had no basis for judging whether it was the cause of their condition or an innocent bystander.

AlphaMissense processed all 71 million variants. It classified 89 percent of them — 57 percent as likely benign, 32 percent as likely pathogenic. It was not a diagnosis. It was a probabilistic catalog, a starting point for clinical investigation that had not existed before. The predictions were made freely available for both commercial and scientific use. The model code was open-sourced and integrated with the global genomics infrastructure. For rare disease diagnosis — where a patient may have an unclassified variant and no benchmark for its significance — it was the kind of tool that could change the outcome of a clinical workup in an afternoon rather than after months of laboratory work.

AlphaMissense received a fraction of the attention that Gemini received three months later. This distribution of attention — a commercially irrelevant scientific breakthrough generating quiet acknowledgment while a chatbot launch generated front-page coverage — captures something true about the period this chapter describes.

Gemini's Comeback

The original Gemini launch in December 2023 had been widely read as underwhelming. Gemini Ultra matched GPT-4 on benchmarks but did not clearly surpass it. The staged demo controversy had undermined the marketing. The gap between the benchmark claims and what Gemini Pro actually delivered in the hands of early users was visible.

The comeback happened in stages.

Gemini 1.5 Pro, announced in February 2024, established a genuine structural advantage: a one million token context window, extended later to two million, compared to GPT-4 Turbo's 128,000 tokens. At scale this was not a marginal improvement — it meant Gemini 1.5 Pro could hold an entire hour of video, eleven hours of audio, or thirty thousand lines of code in active attention simultaneously, without compression or summarization. On retrieval benchmarks — the "needle in the haystack" tests measuring whether a model could locate specific information buried in a massive context — it achieved 99 percent accuracy up to one million tokens. This was a technical lead that mattered for real applications: codebases, legal documents, long research contexts, multimedia analysis.

Then in March 2025, Gemini 2.5 Pro launched and debuted at number one on the Chatbot Arena leaderboard — the human-preference benchmark run independently by researchers at Berkeley and LMSYS — with the largest score jump ever recorded in the leaderboard's history. It led simultaneously in mathematics, creative writing, instruction-following, long-query handling, and multi-turn conversation. On graduate-level science reasoning (GPQA Diamond), it scored 84 percent. On mathematics competition problems (AIME 2025), it matched OpenAI's best reasoning model within a fraction of a percent. On multimodal benchmarks, it led the field.

In competitive coding (SWE-bench), it trailed Claude 3.7 Sonnet at 63.8 percent versus 70.3. The comeback was real but the frontier moves fast — by mid-2025, Claude 4 and GPT-5 variants had taken the coding lead again. What Gemini's trajectory showed was not permanent dominance but genuine competitive presence: an organization that had looked outclassed in early 2023 was, two years later, producing models that no reasonable observer could dismiss.

AlphaFold 3

In May 2024, DeepMind and Isomorphic Labs published AlphaFold 3 in Nature. The original AlphaFold 2 had solved protein structure prediction. AlphaFold 3 extended the same framework to predict the structure and interactions of all major biological molecules: proteins, DNA, RNA, small-molecule drugs, antibodies, and the chemical modifications that control cellular function. The key expansion was drug-like small molecules — the category that includes most pharmaceuticals, and the category AlphaFold 2 could not handle.

The accuracy improvements were substantial. On the PoseBusters benchmark — measuring how accurately a system predicts where a drug molecule binds to its protein target — AlphaFold 3 was at least 50 percent more accurate than the best existing methods, and was described as the first AI system to surpass physics-based docking tools on this task. For antibody-antigen interactions, for protein-nucleic acid binding, for the modifications that control protein function: in each category, AlphaFold 3 substantially exceeded previous state-of-the-art.

The architecture used a diffusion network in place of AlphaFold 2's structure module — the same approach that powers AI image generation, adapted to produce molecular geometries rather than pixel arrays. The result was a system that could generate not just the most likely structure but a distribution over possible structures, capturing the flexibility that many biologically and pharmaceutically important molecules exhibit.

The controversy was the same as before, but sharper. AlphaFold 2 had been released fully open-source — that was what the Nobel Committee had cited, that was what three million researchers in 190 countries had used. AlphaFold 3 launched without the code, accessible only through a capped web server that explicitly blocked predictions involving novel drug-like molecules. More than a thousand scientists signed a protest letter. The paper was published in Nature without the peer reviewers having seen the code.

Pushmeet Kohli, DeepMind's head of AI science, stated the position plainly: the lab had to "strike a balance" between scientific accessibility and "not compromising Isomorphic's ability to pursue commercial drug discovery." Six months later — one month after the Nobel Prize for the open-source predecessor — the code was released for non-commercial academic use. The model weights required a request process. The commercial restrictions remained.

The sequence was a precise demonstration of the tension Mallaby documents throughout the book. The Nobel celebrated the values that had made AlphaFold 2 transformative: open publication, free access, science as a public good. AlphaFold 3 operated under the values of the commercial organization that DeepMind had become: science as competitive advantage, access calibrated to protect Isomorphic's drug discovery business.

The Drug Discovery Bet

Isomorphic Labs, the Alphabet spinout founded in 2021 to commercialize DeepMind's biological AI, had its most significant validation moment in January 2024. In two deals announced simultaneously, it signed research partnerships with Eli Lilly (45millionupfront,upto45 million upfront, up to 1.7 billion in performance milestones) and Novartis (37.5millionupfront,upto37.5 million upfront, up to 1.2 billion in milestones). Combined potential value: nearly $3 billion.

These were not press releases dressed as deals. Eli Lilly and Novartis were paying real money, upfront, before any drug had entered clinical trials — for the right to use Isomorphic's AI-driven molecular design platform on specific undisclosed targets. In early 2025, the Novartis partnership was expanded. In March 2025, Thrive Capital led a $600 million Series A — Isomorphic's first outside capital, external validation of the thesis from one of technology's most disciplined investors.

By mid-2025, Isomorphic's president was describing the company as "getting very close" to human clinical trials. The focus areas are oncology and immunology. The expected timeline for first Phase I trials is late 2026 at the earliest. If those trials proceed to Phase II and III, a commercially successful AI-designed drug is still a decade away by conventional pharmaceutical development timelines — which are notoriously unpredictable, with roughly 10 percent of drug candidates that enter Phase I ultimately reaching approval.

Hassabis has described his target: "a $100 billion-plus AI drug discovery business." The vision is specific enough to be measured against. The proof-of-concept — an AI-designed molecule in human clinical trials — has not yet arrived.

What AGI Means to Hassabis

Asked to define AGI, Hassabis consistently sets a bar that most other people in the field do not. He does not mean a system that passes the bar exam or scores above human experts on MMLU. He means a system capable of genuine invention: formulating new theories in physics, proposing new research directions, designing original experiments that no human has thought to run.

"We don't have systems yet that can do that type of creativity," he has said. The distinction matters because it separates solving a known conjecture from generating a new conjecture — a task that requires not just capability but a kind of scientific curiosity that current systems do not exhibit.

What he says is still missing: hierarchical planning, long-term memory, hypothesis generation, and a genuine world model — an intuitive understanding of physical causality that would allow an AI to reason about consequences, not just predict outputs. He has articulated a two-step requirement for autonomous scientific AI: first, a world model that understands physical reality; second, automated experimentation — the ability to ask questions, design tests, run them, and iterate. When those two components are connected into a closed loop, the system could in principle do independent science. That remains ahead.

His timeline, consistently stated since 2024: a 50 percent chance of AGI by 2030, with "5 to 10 years" as his public range. This puts him in the mainstream rather than the extreme wing of AGI prediction. He also says consistently that scaling alone will not close the remaining gap. "My guess is one or two more big breakthroughs — I'm talking like a Transformer level or AlphaGo level type of breakthrough" — will be required for the reasoning and planning components that current LLMs still struggle with.

The Honest Assessment

By early 2026, Mallaby's book can draw a balance sheet.

On the scientific side, the verdict is unambiguous. AlphaFold 2 won the Nobel Prize and transformed structural biology for three million researchers in 190 countries. AlphaMissense catalogued 71 million genetic variants for disease research. AlphaFold 3 extended molecular prediction to drug interactions. AlphaGeometry matched gold-medalist level on IMO geometry. AlphaCode 2 reached the 85th percentile of competitive programmers. These results represent a coherent scientific AI program that no other organization has replicated at comparable depth.

On the commercial side, the picture is more complicated. OpenAI's annualized revenue exceeded 20billionheadinginto2026.Anthropicswasapproaching20 billion heading into 2026. Anthropic's was approaching 4 billion. Gemini's 750 million monthly active users rival ChatGPT's scale, but Google's monetization of Gemini runs through an ecosystem — Search, Cloud, Android, Workspace — rather than a standalone product. Isomorphic's drug discovery thesis won't have its proof-of-concept in human trials until late 2026 at the earliest, and commercial outcomes from drug development run on decade-scale timelines.

Hassabis has a theory about why the scientific heritage matters for the race, even now. The scaling-first approach — bigger models, more compute, more data — has produced genuinely impressive language models. But he believes the next set of breakthroughs, the ones that close the remaining gap between current AI and genuine AGI, will require the same kind of domain-specific architectural insight that AlphaGo Zero, AlphaFold 2, and AlphaGeometry each required. You cannot pure-scale your way to a world model. You cannot iterate your way to automated hypothesis generation. At some point — if his theory is right — the lab with the deepest understanding of what intelligence actually requires will have an advantage that accumulated parameters cannot easily replicate.

That theory has not been proven. It might be wrong. But the book's underlying question — whether Hassabis's bet on fundamental research over product-first AI is ultimately vindicated — arrives here still open, which is exactly where it belongs.


Epilogue: Turing's Champion

In October 1950, Alan Turing published a paper in the journal Mind that asked a question so fundamental it has not yet been answered. "Can machines think?" he opened — and then, characteristically, dissolved the question before it could harden into unanswerable philosophy. Instead of wrestling with consciousness and definition, he proposed a test: if a judge communicating by text cannot reliably distinguish a machine from a human, the question of whether the machine "really" thinks becomes practically irrelevant.

Turing made two predictions. Within fifty years, he wrote, computers would be able to play the imitation game well enough that an average interrogator would have no better than a 70 percent chance of identifying them correctly after five minutes of questioning. And by the end of the century, "the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted."

Both predictions have been vindicated. The first by GPT-4 and Gemini; the second by every newspaper published in 2024.

But the most prophetic part of the paper was not the imitation game. It was a section near the end titled "Learning Machines." Rather than trying to engineer an adult mind directly — an intractably complex task — Turing proposed building a simple "child machine" and educating it through reward and punishment, mirroring natural development. He described nets of logical components whose properties could be "trained" into a desired function. He was, in 1950, describing deep learning and reinforcement learning three decades before they existed.

When AlphaGo Zero taught itself to play Go through self-play alone — starting from random moves, with no human knowledge, discovering within days strategies no human had found in five thousand years of the game — it was, in the most direct technical sense, the realization of Turing's child machine reaching adulthood. Turing had imagined it. Hassabis had built it.

The Table Is Screaming

Late at night, at his desk in London, Hassabis will sometimes stop working and feel what he describes as reality demanding his attention. He told Mallaby about it directly — rapping his palm on the table as he spoke: "This table, Sebastian! Why should it be solid? Computers are just bits of sand and copper. Why should these combine to do anything?"

This is not a scientist's rhetorical flourish. It is the operative emotion behind everything. Hassabis has described doing science as "reading the mind of God" — his religion, in a sense, the thing underneath the ambition and the competition and the Nobel Prize and the commercial pressures. The universe is structured in ways that can be understood, and those structures are information, and intelligence is what processes information into understanding, and if you build a sufficient intelligence you could, in principle, understand everything. He wanted nothing less than this: an omniscient machine, a tool for closing the gap between human consciousness and the fabric of reality itself.

This is what makes the story Mallaby is telling both thrilling and vertiginous. It is not, at its core, a technology story. It is a story about a person who looked at the strangeness of existence and decided, with complete seriousness, to do something about it.

The Oppenheimer Frame

Mallaby's most explicit historical parallel arrives near the end of the book. J. Robert Oppenheimer created the atomic bomb. He understood what he was building. He signed a letter of revulsion to the Secretary of War after Trinity. He testified against the hydrogen bomb program. He was stripped of his security clearance in 1954 for his trouble, exiled from the policy of the weapon he had made. The thing he built continued without him.

Oppenheimer had said, of the decision to build the bomb: "When you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success." This is the phrase that echoes through Mallaby's account of Hassabis. Geoffrey Hinton captured the same structure when he said that "the thrill of discovery is so big that even if you're very worried about its implications, it's impossible to resist." The technically sweet problem is not a personal failure. It is a civilizational condition.

Mallaby's question about Hassabis is not accusatory. It is tragic: "He wants to do good, but can he be good?" Hassabis understands the dangers. He signed the extinction risk statement. He has called his p(doom) non-negligible. He speaks about the need for safety-minded organizations to stay in the race as the argument for staying in it. He has said that by exiting, he would not advance safety. This is probably true. It is also precisely what any competent actor in this position would have to say, regardless of whether it were true.

Project Mario — the three-year effort to create independent governance structures for AGI development — failed entirely. The ethics board promised in the 2014 acquisition never functioned. The AlphaFold 3 open-source restrictions showed that, when commercial pressures met scientific values, the commercial pressures prevailed. The safety problem, Hassabis told Mallaby, is "soluble." It is also not guaranteed to be solved.

Oppenheimer could not control his creation. Perhaps, Mallaby writes, "this is the privilege and fate of all history's great scientists."

The Guest Book

In December 2024, at the Nobel Foundation in Stockholm, Hassabis signed the laureates' guest book — the book that has been signed since 1952, containing the names of everyone who has stood in that building to receive science's highest honor. Einstein, 1921. Watson and Crick, 1962. Feynman, 1965.

"They're all there, all my heroes," Hassabis told Mallaby. "I get goosebumps just even talking about it."

The specific weight of this moment: Hassabis grew up watching The Race for the Double Helix. As a teenager, he read about Turing. As a student, he studied Feynman. These were not distant figures in the history of science — they were the people whose understanding of the world he had spent his life trying to extend. And now his name was among them, in the book in Stockholm, for work on a problem that did not exist when any of them was alive.

The Nobel honored AlphaFold — a system that predicted protein structures by learning patterns from evolutionary data, vindicating the thesis that intelligence applied to biology could accelerate science by decades. The same thesis, extended to every scientific domain, is the premise of everything Hassabis believes about what comes next.

The Clock

On January 27, 2026, the Bulletin of the Atomic Scientists set the Doomsday Clock to 85 seconds to midnight — the closest it has ever stood in its 79-year history. For the first time in the clock's existence, artificial intelligence was explicitly named as a co-driver of the setting, alongside nuclear weapons and climate change.

The AI Safety Clock, maintained separately, stood at 18 minutes to midnight in early 2026 — having advanced nine minutes in twelve months, with the largest single jump driven by autonomous AI agents and the Pentagon's declaration of intent to become "an AI-first warfighting force."

A survey of 59 AI safety researchers published in February 2026 reported a median p(doom) — probability of human extinction or permanent disempowerment before 2100 — of 25 percent. Mean was 34. Seventy-three percent expected AGI by 2035. The binding constraint on safety work, the researchers said, was talent, not funding.

Hassabis has said the safety problem is soluble. He has also said that the race is not something any individual or organization can stop. These two things are simultaneously true and do not resolve each other. The international governance frameworks that might bridge the gap between "soluble in principle" and "solved in practice" do not yet exist in a form adequate to the problem. The organizations founded on safety rationales are the same organizations accelerating capabilities. The labs building the most powerful systems are the same labs arguing that they should be trusted with the outcome.

What Turing Left Unsaid

Turing's 1950 paper ends on a note of unusual humility for a man whose confidence was otherwise a feature rather than a bug. "We can only see a short distance ahead," he wrote, "but we can see plenty there that needs to be done."

This is the right register for where the story stands. Hassabis is not Oppenheimer exactly — the analogy is suggestive, not precise, and Mallaby is careful to hold it as a question rather than a verdict. What has been built in the decades since Turing published is extraordinary and documented: an artificial system that mastered Go by playing against itself until it had surpassed every human; a system that solved in two years a problem that had resisted fifty years of dedicated effort from the best structural biologists alive; a system that can pass the bar exam, compose coherent arguments, model protein interactions, classify genetic mutations, write code at the 85th percentile of competitive programmers. The child machine has grown.

What comes next — whether the remaining steps to AGI are two or twenty, whether the safety problem is solved before the capabilities make its solution irrelevant, whether Hassabis's bet that scientific rigor and AGI ambition can coexist will prove right — none of this can be seen from here. The book does not pretend otherwise.

What Mallaby offers instead is a portrait of the person at the center of this particular moment in history: a chess prodigy from London who became obsessed with the question of how minds work, who declined the video game industry at twenty-two because it wasn't the problem, who spent his career building systems that surprise their creators, who won a Nobel Prize and then immediately had to rebuild his organization to compete in a race he had hoped to avoid, who sits at night in his office feeling reality scream at him from the surface of a table, who believes the universe is made of information and that intelligence is the instrument by which that information becomes understanding.

He is, in Mallaby's framing, Turing's champion — the person who took the child machine seriously, built it, tested it against Go and proteins and geometry and language, watched it exceed everything humanity had learned, and now stands at the edge of what comes next, holding both the prize and the responsibility, not entirely sure they can be held together.

Turing said we can only see a short distance ahead. That has not changed. There is still plenty that needs to be done.

References:Let's stay in touch and Follow me for more thoughts and updates