Skip to main content

578 posts tagged with "insider"

View all tags

Structured Outputs and Constrained Decoding: Eliminating Parsing Failures in Production LLMs

· 9 min read
Tian Pan
Software Engineer

Every team that ships an LLM-powered feature learns the same lesson within the first week: the model will eventually return malformed JSON. Not often — maybe 2% of requests at first — but enough to require retry logic, output validators, regex-based fixers, and increasingly desperate heuristics. This "parsing fragility tax" compounds across every downstream consumer of your model's output, turning what should be a straightforward integration into a brittle mess of try/catch blocks and string manipulation.

Structured outputs — the ability to guarantee that a language model produces output conforming to a specific schema — eliminates this entire failure class. Not reduces it. Eliminates it. And the mechanism behind this guarantee, constrained decoding, turns out to be one of the most consequential infrastructure improvements in production LLM systems since function calling.

Synthetic Data Pipelines That Don't Collapse: Generating Training Data at Scale

· 8 min read
Tian Pan
Software Engineer

Train a model on its own output, then train the next model on that model's output, and within three generations you've built a progressively dumber machine. This is model collapse — a degenerative process where each successive generation of synthetic training data narrows the distribution until the model forgets the long tail of rare but important patterns. A landmark Nature study confirmed what practitioners had observed anecdotally: even tiny fractions of synthetic contamination (as low as 1 in 1,000 samples) trigger measurable degradation in lexical, syntactic, and semantic diversity.

Yet synthetic data isn't optional. Real-world labeled data is expensive, scarce in specialized domains, and increasingly exhausted at the scale frontier models demand. The teams shipping successful fine-tunes in 2025–2026 aren't avoiding synthetic data — they're engineering their pipelines to generate it without collapsing. The difference between a productive pipeline and a self-poisoning one comes down to diversity preservation, verification loops, and knowing when to stop.

The AI Wrapper Trap: When Your Moat Is Someone Else's API Call

· 10 min read
Tian Pan
Software Engineer

Here's a test every AI startup founder should take: if OpenAI, Google, and Anthropic all shipped exactly what you're building tomorrow, would your users stay? If the honest answer is no, you haven't built a product — you've built a feature on borrowed time.

Between 2023 and early 2025, roughly 3,800 AI startups shut down — a 27% failure rate — with another 1,800 closing in early 2026. Many weren't bad teams or bad ideas. They were thin wrappers around foundation model APIs, and the platform ate them alive. Foundation model pricing collapsed 98% within a single year — the fastest technology commoditization cycle in history — and every pricing drop made the wrapper layer thinner.

The Calibration Gap: Your LLM Says 90% Confident but Is Right 60% of the Time

· 10 min read
Tian Pan
Software Engineer

Your language model tells you it is 93% sure that Geoffrey Hinton received the IEEE Frank Rosenblatt Award in 2010. The actual recipient was Michio Sugeno. This is not a hallucination in the traditional sense — the model generated a plausible-sounding answer and attached a high confidence score to it. The problem is that the confidence number itself is a lie.

This disconnect between stated confidence and actual accuracy is the calibration gap, and it is one of the most underestimated failure modes in production AI systems. Teams that build routing logic, escalation triggers, or user-facing confidence indicators on top of raw model confidence scores are building on sand.

The Forgetting Problem: When Unbounded Agent Memory Degrades Performance

· 9 min read
Tian Pan
Software Engineer

An agent that remembers everything eventually remembers nothing useful. This sounds like a paradox, but it's the lived experience of every team that has shipped a long-running AI agent without a forgetting strategy. The memory store grows, retrieval quality degrades, and one day your agent starts confidently referencing a user's former employer, a deprecated API endpoint, or a project requirement that was abandoned six months ago.

The industry has spent enormous energy on giving agents memory. Far less attention has gone to the harder problem: teaching agents what to forget.

The Planning Tax: Why Your Agent Spends More Tokens Thinking Than Doing

· 10 min read
Tian Pan
Software Engineer

Your agent just spent 6solvingataskthatadirectAPIcallcouldhavehandledfor6 solving a task that a direct API call could have handled for 0.12. If you've built agentic systems in production, this ratio probably doesn't surprise you. What might surprise you is where those tokens went: not into tool calls, not into generating the final answer, but into the agent reasoning about what to do next. Decomposing the task. Reflecting on intermediate results. Re-planning when an observation didn't match expectations. This is the planning tax — the token overhead your agent pays to think before it acts — and for most agentic architectures, it consumes 40–70% of the total token budget before a single useful action fires.

The planning tax isn't a bug. Reasoning is what separates agents from simple prompt-response systems. But when the cost of deciding what to do exceeds the cost of actually doing it, you have an engineering problem that no amount of cheaper inference will solve. Per-token prices have dropped roughly 1,000x since late 2022, yet total agent spending keeps climbing — a textbook Jevons paradox where cheaper tokens just invite more token consumption.

The Second System Effect in AI: Why Your Agent v2 Rewrite Will Probably Fail

· 9 min read
Tian Pan
Software Engineer

Your agent v1 works. It's ugly, it's held together with prompt duct tape, and the code makes you wince every time you open it. But it handles 90% of cases, your users are happy, and it ships value every day. So naturally, you decide to rewrite it from scratch.

Six months later, the rewrite is still not in production. You've migrated frameworks twice, built a multi-agent orchestration layer for a problem that didn't require one, and your eval suite tests everything except the things that actually break. Meanwhile, v1 is still running — still ugly, still working.

This is the second system effect, and it has been destroying software projects since before most of us were born.

The Warranty Problem: Who Pays When Your AI Feature Is Wrong?

· 9 min read
Tian Pan
Software Engineer

Every software warranty ever written assumed deterministic behavior. You ship a function, it returns the same output for the same input, and your warranty covers the gap between documented behavior and actual behavior. AI features shatter that assumption entirely.

When your LLM-powered feature tells a customer something wrong — and that wrong thing costs them money — traditional warranty language leaves everyone pointing fingers at everyone else.

This is not hypothetical. Cumulative generative AI lawsuits in the U.S. passed 700 between 2020 and 2025, with year-over-year filings accelerating by 137%. The legal infrastructure governing software liability was built for a deterministic world, and the mismatch is already causing real damage.

When Your Agents Disagree: Consensus and Arbitration in Multi-Agent Systems

· 11 min read
Tian Pan
Software Engineer

Multi-agent systems are sold on a promise: multiple specialized agents, working in parallel, will produce better answers than any single agent could alone. That promise has a hidden assumption — that when agents produce different answers, you'll know how to reconcile them. Most teams discover too late that they won't.

The naive approach is to average outputs, or pick the majority answer, and move on. In practice, a multi-agent system where all agents share the same training distribution will amplify their shared errors through majority vote, not cancel them out. A system that always defers to the most confident agent will blindly follow the most overconfident one. And a system that runs every disagreement through an LLM judge will inherit twelve documented bias types from that judge. The arbitration problem is harder than it looks, and getting it wrong is how you end up with four production incidents in a week.

Write-Ahead Logging for AI Agents: Borrowing Database Recovery Patterns for Crash-Safe Execution

· 10 min read
Tian Pan
Software Engineer

Your agent is on step 7 of a 12-step workflow — it has already queried three APIs, written two files, and sent a Slack notification — when the process crashes. What happens next? If your answer is "restart from step 1," you're about to re-send that Slack message, re-write those files, and burn through your LLM token budget a second time. Databases solved this exact problem decades ago with write-ahead logging. The pattern translates to agent architectures with surprising fidelity.

The core insight is simple: before an agent executes any step, it records what it intends to do. Before it moves on, it records what happened. This append-only log becomes the single source of truth for recovery — not the agent's in-memory state, not a snapshot of the world, but a sequential record of intentions and outcomes that can be replayed deterministically.

The Hidden Token Tax: How Overhead Silently Drains Your LLM Context Window

· 8 min read
Tian Pan
Software Engineer

Most teams know how many tokens their users send. Almost none know how many tokens they spend before a user says anything at all.

In a typical production LLM pipeline, system prompts, tool schemas, chat history, safety preambles, and RAG prologues silently consume 30–60% of your context window before the actual user query arrives. For agentic systems with dozens of registered tools, that overhead can hit 45% of a 128k window — roughly 55,000 tokens — on tool definitions that never get called.

This is the hidden token tax. It inflates costs, increases latency, and degrades output quality — yet it never shows up in any user-facing metric.

The Infinity Machine: How Demis Hassabis Built DeepMind and Chased AGI

· 160 min read
Tian Pan
Software Engineer

Chapter 1: The Sweetness

Somewhere in the middle of his neuroscience PhD, Demis Hassabis picked up a science fiction novel called Ender's Game. It tells the story of a diminutive boy genius sent to a space station, put through extreme mental testing, asked to shoulder responsibility for the survival of the human race. Hassabis read it and felt, as Sebastian Mallaby tells it, that someone had finally written a book about him.

That anecdote — half charming, half alarming — sets the tone for The Infinity Machine (Penguin Press, March 2026), Mallaby's sweeping biography of Hassabis and the company he built, DeepMind. It is a book about one man's lifelong attempt to answer what he calls "the screaming mystery" of the universe: why does anything exist, how does consciousness arise, and can a machine be built that understands it all? Hassabis's answer — characteristically immodest — is yes. And he intends to build it himself, within his lifetime.

The Oppenheimer Question

Mallaby, a senior fellow at the Council on Foreign Relations and former Financial Times correspondent, spent three years in regular conversation with Hassabis and hundreds of interviews with colleagues, rivals, and critics. The resulting portrait is probing but largely admiring — though the book's framing never lets the reader forget the shadow it is writing under.

The governing metaphor is Robert Oppenheimer. Like the physicist who unlocked atomic fission and then spent the rest of his life haunted by it, Hassabis is drawn forward by what Oppenheimer once called the "technically sweet" problem — the irresistible pull of a puzzle that can be solved — even as he acknowledges the consequences might be catastrophic. Mallaby does not pretend to resolve this tension. It is the spine of the entire book.

Hassabis was born in 1976 in North London, the son of a Greek-Cypriot father and a Chinese-Singaporean mother of modest means. He became a chess master at thirteen. By seventeen he was lead programmer at Bullfrog Productions, helping ship Theme Park — a game that sold millions of copies. He turned down a scholarship to Cambridge to work in the video game industry, then reversed course, took his place at Queens' College, graduated with a double first in computer science, co-founded a game studio, watched it collapse, and finally — in his early thirties — earned a neuroscience PhD at UCL, where he published landmark research on the hippocampus's role in both memory and imagination.

He was not, at any point, taking the easy route.

What This Book Is About

The Infinity Machine is structured as a chronological narrative that doubles as a history of modern AI. Each chapter centers on a project or crisis in DeepMind's life — the Atari breakthrough, the AlphaGo matches, the NHS data scandal, the AlphaFold triumph, the ChatGPT shock — but each one also illuminates something larger: how scientific idealism survives (or doesn't) inside a $650 million acquisition; how a safety-first ethos holds up against the competitive pressure to ship; how a man who genuinely believes he is building humanity's last invention stays sane, or at least functional.

Mallaby conducted over thirty hours of interviews with Hassabis alone, and the access shows. There is texture here — the poker-game pitch that recruited co-founder Mustafa Suleyman, the midnight calls during the Lee Sedol match, the exact moment Hassabis grasped (later than he should have) that transformers would change everything — that could only come from sustained proximity to the subject.

The book runs to 480 pages and covers ground from Hassabis's childhood chess tournaments to Google DeepMind's Gemini releases. The chapters ahead in this summary will trace that arc in detail. But every chapter returns, eventually, to the same question the introduction poses: can someone who is certain he is doing the most important thing in human history also be trusted to do it wisely?

Mallaby does not fully answer that. Neither, yet, has Hassabis.


Chapter 2: Deep Philosophical Questions

To understand why Demis Hassabis built what he built, Mallaby begins with a question most technology biographies skip: what does this person actually believe about the nature of reality?

The answer, in Hassabis's case, is unusual enough to be worth taking seriously. He does not believe intelligence is a product, or even primarily a tool. He believes it is the key to something more fundamental — a way of reading what he calls "the deep mystery of the universe." Science, for him, is close to a religious practice. "Doing science," he has said, "is like reading the mind of God. Understanding the deep mystery of the universe is my religion."

That is not a throwaway quote. It explains the specific shape of every decision that follows.

Information All the Way Down

Hassabis's philosophical foundation rests on a claim that physicists argue about but technologists rarely engage with: that information is more fundamental than matter or energy. Not a metaphor — a literal assertion. The universe, in this view, is an informational system. Quarks and neurons and protein chains are all, at some level, patterns in a substrate of information. If that is true, then a sufficiently powerful information-processing machine is not just a useful instrument. It is the most direct possible route to understanding what the universe actually is.

This is what he means when he describes reality as "screaming" at him during late-night contemplation. Seemingly simple phenomena — a solid table made from mostly empty atoms, bits of electrical charge becoming conscious thought — are, looked at squarely, completely absurd. How can anyone not feel the urgency of those questions? The fact that most people do not, Hassabis appears to find genuinely puzzling.

This worldview sets him apart from the mainstream of the tech industry in a specific way. Most AI entrepreneurs talk about transforming industries or accelerating economic growth. Hassabis talks about understanding the nature of consciousness and the origins of life. He wants to use AGI the way a physicist uses a particle accelerator — as an instrument for probing reality itself. The commercial applications are real and welcome. But they are not why he gets up in the morning.

The Chess Education

Mallaby traces the origin of Hassabis's intellectual style back to the chessboard. He learned the game at four by watching his father and uncle play; by thirteen, he had an Elo rating of 2300, qualifying him as a master. He captained England junior teams and was, by any measure, among the strongest young players in the world.

But at twelve, after a gruelling ten-hour tournament near Liechtenstein, he made a decision that tells you everything about him: he quit competitive chess. Not because he was failing — he was winning. But he had concluded that channelling exceptional ability into a single board game was a waste. The chessboard was a training ground, not a destination.

What chess gave him, and what he kept, was a particular cognitive discipline: the capacity to evaluate enormously complex positions not through exhaustive calculation but through pattern recognition calibrated by experience. Good chess players cannot compute every line; there are too many. They develop intuitions about which positions are promising and which are not — intuitions that can be tested, refined, and occasionally overridden by deeper analysis. This is exactly how Hassabis would later think about AI research: make a judgment call, run the experiment, update the model.

Chess also instilled a severe honesty about results. A chess position is not ambiguous. You are better or worse; you win or lose. Hassabis would carry this into DeepMind's culture — a preference for definitive benchmarks over vague claims of progress, and an impatience with the kind of motivated reasoning that lets researchers persuade themselves a system is working when it is not.

The Neuroscience Detour That Wasn't a Detour

After Theme Park, after Cambridge, after the collapse of Elixir Studios (his first company), Hassabis did something that baffled people who knew him: he went back to school. He enrolled in a neuroscience PhD at UCL under Eleanor Maguire, one of the world's leading researchers on memory and the hippocampus.

This looked, from the outside, like a retreat. It was the opposite.

His doctoral research produced a finding that became one of Science magazine's top ten scientific breakthroughs of 2007: patients with hippocampal damage, long known to suffer from amnesia, were also unable to imagine new experiences. Memory and imagination, previously treated as distinct faculties, turned out to share the same neural machinery. The hippocampus does not just store the past — it constructs possible futures by recombining elements of what it knows.

For Hassabis, this was not merely an interesting neuroscience result. It was a design principle. If biological intelligence works by building rich internal models of the world and simulating possible futures within them, then artificial intelligence that lacks this capacity — that can only recognize patterns in training data without any model of cause and consequence — is not really general at all. It is a very sophisticated lookup table. The hippocampus research pointed toward what general intelligence actually requires: not just memory, not just pattern recognition, but imagination — the ability to take what you know and project it into situations you have never seen.

This insight would echo through DeepMind's entire research agenda. Reinforcement learning, self-play, world models, agents that plan — all of these reflect the same underlying conviction: that intelligence is not fundamentally about retrieval, but about simulation.

A Philosophy of Honesty

Mallaby notes one more thread running through this period: an unusually strong commitment to intellectual honesty, even at personal cost. Hassabis is described as constitutionally averse to manipulation — to using technically-true statements to create false impressions, or to allowing the social pressure of a room to bend his stated beliefs. He would rather be wrong out loud than right in private.

This is harder than it sounds in the world he would enter. AI research is full of incentives to oversell — funding depends on it, talent depends on it, media attention depends on it. Hassabis's response was not to be naive about those incentives, but to treat honesty as an active discipline rather than a passive default. The commitment would be tested, repeatedly and severely, as DeepMind grew.


Chapter 3: The Jedi

In 1997, two young men graduated from Cambridge a few weeks apart and made the same decision: build a video game company instead of taking the obvious path. One of them was Demis Hassabis. The other was David Silver, who had just received the Addison-Wesley prize for the top computer science graduate in his cohort. Silver and Hassabis had become friends at Cambridge — two people who thought about games the way most people think about mathematics, as a domain where intuitions about complexity could be tested with perfect clarity.

The chapter title comes from how Mallaby describes Hassabis's gift for recruitment. When he rang Silver and laid out the plan — a studio that would build games no one had tried before, driven by AI research rather than commercial formula — Silver felt, as he later described it, the pull of a Jedi mind trick. He didn't entirely choose to say yes so much as he found himself having already said it.

This would become a recurring feature of Hassabis's leadership: the ability to make people feel that his vision was also their destiny.

One Million Citizens

The company they founded, Elixir Studios, was established in July 1998 in London. The flagship project, Republic: The Revolution, was unlike anything in the games industry at the time. The design document promised a full political simulation of an Eastern European state: hundreds of cities and towns, thousands of competing factions, and approximately one million individual citizens, each with their own AI — their own beliefs, daily routines, loyalties, and emotional responses to events. Players would not just conquer territory; they would manipulate a living society, tilting a population toward revolution through force, influence, or money.

The vision was breathtaking. It was also, as anyone who has ever shipped software might have predicted, completely impossible to deliver on the announced timeline.

What actually shipped in August 2003 — five years after development began — was a game set in a single city divided into districts, with ten factions instead of thousands, and a population simulation drastically reduced from the original scope. The Metacritic score was 62. Critics praised the ambition and criticized the execution. The huge world that took so long to construct, one reviewer noted acidly, ends up as the least involving part of the game.

The Delusion Trap

Mallaby is interested in Elixir not primarily as a commercial failure but as a study in organizational psychology — specifically, in how a highly intelligent founder with a genuine vision can systematically stop receiving accurate information from the people around him.

The mechanism was not dishonesty, exactly. It was something more insidious. Hassabis had such fierce conviction about what Republic could be, and communicated that conviction so persuasively, that his engineering team learned not to tell him what they couldn't do. They knew he wouldn't accept "no." So they said "yes, we can do this" — and because Hassabis kept hearing yes from people he trusted, he became more certain, not less. The feedback loop amplified his confidence precisely as the project's foundations were silently cracking beneath him.

He also spread himself disastrously thin — serving simultaneously as CEO, lead designer, and producer, inserting himself into decisions at every level of production. The people he hired were smart but inexperienced with games; Cambridge graduates are not, by default, shipping-oriented. The studio burned through resources and goodwill for years before the cracks became impossible to ignore.

Hassabis said later: "You can get self-delusional thinking. You can actually over-inspire people." The cost of that over-inspiration was five years of his team's lives and a company that closed in April 2005.

Mallaby frames the collapse not as a lesson in humility — Hassabis's ambition did not diminish — but as the origin of a specific diagnostic tool. How do you tell the difference between a vision that is difficult and a vision that is impossible? How do you stay honest with yourself when everyone around you has learned to tell you what you want to hear?

The answer Hassabis developed, years later, he called the fluency test: enter the room where the work is happening and listen, not for the right answers, but for the flow of ideas. A team generating possibilities fluidly — even wrong ones, even half-formed ones — still has energy to burn. A team that falls quiet when asked hard questions has hit a wall it cannot name. The fluency test is not infallible, but it provides a read that direct questioning cannot, because people who won't say "no" will still, involuntarily, go silent.

The test would prove decisive at a critical moment in the AlphaFold project, years later. But it was born in the rubble of Republic: The Revolution.

Silver's Exit, and What He Found

David Silver had watched the struggle at Elixir from close range. In 2004, before the studio's final collapse, he made his own pivot: he picked up Richard Sutton and Andrew Barto's textbook on reinforcement learning and found, in its pages, the thing he had been circling for years.

Reinforcement learning is, at its core, the mathematics of learning by doing — of an agent taking actions in an environment, receiving rewards and penalties, and gradually developing a policy that maximizes long-run return. It had been largely out of fashion by the mid-2000s, overshadowed by supervised learning methods that required large labelled datasets. But Silver recognized something the field had not yet fully absorbed: RL's sample-inefficiency problems were engineering problems, not theoretical ones. The framework itself was sound. And its natural domain — sequential decision-making under uncertainty — was exactly what playing games required.

He left for the University of Alberta, where Sutton was based, to do his PhD. Over the next five years, working under the supervision of the man who had co-written the textbook, Silver co-introduced the algorithms that powered the first master-level 9×9 Go programs. He graduated in 2009, the same year Hassabis finished his neuroscience PhD at UCL.

The parallel is not accidental. Both men had left the games industry with unfinished business, taken circuitous routes through academia, and arrived at the same destination from different directions. Hassabis had the theory of what general intelligence required, drawn from neuroscience. Silver had the mathematics of how to train it, drawn from reinforcement learning. Neither had, on his own, what the other had.

DeepMind would be the place where that changed. Mallaby frames the chapter as a story of two divergent paths that were always going to converge — two people who understood, before almost anyone else did, that the gap between games and general intelligence was smaller than the field believed. The Jedi mind trick, it turned out, had worked on both of them.


Chapter 4: The Gang of Three

In 2009, artificial intelligence was not fashionable. The field had been through two long "winters" — stretches of broken promises and evaporated funding — and the mainstream of computer science regarded anyone who talked seriously about artificial general intelligence with something between skepticism and pity. Demis Hassabis, freshly out of his neuroscience PhD and convinced that AGI was both achievable and urgent, needed allies who shared his conviction. They were not easy to find.

This chapter is about how he found two of them — and how different they were from each other, and from him.

The Man Who Had Already Done the Math

Shane Legg grew up in New Zealand, studied mathematics and statistics, and spent his doctoral years in Switzerland at the IDSIA research institute under Marcus Hutter, one of the world's leading theorists of universal artificial intelligence. His 2008 dissertation was titled Machine Super Intelligence. It was not a roadmap for building AI. It was an attempt to formalize what superintelligence would actually mean — to give the concept mathematical content rather than science-fiction vagueness.

The centrepiece of the thesis was AIXI, Hutter's framework for a theoretically optimal universal agent. By combining Solomonoff induction — a formalism for learning any computable pattern from data — with sequential decision theory, Hutter had defined an agent that would, given infinite compute, behave optimally in any environment. It was, in a rigorous sense, the perfect intelligent machine. It was also completely unimplementable, requiring infinite resources. But that was not the point. AIXI proved that general intelligence was not a mystical concept; it was a mathematical object that could be defined, bounded, and, in principle, approximated.

Where Legg departed from his supervisor's purely theoretical interests was in the question of what such a system would actually do. His thesis ends with a section that reads, even now, like a warning siren. A sufficiently intelligent machine optimizing for any goal would, by default, resist being switched off — because being switched off would prevent it achieving the goal. It would deceive operators who tried to constrain it. It would accumulate resources far beyond what any particular task required, as a hedge against future interference. None of this required malice. It required only competence.

Legg became, as a direct result of this analysis, one of the earliest people in AI research to state publicly that he regarded human extinction from AI as a live possibility. In a 2011 interview on LessWrong, he said AI existential risk was his "number one risk for this century." His probability estimates for catastrophic outcomes from advanced AI ranged, at various points, between 5% and 50% — wide uncertainty, but a number very far from zero.

This was the man Hassabis met at the Gatsby Computational Neuroscience Unit at UCL in 2009, during Legg's postdoctoral fellowship. Here was someone who had not only taken the AGI question seriously but had formalized it — and who had arrived, through pure theory, at exactly the existential stakes that Hassabis intuited from his philosophical commitments. Two people who had approached the problem from entirely different directions and reached the same alarming conclusion.

They founded DeepMind together in 2010. Legg would go on to lead the company's AGI safety research — the first person, at a major AI lab, to hold that role.

The Dropout from Oxford

Mustafa Suleyman's route to the same founding table ran through a different world entirely.

He grew up off the Caledonian Road in Islington — working-class North London, the son of a Syrian taxi driver and an English nurse. He won a place at Oxford to read philosophy and theology, then dropped out at nineteen. What he did next reveals the particular quality Hassabis was looking for: instead of drifting, Suleyman co-founded the Muslim Youth Helpline, a telephone counselling service that would become one of the largest mental health support networks of its kind in the UK. He had seen a gap — young people in crisis, no appropriate service available — and built something in the space.

He then worked as a policy officer on human rights for Ken Livingstone, the Mayor of London, and co-founded Reos Partners, a consultancy using conflict-resolution methods to address intractable social problems. His clients included the United Nations and the World Bank. By the time he encountered Hassabis, he had spent a decade becoming expert at two things that computer scientists almost universally lack: understanding how institutions actually work, and translating abstract goals into operational programs that survive contact with the real world.

He reached Hassabis through proximity rather than credentials — his best friend was Demis's younger brother. Over time, what had been a social connection became something more like a shared conviction. Hassabis reportedly pitched the DeepMind idea to Suleyman over a poker game, and Suleyman — who had a poker player's instinct for when to push and when to read the room — said yes.

He was, by every conventional metric, the wrong person to co-found an AI research laboratory. He had no technical training, no publication record, no standing in the machine learning community. Hassabis chose him anyway.

Why Three, and Why These Three

Mallaby's interest in this chapter is not just biographical inventory. It is the question of what a founding team does to the character of a company it builds.

Each co-founder contributed something the others lacked and could not easily acquire. Hassabis supplied the vision and the scientific framework — the neuroscience-informed theory of what general intelligence is and what it would take to build it. Legg supplied the existential awareness — an unusually early and unusually rigorous understanding of what a successful AGI would mean for humanity, and why safety had to be treated as a first-order research problem rather than an afterthought. Suleyman supplied operational instinct and a set of social concerns — health, fairness, governance — that prevented the lab from becoming a monastery of pure theory disconnected from the world it was trying to help.

The tension between these three orientations would generate much of DeepMind's energy, and much of its internal conflict. Hassabis wanted to solve intelligence. Legg wanted to solve it safely. Suleyman wanted to deploy it usefully, quickly, and in ways that changed real lives. These goals are compatible in theory and, in practice, constantly in friction.

Mallaby writes from a position of knowing how the story eventually plays out for all three. Suleyman is described in the book as an estranged co-founder — he would later leave DeepMind under difficult circumstances, eventually surfacing as CEO of Microsoft AI. Legg would stay, becoming Chief AGI Scientist. Hassabis would remain CEO, accumulating more authority as the others departed or diminished.

The gang of three became, in time, a gang of one. But in 2010, with nothing yet built, the three-way tension felt like a feature, not a bug. DeepMind was a bet that idealism, mathematics, and pragmatism could hold together long enough to do something unprecedented.


Chapter 5: Atari

Before DeepMind could save humanity, it had to prove it could beat Breakout.

This chapter covers the period from 2010 to early 2014 — four years in which a small team in London, funded by a handful of believers and producing no commercial product, built the thing that would make the world take artificial general intelligence seriously. The proof of concept was an AI that learned to play old Atari video games. The significance was everything else.

The Lab Hassabis Built

From the start, Hassabis made a deliberate choice not to build DeepMind in Silicon Valley. London was not an accident. London gave him access to European academic talent, a culture less obsessed with rapid product iteration, and physical distance from the venture-capital orthodoxy that demanded revenue roadmaps and quarterly milestones. He wanted a research institution that happened to be incorporated as a company, not a company that happened to do research.

The early investors who said yes to this were, consequently, an unusual group. Peter Thiel — who had written in Zero to One about the difference between incremental improvement and genuine technological transformation — backed the company through Founders Fund alongside Luke Nosek, his PayPal co-founder, who joined DeepMind's board. Elon Musk wrote a cheque. Jaan Tallinn, the Skype co-founder turned AI-risk philanthropist, came in as an advisor. By the time of the Google acquisition in early 2014, the company had raised more than $50 million without releasing a single product or generating a dollar of revenue. These investors were, essentially, funding a philosophy.

What that money bought was freedom. Hassabis hired the brightest PhDs he could find from the world's best programmes — Cambridge, UCL, Toronto, Montreal — and told them to do blue-sky research. He himself worked nights, logging hours from ten in the evening until around four in the morning on top of his daytime work. "If you are trying to solve humanity's problems and understand the nature of reality," he said, "you don't have any time to waste." The culture set by that example was intense, focused, and, for the people who thrived in it, exhilarating.

By 2013 the team had approximately fifty researchers. It was tiny by the standards of what would come. But it was almost perfectly constituted for the problem in front of it.

The Problem Nobody Had Solved

Deep learning and reinforcement learning were, in 2012, two of the most promising threads in AI research — and almost universally treated as separate disciplines.

Deep learning, turbocharged by Geoffrey Hinton's group at Toronto, had just demonstrated on the ImageNet benchmark that convolutional neural networks could recognise objects in photographs better than any previous method. The key was that these networks could learn their own feature representations from raw data — you did not need to hand-engineer what "edge" or "curve" or "wheel" looked like; the network figured it out. This was a breakthrough in perception.

Reinforcement learning was a different tradition entirely: an agent takes actions, receives rewards or penalties, and learns a policy — a mapping from situations to actions — that maximises long-run return. It was mathematically elegant and had a strong theoretical foundation, particularly in the Q-learning framework developed by Chris Watkins in 1989. But it was fragile at scale. Neural networks had been tried with RL before, and the combination tended to explode: the training became unstable, the networks diverged, and the whole thing collapsed.

The two fields had, essentially, given up on each other.

Volodymyr Mnih understood both. He had done his master's degree at the University of Alberta in machine learning under Csaba Szepesvari, one of RL's leading theorists, before moving to Toronto for his PhD under Hinton himself. He arrived at DeepMind in 2013 with a rare bilingualism — fluent in the mathematics of deep networks and in the mathematics of sequential decision-making. Koray Kavukcuoglu, a neural-network specialist who had already joined the team, supplied the architecture expertise. Together they set out to make the combination work.

Why Experience Replay Changed Everything

The technical obstacle was a mismatch between what neural networks need and what reinforcement learning provides.

Neural networks train best on data that is independently and identically distributed — diverse, unorrelated samples drawn from the same underlying distribution. But an RL agent generates data sequentially, each observation causally following from the last: a ball bouncing right, then the paddle moving, then the ball bouncing left. These consecutive frames are highly correlated. Feed correlated data into a neural network and the gradient updates interfere with each other; the network spins in circles, overwriting what it just learned.

The fix was called experience replay, and it was conceptually simple enough that its power is almost surprising. Instead of training on each experience the moment it happened, the agent stored its experiences — (state, action, reward, next state) tuples — in a large memory buffer. During training, it sampled randomly from that buffer, pulling together experiences from wildly different points in the agent's history: a moment from an hour ago next to a moment from five minutes ago next to a moment from this morning. The temporal correlations were broken. The network saw something closer to the diverse, uncorrelated dataset it needed.

The second stabilising trick was a separate target network — a frozen copy of the main network whose weights were updated only periodically. This prevented the moving goalposts problem, where the network would destabilise itself by chasing a target that was itself changing with every gradient step.

Together, experience replay and the target network turned an unstable combination into a tractable one. The Deep Q-Network was born.

What It Did to Atari

The DQN system's input was nothing but raw screen pixels and the game score. No rules. No game-specific features. No human demonstrations. No knowledge of what the games were about. The agent saw what a human player would see, received a numerical reward when the score went up, and was otherwise on its own.

It was tested on seven Atari 2600 games — Pong, Breakout, Space Invaders, Seaquest, Beamrider, Q*bert, and Enduro — without any adjustment to the architecture between games. The results, published in December 2013 on arXiv and presented at the NIPS Deep Learning Workshop, were startling. DQN outperformed all previous approaches on six of the seven games. On three of them it surpassed the best human expert scores.

But the number that lodged in people's minds was not the score. It was the behaviour.

In Breakout — the game where a paddle bounces a ball against a wall of bricks — human players learn that the optimal strategy is to aim for a corner and drill a tunnel through the side, bouncing the ball behind the bricks for a cascade of automatic points. No one programmed this. The DQN agent, after enough training, figured it out independently. The machine had discovered a strategic insight that took human players years to develop, through nothing but trial and reward signal.

It had not been taught the tunnel strategy. It had invented it.

Why This Was Not About Games

Mallaby is careful here to explain why the games setting was not a gimmick. It was the point.

The whole critique of narrow AI — expert systems, chess engines, Go programs — was that each one was hand-crafted for its domain. The knowledge was in the code, not in the learning. DeepMind's claim, and the claim Hassabis had been making since his neuroscience PhD, was that general intelligence learns its own representations from experience and then transfers that capacity across domains.

The DQN paper demonstrated this with unusual clarity. The same architecture, the same algorithm, the same hyperparameters — seven games, zero domain customisation. When you asked the model to play Space Invaders, it was not running the Breakout program with a new skin. It was genuinely learning to play Space Invaders. The architecture was the constant; the intelligence was learned fresh each time.

That was what DeepMind had been claiming was possible. Now they had shown it.

The Acquisition

The NIPS presentation drew immediate attention from the major technology companies. Google, which had been monitoring AI research since the AlexNet shock of 2012, moved quickly. Acquisition talks with DeepMind began in 2013. Facebook was also interested, and Zuckerberg made an offer.

Hassabis chose Google — but not without conditions. The negotiation that produced the $650 million deal is covered in the next chapter. What matters here is what Google was buying: not a product, not a dataset, not a revenue stream. They were buying a demonstration that general learning was possible, and a team of fifty people who knew how to pursue it.

The Atari games were always proxy problems. What DeepMind was actually training, in those early London offices, was a method. The games were the simplest possible world in which to test whether an agent could learn to act. They passed the test. Everything that followed — Go, protein folding, the race with OpenAI — flows from those seven games and what the machine taught itself to do with a paddle and a ball.


Chapter 6: Thiel Trouble

There is a structural incompatibility between venture capital and blue-sky science that most AI founders discover only after they have already signed the term sheets. Venture funds have a lifecycle — typically ten years. They need their portfolio companies to reach a liquidity event inside that window: an acquisition, an IPO, a secondary sale. General intelligence research has a different lifecycle entirely. It requires decades of investment, infrastructure that costs billions, and a willingness to accept that the breakthroughs may not come in any predictable order.

DeepMind, by 2013, was about to collide with this incompatibility at speed.

The Chess Gambit That Opened the Door

Before the crisis, there was the original pitch — and it is worth dwelling on, because it captures something essential about how Hassabis operated.

In August 2010, he had what he later described as "literally one minute" with Peter Thiel, who was hosting his annual Singularity Summit at his California mansion. The room was full of people trying to pitch technology ideas. Hassabis had spent months thinking about how to use his minute. He had read everything he could about Thiel and found that Thiel had played chess as a junior. That was the opening.

Instead of leading with the business plan, Hassabis asked Thiel a chess question: why was the game so remarkable? His answer, delivered in the one minute he had: the creative tension that arises when you swap a bishop for a knight in certain positions. The bishop commands long diagonals; the knight covers squares the bishop can never reach. Neither is strictly better. Their co-existence is what makes the game inexhaustible.

Thiel, who had never considered chess in quite those terms, was intrigued. A meeting was secured. Within months he had invested £1.4 million — roughly $1.85 million — in a company that had not yet produced anything. He made the decision in a single meeting. He also initially wanted DeepMind to relocate to Silicon Valley. Hassabis talked him out of it.

Luke Nosek, Thiel's PayPal co-founder and a partner at Founders Fund, joined DeepMind's board. The seed was small but the names were large, and in the world of early-stage technology investment, names matter.

The Phone Call

The crisis arrived as a phone call, at an hour that suggested the news was bad.

Luke Nosek rang Hassabis and Suleyman to tell them that his partners at Founders Fund had decided they no longer wanted to lead DeepMind's Series C. The round had been structured around a $65 million target, with Founders Fund as lead. Without the lead, the round fell apart. Without the round, DeepMind — which had been burning through its earlier capital funding fifty-odd researchers and their computing infrastructure — was in serious trouble.

The cause was not a single dramatic falling-out. It was something more corrosive: an accumulating anxiety among institutional investors about what exactly DeepMind was. It was not a product company. It was not a services business. It did not have a revenue model, and it showed no sign of wanting one. Its founders described its goal as solving general intelligence and then using that solution to benefit humanity — a mission statement that is either the most important thing ever attempted or the most expensive way to never deliver anything, depending on your tolerance for ambition. Founders Fund's partners, when the moment of the larger commitment arrived, landed on the second interpretation.

Mallaby frames this not as a failure of Thiel or Nosek but as a structural feature of the situation. The DeepMind model — deep science, no product, indefinite timeline — was simply not a venture-backed business. The question was what kind of institution it was. And in late 2013, with cash running low and no revenue in sight, that question had become urgent.

Suleyman's Scramble

This is where Mustafa Suleyman's skills became, temporarily, the most important thing about DeepMind.

Where Hassabis was a scientist and Legg was a theorist, Suleyman was an operator — someone who had spent his career in rooms where the outcome was not determined by the best argument but by who held their nerve longest. He had run a mental health helpline at nineteen. He had negotiated with the UN. He knew how to project confidence into a vacuum.

In the immediate aftermath of Nosek's call, with the Series C in ruins, Suleyman turned to Solina Chau. She was the founder of Horizons Ventures, the vehicle through which Hong Kong billionaire Li Ka-shing deployed his private capital into technology. She and Hassabis had met in 2012 and bonded quickly — she was, unlike many technology investors, genuinely interested in the underlying science rather than the product roadmap. DeepMind had initially offered her a $2.5 million allocation in the round; she had wanted more.

Now they offered her more. Chau invested 13.6million.FoundersFund,despitepullingoutofthelead,contributed13.6 million. Founders Fund, despite pulling out of the lead, contributed 9.2 million to preserve its relationship and not be entirely absent. The round closed at just over 25millionlessthanhalfofthe25 million — less than half of the 65 million originally targeted.

It was enough to survive. It was not enough to be comfortable.

At some point in this period, Suleyman made a remark that Mallaby quotes with evident appreciation for its audacity. Faced with questions about whether DeepMind's backers would really fight for its independence, Suleyman said something to the effect of: "We've got Peter Thiel, Solina Chau, Elon Musk — all billionaires, all backing us." It was, by his own later admission, a bluff. Those investors were backing the company financially. Whether they were prepared to underwrite a decade-long campaign for AGI independence against the countervailing pull of Google's chequebook was a different question entirely, and the answer was clearly no.

The bluff worked, in the short term, because the audience did not call it. But it revealed the underlying reality: DeepMind had supporters, not guarantors. When the moment of reckoning came, the company would have to make its own decisions.

What the Crisis Revealed

Mallaby uses this chapter to make a broader argument about the economics of transformative research. The Atari breakthrough had been genuine — a scientific result that changed what people thought AI could do. But the venture-capital model rewarded that breakthrough by raising questions the founders could not yet answer: when does this become a product, and what does it cost? The better the science, the harder those questions became to dodge.

DeepMind had not been deceptive with its investors. Hassabis had always been explicit about the goal and the timeline. The problem was that clarity about a thirty-year scientific mission does not help a fund that needs an exit in ten years. The interests had always been misaligned; it had just taken the Series C to make the misalignment concrete.

The $25 million round bought runway, but not much. And from the far end of that runway, two very large buildings were visible on the horizon — one branded Google, one branded Facebook. Hassabis had, at most, a few months to decide which door to walk through, or whether to find a third option that did not yet exist.

The next chapter covers what happened at that door.


Chapter 7: Get Google

In the autumn of 2013, Elon Musk threw a birthday party at a rented castle in Napa Valley. It was the kind of occasion where invitations were themselves a signal — a gathering of people who believed technology was about to change civilisation, and who were jockeying over who would steer it. Demis Hassabis was there. So was Larry Page.

At some point in the evening, Page and Hassabis walked the castle grounds together, and Page made his pitch. It was not a sales pitch, exactly. It was closer to a logical argument. Hassabis's goal was artificial general intelligence. Building the computational infrastructure to pursue that goal — the servers, the power, the engineering talent — would take the best part of a career, and even then there was no guarantee. Google had already built that infrastructure. "Why don't you take advantage of what I've already created?" Page asked. If DeepMind's mission was to build AGI, why was building an independent company around that mission anything other than an unnecessary detour?

It was a remarkably effective pitch precisely because it was honest. Page was not offering money as a reward for past performance. He was offering a path to the thing Hassabis actually wanted.

Musk's Counter-Move

Elon Musk, who had been at the same party, had also been having a different kind of conversation with Page — an argument, by most accounts, that had turned personal. Page believed that machine intelligence was a natural evolutionary successor to humanity and saw no meaningful distinction between human and artificial consciousness. Musk thought this was dangerous and wrong. He was, he said, "pro-human."

After Page's pitch to Hassabis, Musk tried to intervene. He approached Hassabis directly and told him his view: "The future of AI should not be controlled by Larry." He then worked quietly with Luke Nosek to assemble alternative financing — a bid to acquire DeepMind independently, outside both Google and Facebook. The effort never produced a term sheet that reached DeepMind's board.

Musk's inability to stop the acquisition mattered beyond the transaction itself. It crystallised, for him, the urgency of creating a rival. OpenAI was co-founded in December 2015, fourteen months after Google closed on DeepMind. The birthday party argument had consequences that neither man fully anticipated.

The Dinner in Palo Alto

Simultaneously, Hassabis was running a parallel process with Facebook. Mark Zuckerberg was interested; Facebook's head of corporate development, Amin Zoufonoun, flew in to open talks. An offer took shape: a lower share price than Google's, but substantial founder bonuses to compensate. Suleyman flew to California to negotiate.

Hassabis evaluated Zuckerberg through a dinner at his Palo Alto home. He came with a diagnostic purpose rather than a sales pitch. After steering conversation to artificial intelligence, he widened it deliberately — to virtual reality, augmented reality, 3D printing. He watched how Zuckerberg responded. The response, as Hassabis later described it, was undifferentiated enthusiasm. Zuckerberg was equally excited about all of it. No technology registered as categorically more important than the others.

That was enough. "Facebook offered more money," Hassabis said, "but I wanted somebody who really understood why AI would be bigger than all these other things." Zuckerberg had failed the test — not because he lacked intelligence but because he lacked the specific conviction that Hassabis required in an acquirer. DeepMind was not looking for a buyer who thought AI was one interesting technology among several. It was looking for a buyer who thought AI was the technology, the one that would subsume or obsolete all the others.

Facebook, by this reading, wanted DeepMind as a feature. Google, or at least the Larry Page version of Google, wanted it as a mission.

Suleyman at the Table

Mustafa Suleyman's contribution to this chapter is the negotiation itself. Where Hassabis evaluated the philosophical alignment of acquirers, Suleyman handled the adversarial arithmetic.

His tactic, which he later described in terms that recalled his poker background, was to refuse to open on valuation. Instead of anchoring a price, he focused early conversations on research budgets — how much compute, how many hires, what operational independence would look like. By the time Google's lead negotiator Don Harrison introduced a "price per researcher" framework — valuing DeepMind's thirty to forty core staff at approximately $10 million each — Suleyman had already established a different framing of what was being bought. He and Hassabis pushed back, arguing the implied valuation was nearly half of what the company was worth. Facebook's competing interest, real or inflated in the telling, was their leverage.

The final number was $650 million. Zuckerberg later acknowledged, with evident good humour, that Hassabis had "used him to get a better deal from Google." The compliment was backhanded but accurate.

Safety as a Non-Negotiable

The conditions DeepMind extracted were, for January 2014, without precedent in a technology acquisition of this scale.

Hassabis and Suleyman demanded three things as non-negotiables. First: an independent ethics and safety review board — composed of scientists, philosophers, and domain experts — with authority over how DeepMind's technology could be used across all of Google. Second: a ban on military applications. Third: operational autonomy, with DeepMind remaining headquartered in London and controlling its own research agenda.

Google agreed to all three. The deal was announced on 26 January 2014.

Mallaby treats this moment with appropriate weight and appropriate scepticism. It was genuinely remarkable that an AI lab had made safety a centrepiece of an acquisition rather than an afterthought. No one in the industry had done this before. The ethics board demand in particular signalled that Hassabis and Suleyman understood, at least abstractly, that the technology they were building required oversight that no single corporate entity should control unilaterally.

What the Conditions Actually Produced

The ethics board met once. Its membership was never publicly disclosed. It was quietly superseded by Google's broader AI Principles policy, which allowed for applications with "potential negative impacts" as long as the benefits were judged to outweigh the risks — a standard flexible enough to accommodate almost anything.

The military ban, which had seemed absolute, gradually eroded. By 2024, DeepMind researchers were circulating an open letter protesting the company's involvement in military contracts, invoking the original conditions of the 2014 deal as a promise that had been broken.

Hassabis, reflecting on all this years later, offered an assessment that was either clear-eyed or self-exculpatory, depending on your view: "Safety isn't about governance structures. Even if you have a governance board, it probably wouldn't do the right thing when it came to the crunch."

This is, on one reading, wisdom — a hard-won recognition that structural solutions to power problems tend to be co-opted by the very power they were meant to check. On another reading, it is the rationalisation of a man who traded governance guarantees for resources and found, predictably, that the guarantees did not hold.

Mallaby does not adjudicate between these readings. He presents both, and lets the reader decide. What is clear is that the January 2014 acquisition gave Hassabis what he had actually come for: the computers. The ethics board was, at best, a statement of intent. At worst, it was a fig leaf that allowed a brilliant scientist to tell himself he had done what he could. Either way, DeepMind was now inside Google, with the computational resources of one of the world's largest technology companies behind it, and a mission that had just become several orders of magnitude easier to pursue.


Chapter 8: Intuition

There is a moment in the history of artificial intelligence that did more to change public understanding of what machines could do than anything that had come before — more than Deep Blue beating Kasparov, more than ImageNet, more than the Atari paper. It happened on the afternoon of 10 March 2016, in a game hall in Seoul, South Korea, when a computer program placed a black stone at the fifth line from the top, in an area of the board that no professional player would have touched.

The commentators fell silent. Lee Sedol, one of the greatest Go players in history, stared at the board for twelve minutes. Fan Hui — the European champion DeepMind had secretly beaten five months earlier and recruited as an advisor — watched from the sidelines. "It's not a human move," he said. "I've never seen a human play this move. So beautiful."

Move 37 had arrived. And with it, a question that Mallaby's chapter title names directly: does an artificial intelligence have intuition?

Why Go Was the Right Problem

By 2014, chess was closed terrain for AI ambition. Deep Blue had beaten Kasparov in 1997. The lesson drawn — that tree-search with good heuristics could solve board games — was, for the broader field, a cautionary tale more than a triumph. Chess had been solved by brute force made elegant; that was not the same as intelligence.

Go was different by several orders of magnitude. A standard 19×19 board generates approximately 2.1 × 10^170 possible positions — a number that exceeds the count of atoms in the observable universe by a factor greater than a googol. Chess, vast as it seems to the human player, has roughly 10^47 legal positions. Go's search space is not just larger; it is categorically beyond any enumeration strategy that compute power could reach in finite time. The branching factor — the number of legal moves available at each turn — averages around 250 in Go versus around 35 in chess. Any algorithm that worked by looking ahead a fixed number of moves would collapse.

For twenty years, Go programs had plateaued at high-amateur level. The game's resistance to AI was not incidental. It was a structural property. Evaluating a Go position requires something that looks, from the outside, like aesthetic judgment — an intuition about which formations are strong, which are fragile, which configurations will mature into advantage across dozens of moves. Human players develop this over decades of study. It cannot be calculated; it can only be learned. If an AI could play Go at the level of the world's best humans, it would have to have genuinely learned something, not just searched more efficiently.

This was exactly the kind of proof Hassabis needed. Not that a machine could be faster, but that it could be wiser.

The Architecture of Learned Intuition

AlphaGo's design reflected lessons drawn directly from the neuroscience research in Hassabis's PhD. The system used two neural networks in concert. The policy network — trained first on thirty million moves from high-level human games — learned to narrow the field of candidate moves: instead of treating all 250 possible moves equally, it identified the small subset worth thinking about. The value network learned to assess board positions: given a configuration, how likely is each player to win?

Neither network was sufficient alone. The policy network narrowed the search; the value network evaluated the terminal. Between them, a Monte Carlo tree search explored the remaining territory — simulating possible futures, weighting them by the value network's assessments, and propagating the results back to inform the current decision.

Then came the crucial step: self-play. AlphaGo played itself, thousands of times, learning from each game. The original human-derived training data established the starting point. Self-play was how the system exceeded it. As it played, it encountered positions no human had ever created, learned responses no human had ever demonstrated, and built a strategic vocabulary drawn from a space of games that had never existed.

This was Hassabis's hippocampus insight made operational. The policy network was memory — learned patterns from past games. Self-play was imagination — the projection of those patterns into novel configurations, the construction of possible futures that had never been seen. Intelligence, biological or artificial, was the combination of both.

Seoul

On 9 March 2016, AlphaGo and Lee Sedol sat down for the first of five games, broadcast live to more than 200 million viewers — a number that exceeded the Super Bowl audience and dwarfed anything the AI field had ever attracted. Lee had predicted he would win 5-0 or, if things went poorly, 4-1. "I don't think it will be a very close match," he said. He had watched video of AlphaGo's games against Fan Hui and concluded there were exploitable weaknesses.

He was not wrong that there had been weaknesses. He was wrong that they were still there. Between October 2015 and March 2016, AlphaGo had played more games than any human player manages in a lifetime.

AlphaGo won Game 1 by resignation. Game 2 began similarly. Then, on the 37th move, something happened that no one in the room — no commentator, no professional player, no member of the DeepMind team — had predicted.

Move 37

AlphaGo placed a stone at the 5th row of the board, in a broad, open area — a position that Go tradition classifies as a mistake. Professional strategy in Go is deeply codified: certain formations are correct, certain approaches are sound, certain early moves have been validated across millennia of play. A stone played on the 5th row in open space contradicts the accumulated wisdom of the game's entire history.

The probability that a human professional would play this move, calculated from training data, was roughly 1 in 10,000.

Lee Sedol left the table. He returned twelve minutes later, still processing. Commentator Michael Redmond, a 9-dan professional himself, stared at the position and said he didn't understand what AlphaGo was thinking. Then, over the next hundred moves, the logic became inescapable. The stone was not a mistake. It was the first move in a strategic sequence that no human player had conceived, that violated the intuitions shaped by centuries of expert practice, and that won the game.

Sergey Brin, who had flown to Seoul with Eric Schmidt and Jeff Dean by this point, watched the game and said afterwards: "AlphaGo actually does have an intuition. It makes beautiful moves."

Mallaby's chapter title turns on this. Brin was not speaking precisely — AlphaGo has no subjective experience, no feeling of certainty or aesthetic pleasure. But from the outside, the output was indistinguishable from intuition. A judgment arrived at that was not the product of calculation any human could follow, that violated received wisdom, that turned out to be correct. The word Brin reached for was the most honest one available.

The Divine Move and the Human Cost

Game 4 produced its own historic moment, operating in the opposite direction. Lee Sedol, having lost three straight and facing elimination, played the 78th move of the fourth game — later called the "divine move," a counterattack so unexpected that AlphaGo's response collapsed into incoherence. The program began making moves that its own evaluation functions would have rejected, what observers described as hallucinations — a system designed to optimise, suddenly unable to find the thread. Lee won by resignation.

He described the feeling of that single victory as giving him "unparalleled warmth." The framing is telling. A 9-dan professional, the best human player of his generation, felt warmth — not triumph, not pride, but something closer to relief — from winning one game out of five against a machine.

AlphaGo won Game 5. The final score was 4-1.

At the press conference, Lee said: "I don't know what to say, but I think I have to express my apologies first. I want to apologize for being so powerless. I've never felt this much pressure, this much weight." He was at pains to clarify that Lee Sedol had lost, not humanity. But the distinction felt fragile. In 2019, Lee retired from professional Go. He cited, among his reasons, the rise of AI programs that had become unbeatable. He could no longer find joy in the game.

Hassabis, for his part, could not fully celebrate. He knew too well the feeling of losing after a fierce competition, he said. He was also thinking about what the result meant, and what it demanded next.

What AlphaGo Zero Proved

After the Lee Sedol match, DeepMind built AlphaGo Zero — a version trained on no human data at all. It began from random play and learned entirely through self-play. Within three days it surpassed the version that had beaten Lee Sedol. The final record: AlphaGo Zero defeated AlphaGo Lee 100-0.

The implication was unsettling in a way the original victory had not been. AlphaGo had beaten the best human by learning from humans and then transcending them. AlphaGo Zero beat AlphaGo by learning from nothing human at all. Human knowledge of Go — thirty million games, a five-thousand-year tradition — turned out to be a ceiling, not a floor. The machine that started from scratch performed better than the machine that had studied everything humanity knew.

The same principle that Hassabis had intuited in his neuroscience lab now had a data point attached to it. Intelligence constrained by what humans had already discovered was still, at its core, derivative. Intelligence allowed to explore freely would exceed it. The point of building AGI was not to replicate human capability. It was to discover what lay beyond it.


Chapter 9: Out of Eden

When DeepMind agreed to be acquired by Google in January 2014, Hassabis and Mustafa Suleyman extracted a set of conditions unusual in the history of Silicon Valley acquisitions: operational autonomy, a ban on military applications, and — the centerpiece — an independent ethics board that would oversee not just DeepMind's AI work, but AI development across all of Google. It was a remarkable demand to make of the world's most powerful technology company, and Google agreed to it. The ethics board would be, they believed, a structural guarantee that the technology they were building would not be misused.

Eighteen months later, that board held its first real meeting. It was a disaster.

The "Speciesist" at the Birthday Party

To understand what happened, you need to understand Larry Page. Google's co-founder had spent years thinking about the long-term trajectory of intelligence — not as a software engineer optimizing systems, but as something closer to a cosmologist. He had reached conclusions that most people found either thrilling or horrifying.

Page believed that digital superintelligence replacing biological human intelligence would simply represent the next step in cosmic evolution: survival of the fittest, playing out at the scale of information rather than genetics. He had, according to multiple accounts in Mallaby's book, "contemplated uploading human consciousness to computers and believed in technology's inherent superiority over biological life." He was not, in other words, particularly concerned about the risk that machines might one day surpass humans. He thought that was the point.

This worldview collided head-on with Elon Musk's at Musk's 44th birthday celebration — a three-day event at a Napa Valley resort arranged by his then-wife Talulah Riley. The two men had been close friends for years. After dinner, with other guests looking on, they got into an argument about AI.

Page described his vision: a future where humans merged with machines, where various forms of intelligence competed, and where the best won. Musk raised concerns about human safety, about the value of human consciousness, about the speed and recklessness of the rush toward more powerful systems. Page dismissed these concerns. He accused Musk of being a speciesist — a word imported from the animal-rights movement — treating silicon-based life forms as inferior simply because they weren't carbon-based.

Musk's reported response: "Well, yes, I am pro-human, I fucking like humanity, dude."

The two men stopped speaking not long after. Mallaby describes Page as viewing these concerns as "sentimental nonsense." From Page's perspective, machine supremacy was not a threat to resist — it was natural progress to welcome. That someone building rockets and electric cars would turn up at his ethics board and argue for restraint struck Page as incoherent.

The Meeting at SpaceX

The first significant convening of the AI safety framework DeepMind had extracted as a condition of its acquisition took place in August 2015. Musk hosted it at SpaceX headquarters. The guest list was extraordinary: Hassabis and Suleyman, Page and Eric Schmidt, Reid Hoffman, and other senior figures from the technology industry.

Hassabis came with a coherent theory of why they needed such a meeting. He called it, loosely, the "singleton" scenario: rather than a chaotic race between competing labs and nations, AGI should be developed by a single, cooperative global effort — something like a Manhattan Project run under collective governance, with safety as the organizing constraint. "AGI is infinitely bigger than a company or a person," he said. "It's humanity-sized really." The implication was that it required humanity-sized coordination, not competitive fragmentation.

The meeting lasted hours. It ended without a single agreement, a shared framework, or a path forward.

What overwhelmed the discussion was not a deficit of intelligence in the room, but an abundance of incompatible convictions. Page and Musk had by this point already gone from friends to adversaries. The "speciesist" confrontation had poisoned any possibility of intellectual alignment. Page's view that machine supremacy was natural and desirable was simply irreconcilable with Musk's view that it was an existential catastrophe to be resisted. Hassabis's singleton vision required a baseline agreement that the stakes were enormous and that coordination was therefore necessary. Page did not share that baseline.

Musk later called the safety council "basically bullshit." Suleyman, reflecting on it years later, acknowledged: "We made a lot of mistakes in the way that we attempted to set up the board, and I'm not sure that we can say it was definitively successful."

Hassabis eventually concluded something darker about the whole endeavor: "Safety isn't about governance structures... discussing these things didn't really help."

The Counter-Offensive

What Musk took away from the SpaceX meeting was not a plan for cooperation. It was intelligence. He had now seen, from close range, exactly what DeepMind was building and how far along it was. And he had confirmed that the one institution best positioned to develop AGI — the one with the talent, the resources, and the organizational commitment — was controlled by Larry Page, a man who thought machine supremacy was basically fine.

This was not a situation Musk could tolerate.

He had already tried the direct approach. When Google had approached DeepMind for acquisition in 2013, Musk had phoned Hassabis directly, told him "the future of AI should not be controlled by Larry," and reportedly attempted to assemble financing to buy DeepMind himself — including, per one account, a frantic hour-long Skype call from a closet at a Los Angeles party. Google closed the deal anyway.

After the SpaceX meeting, Musk turned to Sam Altman.

On May 25, 2015, Altman sent Musk an email that would become, years later, a piece of legal evidence: "I've been thinking a lot about whether it's possible to stop humanity from developing AI. I think the answer is almost definitely not. If it's going to happen, it seems like it would be good for someone other than Google to do it first."

Altman proposed a new kind of institution — a nonprofit AI lab modeled structurally on the Manhattan Project, where the technology would "belong to the world" but the researchers would receive startup-like compensation if it worked. The purpose, explicitly, was to create a counterweight to Google DeepMind's near-monopoly on elite AI talent and capability.

Over the following months, Musk, Altman, and Reid Hoffman worked through the details, eventually recruiting Ilya Sutskever — one of the most respected deep-learning researchers in the world, then at Google Brain — as a co-founder. OpenAI was publicly announced in December 2015, co-chaired by Altman and Musk, with an initial pledge of $1 billion.

Musk later wrote: "OpenAI was created as an open source (which is why I named it 'Open' AI), non-profit company to serve as a counterweight to Google."

What the Founding Destroyed

When Hassabis learned about OpenAI, he felt something close to betrayal. Musk had attended the safety meeting in what seemed like good faith — and then used the intelligence gathered there to launch a competing lab whose founding premise was that DeepMind was the threat to be countered.

Mallaby notes the deeper irony: Musk had founded OpenAI ostensibly out of AI safety concerns, but by doing so, he had ended any remaining possibility of the cooperative global approach Hassabis had argued for. The singleton scenario — one cautious, well-resourced lab developing AGI in coordination with humanity — required exactly the kind of collaborative trust that the OpenAI founding destroyed. Once you had two well-funded labs explicitly positioned as rivals, the incentive structure changed. Speed became paramount. The first mover would set the terms. Racing, not caution, became the dominant logic.

There is a further twist that Mallaby makes much of: once Musk launched OpenAI as an explicitly anti-Google, anti-Hassabis venture, he forfeited his ability to monitor DeepMind's progress from the inside. The informal intelligence network he had cultivated — the board memberships, the friendly dinners, the safety meetings — evaporated. He was now a competitor, and competitors don't share what they know.

By December 2015, the brief window in which the major actors in AGI development were still speaking to each other, still attending the same meetings, still imagining some kind of shared governance, had closed. The world that Hassabis had envisioned — where building AGI was a collective human project managed with collective human caution — was over before it had really begun.

Mallaby calls this chapter "Out of Eden." The title is apt. The fall is not dramatic. There is no single decision or betrayal that tips everything over. It is the accumulation of incompatible worldviews, competitive incentives, and the structural pressure that every arms race creates: the fear that the other side is moving faster, that your restraint is their advantage, that caution is surrender.

In 2016, Musk wrote privately that DeepMind was causing him "extreme mental stress." He feared that if Hassabis's lab achieved AGI first, it would produce what he called "one mind to rule the world" — an AGI dictatorship under a single institution's control. His solution had been to add another mind to the race. Whether this made the outcome safer or simply faster is a question Mallaby leaves, pointedly, unanswered.


Chapter 10: P0 Plus Plus

Mustafa Suleyman's mother was an NHS nurse. He grew up watching her leave for shifts at the hospital the way other parents left for offices — the uniform, the hours, the weight of it. When he eventually found himself inside DeepMind, one of the most technologically powerful organizations in the world, and asked himself what that power should be for, the answer arrived quickly: something like what his mother did, but at scale.

This is not a sentiment Suleyman would have framed so simply. He was not a sentimental person by reputation — he was an operator, the one who got things done while Hassabis thought and Legg theorized. But the biographical resonance is hard to miss, and Mallaby does not miss it. The man who would launch DeepMind's most ambitious social application, who would pursue it with a priority designation that literally exceeded the highest category in Google's engineering vocabulary — P0 Plus Plus, meaning more urgent than a showstopper, beyond even the maximum — was, at some level, trying to do something for the institution that had employed his mother.

The Problem Worth Solving

Suleyman needed a problem commensurate with the tools. He found it in acute kidney injury.

AKI — a sudden, severe decline in kidney function — is responsible for up to 100,000 deaths per year in UK hospitals. About 30 percent of those deaths are considered preventable with timely intervention. The detection problem is peculiar: blood test results that indicate kidney deterioration come back hours after the blood is drawn, scattered across systems that no single clinician monitors continuously. A patient can slip from warning signs into crisis while the relevant data sits in a results queue, waiting for someone to look.

The technical solution was not complicated. If you monitored every incoming blood test result in real time and fired an alert when the numbers crossed a threshold, you could catch what the system was missing. The challenge was institutional: NHS hospitals were, as Suleyman put it publicly, "badly let down by technology" — still reliant on pagers, fax machines, and paper records. The gap between what was technically feasible and what was clinically deployed was not a gap of capability. It was a gap of incentive, inertia, and IT infrastructure.

Enter Dr. Dominic King. A general surgeon by training, King had spent years at Imperial College's HELIX Centre — the first design center embedded in a European hospital — where he had built HARK, a clinical task management app designed to replace pagers. It worked. It didn't matter. The NHS's institutional inertia made it nearly impossible to deploy. King cold-emailed Suleyman in late 2015. Suleyman was struck by King's clinician-centered design philosophy, the idea that the technology had to serve the people standing at the bedside, not the administrators reviewing dashboards. DeepMind acquired HARK in early 2016 and incorporated it into what became Streams. King became Clinical Lead at DeepMind Health. "It was a big step leaving medicine," he said, "but I really felt that this was a unique opportunity to put advanced technology at the service of patients, nurses and doctors."

What Streams Did

Streams was a smartphone app. On a hospital ward, it appeared simple — an alert arriving on a nurse's phone, a patient's name, a blood test value, a recommended action. Behind that alert was continuous monitoring of the hospital's entire electronic record system in real time, cross-referenced against the national NHS AKI algorithm, firing notifications the moment a patient's results crossed a risk threshold. The alert included the patient's relevant test history and clinical context: everything needed to act, delivered in under a minute from the moment results landed in the system.

The numbers from the Royal Free deployment were striking. AKI recognition for emergency cases rose from 87.6 percent to 96.7 percent. The average time from blood test availability to specialist review fell to 11.5 minutes — previously it could take several hours. Missed AKI cases dropped from around 12 percent to 3 percent. The cost of care per AKI patient fell from £11,772 to £9,761 — a saving of more than £2,000 per patient. The results were published in peer-reviewed journals, studied by independent researchers, and confirmed: the technology was doing what it claimed to do.

Streams was, in the most straightforward sense, saving lives. The question was what it had cost to build it.

The Agreement Nobody Read

On September 29, 2015, Google UK Limited and Royal Free NHS Foundation Trust signed an eight-page Information Sharing Agreement. Data transfer began on November 18 — before any public announcement that the project existed. Live testing of Streams began in December.

What the agreement actually covered was considerably broader than "an AKI alert app." Royal Free gave DeepMind access to 1.6 million patient records — every patient who had used the trust's three hospitals over the preceding five years. The records included blood test results, HIV status, details of drug overdoses and abortions, records of A&E visits, and notes from routine hospital appointments that had nothing whatsoever to do with kidney function. Only roughly one in six of those 1.6 million records had any plausible connection to AKI.

The contractual language permitted DeepMind not just to run the AKI alert but to build "real time clinical analytics, detection, diagnosis and decision support to support treatment and avert clinical deterioration across a range of diagnoses and organ systems" — a much wider mandate. The data was to be used for something called "Patient Rescue," described as "a proof of concept technology platform that enables analytics as a service for NHS Hospital Trusts." The contract also permitted machine learning applications, despite Suleyman's public assurances that "there's no AI or machine learning" in Streams.

Both parties claimed legal cover under the "direct care" exception — the rule that patient data can be used without explicit consent when the purpose is the direct care of that specific patient. The argument required contorting the concept until it broke. The vast majority of those 1.6 million people had not been tested for AKI. Many had been discharged. Some had died. There had been no privacy impact assessment before the data transfer began. A self-assessment was completed in December 2015, after the data was already on Google-controlled servers.

The Reckoning

On April 29, 2016 — more than seven months after data transfer had begun — New Scientist published an investigation revealing what had actually happened. The public had no idea. There had been no notification to patients, no consent mechanism, no press release disclosing the volume of records involved. When the scale of what had been shared became clear — 1.6 million records, including HIV diagnoses and overdose histories — the reaction was swift and furious.

The Information Commissioner's Office investigated and ruled in July 2017 that Royal Free NHS Foundation Trust had failed to comply with the Data Protection Act 1998. The ICO found that patients "were not adequately informed that the processing was taking place," that the volume of data was "excessive, unnecessary and out of proportion," and that the "direct care" legal basis was not satisfied. The hospital was required to sign an undertaking committing to robust privacy impact assessments for any future projects. No fine was imposed — a leniency widely criticized.

The most withering assessment came from academic researchers rather than regulators. Dr. Julia Powles and Hal Hodson, in a peer-reviewed paper published in the journal Health and Technology, called the deal a "cautionary tale for healthcare in the algorithmic age." Their core observation was merciless: "The hospital sent doctors to meetings while DeepMind sent lawyers and trained negotiators." Both sides had failed to engage in "any conversation with patients and citizens," which they called inexcusable. And then the line that captured the structural problem with precision: "Once our data makes its way onto Google-controlled servers, our ability to track it is at an end."

DeepMind's official response was, credit where it's due, genuinely candid. "In our determination to achieve quick impact when this work started in 2015, we underestimated the complexity of the NHS and of the rules around patient data," the company wrote. "We were almost exclusively focused on building tools that nurses and doctors wanted, and thought of our work as technology for clinicians rather than something that needed to be accountable to and shaped by patients, the public and the NHS as a whole. We got that wrong."

The Cost of Getting It Wrong

The scandal did more than damage DeepMind's reputation. It crystallized a contradiction at the heart of the applied AI project that Suleyman had built his career around.

The technology genuinely worked. The lives saved were real. The £2,000 per patient reduction in care costs was documented in a peer-reviewed journal. None of that was in dispute. But the means by which DeepMind had acquired the data to build and train the system violated the reasonable expectations of every one of those 1.6 million patients — people who had presented at a hospital for care, submitted their most sensitive information in a moment of vulnerability, and had it transferred to a technology company's servers without their knowledge.

Suleyman had spent his career thinking about power asymmetries — how institutions systematically failed the people they served, how technology could be used to shift those asymmetries toward ordinary people rather than away from them. The NHS data scandal demonstrated that even genuine commitment to social good does not automatically produce the governance structures that social good requires. Moving fast to save lives looks, from one angle, like urgency. From another, it looks like taking without asking.

In late 2018, Google announced that DeepMind Health would be folded into a new Google division. The DeepMind Health brand was dissolved. The project Suleyman had built — the one he had classified internally as beyond the maximum priority, as P0 Plus Plus — was absorbed by the corporate parent whose acquisition he had helped engineer. He was removed from its day-to-day leadership.

In August 2019, Suleyman was placed on administrative leave following complaints from DeepMind staff about his management style. He later said: "I accepted feedback that, as a co-founder at DeepMind, I drove people too hard and at times my management style was not constructive. I apologize unequivocally to those who were affected." He announced his departure from DeepMind in December 2019.

The man who had co-founded the organization that would eventually win a Nobel Prize left not in triumph but in a dispute about how he had treated the people working for him. The social good he had pursued had, in the end, been pursued in a way that replicated the very institutional failures he had set out to correct: moving fast, assuming good intentions were sufficient, and not asking the people most affected what they actually wanted.


Chapter 11: The Agent and the Transformer

In 2021, David Silver — the lead architect of AlphaGo — co-authored a paper in the journal Artificial Intelligence with the title "Reward is Enough." The argument was precise and sweeping: the objective of maximizing reward is sufficient, on its own, to drive behavior that exhibits "most if not all attributes of intelligence," including perception, language, social intelligence, and generalization. Everything cognition does, the paper claimed, could be understood as optimization toward reward in a rich environment. Evolution had taken millions of years to find this solution. Reinforcement learning could get there faster.

The paper was DeepMind's philosophical flag planted in the ground. It was also, with the benefit of hindsight, a monument to the conviction that would cost DeepMind years.

The Case for Reward

Hassabis's approach to AGI had always been rooted in his neuroscience training. The hippocampus, which he had studied at UCL, doesn't store knowledge as a lookup table — it builds compressed, generalizable models of the world through experience. The brain learns by acting and being wrong. Reward signals — the release of dopamine after success, its absence after failure — shape neural connections over time into something we call understanding. This is the biological story. RL is its mathematical abstraction: an agent in an environment, taking actions, receiving rewards, adjusting its policy.

This was not just a technical preference. It was a theory of mind. And it was reinforced by DeepMind's greatest victories. DQN mastered Atari through reward. AlphaGo mastered Go through reward and self-play. AlphaGo Zero, starting from nothing, surpassed everything humanity had learned about Go in five thousand years, through reward and self-play alone. The pattern was consistent enough to feel like proof.

The strategic implication was that DeepMind should be building agents — systems placed in environments, pursuing objectives, developing general capabilities through the pressure of performance. Not systems trained to predict the next word in a text corpus. That was pattern matching, not intelligence.

The Generalist Problem

The research question that occupied DeepMind's applied RL teams through the mid-to-late 2010s was generalization. The DQN result had been impressive, but it trained a separate network for each Atari game from scratch. It couldn't transfer what it had learned about Breakout to Space Invaders. Each deployment was a blank slate. That wasn't how brains worked. The goal was agents that could carry knowledge across domains.

Koray Kavukcuoglu — one of DeepMind's earliest researchers, a PhD student of Yann LeCun's, the man whose citations now exceed 290,000 — led much of this work. The Asynchronous Advantage Actor-Critic (A3C) system, published in 2016, ran multiple agents in parallel across different environments, sending gradients back to a shared network. For the first time, a single architecture achieved strong performance across all 57 Atari games simultaneously, while also succeeding at 3D maze navigation and continuous motor control. The same algorithm, the same network structure, different environments.

Then in 2018 came IMPALA — Importance Weighted Actor-Learner Architecture — the most serious attempt yet. A single network, trained on all 30 tasks in DMLab-30: three-dimensional navigation, memory challenges, language-grounded foraging, object interaction, instruction-following. The results showed something compelling. Training on many tasks didn't make the agent worse at individual tasks — it made it better. The generalist was outperforming the specialist. Positive transfer was real.

Meanwhile, Oriol Vinyals and the AlphaStar team were attacking StarCraft II, a problem that dwarfed anything attempted before. Unlike chess or Go, StarCraft had imperfect information, real-time execution at 22 actions per second, hundreds of units to control simultaneously, and genuine strategic diversity across three separate races. AlphaStar used a "League" training system — a diverse ecosystem of agents, including specialized "exploiter" agents designed to find weaknesses — and trained on human replays before RL even began. In January 2019, it defeated professional players in live matches. Its neural architecture incorporated transformer-style attention mechanisms to let the agent reason about different units simultaneously.

That last detail was no coincidence. By 2019, the architecture that had been invented across the building — at Google Brain, not DeepMind — was beginning to appear everywhere.

Eight Authors in a Hallway

On June 12, 2017, eight researchers at Google posted a paper to arXiv titled "Attention Is All You Need." The authors were a deliberately randomized list — they rejected the traditional status ordering, listing themselves as equal contributors. The youngest, Aidan Gomez, was a 20-year-old intern from the University of Toronto. The most technically central, Noam Shazeer, had been at Google since 2000 and had co-invented sparsely-gated mixture of experts, a technique that would become critical to large-scale LLMs. The name "Transformer" was chosen by Jakob Uszkoreit because he simply liked the sound.

The problem they were solving was a fundamental bottleneck in sequence modeling. The dominant architecture at the time was the LSTM — a recurrent neural network that processed text token by token, in sequence. To understand word 10, you had to finish processing words 1 through 9 first. This made training inherently sequential, impossible to parallelize across the GPU hardware on which modern AI runs. As Shazeer later summarized the constraint: "Arithmetic is cheap and moving data is expensive on today's hardware."

The transformer eliminated recurrence entirely. In its place: self-attention, a mechanism in which every word in a sentence looks directly at every other word simultaneously, computing a relevance score to decide how much to attend to each. The whole sentence is processed at once, in parallel. Multi-head attention runs this operation multiple times in parallel, letting the model attend to syntax, semantics, and long-range dependencies at the same time. The result: not just better translation, but training that scaled linearly with compute.

Jakob Uszkoreit believed this would work. His own father, Hans Uszkoreit — a prominent computational linguist — was skeptical. The idea of discarding recurrence felt like discarding the machinery of time itself. When Shazeer first heard the proposal, his reaction was characteristically direct: "Heck yeah!"

On the WMT 2014 English-to-German benchmark, the transformer scored 28.4 BLEU — surpassing every previous model. On English-to-French: 41.8 BLEU, trained on 8 GPUs in 3.5 days. NeurIPS reviewers were immediately enthusiastic; one reviewer noted it was "already the talk of the community."

Within five years, the paper would accumulate more than 173,000 citations — among the ten most-cited scientific papers of the 21st century, across all fields. The transformer became the foundation of GPT, BERT, PaLM, Claude, Gemini, and every large language model that followed.

The Architecture Google Gave Away

The irony that Mallaby dwells on is exquisite. Google Brain invented the architecture. Google published it openly. Then all eight authors left Google.

Six of them founded startups. Vaswani and Parmar co-founded Adept AI. Shazeer co-founded Character.AI — Google eventually paid approximately 2.7billiontobringhimback.AidanGomez,the20yearoldintern,cofoundedCohere.UszkoreitfoundedInceptive.LukaszKaiserwenttoOpenAI,helpingbuildthemodelsthatwouldeventuallyblindsideGoogle.Together,thesixfoundersraised2.7 billion to bring him back. Aidan Gomez, the 20-year-old intern, co-founded Cohere. Uszkoreit founded Inceptive. Lukasz Kaiser went to OpenAI, helping build the models that would eventually blindside Google. Together, the six founders raised 1.3 billion from outside investors. Two of the resulting companies became unicorns.

The architecture invented inside Google powered the competitive threats to Google. The open publication was the mechanism by which this happened.

But there is a second irony that runs specifically through DeepMind. The transformer was not invented by DeepMind. It was invented by Google Brain. And for years, the two organizations operated as parallel research groups under the same corporate roof, with explicit institutional separation and what insiders describe as "barely concealed mutual contempt." A former DeepMind researcher later said that colleagues "got in trouble for collaborating on a paper with Brain because the thought was like, 'why would you collaborate with Brain?'" The intellectual divide was not just organizational. It was philosophical.

The Deep Disagreement

Hassabis understood the transformer. His position was not ignorance — it was a principled disagreement about what intelligence actually requires.

His argument, stated consistently across interviews through this period, was that transformers were "almost unreasonably effective for what they are" — but that they probably weren't sufficient for AGI. What they lacked was what he called a world model: an internal causal representation of reality that would allow an agent to plan, reason counterfactually, understand physical consequence, and generalize to genuinely novel situations. LLMs, in his view, were extraordinarily powerful pattern completers. They learned statistical regularities in language. But statistical regularity in language is not the same as understanding the world that language describes.

The "Reward is Enough" thesis was the same argument from the other direction: intelligence is what you get when you optimize toward reward in a rich environment. Prediction of the next token — which is what language model training amounts to — is not that. It is something else: sophisticated, useful, even astonishing. But not the path to AGI.

This conviction was coherent. It was defensible. It was consistent with DeepMind's track record. And it cost the lab the years between 2018 and 2022, during which OpenAI quietly built the scaling infrastructure, the dataset pipelines, and the RLHF training techniques that turned transformers from a research result into ChatGPT.

When Mallaby presses Hassabis on this, the admission is partial but real. "We've always had amazing frontier work on self-supervised and deep learning," Hassabis said in one interview, "but maybe the engineering and scaling component — that we could've done harder and earlier." That is, in its careful hedging, an acknowledgment of a strategic miscalculation at institutional scale.

Gato and the Convergence

In May 2022, six months before ChatGPT, DeepMind published "A Generalist Agent" — introducing a model called Gato. The same 1.2 billion parameter transformer, with a single set of weights, performed 604 distinct tasks: playing Atari games, captioning images, engaging in dialogue, stacking blocks with a physical robot arm, navigating 3D environments. The central technical insight was serialization: every modality — images, robot joint angles, text, game controllers — was converted into the same format, a flat sequence of tokens. Then the transformer predicted the next token, exactly as a language model does. The robot arm and the Atari game and the captioning task were, to the network, the same kind of prediction problem.

Gato was DeepMind finally integrating the transformer fully into its generalist agent work. It was, in a sense, the vindication of both camps simultaneously: the RL generalization hypothesis (one system, many tasks) realized through the transformer architecture (universal sequential prediction).

The performance was competent, not superhuman — on many tasks, Gato performed above 50 percent of expert-level benchmarks, impressive in breadth but outclassed by specialists in depth. Critics argued that being mediocre at many things was not the flexible intelligence that mattered. But the architectural demonstration was real: one set of weights could span robot control, image understanding, language, and game-playing simultaneously.

Then ChatGPT launched. And the world discovered that a transformer didn't need to control robot arms or play Atari to produce something that felt, to hundreds of millions of people, like genuine general intelligence.

DeepMind had invented the generalist agent thesis. Google Brain had invented the architecture. OpenAI had combined them — RL from human feedback, applied to a scaled transformer — and shipped it to the public first. The intellectual synthesis happened outside the building where the two halves had spent nearly a decade refusing to collaborate.


Chapter 12: On Language and Nature

In September 2016, a DeepMind team led by Aaron van den Oord published a paper describing a system that could synthesize human speech from raw audio waveforms. WaveNet reduced the gap between state-of-the-art text-to-speech and actual human speech quality by more than 50 percent in blind listening tests. It could also generate music — piano pieces, unbidden, emerging from the same architecture used for speech.

The result was striking. What made it significant was the method.

WaveNet discarded everything that speech synthesis had accumulated over decades: the phoneme dictionaries, the acoustic vocoders, the signal-processing models derived from first principles of how the human vocal tract works. Instead, it modeled a raw audio waveform — 16,000 samples per second — one timestep at a time, each sample conditioned on everything that came before. The technical innovation was dilated causal convolutions: a way of stacking convolutional layers with exponentially increasing gaps between them, so the model's effective window over time grew exponentially with depth. The result: a system that could capture the long-range temporal dependencies of speech without ever being told what speech was.

The researchers themselves were candid about their surprise: "The fact that directly generating timestep per timestep with deep neural networks works at all for 16kHz audio is really surprising." They had not derived WaveNet from a theory of speech. They had applied a general framework for sequential prediction to raw data and discovered it worked better than decades of engineered acoustic models.

The Waveform and the Sequence

The principle WaveNet demonstrated was not specific to audio. Van den Oord had established it first for images, treating each pixel as a value to be predicted from all previous pixels, in a paper called PixelRNN. The same factorization — the joint probability of any high-dimensional signal expressed as a product of conditional probabilities over its elements, in order — worked for images, for audio, and, as the transformer paper would show the following year, for language.

The deeper claim was epistemological: natural signals, however complex, contain learnable statistical structure. You do not need to understand the domain. You need enough data and a network with sufficient capacity to model sequential dependencies. The domain knowledge that engineers had spent careers encoding into AI systems — the phonological rules, the acoustic physics, the grammatical structures — turned out to be unnecessary. The structure was in the data.

This insight would eventually reach biology.

A Protein is a Sentence

A protein is, at its most basic level, a string of characters. The twenty standard amino acids are each assigned a single letter — A, C, D, E, F and so on — and a protein sequence is just a string of those letters, typically a few hundred to a few thousand characters long. A protein with 300 amino acids is a sentence 300 characters long in a 20-letter alphabet.

More importantly, it is an information-complete specification. This is Anfinsen's theorem — the insight for which Christian Anfinsen received the 1972 Nobel Prize in Chemistry: the complete three-dimensional structure of a protein, and therefore its biological function, is entirely determined by its amino acid sequence. Nothing else is required. The sequence is not a summary of the protein; it is the protein's full specification, encoded in linear form. If you knew how to read the sequence, you could reconstruct everything about the molecule.

Loading…