On Intelligence, Chapter by Chapter: A 2004 Book That Predicted Half of Modern AI
A 2004 book about brains argued that intelligence is, fundamentally, prediction. Twenty-two years later, the dominant paradigm in AI is literally trained to predict the next token. That book deserves another reading.
On Intelligence by Jeff Hawkins (with Sandra Blakeslee) is one of those rare technical books whose central claim has aged well in the most awkward way possible. The framework was right about what the brain does. It was almost certainly wrong about how you should engineer a machine to do it. And it is still the cleanest mental model I know for explaining why your LLM hallucinates with such confidence.
What follows is a chapter-by-chapter summary written for an engineer who is shipping AI features in 2026, not for a neuroscience seminar. I'll resist the temptation to relitigate every claim and just give you the spine, with a working engineer's annotation where the chapter has something to say about what you're building next week.
Prologue: A Palm Pilot Founder Goes Looking for the Brain
The prologue is autobiographical and load-bearing. Hawkins opens by declaring that two passions have animated his life: mobile computing and brains.
The first made him rich and famous. He founded Palm Computing in January 1992 and Handspring with Donna Dubinsky and Ed Colligan in 1998. He was the architect of the PalmPilot, the Visor, and the Treo — the device that arguably gave the modern smartphone its shape. The second passion was the actual obsession, and the book exists because he never managed to let it go.
The origin story Hawkins tells is unusually specific. He graduated from Cornell with a B.S. in electrical engineering in June 1979. As a teenager he had drawn up four "fundamental questions" he wanted to answer with his life, and the last and biggest of them was how intelligence works. He went to his local library looking for the canonical book on the brain and discovered, to his bewilderment, that no such book existed — because no one had any idea how the brain actually worked.
Three months after graduation, the September 1979 issue of Scientific American arrived: a special issue on the brain. Francis Crick, the co-discoverer of DNA's double helix, contributed an essay titled "Thinking About the Brain." Crick's complaint cut through everything else in that issue. Neuroscience had accumulated mountains of data about neurons and chemistry, but it lacked, in his phrase, "a broad framework of ideas" that would tell anyone what those data meant. Hawkins read this and decided, with the kind of certainty only a recent twenty-two-year-old can muster, that finding that framework would be his life's work.
The execution took the long way around. He went to Intel as a software engineer in Portland, then wrote a proposal arguing that Intel itself should fund brain research because understanding the cortex would eventually transform microprocessor design. Intel declined. He transferred to Intel's Boston office and submitted the same proposition to MIT's AI Lab — could he come study "real brains" alongside them? MIT also declined. The AI program at the time was explicitly oriented around surpassing biological intelligence, not understanding it, and Hawkins' bottom-up framing was the wrong shape for the people running the lab.
He eventually enrolled in Berkeley's biophysics program in 1986, planning to develop a theory of the neocortex as a PhD thesis. After roughly two years of coursework his proposal was rejected: no faculty member worked on neocortical theory at the level of generality he wanted, and no one was prepared to advise a project that crossed that many sub-disciplines. He left without the degree, returned to industry, and joined GRiD Systems as VP of research, where he designed the GRiDPad in 1989 — one of the earliest tablet computers and the direct ancestor of the device he would build at Palm. From there the path runs through Palm and Handspring to a personal fortune large enough to fund whatever institution he wanted.
So in 2002 he built that institution himself: the Redwood Neuroscience Institute in Menlo Park. It was a nonprofit dedicated to a single question — what does the neocortex actually do — that no university department was structured to ask. On Intelligence was published two years later as the manifesto for that institute. Numenta, the for-profit follow-on with Dubinsky and Dileep George, was founded in March 2005 to turn the theory into software.
The prologue's emotional core, the part you should not miss, is that this book is what happens when someone who has been told "no" by Intel, by MIT, and by a Berkeley faculty committee gets rich enough to fund the answer himself. The editorial choices in the chapters that follow all snap into focus once you understand this. The impatience with classical AI. The conviction that one algorithm runs the whole cortex. The willingness to publish a falsifiable theory before the experimental evidence is complete. The book is, in part, the case Hawkins has been trying to make in writing since he was twenty-two.
Chapter 1: Artificial Intelligence
The first half of the chapter is the second movement of Hawkins' autobiography, and it earns its place in the book by being startlingly specific. He worked at Intel teaching customers how to design microprocessor systems, and at some point he sat down and wrote a literal letter to chairman Gordon Moore proposing that Intel fund brain research. The book paraphrases it in a single irreplaceable paragraph: "Dear Dr. Moore, I propose that we start a research group devoted to understanding how the brain works. It can start with one person — me — and go from there. I am confident we can figure this out. It will be a big business one day." Moore didn't say yes, but he handed the letter to Ted Hoff, Intel's chief scientist and the man who had designed the first microprocessor. Hawkins flew to California to make the pitch in person and only discovered afterward that Hoff also had a serious history in early neural-network theory — exactly the wrong audience for an argument that neuroscience was about to become tractable. Hoff told him the brain wouldn't be understood in the foreseeable future and that it therefore didn't make sense for Intel. Twenty-five years on, Hawkins concedes in the book that Hoff was correct: "Timing is everything in business." From there came the same MIT rejection in 1981 that the prologue references, then a holding pattern at GRiD Systems before Berkeley.
The second half of the chapter is the part to read carefully. It is a compressed intellectual history of classical AI that has aged remarkably well.
Hawkins traces the field's central dogma — "the brain is just another kind of computer" — back to Alan Turing's proof of universal computation. Once Turing demonstrated that all digital computers are logically equivalent regardless of substrate, the leap to "and so are brains, if you describe them at the right level of abstraction" felt small enough to be obvious.
The worldview was reinforced by two pillars. Warren McCulloch and Walter Pitts' 1943 paper showed that biological neurons could in principle implement logic gates — AND, OR, NOT. And the dominant strain of mid-twentieth-century psychology, behaviorism, held that the inside of the head was an impenetrable black box. The only respectable thing to study was the input-output behavior of the whole organism. When practical digital computers arrived after the war, the field was so confident in its premises that translation was framed as code-breaking, vision as a geometry problem, and general intelligence as a matter of a few years' work.
The chapter then walks through the wreckage, naming names.
Eliza mimicked a psychoanalyst by rephrasing user input. "My boyfriend and I don't talk anymore" elicited "Tell me more about your boyfriend" or "Why do you think your boyfriend and you don't talk anymore?" Eliza came closest to fooling humans on the Turing Test, and Hawkins flags the irony: the AI program best at Turing's challenge was a deliberately trivial parlor trick designed as a joke.
Blocks World could answer "Is there a green pyramid on top of the big red cube?" with confidence, but generalized to nothing outside its simulated room. Theorem-proving programs could only rediscover theorems that were already known. Expert systems — the great hope of the 1980s — turned out to be brittle fact databases with a thin ceiling outside their narrow domains.
Deep Blue beat Garry Kasparov in 1997. Hawkins puts it bluntly: "Deep Blue didn't win by being smarter than a human; it won by being millions of times faster than a human." Deep Blue had no intuition about which board positions were dangerous, no history of the game, no model of its opponent. "It played chess yet didn't understand chess, in the same way that a calculator performs arithmetic but doesn't understand mathematics."
The throwaway line that lands hardest in the chapter: "Even today, no computer can understand language as well as a three-year-old or see as well as a mouse." In 2026 the first half of that sentence has aged into being arguably wrong, depending on how you score "understand." The second half is still defensible.
The chapter then makes a long stop at John Searle's Chinese Room, the thought experiment Searle (then "an influential philosophy professor at the University of California at Berkeley") published in 1980. A monolingual English speaker sits in a room with a thick rulebook in English. Slips of paper in Chinese come through a slot; he follows the book's instructions to manipulate the characters and slides answers back out. To a Chinese speaker outside the room, the answers look intelligent, even insightful. But the person inside understands no Chinese, and the rulebook is just paper, and the scratchpad is just scratchpad. So where exactly did the understanding happen? Searle's mapping is precise: the person is the CPU, the rulebook is the software, the scratchpad is memory. Therefore no computer, no matter how cleverly arranged, can be said to understand anything — it can only produce the right outputs. Hawkins is unusually direct about whose side he is on: "I think Searle had it right. When I thought through the Chinese Room argument and when I thought about how computers worked, I didn't see understanding happening anywhere."
He then uses the rest of the chapter to push past Searle's stopping point. Searle proved a negative; Hawkins wants a positive definition of what understanding is. He proposes one as the book's working thesis: "Understanding cannot be measured by external behavior; it is instead an internal metric of how the brain remembers things and uses its memories to make predictions." Behavior is optional. You can be intelligent lying in the dark, thinking, with no observable output whatsoever. The chapter's clincher concedes the only ground worth conceding to AI defenders — that a computer could in principle simulate the entire brain, neuron by neuron, and the simulation would be as intelligent as the original — and then immediately points out that AI researchers don't simulate brains, and that you can't simulate a brain without first understanding what it does. Which is the project of the rest of the book.
That last sentence is the door into everything that follows, and it is the one to hold next to your LLM in 2026. The behavioral revolution of the past five years has built systems that pass the Turing Test in most of its specific forms, so Hawkins would, by his own framing, have to call them intelligent — or refine what he meant. When a frontier model produces a correct answer to a hard reasoning problem, is it remembering and predicting in the sense he meant, or is it Searle's room with prettier wallpaper?
My working answer, and I think the field has only half-converged on this: next-token prediction at scale is closer to "remembering and predicting" than Hawkins would have credited in 2004. But it is not the kind of internal-model-driven prediction he was specifying. The model has no top-down predictive trajectory tied to a stable world representation that it can compare against and update. Hallucinations and out-of-distribution brittleness both fall out of that gap. Hawkins identified the gap in 2004; the field is still working out what to do about it.
Chapter 2: Neural Networks
The chapter opens with Hawkins arriving at Berkeley in January 1986 and assigning himself a self-directed reading list — a complete history of every existing theory of intelligence and brain function. He works through hundreds of papers from anatomists, physiologists, linguists, philosophers, computer scientists, and psychologists, and discovers that each field has its own private dialect for the same animal. Linguists talk about syntax and semantics. Vision scientists talk about 2D, 2.5D, and 3D sketches. Computer scientists talk about schemas and frames. Anatomists describe the structure of the brain in great detail but refuse to commit to any large-scale theory. Nobody is talking about the same thing, and nobody is talking about the structure of the brain itself when they reason about intelligence.
While Hawkins is mid-way through this reading project, neural networks burst back onto the scene as the credible alternative to symbolic AI. Hawkins is sober about the politics: neural networks had been around since the late 1960s, but "AI, the 800-pound gorilla in those days, actively squelched neural network research," and connectionists had been essentially blacklisted from funding for years. By the mid-1980s, AI's continuing failure had created an opening. The new generation of researchers — who preferred the name connectionists — argued from a syllogism so simple it felt unanswerable: brains are made of neurons, therefore the brain is a neural network. Hawkins, already obsessed with extracting a theory of intelligence from biology, should have been the natural audience. He wasn't. Within months he had filed neural networks under "interesting but wrong," and most of the chapter is the explanation of why.
His objection took the form of three criteria, all of which Hawkins insists any genuine theory of the brain must satisfy and none of which the popular networks of the era met.
First, time. Real brains process rapidly changing streams; there is nothing static about the flow of information into and out of the cortex. The standard three-layer network of the 1980s — input, hidden, output, weight updates by backpropagation — was trained on static input-output pairs with no internal state representing what just happened. "There was no history or record in the network of what happened even a short time earlier."
Second, feedback. Hawkins reports an anatomical fact that hits harder than its sentence length suggests: in the circuit between the neocortex and the thalamus, the connections going backward (from cortex toward the sensory periphery) outnumber the connections going forward by almost a factor of ten. For every fiber carrying sensory information up into the cortex, there are ten carrying signals back down. Feedback "dominates most connections throughout the neocortex as well." Three-layer feedforward networks have no such structure. Backpropagation, despite the name, only flows during training and only carries an error signal; once a model is deployed, information moves one way. That is emphatically not the kind of feedback Hawkins means.
Third, physical architecture. The neocortex is organized as a repeating hierarchy of regions, each region with the same six-layer column structure. Any model whose architecture is "trivial compared to the complicated and hierarchical structure of the brain" cannot, in Hawkins' view, work like a brain — no matter how well it performs on benchmark tasks.
Most neural networks of the era failed all three criteria, and yet the field stalled on them anyway because they were good enough to produce commercial results. By the late 1980s, neural networks were predicting stock movements, evaluating loan applications, verifying signatures. The field got rich and forgot the original mission. The pop-press exemplar Hawkins names is NetTalk, a network that learned to map English text to phoneme sequences and "started sounding like a computer voice reading the words." It was heralded on national news as a machine learning to read. Hawkins is sharp about this: "It didn't read, it didn't understand, and was of little practical value. It just matched letter combinations to predefined sound patterns."
The chapter's best analogy for what is wrong with all of this is the transistor amplifier. Imagine, Hawkins says, that you are trying to understand how a computer works, but instead of studying the computer you study transistors in isolation. After years of effort you discover that three transistors connected in a particular way form an amplifier. Overnight a whole industry springs up making transistor radios and televisions. Many people get rich. None of it teaches you anything about how the computer actually works. "A real brain and a three-row neural network are built with neurons, but have almost nothing else in common."
Then comes the chapter's most enjoyable autobiographical detour, which is also one of the most consequential paragraphs in the book.
In the summer of 1987, Hawkins attends a neural network conference and watches a presentation by a company called Nestor. Nestor had built a neural-network-based handwriting recognizer for a tablet computer and was offering to license it for one million dollars. Hawkins, who has spent his industry years quietly obsessed with pen-based computing, is suddenly very attentive.
He goes home that night, thinks the problem through, and in two days designs a handwriting recognizer that is "fast, small, and flexible." His version uses no neural network and doesn't work like a brain at all. It is the engineering solution to a problem Nestor had dressed up as a research breakthrough. That recognizer eventually becomes the basis for Graffiti, the text-entry system shipped in the first series of Palm products almost a decade later. Hawkins adds the perfect punchline: "I think Nestor went out of business."
He also notes that the same conference "sparked my interest in designing computers with a stylus interface (eventually leading to the PalmPilot ten years later)." A single neural-network demo indirectly seeded the company that would eventually fund the Redwood Neuroscience Institute and the writing of this book. The book is too gracious to draw that arrow, but the timing does it for him.
After the Nestor detour, the chapter returns to its main thesis: the deepest problem with neural networks is the one they share with classical AI — both treat intelligence as a property of behavior. The output is the metric. "As inspired by Alan Turing, intelligence equals behavior." Hawkins disagrees in exactly the same terms he used against Searle's targets in Chapter 1: "Behavior is a manifestation of intelligence, but not the central characteristic or primary definition of being intelligent. A moment's reflection proves this: You can be intelligent just lying in the dark, thinking and understanding."
Before closing, Hawkins flags one strand of connectionist work that he thinks did get something right: auto-associative memories. These were small-cohort networks built by a splinter group of neural network theorists who connected their neurons differently — feeding each neuron's output back into the network's input, "sort of like calling yourself on the phone." Two properties of this design map directly onto how brains actually behave. First, an auto-associative memory can retrieve a stored pattern from a partial or noisy version of it. Hawkins reaches for the most memorable analogies in the chapter: "It would be like going to the grocer with half eaten brown bananas and getting whole green bananas in return. Or going to the bank with a ripped and unreadable bill and the banker says, 'I think this is a messed-up $100 bill. Give me that one, and I will give you this new, crisp $100 bill.'" Second, with a time delay added to the feedback loop, an auto-associative memory can store sequences of patterns — feed it the first few notes of "Twinkle Twinkle Little Star" and it returns the whole song. He flags this as the rare line of connectionist research that took feedback and time seriously, and tells us up front that the brain almost certainly uses something like it.
The chapter then closes with a philosophical aside that I find more interesting on the second read than I did on the first.
Hawkins identifies himself as a functionalist: someone who believes mind is a property of organization, not substrate. AI researchers, connectionists, and he himself all agree on this. A salt shaker standing in for a lost knight is still a real knight piece, because the game is determined by function and not material. Every few years your body swaps out most of its atoms and you remain yourself. If a mad scientist replaced each of your neurons with a functionally equivalent micromachine, you should come out of the procedure feeling no less your own true self than you did going in.
The disagreement among functionalists is downstream. Do you have to copy how the brain does it, or can you invent your own engineering path? AI proponents take the second view — "Why should we engineers be bound by the solutions evolution happened to stumble upon?" — and lean on the Rube Goldberg / kludge metaphor. The brain is a several-hundred-million-year-old mess full of evolutionary legacy code, so throw it out and start fresh. The standard supporting examples are wings-versus-propellers (we did not build flight by flapping) and wheels-versus-legs (we did not build ground vehicles by mimicking cheetahs).
Hawkins concedes the analogies but draws a different line. Connectionists, he writes, "have mainly been just too timid." They intuited that the brain wasn't a computer, started building networks loosely inspired by neurons, and then stopped instead of pushing all the way through to architecture, hierarchy, and feedback. The chapter ends with its most quotable sentence: "We have to extract intelligence from within the brain. No other road will get us there."
The 2026 audit on this chapter is sharper than the audit on Chapter 1. Two of Hawkins' three criteria have been met by the field, just not in the way he expected. Time got handled by RNNs, then LSTMs, then transformers reading entire sequences in parallel. Hierarchy got handled by depth — a 96-layer transformer is hierarchical, even if the hierarchy is uniform and unstructured compared to the cortex. Feedback in Hawkins' sense — a continuous top-down predictive signal propagating from high abstractions down to sensory expectations at inference time, not just an error gradient during training — is still mostly missing. Modern attention is dynamic and bidirectional, but it does not build the kind of persistent, top-down-conditioned world model the chapter describes. The 10:1 backward-to-forward connection ratio in the thalamocortical circuit is not a number any current architecture comes close to. Auto-associative pattern completion, on the other hand, is now ubiquitous — vector search and retrieval-augmented generation are basically industrial auto-associative memories, and Hopfield-style networks had a small renaissance in 2020-era research. Hawkins called the importance of that line of work two decades early, and Nestor presumably still wishes it had charged less.
Chapter 3: The Human Brain
This is the chapter where the book stops critiquing other people's theories and starts laying out its own substrate. It is dense with anatomy, and the anatomy matters because the rest of the book uses it as load-bearing structure. There are three movements: a physical tour of the neocortex, the introduction of Vernon Mountcastle's single-algorithm conjecture, and a long demonstration that the cortex doesn't really care what kind of sensor is feeding it — patterns are patterns.
Hawkins opens by confessing to "neocortical chauvinism" and defending it. Critics will say you can't explain intelligence by looking only at the neocortex — the thalamus, hippocampus, brain stem, basal ganglia, and amygdala all matter. He concedes the point for human intelligence but draws a line: "I am not interested in building humans. I want to understand intelligence and build intelligent machines. An intelligent machine need not have sexual urges, hunger, a pulse, muscles, emotions, or a humanlike body." If the goal is intelligence and not biological completeness, the right place to dig is the neocortex.
The physical description that follows is one of the best in popular neuroscience.
The neocortex is a sheet of tissue about two millimeters thick. Hawkins asks you to stack six business cards or six playing cards to feel the actual thickness, with each card standing in for one of the six cortical layers. Stretched flat, the human cortical sheet is roughly the size of a large dinner napkin. A rat's cortical sheet is the size of a postage stamp; a monkey's is the size of a business-letter envelope. All three have the same six-layer structure.
Humans are smarter not because our layers are thicker or contain a special class of "smart" cells, but because our sheet is much larger. To fit it inside a skull, evolution folded it up "like a sheet of paper crumpled into a brandy snifter." To deliver the resulting big-headed children, human females also evolved a wider pelvis. Some paleoanthropologists believe this coevolved with bipedalism. The book is generous with this kind of detail.
A one-millimeter square of the cortical sheet — about the size of the letter "o" — contains roughly one hundred thousand neurons. The whole neocortex contains around thirty billion. The typical pyramidal cell (eight out of every ten cortical neurons) carries thousands of synapses. The book's working number for total cortical synapses is about thirty trillion, which Hawkins calls "apparently sufficient to store all the things you can learn in a lifetime."
He quotes Crick's later book The Astonishing Hypothesis for the philosophical punchline. The mind is the creation of the cells in the brain. No magic, no special sauce, only neurons and a dance of information. Crick called it a hypothesis to be politically correct; Hawkins points out that it is now a fact.
The chapter's first technical move is to establish that the apparently uniform cortical sheet has specialized functional areas. People knew this long before functional imaging, because strokes localized to specific regions produced specifically weird deficits.
Damage to the right parietal lobe causes the patient to lose all perception of the left side of space — and even the concept of the left side. Damage to Broca's area in the left frontal lobe destroys grammar while leaving vocabulary intact. Damage to the fusiform gyrus destroys the ability to recognize faces, so the patient can't recognize his own mother, his own children, or his own face in a photograph.
Daniel Felleman and David van Essen mapped the macaque cortex into dozens of such regions and showed they were arranged in a complex branching hierarchy. The visual hierarchy begins at V1, processing the lowest-level features — edge orientations, color contrast, motion, binocular disparity for stereo vision — and feeds into V2, V4, IT (where high-level visual memories of faces, animals, tools, and body parts live), and MT (specialized for motion). Auditory cortex starts at A1 and ascends similarly; somatosensory starts at S1. Association areas receive convergent input from multiple senses — they are why you know that the sight of a fly crawling up your arm and the tickle you feel on your skin share the same cause. The motor system has its own mirror-image hierarchy, descending from high motor planning areas down to M1, which sends connections to the spinal cord and drives muscles directly.
The non-obvious point Hawkins hammers on is that information flows in both directions through this hierarchy, and the downward flow is bigger than the upward one. The classic mental picture — sensory information goes up, motor commands come down, neat assembly line — is wrong in a way that matters. He gives an example designed to startle: "As you read aloud, higher regions of your cortex send more signals 'down' to your primary visual cortex than your eye receives from the printed page." Top-down expectations outvolume sensory input even at the level of primary visual cortex. If that doesn't sound like a description of an LLM's prior dominating its context window, it should.
The chapter's center of gravity is Vernon Mountcastle's 1978 paper, "An Organizing Principle for Cerebral Function," written when Mountcastle was a neuroscientist at Johns Hopkins. The paper's claim is short and audacious.
The neocortex looks structurally identical wherever you look. Visual cortex resembles auditory cortex resembles motor cortex resembles Broca's language area resembles practically every other region. The differences anatomists had spent decades cataloging — slight variations in layer thickness, cell density, connection patterns — are real but subtle, "often so subtle that trained anatomists can't agree on them."
Mountcastle's leap: since the regions all look the same, maybe they are all doing the same thing. The differences between vision and hearing and motor control are not differences in algorithm. They are differences in what each region happens to be wired up to. One cortical algorithm. Different inputs.
Hawkins is unusually emotional about this paragraph. "When I first read Mountcastle's paper I nearly fell out of my chair. Here was the Rosetta stone of neuroscience — a single paper and a single idea that united all the diverse and wondrous capabilities of the human mind." He calls Mountcastle's conjecture "the most important discovery in neuroscience" and finds it astonishing that most scientists and engineers either refuse to believe it, ignore it, or aren't aware of it. The analogy he reaches for is Einstein's special relativity falling out of a single counterintuitive observation about the speed of light, or Darwin asking what it means that species are so similar rather than fixating on their differences.
The chapter spends its remaining pages defending Mountcastle with two lines of evidence: plasticity and pattern fungibility. On plasticity, the experiments are still striking to read two decades later. Newborn ferret brains can be surgically rewired so that the eye's signals are sent to the auditory cortex instead of the visual cortex; the auditory cortex develops functional visual pathways and the ferret sees with brain tissue that normally hears. Pieces of rat visual cortex transplanted to somatosensory regions at birth develop the sense of touch. Adults born deaf process visual information in tissue that would normally have become auditory. Adults born blind read braille with their visual cortex — the rearmost part of the cortex, which never received any visual input, repurposes itself for touch. "Apparently no area of cortex is content to represent nothing." Genes lay down the rough architecture and decide what is wired to what, but within that scaffolding the function of each region is determined by the kind of input it happens to receive during development.
On pattern fungibility, Hawkins makes a point that lands hardest if you've never thought about it. The brain is in a dark, silent box; it has no senses of its own; a surgeon can poke it and you feel nothing. Everything you experience arrives as action potentials — identical electrical spikes — on bundles of axons. The optic nerve carries about a million fibers, the auditory nerve about thirty thousand, the spinal cord another million for touch. The spikes coming down each bundle are physically indistinguishable. The reason you experience sight as different from hearing is what's wired to what inside the cortex, not anything intrinsic to the signal itself. "An action potential is an action potential. All your brain knows is patterns." Vision in particular, Hawkins notes, is "more like a song than a painting" — the eye performs roughly three saccades per second, completely changing the retinal image every time, and the stable visual world you perceive is constructed entirely from those time-varying patterns. Hearing was deciphered as a frequency-decomposing organ by the Hungarian physicist Georg von Békésy half a century earlier — each segment of the spiral cochlea (embedded in the temporal bone, the hardest bone in the body) vibrates at a different frequency band. Touch is the same: ask a friend to drop a small object — a ring, an eraser — into your stationary palm with your eyes closed and you'll struggle to identify it; let your fingers move and you'll know immediately. "Touch, too, is like a song."
The supporting demonstrations are the most fun part of the chapter. Paul Bach y Rita at the University of Wisconsin built a sensory-substitution device that maps a head-mounted camera's pixels onto a grid of pressure points on the human tongue. Blind subjects wearing it learn to "see" through their tongue. Erik Weihenmayer — the world-class athlete who went blind at age thirteen and in 2002 became the first blind person ever to summit Mount Everest — tried the tongue device in 2003 and saw images for the first time since childhood: a ball rolling toward him on the floor, the soft drink he reached for on a table, a game of Rock Paper Scissors. He walked down a hallway, saw the doorway at the end, examined the door and its frame, and noticed the sign on it. "Images initially experienced as sensations on the tongue were soon experienced as images in space." Helen Keller, with no sight and no hearing at all, learned language and became a more skillful writer than most people with the full sensory inventory. Rubber hand illusions show that if you watch a fake hand being stroked at the same rhythm as your real one (hidden behind a screen), the brain incorporates the fake hand into your body map within minutes. Give a person a rake and have them use it to reach objects, and the rake itself gets pulled into the body map.
The 2026 engineer's note on this chapter is the warmest in the book. Mountcastle's one-algorithm-on-many-inputs conjecture is the most successful neuroscience-derived idea in the history of AI engineering, and it has been vindicated in a way Hawkins probably wouldn't have predicted. The single architecture of the modern transformer — same operator stack, same training procedure, trained on text or images or audio or video or robotic control or protein sequences or chess moves — is Mountcastle's claim ported to silicon. We discovered empirically that one stack of layers can become "visual cortex" or "language cortex" or "audio cortex" depending on what you feed it. Pattern fungibility shows up in modern systems as the surprising universality of tokenization: once a modality is encoded into a token sequence, the model basically doesn't care where the tokens came from. The plasticity experiments map onto multi-modal pretraining and transfer learning so directly that you could rewrite Hawkins' chapter with 2026 vocabulary and lose almost nothing. The piece the field has not yet imported from this chapter is the downward-dominant flow — the structural fact that in your primary visual cortex, top-down signals outvolume sensory input. That asymmetry is built into the cortex and is not built into any current production AI system. The experiment of taking it seriously has not really been run.
Chapter 4: Memory
This is the reframe most readers remember from the book, and it is where the foundations the next four chapters will build on get laid down. The thesis is short: the neocortex is not a computer. It is a memory system. It does not compute the answer to a problem; it retrieves the answer from stored patterns. The chapter spends most of its length unpacking what kind of memory system, with four named properties — sequence storage, auto-associative recall, invariant representations, and hierarchy — and the first three get the full attention here. The fourth waits until Chapter 6.
The setup is the hundred-step rule, one of the cleanest thought experiments in the book.
Real neurons are slow. A typical neuron fires and resets in about five milliseconds, so a single neuron can do roughly two hundred operations per second. A modern CPU can do a billion. The basic operation of silicon is about five million times faster than the basic operation of brain tissue. The brain's defenders usually counter that neurons run in parallel — billions of them, all firing at once — and that this parallelism more than makes up for the speed deficit. Hawkins thinks the parallelism argument is wrong, and the proof is one of his favorite passages in the book.
Consider a task humans do effortlessly in half a second: glance at a photo and tell me whether it shows a cat, a bear, a warthog, or a turnip. In half a second, information entering your visual system can travel through at most a hundred neurons in series. So whatever the brain is doing to solve the problem, it's doing in one hundred steps or fewer. A hundred computer instructions, by contrast, are barely enough to move a single character on a screen.
Parallelism doesn't save the computer. If a desert-crossing task requires a million sequential steps and you hire a thousand workers, you still need a million steps' worth of time to get across; adding more workers past that point doesn't help. "A computer, no matter how many processors it might have and no matter how fast it runs, cannot 'compute' the answer to difficult problems in one hundred steps." If the brain were computing, this would be physically impossible. The only way out is that the brain isn't computing. It's looking the answer up. "The entire cortex is a memory system. It isn't a computer at all."
The example Hawkins reaches for to make the contrast vivid is catching a thrown ball. To a human it feels trivially easy; to a robot engineer it has been a recurring nightmare.
The engineer's approach is to solve the projectile-motion equations for the ball's trajectory. Then solve a more painful set of equations for the joint angles required to put the hand in the right place at the right time. Then repeat both calculations many times as new sensor data arrives. Millions of steps.
The brain doesn't do this. The sight of the ball automatically recalls a stored sequence of muscle commands, and the retrieved memory is adjusted on the fly to fit the specific path of this ball and the position of this body. "The memory of how to catch a ball was not programmed into your brain; it was learned over years of repetitive practice, and it is stored, not calculated, in your neurons."
The waterbed analogy is his favorite illustration of how the adjustment step works. Sit on a waterbed and the pillows and other people on it spontaneously shift to accommodate you, with no central controller computing the adjustment. The physics of the water and the mattress skin do it. The cortex does something analogously distributed with information.
Property one: sequences of patterns
The first non-computer-like property of cortical memory is that it stores patterns as sequences. Stories can only be told in order; you cannot recount everything at once, no matter how fast you talk. Memory works the same way. Try saying the alphabet backward. You cannot, not because the letters aren't all in there, but because you never stored them in reverse. Phone numbers, months of the year, days of the week — all encoded as sequences. Try humming "Somewhere Over the Rainbow" all at once. You can't. A song exists only as it unfolds in time, and the only way to recall it is to play it forward through the same temporal pattern you originally heard.
The sequence property goes all the way down to the cheapest sensory memories. Hawkins makes an unusually visceral point about touch: "If I buried your hand in a bucket of gravel while you slept, when you woke up you wouldn't know what you were touching until you moved your fingers." Tactile texture is stored as a sequence of pressure-and-vibration patterns across the skin; the static pattern alone is not enough. He confesses that, after observing himself for a few days, he discovered he dries himself off with a towel after the shower in nearly the same sequence of rubs, pats, and body positions every time — and after a "pleasant experiment" he found that his wife does the same. Try doing it in a different order and you can will yourself to, but the moment your attention drifts, you snap back to the accustomed sequence. Memories are stored in the synaptic connections between neurons. Only a tiny fraction of those synapses are active at any one moment; thought is the trajectory through that vast stored space, moving from one set of active neurons to the next along learned sequence transitions. "Truly random thoughts don't exist. Memory recall almost always follows a pathway of association."
Property two: auto-associative recall
The second property is the one introduced in Chapter 2 and now fully argued. Auto-associative memory means a partial or distorted input retrieves the full stored pattern. You see your child's shoes sticking out from behind the curtains and your brain envisions the entire child. You see someone partially behind a bush at a bus stop and your brain fills in the whole person — so completely that you may not realize you're inferring. In conversation, your brain hears about half the words a noisy environment lets through and reconstructs the rest from context; we routinely don't hear what is actually said and instead hear what we expected to hear. "Some people complete others' sentences aloud, but in our minds all of us are doing this constantly. And not just the ends of sentences, but the middles and beginnings as well."
The most literary example in the chapter is Marcel Proust's Remembrance of Things Past — the novel opens with the memory of how a madeleine cookie smelled, and Proust is off and running for a thousand-plus pages. A single sensory detail retrieves an entire memory sequence. Auto-associative recall is what Hawkins thinks the cortex is doing constantly: "Each functional region is essentially waiting vigilantly for familiar patterns or pattern fragments to come in." When a fragment arrives, the full pattern lights up — and the next pattern in the learned sequence is now primed. "Inputs to the brain auto-associatively link to themselves, filling in the present, and auto-associatively link to what normally follows next. We call this chain of memories thought." Its path is not deterministic, and we are not fully in control of it either.
Property three: invariant representations
The third property is the heaviest, and the one that does the most work in the rest of the book. Computer memory stores information at perfect fidelity — a byte is the byte you wrote, and a discrepancy of even one bit can crash a program. Cortical memory doesn't work that way. The brain doesn't remember what it sees or hears exactly. It remembers the important relationships in the world, independent of the details. The classic example: take a face made of black-and-white dots, store it in an artificial auto-associative memory, then shift every dot five pixels to the left. The artificial memory completely fails to recognize the new pattern. You and I wouldn't even notice the change. "Artificial auto-associative memories fail to recognize patterns if they are moved, rotated, rescaled, or transformed in any of a thousand other ways, whereas our brains handle these variations with ease."
The book is studded with concrete demonstrations. You can hold this book for a hundred years and the pattern of light falling on your retina will never once be exactly the same. Yet not for a moment do you doubt you're holding "the same book." You recognize your friend's face two feet away, twenty feet away, in profile, smiling, yawning, in shade, under "strangely angled disco lights." For each view, the pattern of light hitting your retina is unique.
Hawkins points to direct neural evidence. If you record from cells in V1, the pattern of activity changes with every saccade and every shift in position. But if you record from cells in the face recognition area several steps higher in the cortical hierarchy, you find stability. The same cells stay active as long as the face is anywhere in the field of view, regardless of size, position, orientation, scale, or expression. "This stability of cell firing is an invariant representation." The chapter is candid that the engineering problem of building such representations "remains one of the biggest mysteries in all of science." That line was true when the book was published and stayed true for almost another decade.
This problem has an ancient pedigree. Plato wondered, twenty-three centuries ago, how we can have a concept of a perfect circle when every real-world circle is imperfect, or a stable concept of "dog" when every dog we see is different. His answer was the Theory of Forms — fixed eternal ideas in a transcendent plane that our souls knew before birth. Hawkins is appropriately gentle: "It's all quite loopy from a modern perspective. But if you strip away the high-flown metaphysics, you can see that he was really talking about invariance. His system of explanation was wildly off the mark, but his intuition that this was one of the most important questions we can ask about our own nature was a bull's-eye."
The chapter then sweeps the invariance idea through every modality. Touch: reach into your car's glove compartment for your sunglasses; any part of any finger can find them, anywhere on the frame, no matter how the glasses sit. "Just a second of moving any part of your hand over any portion of the glasses is sufficient for your brain to identify them." Sensorimotor: put the key in your car's ignition — the seat, body, arm, and hand are in slightly different positions every time, but to you it feels identical, because the brain stores the invariant. A robot would need to be in exactly the same posture every time and would have to be reprogrammed for a different car. Motor: your signature is an invariant program that produces a recognizable mark whether you sign daintily with a fine pen, flamboyantly like John Hancock, in the air with your elbow, or "clumsily with a pencil held between your toes." Music: you can recognize "Somewhere Over the Rainbow" in any key. You learned it from Judy Garland singing it in The Wizard of Oz in A flat, but a piano playing it in D will land as the same song — even though every single note is different. Your memory stored intervals, not absolute pitches: "an octave up, followed by a halftone down, followed by a major third down." Your friend's face is the same kind of representation: relative dimensions, relative colors, relative proportions — spatial intervals rather than pitch intervals.
The final move of the chapter is to argue that storing invariant memories is what lets the cortex make specific predictions about a constantly changing world.
Hawkins reaches for a parable that does most of the work. It is 1890 and you are in a frontier town in the American West. Your sweetheart is taking the train from the East to join you. There are no published timetables; the trains seem to come and go at random. For weeks you keep your own log and conclude they're unpredictable. Then you notice a structure: the eastbound train always arrives four hours after one leaves heading east. The four-hour gap is invariant; the absolute times are not. On the day of her arrival, you watch for the eastbound train. Four hours after it leaves, you walk to the station and meet her.
The brain does this constantly. To predict the next note in a familiar song, the cortex combines its invariant memory of intervals with the specific last note it just heard. Interval "major third up" plus last note "C" produces predicted note "E." To predict the next moment of your friend's face under unfamiliar lighting, the cortex combines the invariant face structure with the orientation and lighting it's currently sensing, and fills in the details before the eyes even arrive at them. "Memory storage, memory recall, and memory recognition occur at the level of invariant forms."
The chapter ends with a one-sentence handoff to the next: "The three properties of cortical memory discussed in this chapter (storing sequences, auto-associative recall, and invariant representations) are necessary ingredients to predict the future based on memories of the past. In the next chapter I propose that making predictions is the essence of intelligence."
The 2026 engineer's audit is sharper here than in any prior chapter, because each of Hawkins' three properties has a direct counterpart in the architectures we ship today.
Sequence storage is the central design choice of every successful generative model since 2017. Attention is a mechanism for retrieving relevant past tokens; next-token prediction is sequence completion in exactly the sense Hawkins described. Auto-associative recall is what retrieval-augmented generation, vector search, and key-value attention all are at heart: produce a query, retrieve the matching pattern. Invariant representations are what learned embeddings are. The whole point of mapping a face or a sentence into a fixed-dimensional vector is that the vector should remain stable across pose, lighting, paraphrase, or font; surface variation gets projected out rather than stored.
The hundred-step rule is the part nobody talks about, but it is arguably the most prescient prediction in the chapter. A transformer answering "is this a cat or a warthog" in fifty milliseconds is not "computing" in the symbolic-AI sense. It is doing a single forward pass through a fixed-depth pattern-matcher, which is much closer to "retrieving an answer from memory" than the field is usually willing to admit. Hawkins' framing of cortical memory was, to a startling degree, a specification of what self-supervised representation learning would later actually produce. He just couldn't have known how literal the resemblance would be.
Chapter 5: A New Framework of Intelligence
This is the chapter that pays out the book's title, and it is the most emotionally vivid in the book. It opens with Hawkins sitting in his office in April 1986, halfway through his Berkeley reading project, struggling with a single question he could not let go of: what does the brain do if it isn't generating behavior? Information enters the brain, a great deal of work happens inside, and yet most of the time nothing observable comes out. You can sit silently and understand what you are reading, with no overt action to prove that you understood — exactly the move Searle's Chinese Room is constructed to deny. So what, precisely, is happening inside?
The aha moment arrives when Hawkins asks himself what would happen if a new object appeared in the room — say, a blue coffee cup he had never seen there before. The answer is immediate: it would catch his attention. He wouldn't have to consciously check whether it belonged; the unbelonging would surface on its own. Underneath that trivial-seeming answer was the realization that does the heavy lifting for the rest of the book. For the unbelonging to surface, something inside his brain had to already be making thousands of small predictions about what should be there — the computer in the middle of the desk, the lamp in the right-hand corner, the dictionary where he left it, sunlight at the correct angle for the time of day, the windows being rectangular and the walls vertical. Those predictions were all being met silently. The blue cup would have violated one of them, and the violation would have produced the experience he called "attention." "What we perceive," he writes, "is a combination of what we sense and of our brains' memory-derived predictions."
The thought experiment Hawkins constructs minutes later, in the same office, is the famous altered-door experiment, and it remains the cleanest one-paragraph argument for why the brain is fundamentally a prediction engine. "Suppose while you are out, I sneak over to your home and change something about your door." The list of possible changes runs for a full paragraph: move the knob over by an inch, swap a round knob for a thumb latch, change brass for chrome, switch solid oak for hollow (or vice versa), make the hinges squeaky and stiff or frictionless, widen or narrow the frame, change the color, add a knocker where the peephole used to be, add a window. "I can imagine a thousand changes that could be made to your door, unbeknownst to you." When you come home, you will detect that something is wrong almost instantly, in a few seconds at most. How? The AI engineer's database approach — store every property of every door and compare on entry — is implausible on its face: the list of attributes is endless, you'd need a similar list for every object you ever encounter, and neurons are too slow to execute database queries that way. "It would take you twenty minutes instead of two seconds to notice the change as you go through the door."
"There is only one way to interpret your reaction to the altered door: your brain makes low-level sensory predictions about what it expects to see, hear, and feel at every given moment, and it does so in parallel." Visual areas predict edges, shapes, locations, motions. Auditory areas predict tones, the direction of the source, the pattern of sound the door makes. Somatosensory areas predict touch, texture, contour, temperature. "Prediction" means that the neurons that will sense the door become active in advance of actually sensing it. When the prediction is met, you walk through the door without consciously knowing the predictions were ever made. When any one of them is violated — the knob is too low, the door is too light, the latch isn't where it was — the error rises to attention. "Correct predictions result in understanding... Incorrect predictions result in confusion and prompt you to pay attention."
Then comes the thesis sentence the whole book has been pointing at: "Prediction is not just one of the things your brain does. It is the primary function of the neocortex, and the foundation of intelligence. The cortex is an organ of prediction."
Hawkins is honest about the lineage. He didn't invent the idea that prediction matters; he places himself in a tradition.
The roster includes D.M. Mackay (1956), who argued intelligent machines should have an "internal response mechanism" designed to "match what is received." Rodolfo Llinas at NYU Medical School, whose 2001 book i of the vortex called the capacity to predict the future "the ultimate and most common of all global brain functions." David Mumford at Brown, Rajesh Rao at the University of Washington, and Stephen Grossberg at Boston University. Plus the entire subfield of Bayesian networks, named after Thomas Bayes, the eighteenth-century English minister and statistics pioneer.
What was missing, Hawkins argues, was the gluing of these scattered pieces into a single framework with the cortex's actual anatomy at the center. That is the work the book is doing.
The middle of the chapter is an extended demonstration that prediction is what almost everything you do feels like from the inside. Hawkins gives a tour of his own morning making pancakes: he reaches under the counter for the cabinet door without looking, and his brain knows what the knob will feel like, where, and when. He twists the milk container expecting it to turn and come free. He turns the griddle knob expecting it to push in slightly, then turn with a certain resistance, then produce the gentle fwoomp of the gas flame about a second later. "Every minute in the kitchen I made dozens or hundreds of motions, and each one involved many predictions. I know this because if any of those common motions had a different result from the expected one, I would have noticed it." The cleanest version of the argument is the missed step on a stairway: the instant your foot continues past the spot where your brain expected it to land, you know something is wrong before any sensor has told you anything. The foot has felt nothing yet; the prediction simply wasn't met. "A computer-driven robot would blissfully fall over, not realizing that anything was amiss."
Prediction is happening at every level, in every modality, and often probabilistically. When you listen to a familiar album you hear the start of the next song in your head a few seconds before it begins — and putting that album on random shuffle produces a "pleasant sensation of mild uncertainty" because the next-song prediction is now reliably wrong. "How now brown..." activates the neurons representing "cow" before the speaker finishes (in English; a reader who doesn't speak the language gets no such activation). "Please pass me the..." sets up parallel expectations of "salt," "pepper," or "mustard," and would be jolted into surprise by "sidewalk." Even a piece of music you have never heard before triggers predictions of regular beats, repeated rhythms, phrase completions, and tonic-pitch endings — and you instantly notice violations.
Several of the chapter's most memorable examples lean on prediction failures rather than prediction successes.
When New York City stopped running its elevated trains, residents in the surrounding apartments called the police in the middle of the night because something had woken them up — the absence of a train predicted to pass at that hour. A distant jackhammer is one of those things you only notice has been going when it stops.
Each eye has a small blind spot where the optic nerve exits the retina, and you don't perceive a hole because the visual system fills it in. Stare at a Turkish carpet or the wood grain of a cherry tabletop with one eye closed, and the texture looks seamless, even though entire knots are constantly winking out of retinal view. Filling-in happens everywhere, not just in the blind spot. A photograph of a driftwood log on rocks has, at high magnification, an indistinguishable boundary between log and rock. At normal viewing distance the edge looks crisp, because the cortex is filling in what it expects to be there. "What you 'perceive' is not what V1 sees."
The most unsettling demonstration is the two-noses experiment. Your eyes fixate on one eye of a face, then jump to the other eye, occasionally to the nose, mouth, or ears, three times a second. Each fixation should be a shock — every saccade fires completely different cells in V1 — and yet you experience a stable face. Now imagine you met someone with an extra nose where their second eye should be. Your eyes fixate on the first eye, then saccade to where the second eye should be, and instead find a nose. Your attention is immediately aroused. For that to happen, your brain must have made a prediction of what the next saccade would land on. The same logic explains why you don't notice your own seat unless it starts to drift backward, why a book page in your hand doesn't surprise you until it bends wrong, why the world appears stable: the brain is continuously verifying its model against reality, not rebuilding it.
The chapter's pivotal move is to argue that the same machinery scales all the way up. Per Mountcastle, if every cortical region runs the same algorithm, then the prediction work happening at sensory cortices is also happening at the highest abstract levels.
Hawkins gives an example. To predict that his wife will remind him to take out the recycling that night, his cortex combines an invariant memory of her with current observation. The invariant memory: she has said this in the past, today is Friday, the recycling bin must go out on Friday nights, he didn't do it on time last week. The current observation: her face has a certain look. From those, his cortex produces a specific anticipation of what she is about to say. He may not know the exact words, but he knows the gist.
"Higher intelligence is not a different kind of process from perceptual intelligence. It rests fundamentally on the same neocortical memory and prediction algorithm." IQ tests, he notes in passing, are literally prediction tests: complete this number sequence, complete this analogy, given these three views of an object pick the next one. Science itself is hypothesis-and-testing — prediction in slow motion. Product design predicts what consumers will want. "Intelligence is measured by the capacity to remember and predict patterns in the world."
The chapter closes with the evolutionary story that justifies the whole framework. Reptiles had sophisticated behaviors long before mammals had a neocortex: "An alligator has sophisticated senses just like you and me. It has well-developed eyes, ears, nose, mouth, and skin. It carries out complex behaviors including the ability to swim, run, hide, hunt, ambush, sun, nest, and mate." What changed when mammals evolved was the addition of a memory layer — the "neocortex," literally "new bark" or "new rind" in Latin — that tapped the sensory stream and stored patterns of what had happened before. The human cortex itself is geologically recent; it expanded dramatically only a couple of million years ago, and "we got smart by adding many more elements of a common cortical algorithm" rather than inventing a different one. The famous talking-rat thought experiment carries the argument: a rat with a small neocortex navigating a familiar maze "sees" the future not by hang-gliding to the cheese but by recognizing the corner, retrieving the stored sequence of what happened last time, and envisioning the cheese at the end of the hallway. "If I turn right here, I know what will happen next. There's a piece of cheese down at the end of this hallway. I see it in my imagination." That is what the cortex bought reptilian behavior: a window into the very near future.
The cortex then evolved in two directions. First, it got larger and stored richer memories. Second — and especially in humans — it began to take over motor control from the old brain. The motor cortex of a rat is small enough that damaging it has limited effect; damaging the motor cortex of a human causes paralysis. The anterior (forehead-ward) half of the human cortex grew disproportionately, taking on most of high-level planning, thought, and motor command, separated from the posterior sensory half by the large fissure called the central sulcus. And here Hawkins offers one of the most arresting sentences in the book. When you move your arm in front of your face, you might think the brain first moves the arm and then predicts what it will see. He thinks this is exactly backwards. "I believe the cortex predicts seeing the arm, and this prediction is what causes the motor commands to make the prediction come true. You think first, which causes you to act to make your thoughts come true." Behavior, in this framing, is downstream of prediction — the prediction is the goal, and the motor command is whatever satisfies it. Dolphins, with their large but only three-layered cortex, get a brief mention as the limiting case: probably they have rich autobiographical memory of the ocean, probably they recognize individual dolphins, but their cortex hasn't taken over their behavior the way ours has.
The chapter ends by returning to Searle. If the Chinese Room contained a memory system that predicted what character would arrive next, and what would happen next in the story, then we could say with confidence that the room understood Chinese. "We can now see where Alan Turing went wrong. Prediction, not behavior, is the proof of intelligence."
The 2026 engineer's note has already half-written itself by the time you finish the chapter. Every successful generative model since 2017 is trained on exactly the objective Hawkins put at the center of his framework. Next-token prediction is the loss function the field converged on, in part because nothing else worked as well at scale. The agreement is so literal it is almost embarrassing.
But the differences remain real. Hawkins' picture is of continuous prediction at every cortical level, with top-down predictions racing downward to meet bottom-up signals at every moment of conscious experience. Modern transformers do nothing of the kind. They predict the next token at the output, with no analogous predictive flow at intermediate layers. The altered-door experiment describes a system whose intermediate representations are themselves doing prediction in parallel, which is not what a transformer is.
The frontier models that are noticeably better at long-horizon reasoning and tool use in 2026 do something closer to the altered-door picture. They generate explicit predictions ("I expect to see X after the next action"), compare them against observed results, and update. But they do it at the level of text in a scratchpad, not at the level of activations inside the network. Hawkins' framework would predict, I think, that the next architecture-level breakthrough will come from pushing this comparison loop down into the layers themselves. The objective was right, twenty-two years ago. The location is still under construction.
Chapter 6: How the Cortex Works
This is by a wide margin the longest chapter in the book and the most technically exposed. It is also where Hawkins moves from a framework — "the cortex is a memory-prediction engine" — to a falsifiable hypothesis: "here is how a real piece of cortex could implement that engine."
He opens with a jigsaw-puzzle analogy. Neuroscience has been forced to assemble the brain bottom-up because no top-down framework existed. The puzzle has thousands of pieces. Many are interpretable two ways. Many will turn out not to belong to this puzzle at all. Every month new pieces arrive in the mail and replace older ones. And worse, "you have no idea what the end result will look like." Hawkins' claim is that the memory-prediction model can play the role of the picture on the box. Once you know what to look for, the existing data starts to organize itself.
He flags Gabriel Kreiman and Christof Koch at Caltech, working with the neurosurgeon Itzhak Fried at UCLA. They had found cells that fire whenever a patient sees a picture of Bill Clinton — exactly the kind of high-level invariant representation his framework predicts. One of his stated goals in the chapter is to explain how a Bill Clinton cell could possibly come into being.
A nested world, a hierarchical cortex
Hawkins first restructures the standard textbook picture of visual cortex. The classical story is a four-step pipeline: V1 detects edges, V2 combines them into shapes, V4 builds objects, and IT achieves invariance at the top. Hawkins lists three problems. First, if Mountcastle's single-algorithm claim is true, why should invariance only appear in IT? Second, two non-adjacent patches of V1 are not directly connected, yet both participate in recognizing a single face — they must be doing the same thing in parallel. Third, higher regions get converging input from multiple lower regions, but V1 and V2 do not, which contradicts the uniform-algorithm hypothesis. His proposed rewrite: V1, V2, and V4 should not be thought of as single regions at all. Each is a large collection of small subregions, each doing the same job as IT on a smaller slice of the visual world. "Every region forms invariant representations." Invariance isn't a property that magically appears at the top; it is the basic operation of cortex.
The chapter's most useful big-picture argument is that the cortex is built as a hierarchy because the world itself is hierarchical. Music is notes nested in intervals nested in phrases nested in melodies nested in albums. Written language is letters nested in syllables nested in words nested in clauses nested in sentences. Your neighborhood has streets containing houses containing rooms containing walls and doors and windows, each made of smaller parts. "All objects in your world are composed of subobjects that occur consistently together; that is the very definition of an object." The brain is built to discover and store this nested structure, so its memory is mapped onto a matching hierarchy: large-scale relationships stored at the top, small-scale details stored toward the bottom. That is why, when your eyes are fixated on a window latch, you nevertheless know you are at home, in your living room, looking at a window — the higher regions are maintaining the larger context while the lower ones handle the fast-changing details.
Sequences of sequences
A cortical region's job is to learn the sequences of patterns it sees from below, give each learned sequence a name, and pass that name up to the next region. The name is a group of cells whose collective firing stays constant for as long as the sequence is playing. The higher region sees only this stable name — one constant signal during the entire song — while the region below sees the fast-changing notes. As you ascend the hierarchy, sequences collapse into names, names collapse into sequences of names, and stability accumulates. That is how invariant representations are formed: not in one giant leap from V1 to IT, but step by step, sequence by sequence, with each level pasting over the fluctuations of the level below.
Going down the hierarchy, the same machine runs in reverse. A stable invariant pattern at the top gets unfolded into a sequence at the next level down.
Hawkins reaches for an example he must have rehearsed: the Gettysburg Address, memorized in seventh grade and still recoverable decades later. The top of the language hierarchy stores a single pattern for the speech. The next region down unfolds that pattern into a sequence of phrases. The next region down unfolds each phrase into a sequence of words. At the lowest motor layer, each phoneme is unfolded into a sequence of muscle commands. The same top pattern can fork at the motor cortex. If you decide to type the speech instead of speaking it, the words unfold into letters and finger commands rather than phonemes and mouth commands.
"Notice you don't have to memorize the speech twice, once for speaking and once for writing." This sharing-and-reuse across levels is, Hawkins argues, why the brain is so efficient. A hierarchy of nested sequences lets the same low-level objects — words, phonemes, letters — participate in arbitrarily many high-level sequences.
The supporting analogy is a military command chain. An army general says "Move the troops to Florida for the winter," and the command unfolds through the hierarchy into more and more specific operations: leave-preparations, transportation, arrival prep, then thousands of privates taking tens of thousands of individual actions. Reports back up the chain get summarized at each level; the general gets "Move to Florida going okay." But if something goes wrong that subordinates cannot handle, the issue rises up the chain until someone knows what to do. That, Hawkins says, is exactly how the cortex behaves with unexpected inputs.
The classification of patterns into a discrete set of "buckets" is the other essential operation of a cortical region. Hawkins makes it concrete with the rrgpog example. Imagine sorting colored pieces of paper into ten buckets — green, yellow, red, orange, purple, and so on. Some papers are easy, some are ambiguous (halfway between red and orange). You have to commit. Now imagine you also notice that the sequence "red-red-green-purple-orange-green" — call it "rrgpog" — keeps appearing. Once you know that sequence, an ambiguous paper that comes after "red-red-green-purple" can be confidently called "orange" because the sequence predicts it. Classification and sequence-learning are complementary: classification gives you the alphabet, sequences give you the words, and once you have the words you can use them to disambiguate noisy letters. He notes in passing that this is exactly what happens when you read handwritten or noisy text: individual letters out of context are often unintelligible, but inside a known sentence they snap into focus.
Inside a cortical region
The structural tour begins with a single coin-sized cortical region. It is six layers thick (the same six business cards from Chapter 3) and made of thousands of columns running vertically through the layers. Each layer has its own cell density and types: layer 1 is almost all axons with very few cells; layers 2 and 3 are packed with pyramidal neurons; layer 4 has star-shaped cells; layer 5 contains regular pyramidals plus a class of extra-large pyramid cells; layer 6 has its own unique types. Microcolumns — about a hundred neurons each, all descended from a single precursor cell that migrated outward from the inner brain cavity during embryonic development — are tightly connected vertically. The human cortex has "several hundred million microcolumns." Hawkins offers a memorable visualization: imagine a single microcolumn the width of a human hair; cut thousands of those hairs to the height of a lowercase i without the dot; line them up side by side as a dense brush; lay a sheet of long parallel hairs across the top, representing the layer-1 axons. Mountcastle, in 1979, proposed that the cortical column is the cortex's basic unit of computation. Hawkins refines: a column is the basic unit of prediction. For a column to predict when it should be active, it needs to know what is happening everywhere else — hence the striking fact that more than 90 percent of the synapses on cells inside a column come from cells outside the column.
Information flow through this structure has three circuits.
Upward. Feedforward inputs from lower regions converge on layer 4, which fires the rest of the column upward through layers 2 and 3. Those layers then send axons out to layer 4 of the next region above.
Downward. Layer 6 cells project to layer 1 of regions below, where the axons spread laterally for long distances. Cells in layers 2, 3, and 5 of the lower region have dendrites in layer 1 and get excited by these spreading top-down signals.
Delayed thalamic feedback. Large layer 5 cells project to the nonspecific thalamus, which projects back to layer 1 of many regions. This provides the time-delayed signal that an auto-associative memory needs to learn sequences.
Hawkins summarizes the architecture in one line: half of the input to layer 1 comes from layer 5 cells in neighboring columns and regions — what was happening moments before — and the other half comes from layer 6 cells in hierarchically higher regions — the name of the sequence currently being heard. Layer 1, in this telling, carries both where we are in the song and what song this is. That is exactly what a column needs to know in order to predict when it should fire.
Hawkins lays out four mechanistic problems and proposes a candidate mechanism for each.
(1) Classification. Layer 4 cells in each column vote on which "bucket" the input belongs to. Inhibitory cells let one column win in a neighborhood.
(2) Sequence learning. When a column is activated by layer 4, the cells in layers 2, 3, and 5 fire. Their layer-1 synapses then get strengthened on whatever signals are currently active in layer 1. After enough repetitions, those layer-1 synapses become strong enough to fire the column without needing layer 4 input from below. The column starts firing in anticipation of the input that's about to arrive. That, in this framework, is prediction at the neural level: a column firing before it has been driven from below.
(3) Forming a constant name. Hawkins proposes specific roles for two cell classes. Layer 2 cells stay on during a learned sequence — the "name cells." Layer 3b cells fire when the column becomes active unexpectedly — the "unexpected input" cells. Layer 3a inhibits 3b when the prediction is met. The combination yields the constant pattern needed by the next region up.
(4) Specific predictions from invariant memories. The cortex combines top-down invariant predictions with bottom-up partial input to find an intersection. Hawkins illustrates with the musical-interval D-A example. A higher region predicts "expect an interval of a fifth." That activates layer 2 cells in all columns representing fifths — C-G, D-A, E-B, and so on. Bottom-up input then says "the last note was a D," which gives layer 4 partial input to all columns containing a D — D-E, D-A, D-B, and so on. The single column where both signals coincide is D-A. A layer 6 cell in that column fires, producing the specific prediction of the next note, A. The mental picture Hawkins reaches for is two pieces of paper with holes punched in them: lay one over the other and the holes that line up are the active columns. Stephen Grossberg calls this "folded feedback"; Hawkins prefers "imagining."
Perception and behavior are one thing
The chapter argues that motor control and sensory prediction are not separate systems but the same machinery viewed from two angles. Layer 5 cells in supposedly sensory areas like V2 and V4 project to the parts of the brain that move the eyes — meaning the visual cortex helps determine where the eyes look next. Motor commands are themselves invariant representations at the top of a motor hierarchy that unfolds into specific muscle commands lower down, exactly mirroring sensory unfolding. When IT recognizes "nose" and the next item in the face-sequence should be "eye," merely switching to the invariant representation for "eye" cascades downward into a specific saccade — and the saccade comes out the right size, in the right direction, for the face's current position. "Thinking of going to the next pattern in a sequence causes a cascading prediction of what you should experience next. As the cascading prediction unfolds, it generates the motor commands necessary to fulfill the prediction. Thinking, predicting, and doing are all part of the same unfolding of sequences moving down the cortical hierarchy." This is the chapter's most explicit articulation of the line from Chapter 5: "Goal-oriented behavior is the holy grail of robotics. It is built into the fabric of the cortex."
The flip side of this picture is what failure feels like. When you walk through a familiar room, most of the predictions get satisfied at low levels and errors never rise far. When you step off an airplane into a foreign country, errors rise high — cars drive on the wrong side, money is strange, finding a bathroom takes all your cortical powers. Hawkins' practical advice, which has stuck with me since first reading: "Don't try to rehearse a speech while walking in a foreign land." The cortex isn't free to be elsewhere. The other vivid demonstration is the Dalmatian-in-the-dots picture. At first you see noise. Your eyes scan, errors race up the hierarchy, the top tries hypotheses that race back down and conflict with the bottom. Confusion is the state of no consistent prediction. Then suddenly, in less than a second, the right high-level prediction propagates all the way down — chunk, chunk, chunk, chunk — and the Dalmatian pops out. That is the "aha" moment. It isn't a new piece of perception; it is a new prediction that finally won.
The hippocampus on top, and an alternate route up
The chapter's most surprising late twist is the place of the hippocampus. The classical view treats it as a separate memory structure that stores new memories and slowly transfers them to cortex. Hawkins admits he never made sense of it for years.
Then, in late 2002, his colleague Bruno Olshausen at the Redwood Neuroscience Institute pointed out that the hippocampus is structurally the top region of the neocortex. It sits at the peak of the cortical pyramid, not next to it. The reframe falls into place.
Each cortical region tries to interpret its input as part of a sequence it knows. If it can, it doesn't pass details up; it relays a constant name. If it can't, the unexpected input propagates further up. Patterns that are truly novel keep escalating, region after region, until they reach the top — the hippocampus. The hippocampus stores the genuinely new.
Most days you encounter many genuinely new things: a story in the newspaper, a person you met for the first time, a car accident on the commute home. Those go into the hippocampus, where they will either be transferred down into cortex (through repeated exposure or repeated thought) or eventually lost. The framework explains an experience Hawkins admits to. As he gets older, he has more trouble remembering new things, because fewer things are genuinely new to him. His children remember every play they see; he doesn't, because each new play already fits into his cortex's memory of past plays. "The more you know, the less you remember."
He closes with an alternate path up the hierarchy through the thalamus, identified by Murray Sherman at SUNY Stony Brook and Ray Guillery at the University of Wisconsin School of Medicine. This path can be selectively turned on, and Hawkins speculates it is the mechanism of attention to detail. The example: read the word imagination at a glance and you see the word; focus on the letter i in the middle and you see the letter; focus on the dot above the i and you see the dot — all from the same V1 input. The same mechanism is why a strange mark on the nose of an otherwise familiar face captures your attention; it's why you can't help staring at a deformity; and it's why you sometimes don't notice a typo as you read.
The chapter also makes a quiet defense of feedback that's worth flagging. For decades, neuroscientists treated cortical feedback as "modulatory" — a slow background signal, not the load-bearing structure of cognition. Hawkins argues this was wrong on three counts. There was no reason to focus on feedback if you didn't accept prediction's role, the feedback signal is spread over large areas of layer 1, and synapses far from a neuron's cell body were assumed to have weak effects. But recent work showed that synapses on distant thin dendrites can act as coincidence detectors — two synapses firing within a small time window on the same thin dendrite produce a large effect at the cell body, even though each one alone is too far away to matter. The framework needs precise top-down feedback, and the new neuron models allow precisely that. "In hindsight, it seems almost foolish to say that most of the thousands of synapses on a neuron play only a modulatory role."
2026 engineer's audit
This chapter is where Hawkins is most exposed, and where his framework has held up most unevenly.
- https://en.wikipedia.org/wiki/On_Intelligence
- https://en.wikipedia.org/wiki/Memory-prediction_framework
- https://www.numenta.com/resources/books/on-intelligence-book-by-jeff-hawkins/
- https://www.jch.com/jch/notes/hawkinsi.html
- https://pmc.ncbi.nlm.nih.gov/articles/PMC6336927/
- https://medium.com/@adnanmasood/hierarchical-temporal-memory-in-context-a-critical-reappraisal-of-hawkins-on-intelligence-d9232e123634
