Debbie Roth’s Kindle Notes & Highlights for A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains

DNA had officially become life’s blueprint, ribosomes its factory, and proteins its product.

Respiratory microbes differed in one crucial way from their photosynthetic cousins: they needed to hunt. And hunting required a whole new degree of smarts.

6%

What separates you from an earthworm is not the unit of intelligence itself—neurons—but how these units are wired together.

7%

Gastrulation, neurons, and muscles are the three inseparable features that bind all animals together and separate animals from all other kingdoms of life.

9%

There is another observation about bilaterians, perhaps the more important one: They are the only animals that have brains. This is not a coincidence. The first brain and the bilaterian body share the same initial evolutionary purpose: They enable animals to navigate by steering. Steering was breakthrough #1.

14%

Dopamine is not a signal for pleasure itself; it is a signal for the anticipation of future pleasure.

14%

Berridge proved that dopamine is less about liking things and more about wanting things.

14%

serotonin is the satiation, things-are-okay-now, satisfaction chemical, designed to turn off valence responses.

20%

The second breakthrough was reinforcement learning: the ability to learn arbitrary sequences of actions through trial and error.

21%

The signal on which the actor learns is not rewards, per se, but the temporal difference in the predicted reward from one moment in time to the next. Hence Sutton’s name for his method: temporal difference learning.

21%

And then on the tenth move you pull off some clever maneuver that turns the tide of the game; suddenly you realize you are in a far better position than your opponent. It is that moment where a temporal difference learning signal reinforces your action.

21%

But over time, with enough games, each refines the other until they converge to produce an AI system capable of making remarkably intelligent decisions. At least, that’s what happened in Sutton’s simulations. It wasn’t clear whether this would work in practice.

21%

At the same time that Sutton was working on TD learning, a young physicist by the name of Gerald Tesauro was working on getting AI systems to play backgammon. Tesauro was at IBM Research, the same group that would later build Deep Blue (the program that famously beat Garry Kasparov in chess) and Watson (the program that famously beat Ken Jennings in Jeopardy!). But before Deep Blue or Watson, there was Neurogammon. Neurogammon was a backgammon-playing AI system that was trained on transcripts of hundreds of expertly played backgammon games. It learned not through trial and error but by ...more

22%

Dopamine is not a signal for reward but for reinforcement. As Sutton found, reinforcement and reward must be decoupled for reinforcement learning to work. To solve the temporal credit assignment problem, brains must reinforce behaviors based on changes in predicted future rewards, not actual rewards.

22%

In 1997 Dayan and Montague published a landmark paper, coauthored with Schultz, titled “A Neural Substrate of Prediction and Reward.” To this day, this discovery represents one of the most famous and beautiful partnerships between AI and neuroscience. A strategy inspired by how Sutton thought the brain might work turned out to successfully overcome practical challenges in AI, and this in turn helped us interpret mysterious data about the brain. Neuroscience informing AI, and AI informing neuroscience.

23%

dopamine was transformed from a good-things-are-nearby signal to a there-is-a-35 percent-chance-of-something-awesome-happening-in-exactly-ten-seconds signal. Repurposed from a fuzzy average of recently detected food to an ever fluctuating, precisely measured, and meticulously computed predicted-future-reward signal.

23%

Both disappointment and relief are emergent properties of a brain designed to learn by predicting future rewards. Indeed, without an accurate prediction of a future reward, there can be no disappointment when it does not occur. And without an accurate prediction of future pain, there can be no relief when it does not occur.

23%

Indeed, you can train vertebrates, even fish, to perform arbitrary actions not only with rewards and punishments but also with the omission of expected rewards or punishments.

23%

A nematode, on the other hand, cannot learn to perform arbitrary behaviors through the omission of rewards. Even crabs and honeybees, who independently evolved many intellectual faculties, are not be able to learn from omission of things.*

23%

Bacteria, animals, and plants all have circadian clocks to track the cycle of the day. But vertebrates are unique in the precision with which they can measure time. A verterbate can remember that one event occurs precisely five seconds after another event. In contrast, simple bilaterians like slugs and flatworms are entirely unable to learn the precise time intervals between events.

23%

TD learning, disappointment, relief, and the perception of time are all related. The precise perception of time is a necessary ingredient to learn from omission,

23%

The Basal Ganglia My favorite part of the brain is a structure called the basal ganglia. For most brain structures, the more one learns about them, the less one understands them—simplified

23%

The basal ganglia is wedged between the cortex and the thalamus

23%

The input to the basal ganglia comes from the cortex, thalamus, and midbrain, enabling the basal ganglia to monitor an animal’s actions and external environment.

23%

The basal ganglia is thereby in a perpetual state of gating and ungating specific actions, operating as a global puppeteer of an animal’s behavior. The functioning of the basal ganglia is essential to our lives.

23%

This symptom of Parkinson’s disease primarily emerges due to disruption of the basal ganglia, leaving it in a perpetual state of gating all actions, thereby depriving patients of the ability to initiate even the simplest of movements.

23%

the basal ganglia also receives input from a cluster of dopamine neurons. Whenever these dopamine neurons get excited, the basal ganglia is rapidly flooded with dopamine; whenever these dopamine neurons are inhibited, the basal ganglia is rapidly starved of dopamine.

23%

The basal ganglia learns to repeat actions that maximize dopamine release. Through the basal ganglia, actions that lead to dopamine release become more likely to occur (the basal ganglia ungates those actions), and actions that lead to dopmaine inhibition become less likely to occur (the basal ganglia gates those actions).

23%

The basal ganglia is, in part, Sutton’s “actor”—a system designed to repeat behaviors that lead to reinforcement and inhibit behaviors that lead to punishment. Remarkably, the circuitry of the basal ganglia is practically identical between a human brain and a lamprey fish brain, two species whose shared ancestors were the first vertebrates over 500 million years ago.

24%

Reinforcement learning emerged not from the basal ganglia acting alone, but from an ancient interplay between the basal ganglia and another uniquely vertebrate structure called the hypothalamus, which is a small structure at the base of the forebrain.

24%

so, in some ways, the basal ganglia is a student, always trying to satisfy its vague but stern hypothalamic judge.

24%

The hypothalamus is the decider of actual rewards;

24%

One leading theory of basal ganglia function is that these parallel circuits are literally Sutton’s actor-critic system for implementing temporal difference learning. One circuit is the “actor,” learning to repeat the behaviors that trigger dopamine release; the other circuit is the “critic,” learning to predict future rewards and trigger its own dopamine activation.

24%

the basal ganglian student initially learns solely from the hypothalamic judge, but over time learns to judge itself, knowing when it makes a mistake before the hypothalamus gives any feedback.

24%

TD learning, the wiring of vertebrate basal ganglia, the properties of dopamine responses, the ability to learn precise time intervals, and the ability to learn from omissions are all interwoven into the same mechanisms for making trial-and-error learning work.

24%

But this time was different—our vertebrate ancestor would remember the smell of that dangerous arthropod; she would remember the sight of its eyes peeking through the sand. She wouldn’t make the same mistake again. Sometime around five hundred million years ago, our ancestor evolved pattern recognition.

25%

Early vertebrates could recognize things using brain structures that decoded patterns of neurons. This dramatically expanded the scope of what animals could perceive. Within the small mosaic of only fifty types of olfactory neurons lived a universe of different patterns that could be recognized. Fifty cells can represent over one hundred trillion patterns.*

25%

The above type of learning, in which a network is trained by providing examples alongside the correct answer, is called supervised learning (a human has supervised the learning process by providing the network with the correct answers). Many supervised learning methods are more complex than this, but the principle is the same: the correct answers are provided, and networks are tweaked using backpropagation to update weights until the categorization of input patterns is sufficiently accurate. This design has proven to work so generally that it is now applied to image recognition, natural ...more

25%

the brain does not do supervised learning—you are not given labeled data when you learn that one smell is an egg and another is a strawberry. Even before children learn the words egg and strawberry, they can clearly recognize that they are different. Second, backpropagation is biologically implausible. Backpropagation works by magically nudging millions of synapses simultaneously and in exactly the right amount to move the output of the network in the right direction. There is no conceivable way the brain could do this. So then how does the brain recognize patterns?

26%

The next time a pattern shows up, even if it is incomplete, the full pattern can be reactivated in the cortex. This trick is called auto-association; neurons in the cortex automatically learn associations with themselves. This offers a solution to the generalization problem—the cortex can recognize a pattern that is similar but not the same.

26%

Auto-association reveals an important way in which vertebrate memory differs from computer memory. Auto-association suggests that vertebrate brains use content-addressable memory—memories are recalled by providing subsets of the original experience, which reactivate the original pattern.

Ron Jenkins liked this

26%

However, computers use register-addressable memory—memories that can be recalled only if you have the unique memory address for them. If you lose the address, you lose the memory.

Ron Jenkins liked this

26%

Register-addressable memory enables computers to segregate where information is stored, ensuring that new information does not overwrite old information. In contrast, auto-associative information is stored in a shared population of neurons, which exposes it to the risk of accidentally overwriting old memories. Indeed, as we will see, this is an essential challenge with pattern recognition using networks of neurons.

Ron Jenkins liked this

26%

when you train a neural network to recognize a new pattern or perform a new task, you risk interfering with the network’s previously learned patterns. How do modern AI systems overcome this problem? Well, they don’t yet. Programmers merely avoid the problem by freezing their AI systems after they are trained. We don’t let AI systems learn things sequentially; they learn things all at once and then stop learning.

Ron Jenkins liked this

26%

As of this book going to print, even ChatGPT, the famous chatbot released by OpenAI, does not continually learn from the millions of people who speak to it. It too stopped learning the moment it was released into the world. These systems are not allowed to learn new things because of the risk that they will forget old things (or learn the wrong things). So modern AI systems are frozen in time, their parameters locked down; they are allowed to be updated only when retrained from scratch with humans meticulously monitoring their performance on all the relevant tasks.

26%

even early bilaterians learned continually; the connections between neurons were strengthened and weakened with each new experience. But these early bilaterians never faced the problem of catastrophic forgetting because they never learned patterns in the first place. If things are recognized in the world using only individual sensory neurons, then the connection between these sensory neurons and motor neurons can be strengthened and weakened without interfering with each other. It is only when knowledge is represented in a pattern of neurons, like in artificial neural networks or in the cortex ...more

Ron Jenkins liked this

26%

even fish avoid catastrophic forgetting fantastically well. Train a fish to escape from a net through a small escape hatch, leave the fish alone for an entire year, and then test it again. During this long stretch of time, its brain will have received a constant stream of patterns, learning continually to recognize new smells, sights, and sounds. And yet, when you place the fish back in the same net an entire year later, it will remember how to get out with almost the same speed and accuracy as it did the year before.

26%

One theory is that the cortex’s ability to perform pattern separation shields it from the problem of catastrophic forgetting; by separating incoming patterns in the cortex, patterns are inherently unlikely to interfere with each other. Another theory is that learning in the cortex selectively occurs only during moments of surprise; only when the cortex sees a pattern that passes some threshold of novelty are the weights of synapses allowed to change.

26%

There is some evidence that the wiring between the cortex and the thalamus—both structures that emerged alongside each other in early vertebrates—are always measuring the level of novelty between incoming sensory data through the thalamus and the patterns represented in the cortex. If there is a match, then no learning is allowed, hence noisy inputs don’t interfere with existing learned patterns. However, if there is a mismatch—if an incoming pattern is sufficiently new—then this triggers a process of neuromodulator release, which triggers changes in synaptic connections in the cortex, ...more

27%

The minuscule half-millimeter-thick membrane in the back of the eye—the retina—contains over one hundred million neurons of five different types. Each region of the retina receives input from a different location of the visual field, and each type of neuron is sensitive to different colors and contrasts. As you view each object, a unique pattern of neurons activates a symphony of spikes.

See a Problem?

Preview — A Brief History of Intelligence by Max Solomon Bennett