More on this book
Community
Kindle Notes & Highlights
The third and final part of each pattern is the set of higher-level patterns that it in turn is part of. For the letter “A,” this is all of the words that include “A.” These are, again, like Web links. Each recognized pattern at one level triggers the next level that part of that higher-level pattern is present. In the neocortex, these links are represented by physical dendrites that flow into neurons in each cortical pattern recognizer. Keep in mind that each neuron can receive inputs from multiple dendrites yet produces a single output on an axon. That axon, however, can then in turn
...more
To take some simple examples, the simple patterns on the next page are a small subset of the patterns used to make up printed letters. Note that every level constitutes a pattern. In this case, the shapes are patterns, the letters are patterns, and the words are also patterns. Each of these patterns has a set of inputs, a process of pattern recognition (based on the inputs that take place in the module), and an output (which feeds to the next higher level of pattern recognizer).
Patterns that are part of the higher-level pattern “P.” “L”:
Patterns that are part of the higher-level pattern “L.” “E”: Patterns that are part of the higher-level pattern “E.”
These letter patterns feed up to an even higher-level pattern in a category called words. (The word “words” is our language category for this concept, but the neocorte...
This highlight has been truncated due to consecutive passage length restrictions.
In a different part of the cortex is a comparable hierarchy of pattern recognizers processing actual images of objects (as opposed to printed letters). If you are looking at an actual apple, low-level recognizers will detect curved edges and surface color patterns leading up to a pattern recognizer firing its axon and saying in effect, “Hey guys, I just saw an actual apple.” Yet other pattern recognizers will detect combinations of frequencies of sound leading up to a pattern recognizer in the auditory cortex that might fire its axon indicating, “I just heard the spoken word ‘apple.’”
Keep in mind the redundancy factor—we don’t just have a single pattern recognizer for “apple” in each of its forms (written, spoken, visual). There are likely to be hundreds of such recognizers firing, if not more. The redundancy not only increases the likelihood that you will successfully recognize each instance of an apple but also deals with the variations in real-world apples. For apple objects, there will be pattern recognizers that deal with the many varied forms of apples: different views, colors, shadings, shapes, and varieties.
Also keep in mind that the hierarchy shown above is a hierarchy of concepts. These recognizers are not physically placed above each other; because of the thin construction of the neocortex, it is physically only one pattern recognizer high. The conceptual hierarchy is...
This highlight has been truncated due to consecutive passage length restrictions.
An important attribute of the PRTM is how the recognitions are made inside each pattern recognition module. Stored in the module is a weight for each input dendrite indicating how important that input is to the recognition. The pattern recognizer has a threshold for firing (which indicates that this pattern recognizer has successfully recognized the pattern it is responsible for). Not every input pattern has to be present for a recognizer to fire. The recognizer may still fire if an input with a low weight is missing, but it is less likely to fire if a hi...
This highlight has been truncated due to consecutive passage length restrictions.
Successful recognition by a module of its pattern goes beyond just counting the input signals that are activated (even a count weighted by the importance parameter). The size (of each input) matters. There is another parameter (for each input) indicating the expected size of the input, and yet another indicating how variable that size is. To appreciate how this works, suppose we have a pattern recognizer that is responsible for recognizing the spoken word “steep.” This spoken word has four sounds: [s], [t], [E], and [p]. The [t] phoneme is what is known as a “dental consonant,” meaning that it
...more
This highlight has been truncated due to consecutive passage length restrictions.
In our work in speech recognition, we found that it is necessary to encode this type of information in order to recognize speech patterns. For example, the words “step” and “steep” are very similar. Although the [e] phoneme in “step” and the [E] in “steep” are somewhat different vowel sounds (in that they have different resonant frequencies), it is not reliable to distinguish these two words based on these often confusable vowel sounds. It is much m...
This highlight has been truncated due to consecutive passage length restrictions.
We can encode this type of information with two numbers for each input: the expected size and the degree of variability of that size. In our “steep” example, [t] and [p] would both have a very short expected duration as well as a small expected variability (that is, we do not expect to hear long t’s and p’s). The [s] sound would have a short expected duration but a larger variability because it is pos...
This highlight has been truncated due to consecutive passage length restrictions.
In our speech examples, the “size” parameter refers to duration, but time is only one possible dimension. In our work in character recognition, we found that comparable spatial information was important in order to recognize printed letters (for example the dot over the letter “i” is expected to be much smaller than the portion under the dot). At much higher levels of abstraction, the neocortex will deal with patterns with all sorts of continuums, such as levels of attractiveness, irony, happiness, frustration, and myriad others. We can draw similarities acros...
This highlight has been truncated due to consecutive passage length restrictions.
In a biological brain, the source of these parameters comes from the brain’s own experience. We are not born with an innate knowledge of phonemes; indeed different languages have very different sets of them. This implies that multiple examples of a pattern are encoded in the learned parameters of each pattern recognizer (as it requires multiple instances of a pattern to ascertain the expected distribution of magnitudes of the inputs to the pattern). In some AI systems, these types of parameters are hand-coded by experts (for example, linguists who can tell us the expected durations of
...more
This highlight has been truncated due to consecutive passage length restrictions.
Each particular input to the module is active if the corresponding lower-level pattern recognizer is firing (meaning that that lower-level pattern was recognized).
In the 1980s and 1990s, I and others pioneered a mathematical method called hierarchical hidden Markov models for learning these parameters and then using them to recognize hierarchical patterns. We used this technique in the recognition of human speech as well as the understanding of natural language.
If we go up several dozen more levels, we get to higher-level concepts like irony and envy. Even though every pattern recognizer is working simultaneously, it does take time for recognitions to move upward in this conceptual hierarchy. Traversing each level takes between a few hundredths to a few tenths of a second to process. Experiments have shown that a moderately high-level pattern such as a face takes at least a tenth of a second. It can take as long as an entire second if there are significant distortions.
A very important point to note here is that information flows down the conceptual hierarchy as well as up. If anything, this downward flow is even more significant. If, for example, we are reading from left to right and have already seen and recognized the letters “A,” “P,” “P,” and “L,” the “APPLE” recognizer will predict that it is likely to see an “E” in the next position. It will send a signal down to the “E” recognizer saying, in effect, “Please be aware that there is a high likelihood that you will see your ‘E’ pattern very soon, so be on the lookout for it.”
The “E” recognizer then adjusts its threshold such that it is more likely to recognize an “E.” So if an image appears next that is vaguely like an “E,” but is perhaps smudged such that it would not have been recognized as an “E” under “normal” circumstances, the “E” recognizer may nonetheless indicate that it has indeed seen an “E,” since it was expected.
The neocortex is, therefore, predicting what it expects to encounter. Envisaging the future is one of the primary reasons we have a neocortex. At the highest conceptual level, we are continually making predictions—who is going to walk through the door next, what someone is likely to say next, what we expect to see when we turn the corner, the likely results of our own actions, and so on. These predictions are constantly occurring at every level of the neocortex hierarc...
This highlight has been truncated due to consecutive passage length restrictions.
In addition to positive signals, there are also negative or inhibitory signals which indicate that a certain pattern is less likely to exist. These can come from lower conceptual levels (for example, the recognition of a mustache will inhibit the likelihood that a person I see in the checkout line is my wife), or from a higher level (for example, I know that my wife is on a trip, so the person in the checkout line can’t be she). When a pattern recognizer receives an inhibitory signal, it raises the recognit...
This highlight has been truncated due to consecutive passage length restrictions.
So it would seem that the input to a neocortex pattern processor must comprise two- if not three-dimensional patterns. However, we can see in the structure of the neocortex that the pattern inputs are only one-dimensional lists. All of our work in the field of creating artificial pattern recognition systems (such as speech recognition and visual recognition systems) demonstrates that we can (and did) represent two- and three-dimensional phenomena with such one-dimensional lists.
We should factor in at this point the insight that the patterns we have learned to recognize (for example, a specific dog or the general idea of a “dog,” a musical note or a piece of music) are exactly the same mechanism that is the basis for our memories. Our memories are in fact patterns organized as lists (where each item in each list is another pattern in the cortical hierarchy) that we have learned and then recognize when presented with the appropriate stimulus. In fact, memories exist in the neocortex in order to be recognized.
Even if we do have some level of awareness of the memories (that is, the patterns) that triggered the old memory, memories (patterns) do not have language or image labels. This is the reason why old memories may seem to suddenly jump into our awareness. Having been buried and not activated for perhaps years, they need a trigger in the same way that a Web page needs a Web link to be activated. And just as a Web page can become “orphaned” because no other page links to it, the same thing can happen to our memories.
Our thoughts are largely activated in one of two modes, undirected and directed, both of which use these same cortical links. In the undirected mode, we let the links play themselves out without attempting to move them in any particular direction. Some forms of meditation (such as Transcendental Meditation, which I practice) are based on letting the mind do exactly this. Dreams have this quality as well.
In directed thinking we attempt to step through a more orderly process of recalling a memory (a story, for...
This highlight has been truncated due to consecutive passage length restrictions.
Because these patterns are not labeled with words or sounds or pictures or videos, when you try to recall a significant event, you will essentially be reconstructing the images in your mind, because the actual images do not exist.
If we were to “read” the mind of someone and peer at exactly what is going on in her neocortex, it would be very difficult to interpret her memories, whether we were to take a look at patterns that are simply stored in the neocortex waiting to be triggered or those that have been triggered and are currently being experienced as active thoughts. What we would “see” is the simultaneous activation of millions of pattern recognizers. A hundredth of a second later, we would see a different set of a comparable number of activated pattern recognizers. Each such pattern would be a list of other
...more
Language is itself highly hierarchical and evolved to take advantage of the hierarchical nature of the neocortex, which in turn reflects the hierarchical nature of reality. The innate ability of humans to learn the hierarchical structures in language that Noam Chomsky wrote about reflects the structure of the neocortex.
Chomsky cites the attribute of “recursion” as accounting for the unique language faculty of the human species.4 Recursion, according to Chomsky, is the ability to put together small parts into a larger chunk, and then use that chunk as a part in yet another structure, and to continue this process iteratively. In this way we are able to build the elaborate structures of sentences and paragraphs from a limited set of words.
Every important pattern—at every level—is repeated many times. Some of these recurrences represent simple repetitions, whereas many represent different perspectives and vantage points. This is a principal reason why we can recognize a familiar face from various orientations and under a range of lighting conditions. Each level up the hierarchy has substantial redundancy, allowing sufficient variability that is consistent with that concept.
The apparent lushness of human experience is a result of the fact that all of the hundreds of millions of pattern recognizers in our neocortex are considering their inputs simultaneously.
These early inputs are processed by cortical regions that are devoted to relevant types of sensory input (although there is enormous plasticity in the assignment of these regions, reflecting the basic uniformity of function in the neocortex).
When we hear something that perhaps sounds like our spouse’s voice, and then see something that is perhaps indicative of her presence, we don’t engage in an elaborate process of logical deduction; rather, we instantly perceive that our spouse is present from the combination of these sensory recognitions. We integrate all of the germane sensory and perceptual cues—perhaps even the smell of her perfume or his cologne—as one multilevel perception.
In the previous chapter I noted that we can often recognize a pattern even though we don’t recognize it well enough to be able to describe it.
However, if I don’t think about her for a given period of time, then these pattern recognizers will become reassigned to other patterns. That is why memories grow dimmer with time: The amount of redundancy becomes reduced until certain memories become extinct.
Autoassociation and Invariance
In the previous chapter I discussed how we can recognize a pattern even if the entire pattern is not present, and also if it is distorted. The first capability is called autoassociation: the ability to associate a pattern with a part of itself. The structure of each pattern recognizer inherently supports this capability.
As each input from a lower-level pattern recognizer flows up to a higher-level one, the connection can have a “weight,” indicating how important that particular element in the pattern is. Thus the more significant elements of a pattern are more heavily weighted in considering whether that pattern should trigger as “recognized.” Lincoln’s beard, Elvis’s sideburns, and Einstein’s famous tongue gesture ar...
This highlight has been truncated due to consecutive passage length restrictions.
As I pointed out, the computation of the overall probability (that the pattern is present) is more complicated than a simple weighted sum in that the size parameters also need to be considered.
If the pattern recognizer has received a signal from a higher-level recognizer that its pattern is “expected,” then the threshold is effectively lowered (that is, made easier to achieve). Alternatively, such a signal may simply add to the total of the weighted inputs, thereby compensating for a missing element.
The ability to recognize patterns even when aspects of them are transformed is called feature invariance, and is dealt with in four ways. First, there are global transformations that are accomplished before the neocortex receives sensory data. We will discuss the voyage of sensory data from the eyes, ears, and skin in the section “The Sensory Pathway” on page 94.
The second method takes advantage of the redundancy in our cortical pattern memory. Especially for important items, we have learned many different perspectives and vantage points for each pattern. Thus many variations are separately stored and processed.
The third and most powerful method is the ability to combine two lists. One list can have a set of transformations that we have learned may apply to a certain category of pattern; the cortex will apply this same list of possible changes to another pattern. That ...
This highlight has been truncated due to consecutive passage length restrictions.
For example, we have learned that certain phonemes (the basic sounds of language) may be missing in spoken speech (for example, “goin’”). If we then learn a new spoken word (for example, “driving”), we will be able to recognize that word if one of its phonemes is missing even if we have never experienced that word in that form before, beca...
This highlight has been truncated due to consecutive passage length restrictions.
Certain artistic modifications emphasize the very features that are recognized by our pattern recognition–based neocortex. As mentioned, that is precisely the basis of caricature.
The fourth method derives from the size parameters that allow a single module to encode multiple instances of a pattern. For example, we have heard the word “steep” many times. A particular pattern recognition module that is recognizing this spoken word can encode these multiple examples by indicating that the duration of [E] has a high expected variability. If all the modules for words including [E] share a similar phenomenon, that variability could be encoded in the models for [E] itself.
Are we not ourselves creating our successors in the supremacy of the earth? Daily adding to the beauty and delicacy of their organization, daily giving them greater skill and supplying more and more of that self-regulating self-acting power which will be better than any intellect? —Samuel Butler, 1871
The principal activities of brains are making changes in themselves. —Marvin Minsky, The Society of Mind
Learning and recognition take place simultaneously. We start learning immediately, and as soon as we’ve learned a pattern, we immediately start recognizing it.

