AI & I: An Intellectual History of Artificial Intelligence
Rate it:
Open Preview
Kindle Notes & Highlights
Read between November 28, 2024 - January 6, 2025
33%
Flag icon
This is also an exponential, 2n. Since there are sixty-four squares on a chess-board, the last square would require 264 = 1.8446744 · 1019 grains of wheat, which is about the number of atoms in the universe.)
41%
Flag icon
The purpose of a language model is to remove from consideration word sequences that might correspond to the sounds but make no sense. The classic problem is homophones—a group of words with different spellings and different meanings, but that sound exactly the same.
42%
Flag icon
You may remember in our discussion of learning for computer vision, we said that it was important to divide our research data, such as sets of labeled images, into two or three subsets, one each for training, validation, and testing.
45%
Flag icon
A common joke at the time was that the saying “The spirit is willing, but the flesh is weak” (but expressed in Russian) was translated into English as “The vodka is good, but the meat is rotten.”
46%
Flag icon
The law in Canada requires that their parliament publish its proceedings in both of their official languages, French and English. Also, very early on, the government made the proceedings available online.
47%
Flag icon
Language models have the nice feature of being practically self-evaluating. We have introduced language models as a way of improving machine translation. But they are a small part of MT, and evaluating MT is difficult. (The best way is using them to translate the same set of sentences and simply asking a person which did the better job.)
48%
Flag icon
While I found grammar useless, I have spent much of my academic career trying to teach it to computers. The real purpose of grammar is not to improve your writing but, rather, to help construct the meaning of sentences from the meanings of their parts.
50%
Flag icon
“All grammars leak,” by which he meant that it was not possible to write a complete grammar for a natural language.
50%
Flag icon
Perhaps even worse, as you include more rules in the grammar, sentences that previously were unambiguous suddenly had extra structures that made little or no sense.
50%
Flag icon
The solution came from an unexpected direction. In 1994, a group at University of Pennsylvania led by Mitchell Marcus published the Penn treebank, a collection of one million words of text from the Wall Street Journal together with the structure of the sentences in the form of syntactic trees [61]. From the treebank, it is possible to read off all of the context-free rules necessary to assign the correct tree for all the sentences therein. This pretty much solved the grammar leakage problem.
55%
Flag icon
Vacuum tubes and transistors can act as switches, and it was a big deal when transistors replaced vacuum tubes in both radios and computers.
55%
Flag icon
It may be possible to buy an individual transistor these days, but don’t look inside your laptop or cell phone; they are there, but invisible, because they have been miniaturized so millions can fit on the chips.)
55%
Flag icon
In the early days of the field, AI researchers used much the same computer hardware as everyone else.
55%
Flag icon
“Matrix” is just a fancy word for a two-dimensional “array” or table.
56%
Flag icon
So, any machine designed for computer gaming has a graphics processing unit, or GPU, besides its standard CPUs. You can think of a GPU as 1,000 slow CPUs—say, one-tenth the speed—but because there are so many of them, when they are set to a task for which they are designed—matrix operations are the most common—they are 100 times faster than CPUs.
56%
Flag icon
Since then, AI has become a booming business, with a multicompany race to build the next generation of still faster AI hardware. The basic idea is to make the hardware increasingly specialized to AI tasks. One we have seen is convolution. Another is a recent scheme called transformers, to be discussed in chapter 10. Since these processors are specialized for AI and not graphics, the new term for this kind of processing unit is AI accelerator. Accelerators typically have the equivalent of several GPU processors.
56%
Flag icon
After all, visual information comes as light intensities, which are just numbers, and numbers are at the center of how perceptrons, neural networks, deep learning, and so on all work.
57%
Flag icon
There is a famous saying in linguistics, “You shall know a word by the company it keeps,” so similar words have similar neighbors [108].
58%
Flag icon
In 2016, Google started replacing its statistical MT system, which was based on the technology described in chapter 7, with neural network methods. (I remember chatter on the web because people noticed their translations improving from one week to the next.)
59%
Flag icon
As noted earlier, Go is a two-player game of perfect information. It is also a board game, though crucially, the board has more positions, 19 × 19 = 361 compared to 8 × 8 = 64 for chess.
59%
Flag icon
Go has been played in China for about 2,500 years—it makes Western civilization look very young indeed.
59%
Flag icon
That the key to DeepMind’s success was its use of deep learning [97] had a major impact on the field, your author most definitely included.
60%
Flag icon
Instead AI researchers, not just at MIT but also at Stanford and CMU, pursued planning or problem solving.
60%
Flag icon
Contrast this with reinforcement learning. In RL, the program is given a reward function, not a goal.
63%
Flag icon
In 2017, in an attempt to improve machine translation, a group at Google published a new NN model for MT. They called it the transformer model, and the paper was titled “Attention Is All You Need” [104]; see figure 10.1. While transformers were initially created to improve machine translation, they were quickly adapted to language modeling, and that is where they have had the largest impact. Thus, we present the simplified version that was adopted early on by the group at OpenAI for LMs.
66%
Flag icon
GPT stands for generative pretrained transformer language model.
66%
Flag icon
When you use a language model to generate text, you first compute the probability of all possible next word pieces.
66%
Flag icon
Another possibility is to choose a word piece according to the distribution proposed by the LM.
66%
Flag icon
In the paper announcing GPT-2, the OpenAI group make the case that language models are by their very nature multitask programs, instead of needing to design separate programs, one per task.
67%
Flag icon
Indeed, two subsequent models, one from OpenAI, GPT-3 with 175 billion parameters in 2020 [10], and one from Google, PaLM with 540 billion (in 2022) [16], have racked up the best scores yet.
68%
Flag icon
It’s unclear how we’d distinguish “real understanding” from “fake understanding.” Until such time as we can make such a distinction, we should probably just retire the idea of “fake understanding.”
68%
Flag icon
The idea is that since my laptop does not have an inner life, neither does LaMDA. The philosopher David Chalmers has been a proponent of this idea [13].
72%
Flag icon
As opposed to thriller movies with science backgrounds, it is very unusual for any small group of scientists to be that far ahead of the rest of the field that others cannot replicate their work if they really care to, and the protein structure prediction problem is important, so the others really cared to! Knowing that something can be done is half the battle, and other groups’ results started improving rather quickly.
72%
Flag icon
since the start of 2023 saw more publicity and controversy about AI than all the previous sixty-five years combined.
72%
Flag icon
Suddenly, it became apparent that this was no longer your grandmother’s AI.
72%
Flag icon
Explaining how ChatGPT differs from GPT-3 is complicated. ChatGPT starts with GPT-3 as its base and, thus, can be correctly described as an LLM. However, it moves beyond that description in that it also uses reinforcement learning (see section 9.3).
72%
Flag icon
GPT-4 followed a few months later. Besides being much more accurate than ChatGPT, it has one significant addition—it can accept images as part of the prompt.
73%
Flag icon
One final note: besides writing, another occupation that requires putting down one symbol after another is programming, and both ChatGPT and GPT-4 have been trained on not just language texts but also computer programs. GPT-4, in particular, is pretty good. I do not think programmers are in danger of mass layoffs, but it will change the nature of the job. I know it is already changing college-level courses on the topic.
74%
Flag icon
Most AI researchers disagree with me, but in my estimation, AI has little to show for its first fifty years.
74%
Flag icon
Furthermore, large language models are impressive, but their abilities have large gaps. Common sense reasoning is a big one, so there are still some other ingredients required.
74%
Flag icon
To add fuel to the fire, it asks, if this is not agreed to voluntarily, that “governments should step in.” This is scary stuff—not AI, but the idea that in the twenty-first-century, there are calls to put engineering and scientific researchers in jail.
74%
Flag icon
AI agents are going to be built, not evolved, so unless their builders go to a lot of trouble, AIs will have no such compulsions.
75%
Flag icon
As for me, I tend to side with the cognitive scientist Steven Pinker who said about the AI apocalypse in general: It depends on the hypothesis that humans are so gifted that they can design an omniscient and omnipotent AI, yet so moronic that they would give it control of the universe without testing how it works [79]. To which I would only add, “or including an off switch.”
75%
Flag icon
The physical aspects of robotics is hard—in my estimation, much harder than the rest of AI, and to a large degree separate. I had a reason for not including robotics in this history.
76%
Flag icon
Large language models in particular are telling us something profound about the nature of understanding in both computers and people, even if we don’t understand yet what it is telling us.