More on this book
Community
Kindle Notes & Highlights
by
Cade Metz
Read between
August 6 - August 14, 2023
As Microsoft vice president Peter Lee told Bloomberg Businessweek,7 the cost of acquiring an AI researcher was akin to the cost of acquiring an NFL quarterback.
After Facebook unveiled its research lab and Google acquired DeepMind, it was announced that Andrew Ng would be running labs in both Silicon Valley and Beijing—for Baidu.8
The trick, Alan Eustace believed, was to surround yourself with people who could apply new kinds of expertise to problems that seemed unsolvable with the old techniques. “Most people look at particular problems from a particular point of view and a particular perspective and a particular history,” he says. “They don’t look at the intersections of expertise that will change the picture.”
“People ask me a lot: ‘Are you a daredevil?’ But I am the opposite of a daredevil,” he says. “I hire the best people I can find, and we all work together to basically eliminate every possible risk and test for every risk and try to get to the point where what is seemingly very dangerous is actually very safe.”
A network of forty thousand GPUs would cost the company about $130 million, and though Google regularly invested such enormous sums of money in its data center hardware, it had never invested in hardware like this. So Dean and Giannandrea took their request to Alan Eustace, who was about to make his leap from the stratosphere. Eustace understood. He took the request to Larry Page, and just before he broke Baumgartner’s skydiving record in a scuba suit,2 he secured $130 million in graphic chips. Less than a month after the chips were installed, all forty thousand of them were running around the
...more
With help from those forty thousand GPU chips and soon many more—a data center overhaul the company called Project Mack Truck—deep learning moved into everything from the Google Photos app, where it instantly found objects in a sea of images, to Gmail, where it helped predict the word you were about to type. It also greased the wheels inside Ad-Words, the online ad system that generated a vast majority of the company’s $56 billion in annual revenue.4 By analyzing data showing which ads people had clicked on in the past, deep learning could help predict what they would click on in the future.
...more
In London, Demis Hassabis soon revealed that DeepMind had built a system that reduced power consumption across Google’s9 network of data centers, drawing on the same techniques the lab used to crack Breakout. This system decided when to turn on cooling fans inside individual computer servers and when to turn them off,10 when to open the data center windows for additional cooling and when to close them, when to use chillers and cooling towers and when the servers could get by without them. The Google data centers were so large and the DeepMind technology was so effective,11 Hassabis said, it
...more
The press needed heroes for its AI narrative. It chose Hinton, LeCun, Bengio, and sometimes Ng, thanks largely to the promotional efforts of Google and Facebook.
The narrative did not extend to Jürgen Schmidhuber, the German researcher based on Lake Lugano who carried the torch for neural networks in Europe during the 1990s and 2000s. Some took issue with Schmidhuber’s exclusion, including Schmidhuber.
Sitting at his desk inside Chauffeur, Krizhevsky was at the heart of this AI boom, but he didn’t see his role as all that important, and he didn’t see any of it as artificial intelligence. It was deep learning, and deep learning was just mathematics, pattern recognition, or, as he called it, “nonlinear regression.” These techniques had been around for decades. It was merely that people like him had come along at the right time, when there was enough data and enough processing power to make it all work. The technologies he built were in no way intelligent. They worked only in very particular
...more
Google Brain had already explored a technology called “word embeddings.”14 This involved using a neural network to build a mathematical map of the English language by analyzing a vast collection of text—news articles, Wikipedia articles, self-published books—to show the relationship between every word in the language and every other word. This wasn’t a map you could ever hope to visualize. It didn’t have two dimensions, like a road map, or three dimensions, like a video game. It had thousands of dimensions, like nothing you’ve ever seen, or could ever see. On this map, the word “Harvard” was
...more
In December 2014, back at the NIPS conference, this time in Montreal, Sutskever presented a paper describing their work to a roomful of researchers from across the globe.16 The strength of the system, he told his audience, was its simplicity. “We use minimum innovation for maximum results,” he said, as applause rippled across the crowd, catching even him by surprise.
“The real conclusion is that if you have a very large dataset and a very large neural network,” he told his audience, “then success is guaranteed.”
Over the next eighteen months, Google Brain took this prototype and transformed it into a commercial system used by millions of people, an echo of what the lab had done with Navdeep Jaitly’s speech prototype three years earlier. But here, the lab changed the equation in a way that would send another ripple across the field and, in the end, amplify the ambitions of Ilya Sutskever and many others.
The result was the tensor processing unit, or TPU. It was designed to process the tensors—mathematical objects—that under-pinned a neural network. The trick was that its calculations were less precise than typical processors.20 The number of calculations made by a neural network was so vast, each calculation didn’t have to be exact. It dealt in integers rather than floating point numbers. Rather than multiply 13.646 by 45.828, the TPU lopped off the decimal points and just multiplied 13 and 45. That meant it could perform trillions of extra calculations each second—exactly what Dean and his
...more
In the end, the Google engineers beat Dean’s deadline by three months, and the difference was the TPU. A sentence that needed ten seconds to translate on ordinary hardware back in February could translate in milliseconds with help from the new Google chip.32 They released the first incarnation of the service just after Labor Day,33 well before Baidu. “I was amazed that it worked that well. I think everybody was,” Hinton says. “Nobody expected it to work so well, so soon.”
The same tweet urged his followers to read Superintelligence:8 Paths, Dangers, Strategies, a recently published tome from an Oxford University philosopher named Nick Bostrom. Like Shane Legg, the founder of DeepMind, Bostrom believed that superintelligence could secure the future of humanity—or destroy it.9 “This is quite possibly the most important and most daunting challenge humanity has ever faced,”10 he wrote. “And—whether we succeed or fail—it is probably the last challenge we will ever face.” His concern was that scientists would design a system to perfect a particular part of our lives
...more
DAYS after the dinner in Palo Alto, Elon Musk phoned Yann LeCun. He said he was building a self-driving car at Tesla, and he asked LeCun who he should hire to run the project. That week, he contacted several other Facebook researchers, asking each the same question—a gambit that eventually raised the ire of Mark Zuckerberg.
Back in 2008, Shane Legg described this attitude in his thesis,19 arguing that although the risks were great, so were the potential rewards. “If there is ever to be something approaching absolute power,20 a super intelligent machine would come close. By definition, it would be capable of achieving a vast range of goals in a wide range of environments,” he wrote. “If we carefully prepare for this possibility in advance, not only might we avert disaster, we might bring about an age of prosperity unlike anything seen before.”
The Future of Life Institute was less than a year old in the fall of 2014 when it invited this growing community to a private summit in Puerto Rico.21 Led by an MIT cosmologist and physicist named Max Tegmark, it aimed to create a meeting of the minds along the lines of the Asilomar conference,22 a seminal 1975 gathering where the world’s leading geneticists discussed whether their work—gene editing—would end up destroying humanity.
For Musk, the threat of superintelligence was only one thing among many. His main concern, it seemed, was maximum attention. “He is a super-busy man, and he doesn’t have time to dig into the nuances of the issues, but he understands the basic outlines of the problem,” Tallinn says. “He also genuinely enjoys the press attention, which translates to his very slogan-y tweets, et cetera. There is a symbiosis between Elon and the press that annoys many AI researchers, and that is the price the community has to pay.”
Altman had brought them together in the hopes of building a new AI lab that could serve as a counterweight to the labs that were rapidly expanding inside the big Internet companies, but no one knew if that was possible. Brockman certainly wanted to build one after leaving Stripe, one of Y Combinator’s most successful companies. He had never actually worked in AI and had only recently bought his first GPU machine and trained his first neural network, but as he told Altman a few weeks earlier, he was intent on joining the new movement. So, too, was Musk, after watching the rise of deep learning
...more
This highlight has been truncated due to consecutive passage length restrictions.
When he drove home with Altman that night, Brockman vowed to build the new lab they all seemed to want.
Musk and Altman and Brockman had no choice but to delay their announcement while they waited for Sutskever to decide. He phoned his parents back in Toronto, and as he continued to weigh the pros and cons, Brockman sent text after text urging him to choose OpenAI. This went on for days. Finally, on Friday, the last day of the conference, Brockman and the others decided they needed to announce the lab with or without him. The announcement was set for three p.m., and that time came and went, without an announcement and without a decision from Sutskever. Then he texted Brockman to say he was in.
Brin said that when he and Page were building Google at Stanford, he played so much Go that Page worried their company would never happen. Hassabis said that if he and his team wanted to, they could build a system capable of beating the world champion. “I thought that was impossible,” Brin said. In that moment, Hassabis resolved to do it.
The next morning, David Silver slipped into the control room, just so he could revisit the decisions AlphaGo made in choosing Move 37. In the midst of each game, drawing on its training with tens of millions of human moves, AlphaGo calculated the probability that a human would make a particular play. With Move 37, the probability was one in ten thousand. AlphaGo knew this wasn’t a move a professional Go player would ever make. Yet it made the move anyway, drawing on the millions of games it had played with itself—games in which no human was involved. It had come to realize that although no
...more
Two days later, as he walked through the lobby of the Four Seasons, Hassabis explained the machine’s collapse. AlphaGo had assumed that no human would ever make Move 78. It calculated the odds at one in ten thousand—a very familiar number. Like AlphaGo before him, Lee Sedol had reached a new level, and he said as much during a private meeting with Hassabis on the last day of the match.
Feeding thousands of retinal scans from the Aravind Eye Hospital into a neural network, they taught it to recognize signs of diabetic blindness. Such was their success, Jeff Dean pulled them into the Google Brain lab, around the same time that DeepMind was tackling Go. The joke among Peng and the rest of her medically minded team was that they were a cancer that metastasized into the Brain. It wasn’t a very good joke. But it was not a bad analogy.
“You can think of AI as a large math problem where it sees patterns that humans can’t see,” says Eric Schmidt, the former Google chief executive and chairman. “With a lot of science and biology, there are patterns that exist that humans can’t see, and when pointed out, they will allow us to develop better drugs, better solutions.”
Still, Hinton believed that as Google continued its work with diabetic retinopathy and others explored systems for reading X-rays, MRIs, and other medical scans, deep learning would fundamentally change the industry. “I think that if you work as a radiologist you are like Wile E.7 Coyote in the cartoon,” he said during a lecture at a Toronto hospital. “You’re already over the edge of the cliff, but you haven’t yet looked down. There’s no ground underneath.” He argued that neural networks would eclipse the skills of trained doctors because they would continue to improve as researchers fed them
...more
In his history of artificial intelligence, The Master Algorithm, University of Washington professor Pedro Domingos called them “tribes.”3 Each tribe nurtured its own philosophy—and often looked down on the philosophies of others. The connectionists, who believed in deep learning, were one tribe. The symbolists, who believed in the symbolic methods championed by the likes of Marvin Minsky, were another. Other tribes believed in ideas ranging from statistical analysis to “evolutionary algorithms” that mimicked natural selection.
If Qi Lu was a prime example of the cosmopolitan nature of the AI community, his background made him one of the community’s most unlikely participants. Brought up by his grandfather in a poverty-stricken countryside at the height of Mao Zedong’s5 Cultural Revolution, he ate meat just once a year, when his family celebrated the spring festival, and attended a school where a single teacher taught four hundred students. Yet he overcame all the natural disadvantages he faced to take a degree in computing science at Shanghai’s Fudan University and to attract the attention, in the late ’80s, of the
...more
As he recovered from the first surgery on his hip, Lu urged the Microsoft brain trust to embrace the idea of a driverless car. Myriad tech companies and carmakers had a long head start with their autonomous vehicles, and Lu wasn’t exactly sure how Microsoft would enter this increasingly crowded market. But that wasn’t the issue. His argument wasn’t that Microsoft should sell a driverless car. It was that Microsoft should build one. This would give the company the skills and the technologies and the insight it needed to succeed in so many other areas. Google had come to dominate so many
...more
He offered a radically different solution. What they should do, he explained, was build a neural network that learned from another neural network. The first neural network would create an image and try to fool the second into thinking it was real. The second would pinpoint where the first went wrong. The first would try again. And so on. If these dueling neural networks dueled long enough, he said, they could build an image that looked like the real thing. Goodfellow’s colleagues were unimpressed. His idea, they said, was even worse than theirs. And if he hadn’t been slightly drunk, Goodfellow
...more
In the paper he published on the idea, he called them “generative adversarial networks,” or GANs. Across the worldwide community of AI researchers, he became “the GANfather.”
Goodfellow’s work sparked a long line of projects that refined and expanded and challenged his big idea. Researchers at the University of Wyoming built a system that generated tiny but perfect images of insects,5 churches, volcanos, restaurants, canyons, and banquet halls. A team at Nvidia built a neural network that could ingest a photo of a summer day and turn it into the dead of winter.6 A group at the University of California–Berkeley designed a system that converted horses into zebras and Monets into van Goghs.7 They were among the most eye-catching and intriguing projects across both
...more
Conceived and designed by Jeff Dean and his team, TensorFlow was the successor to DistBelief, the sweeping software system that trained deep neural networks across Google’s global network of data centers. But that was not all. After deploying the software in its own data centers, Google had open-sourced this creation, freely sharing the code with the world at large. This was a way of exerting its power across the tech landscape. If other companies, universities, government agencies, and individuals used Google’s software as they, too, pushed into deep learning, their efforts would feed the
...more
Two months later, the Chinese State Council unveiled its plan to become the world leader in artificial intelligence by 2030,19 aiming to surpass all rivals, including the United States, as it built a domestic industry worth more than $150 billion. China was treating artificial intelligence like its own Apollo program. The government was preparing to invest in moonshot projects across industry, academia, and the military. As two university professors who were working on the plan told the New York Times, AlphaGo versus Lee Sedol was China’s Sputnik moment.
JUST after the match in Wuzhen, Qi Lu joined Baidu. There, he did what he’d wanted to do at Microsoft: build a self-driving car. The company launched its project years after Google, but Lu was sure it would put cars on the road far faster than its American rival. This wasn’t because Baidu had better engineers or better technology. It was because Baidu was building its car in China. In China, government was closer to industry. As Baidu’s chief operating officer, he was working with five Chinese municipalities to remake their cities so that they could accommodate the company’s self-driving cars.
China’s other advantage, he said, was data. In each socioeconomic era, he liked to say, there was one primary means of production. In the agricultural era, it was about the land. “It doesn’t matter how many people you have. It doesn’t matter how brilliant you are. You cannot produce more if you do not have more land.” In the industrial era, it was about labor and equipment. In the new era, it was about data.
Google did not announce the project, and it asked that the DoD refrain from announcing the project, too. Even company employees would have to learn about Project Maven on their own.
In November, a team of nine Google engineers was assigned to build the software for this system, but they never did. Soon realizing what it was for, they refused to be involved in any way. After the New Year, as word of the project began to spread across the company, others voiced concerns that Google was helping the Pentagon make drone strikes. In February, the nine engineers told their story in a post sent across the company on its internal social network, Google+. Like-minded employees supported the stance and hailed these engineers as the Gang of Nine.
At Microsoft and Amazon, employees protested against military and surveillance contracts.15 But these protests weren’t nearly as effective. And even at Google, the groundswell eventually vanished. The company parted ways with most of those who had stood up against Maven, including Meredith Whittaker and Jack Poulson. Fei-Fei Li returned to Stanford. Though Google had dropped the contract, it was still pushing in the same direction. A year later, Kent Walker took the stage at an event in Washington alongside General Shanahan and said that the Maven contract was not indicative of the company’s
...more
That afternoon, the group overseeing the company’s AI research walked in alongside Mike Schroepfer, the chief technology officer, and Yann LeCun gave the presentation, detailing their work with image recognition, translation, and natural language understanding. Zuckerberg didn’t say much as he listened. Neither did Schroepfer. Then, when the presentation ended, the group walked out of the room, and Schroepfer lit into LeCun, telling him that nothing he’d said meant anything. “We just need something that shows we’re doing better than other companies,” he told LeCun. “I don’t care how you do it.
...more
Separating the real from the fake was a matter of opinion. If humans couldn’t agree on what was and what was not fake news, how could they train machines to recognize it? News was, inherently, a tension between objective observation and subjective judgment. “In many cases,”14 Pomerleau said, “there is no right answer.”
GARY Marcus came from a long line of thinkers who believe in the importance of nature, not just nurture. They’re called nativists, and they argue that a significant portion of all human knowledge is wired into the brain, not learned from experience. This is an argument that has spanned centuries of philosophy and psychology, running from Plato to Immanuel Kant to Noam Chomsky to Steven Pinker. The nativists stand in opposition to the empiricists, who believe that human knowledge comes mostly from learning. Gary Marcus studied under Pinker, the psychologist, linguist, and popular science
...more
LeCun was bemused by it all. As he told the audience at NYU, he agreed that deep learning alone could not achieve true intelligence,16 and he had never said it would. He agreed that AI would require innate machinery. After all, a neural network was innate machinery. Something had to do the learning. He was measured and even polite during the debate. But his tone changed online. When Marcus published his first paper questioning the future of deep learning, LeCun responded with a tweet: “The number of valuable recommendations ever made by Gary Marcus is exactly zero.”
BERT was what researchers call a “universal language model.” Several other labs, including the Allen Institute and OpenAI, had been working on similar systems. Universal language models are giant neural networks that learn the vagaries of language by analyzing millions of sentences written by humans. The system built by OpenAI analyzed thousands of self-published books, including romance, science fiction, and mysteries. BERT analyzed the same vast library of books as well as every article on Wikipedia, spending days poring over all this text with help from hundreds of GPU chips.
The story also quoted Gary Marcus saying that the public should be skeptical that these technologies will continue to improve so rapidly because researchers tend to focus on the tasks they can make progress on and avoid the ones they can’t. “These systems are still a really long way from truly understanding running prose,”20 he said. When Geoff Hinton read this, he was amused. The quote from Gary Marcus would prove useful, he said, because it could slot into any story written about AI and natural language for years to come. “It has no technical content so it will never go out of date,” Hinton
...more
Led by Wojciech Zaremba, the Polish researcher stolen from under Google and Facebook as OpenAI was founded, they’d spent more than two years working toward this eye-catching feat. In the past, many others had built robots that could solve a Rubik’s Cube. Some devices could solve it in less than a second. But this was a new trick. This was a robotic hand that moved like a human hand, not specialized hardware built solely for solving Rubik’s Cubes. Typically, engineers programmed behavior into robots with pains-taking precision, spending months defining elaborate rules for each tiny movement.
...more

