More on this book
Community
Kindle Notes & Highlights
Here we were, celebrating fractional reductions in classification error rates—about as shallow a perceptual achievement as could be imagined—while our own brains filled every moment with such fluent awareness of our world that its vibrancy had become all but invisible to us. Back in the 1970s, the researcher and mathematician Anatol Holt summed up this myopia by saying that AI was a technology that can make a perfect chess move while the room is on fire.
We see the world holistically, not just identifying but understanding its contents—their relationships, their meanings, their pasts, and their futures. The gist.
The fundamental unit of an image is the “pixel”—a now common term that began as a contraction of “picture element”—an almost imperceptible dot capturing the color at a single tiny point within a scene.
In contrast, the fundamental unit of a language like English, at least the way it’s spoken and written in everyday use, is the word. Unlike a pixel, words typically convey distinct meaning, even in isolation. And the full range of words is, although very large, finite.
An early jewel of this period was the recurrent neural network, or RNN. A family of algorithms tailor-made to match the linear sequences of words, RNNs were able to quickly infer basic properties of text, in much the same way that convolutional neural networks like AlexNet processed images. Like CNNs, RNNs had existed for decades, but it wasn’t until now that their true power was realized.
The growing family of neural networks gave vision, language, speech, and other forms of perception a shared algorithmic framework, inspiring labs like ours to blur the boundaries that separated them in the quest to achieve more integrated, human-like capabilities.
“Imagine pairing a CNN with an RNN,” he said as he took a seat on the couch. “One to encode visual information and pair it with words, and the other to generate language. We’ll train our model on pairs of images and human-written descriptions.”
Her, the Spike Jonze movie about a man who falls in love with his AI companion, was still fresh in the minds of most in attendance.
It was one of the last Fridays before the winter break, and I was attending my new favorite event: a twice-monthly closed-door gathering for SAIL students and faculty called “AI Salon,” which provided a venue for topical conversations about our field.
Today we were discussing Superintelligence, a provocative tome by Oxford philosopher Nick Bostrom exploring the future of AI.
The almost magical combination AlexNet demonstrated—large-scale data sets, high-speed GPUs, and deeply layered neural networks—was a blueprint destined for mass adoption in fields far beyond ours.
This wasn’t simply the age of machine learning, but, in a nod to the ever more lavishly layered networks being built in labs across the world, deep learning. It was the birth of an entirely new paradigm, much as the early years of the twentieth century were for physics.
Towering, logo-emblazoned booths bathed in colored lighting were commonplace within a matter of years, and companies like Google and Microsoft held extravagant parties for grad students considering career options.
The research world is famously open, sometimes to a fault; apart from the bragging rights of discovering something first, our work isn’t generally treated like intellectual property, let alone something confidential, like a trade secret.
She had the intellect to keep up, but science for its own sake was never her style. She thought in terms of stories and characters. Passions and conflict.
And in a way, all the technology built for them over the last few decades has made things worse, because now they’re overwhelmed with information, too.
By focusing our attention on the behavior of caregivers, rather than patients, we could avoid some of the trickier complexities of medical research when people undergoing treatment are involved.
Within what felt like an hour or two of our conversation, he was texting me with updates that read like achievements unto themselves: calling in favors, arranging meetings with decision makers, and securing hospital access.
“What I’m wondering, though, is whether I’ll still get to publish in the usual venues.
Given how much an academic career depends on publication, especially in the early years, it was a good question.
I mean, computer science papers normally take like a few months.”
“Sure, it’d be great to automate more of my day. Whatever. I get it,” he continued. “But I’m a little tired of tech executives talking about putting people like me out of a job. You and Arnie are the only ones who actually seem to want to help me, rather than replace me.”
I appreciate the role technology is playing in keeping us all alive these days, but it’s not an exaggeration to say that the real reason my mother and I have made it through it all is people.
I’d have inundated them with a computer scientist’s reading list, conditioning them to think in terms of data, neural networks, and the latest architectural advances.
And while it was clear that ambient intelligence would remain a niche among researchers for some time—demand for AI expertise was simply too strong, and competing opportunities too lavish—the caliber of our recruits suggested we were onto something.
Arnie saved his most impressive feat for last: persuading real organizations to let us demonstrate our technology on their premises.
Patients and clinicians were rarely recorded to begin with, for obvious reasons ranging from legal liabilities to basic privacy, and clear depictions of the events we wished to detect—many of which were outliers to begin with, like falls—were even rarer.
It was 2015, and the privacy implications of AI were still coming into focus for most of us; it’d only been a few years, after all, since the accuracy of image classification even approached a useful threshold to begin with. Now, in what felt like the blink of an eye, researchers like us were flirting with capabilities of such power that the technical challenges were giving way to ethical ones.
The IRB, or Institutional Review Board, is the governing body that oversees clinical research like ours. Navigating their expectations to ensure a study is approved requires finesse and a kind of diplomatic savvy, not to mention deep clinical experience.
Instead, we had to pursue another emerging trend: “edge computing,” in which all necessary computational resources are packed into the device.
He turned to wave at another man across the room, decked out in what looked like the exact same jeans and fleece pullover.
And although I hesitate to liken it too explicitly to a living organism (our field’s history is replete with attempts at anthropomorphization that are more misleading than insightful), it had undeniably evolved into something new.
It’d simply become a brute fact that university labs, once the alpha and omega of AI research, were not the only institutions advancing the frontier. We shared a crowded landscape with tech giants like Google, Microsoft, and Facebook, start-ups all over the world, a voracious network of venture capitalists, and even software developers in the open-source community, whether they were sharing code on platforms like GitHub or discussing the latest developments on forums like Reddit.
The simple act of adding more layers, however, wasn’t a panacea—deeper networks demonstrated higher and higher accuracy scores at first, but soon reached a point of diminishing returns. As our ambitions pushed us to build bigger and bigger, we inadvertently turned neural networks into labyrinths, their excessive layering corrupting the signal along the journey from one end of the network to the other, halting the training process in its tracks and rendering the system useless.
Fittingly, that innovation came later in 2015, when the Deep Residual Network, a submission led by a young Microsoft researcher named Kaiming He, changed the game yet again. Nicknamed “ResNet” for short, it was enormous—a staggering 152 layers—but employed an architectural twist whereby some of those layers could be bypassed during the training phase, allowing different images to direct their influence toward smaller subregions of the network. Although the fully trained system would eventually put all its depth to use, no single training example was obliged to span its entirety. The result
...more
The year before, Google had gone on an AI start-up acquisition spree, with DeepMind the most expensive of its purchases by a wide margin at more than half a billion dollars.
For all the fancy talk about modeling a combinatorially incomprehensible game of strategy, something as simple as preparing a bottle of baby formula and setting it down in the warmer was still a roboticist’s holy grail—and far from a solved problem outside of tightly controlled laboratory conditions.
If waiting months for the peer review process to run its course was asking too much, was it any surprise that textbooks written years ago, if not entire generations ago, were falling by the wayside?
But despite a faculty offer from Princeton straight out of the gate—a career fast track any one of our peers would have killed for—he was choosing to leave academia altogether to join a private research lab that no one had ever heard of. OpenAI was the brainchild of Silicon Valley tycoons Sam Altman, Elon Musk, and LinkedIn CEO Reid Hoffman, built with an astonishing initial investment of a billion dollars.
The sentiment, delivered without even a hint of mirth, was icy in its clarity: the future of AI would be written by those with corporate resources.
It was the inevitable outcome of what journalist and commentator Jack Clark called AI’s “Sea of Dudes” problem: that the tech industry’s lack of representation was leading to unintentionally biased algorithms that perform poorly on nonwhite, nonmale users.
Ten years before, the explosion of content organized by the Googles and Wikipedias of the world seemed to offer a window into human life as it truly is, as opposed to the provincial glimpses found in legacy media like TV networks and newspapers. And in a way, of course, they did.
The two-week AI crash course that followed, although intense, demonstrated to everyone in attendance that it takes surprisingly little to convince the historically excluded that they belong, too.
In just a few years, the initiative was officially branded AI4ALL, and it even attracted some capital, with a transformative round of funding coming from Melinda French Gates’s Pivotal Ventures and Nvidia founder Jensen Huang.
Olga, who’d been offered a professorship at Princeton and took it, set about expanding her new lab’s research agenda from the mechanics of machine perception to the larger issue of fairness in computing, including a special emphasis on “debiasing”: a formal, mathematically rigorous attempt to quantify and neutralize the bias lurking in our data.
“Phenomenon” was too passive. “Disruption” too brash. “Revolution” too self-congratulatory. Modern AI was revealing itself to be a puzzle, and one whose pieces bore sharp edges.
A wide range of parameters define how such models behave, governing trade-offs between speed and accuracy, memory and efficiency, and other concerns.
Training even a single model was still cost-prohibitive for all but the best-funded labs and companies—and neural architecture search entailed training thousands.
More colleagues than I could count had made the transition themselves, and even my students were taking breaks from their degrees for high-paid sojourns at tech firms all over the world—and not always coming back.
I could no longer pretend a job in the private sector was some cynical bribe to abandon the lab. These days, it was an invitation to run an even bigger one.

