The Worlds I See: Curiosity, Exploration, and Discovery at the Dawn of AI
Rate it:
Open Preview
51%
Flag icon
This involved some circular reasoning, of course—if algorithms were capable of recognizing objects accurately enough to help us label them, then we wouldn’t need ImageNet in the first place.
51%
Flag icon
It made a kind of perverse if debatable sense, but we never got the balance right.
51%
Flag icon
Our goal was to embed unalloyed human perception in every image, in the hopes that a computer vision model trained on the complete set would be imbued with some similar spark of acumen.
52%
Flag icon
But the funds simply weren’t there. It was infuriating to think that after so much emotional investment, this would all come down to a question of money.
52%
Flag icon
Vision is blurrier at seventy miles an hour, but no less rich in content.
52%
Flag icon
The conference felt like the perfect excuse for an escape, and I looked forward to twelve hundred miles of blissfully monotonous driving that I could spend thinking about anything—anything—other than our work. I rented a van and filled it with a few students from the lab.
52%
Flag icon
“So Fei-Fei, now that you’ve got a lab of your own, what are you working on these days?” It was a question I was dreading, but it came from Jitendra—Pietro’s advisor and my “academic grandfather”—the person I was most hoping to run into.
52%
Flag icon
“Honestly, Jitendra, it’s a bit of a sore subject.”
52%
Flag icon
A frightening idea was beginning to sink in: that I’d taken a bigger risk than I realized, and it was too late to turn back.
52%
Flag icon
He’d entered the world of computer vision talented but naive, and he’d trusted me to guide him. Now, I could sense his frustration growing—justifiably—and I knew he was worried about his own path to a PhD.
53%
Flag icon
I was running late for a faculty meeting when Min, a master’s student, popped up in front of me.
53%
Flag icon
“I was hanging out with Jia yesterday,” he continued, “and he told me about your trouble with this labeling project. I think I have an idea you two haven’t tried yet—like, one that can really speed things up.”
53%
Flag icon
It was a clever name, taken from the original Mechanical Turk, an eighteenth-century chess-playing automaton that toured the world for years as both a marvel of engineering and a formidable opponent, even for experienced players. The device was actually a hoax; concealed in its base was a human chess master, who controlled the machine to the delight and bewilderment of its audiences.
53%
Flag icon
ImageNet owed the very possibility of its existence to so many converging technological threads: the internet, digital cameras, and search engines. Now crowdsourcing—delivered by a platform that had barely existed a year earlier—was providing the capstone. If I ever needed a reminder that the default position of any scientist should be one of absolute humility—an understanding that no one’s intellect is half as powerful as serendipity—this was it.
53%
Flag icon
At the peak of ImageNet’s development, we were among the largest employers on the AMT platform, and our monthly bills for the service reflected it.
54%
Flag icon
Although it wasn’t a world I’d aspired to join myself, I was impressed by the reach of Stanford’s influence on it, with companies like Hewlett-Packard, Cisco Systems, Sun Microsystems, Google, and so many others tracing their roots to the school.
54%
Flag icon
Princeton felt like home, but I couldn’t deny that Stanford seemed like an even more hospitable backdrop for my research.
54%
Flag icon
And so, in 2009, I made the decision to once again head west, with Jia and most of my students transferring along with me.
54%
Flag icon
In spite of the many challenges we’d faced along the way, we’d actually done it: fifteen million images spread across twenty-two thousand distinct categories, culled from nearly a billion candidates in total, and annotated by a global team of more than forty-eight thousand contributors hailing from 167 countries. It boasted the scale and diversity we’d spent years dreaming of, all while maintaining a consistent level of precision: each individual image was not just manually labeled, but organized within a hierarchy and verified in triplicate.
55%
Flag icon
But beyond the numbers lay the accomplishment that moved me most: the realization of a true ontology of the world, as conceptual as it was visual, curated from the ground up by humans for the sole purpose of teaching machines.
55%
Flag icon
We fielded the usual questions and enjoyed a handful of pleasant conversations but left with little to show for our presence. It was soon clear that whatever was in store for ImageNet—whether it would be embraced as a resource of uncommon richness or written off as folly—it wasn’t going to get a boost at CVPR. On the bright side, people seemed to like the pens.
55%
Flag icon
ImageNet was more than a data set, or even a hierarchy of visual categories. It was a hypothesis—a bet—inspired by our own biological origins, that the first step toward unlocking true machine intelligence would be immersion in the fullness of the visual world.
55%
Flag icon
ImageNet’s slide toward obscurity was beginning to feel so inevitable that I’d resorted to an impromptu university tour to counteract it, delivering live presentations wherever I could to lecture halls filled with skeptical grad students and postdocs.
56%
Flag icon
If image data sets can be thought of as the language of computer vision research—a collection of concepts that an algorithm and its developers can explore—ImageNet was a sudden, explosive growth in our vocabulary.
56%
Flag icon
It massively broadened the range of possibilities our algorithms might face, presenting challenges that smaller data sets didn’t.
56%
Flag icon
The PASCAL Visual Object Classes data set, generally known as PASCAL VOC, was a collection of about ten thousand images organized into twenty categories.
56%
Flag icon
The collective power of collaboration, energized by the pressure of competition.
57%
Flag icon
To ensure we didn’t declare a well-performing algorithm incorrect, each entry would be allowed to provide a rank-ordered list of five labels in total—making room for “strawberry” and “apple,” in this case—an evaluation metric we came to call the “top-5 error rate.” It encouraged submissions to intelligently hedge their bets, and ensured we were seeing the broadest, fairest picture of their capabilities.
57%
Flag icon
To be sure we were providing novel tests to the algorithms, we recapitulated much of ImageNet’s development process by downloading and labeling hundreds of thousands of new images, complete with yet another round of crowdsourced labeling.
57%
Flag icon
Along the way, Jia’s efforts were supported by a growing team that included newcomers like Olga Russakovsky, a smart, energetic grad student looking for something interesting to throw her weight behind.
57%
Flag icon
She was already a solid choice on intellectual grounds, but possessed a social adroitness that was rare in our department as well. I could tell she had the intellect to contribute to the project behind the scenes, but I began to wonder if, someday, she might tap into her natural savvy to represent it publicly as well.
57%
Flag icon
Support vector machines, random forests, boosting, even the Bayesian network Pietro and I employed in our one-shot learning paper would buckle under its weight, forcing us to invent something truly new. “I don’t think ImageNet will make today’s algorithms better,” I said. “I think it will make them obsolete.”
57%
Flag icon
Recognizing our lack of experience, not to mention ImageNet’s still-flagging name recognition, we reached out to Mark Everingham, a founding organizer of PASCAL VOC.
57%
Flag icon
It was a fitting continuation of the biological influence that drove the entire project. ImageNet was based on the idea that algorithms need to confront the full complexity and unpredictability of their environments—the nature of the real world. A contest would imbue that environment with true competitive pressures.
57%
Flag icon
The winning entrant, from a joint team composed of researchers at NEC Labs, Rutgers, and the University of Illinois, was an example of a support vector machine, or SVM—one of the algorithms I’d assumed ImageNet would overpower.
58%
Flag icon
We’d dedicated years of our lives to a data set that was orders of magnitude beyond anything that had ever existed, orchestrated an international competition to explore its capabilities, and, for all that, accomplished little more than simply reifying the status quo.
59%
Flag icon
Both facts freighted my offer with a seriousness I hadn’t appreciated until the words came out of my mouth. Silence. Then a sharp intake of breath. Faint, scratchy, and trembling. It couldn’t be what I thought it was. Is he … crying?
59%
Flag icon
By chance, a contact I’d made through a fellowship program connected me to the neurobiology department of a nearby university hospital. The next day, he was transferred to one of the most advanced care units in the state.
59%
Flag icon
Bob never realized his dream of being published in the sci-fi world, but he continued to write so prodigiously that he developed a habit of emailing me his personal journal entries at the end of each month.
59%
Flag icon
By August 2012, ImageNet had finally been dethroned as the topic keeping me awake at night. I’d given birth, and a new reality of nursing, diapers, and perpetually interrupted sleep had taken over my life.
60%
Flag icon
A twenty-first-century student using the word “ancient” to describe work from a couple of decades earlier was a testament to just how young our field was.
60%
Flag icon
Our world evolved fast, and by the 2010s, most of us saw the neural network—that biologically inspired array of interconnected decision-making units arranged in a hierarchy—as a dusty artifact, encased in glass and protected by velvet ropes.
60%
Flag icon
Dropping everything to attend ECCV had thrown my homelife into chaos, but Jia’s news didn’t leave much choice. And I had to admit that there was pretty significant upside to living with one’s parents when an infant needs last-minute babysitting.
60%
Flag icon
The winner was dubbed AlexNet, in homage to both the technique and the project’s lead author, University of Toronto researcher Alex Krizhevsky.
60%
Flag icon
AlexNet was an example of a convolutional neural network, or CNN. The name is derived from the graphical process of convolution, in which a series of filters are swept across an image in search of features corresponding to things the network recognizes. It’s a uniquely organic design, drawing inspiration from Hubel and Wiesel’s observation that mammalian vision occurs across numerous stages. As in nature, each layer of a CNN integrates further details into higher and higher levels of awareness until, finally, a real-world object comes fully into view.
61%
Flag icon
Rather than arbitrarily deciding in advance which features the network should look for, the authors allowed each of its hundreds of thousands of neurons to learn their own sensitivities gradually, exclusively from the training data, without manual intervention. Like a biological intelligence, AlexNet was a natural product of its environment. Next, signals from those thousands of receptive fields travel deep into the network, merging and clustering into larger, clearer hints.
61%
Flag icon
Finally, the few remaining signals that survived the trip through each layer, filtered and consolidated into a detailed picture of the object in question, collide with the final stage of the network: recognition.
61%
Flag icon
Yann LeCun had remained astonishingly loyal to convolutional neural networks in the years since his success applying them to handwritten ZIP codes at Bell Labs.
61%
Flag icon
The project was helmed by the eponymous Alex Krizhevsky and his collaborator, Ilya Sutskever, both of whom were smart but young researchers still building their reputations.
61%
Flag icon
The same Hinton who’d made his name as an early machine learning pioneer with the development of backpropagation in the mid-1980s, the breakthrough method that made it possible to reliably train large neural networks for the first time.