Christopher John’s Kindle Notes & Highlights for Unmasking AI: My Mission to Protect What Is Human in a World of Machines

Looking at how machine learning models are being developed, we see the impact of what I first heard Kate Crawford term the “politics of classification.” Having the ability to define classification systems is in itself a power. The choices that are made are influenced by cultural, political, and economic factors, and while these classifications don’t have to be based on definite distinctions, they still have an impact on individual lives and societal attitudes. Despite the power shadows inherent in classification systems, often those systems go unchallenged. Instead, they are used as shorthand. ...more

33%

I found it interesting to see African countries that I’d been socialized to think of as being behind in the world leading on gender representation. Other explorations I had for potential datasets included looking at Olympic teams. Like the UN, the Olympics brought together a world of nations, yet I also had to attend to the fact that elite athletes were not exactly representative of the typical population in ability, physical form, or age. In some ways an Olympic dataset would bring the same concerns as a celebrity dataset.

33%

I reached out to the Boston University Technology Law Clinic for help. They looked at the copyright laws for all the countries I targeted and assessed that as long as I did not redistribute the dataset for profit and kept its use for research purposes I should be clear and within the realm of fair use. Nevertheless, lawful use did not overcome the basic fact that I would be using images of people’s faces without their consent to do this research, unless I could somehow obtain the consent of the 1,270 individuals who would be included in the dataset. Despite my struggle with this question, ...more

34%

When we look at proclamations about the performance of AI systems, we must have insight into the types of tests that were used to reach the conclusion, the data collection methods, the labels used, and who was involved in decision-making. Yet as I went through each face in my Pilot Parliaments Benchmark to determine gender and skin type, the subjectivity of the exercise became more apparent. We cannot ignore this subjectivity as more AI systems are introduced into our lives, and we need to push for high standards for AI deployers to support the claims they make about what their systems can do. ...more

35%

Shortly after Magic Avatars gained popularity, some women started noticing that the avatars generated for them included hypersexualized images. Some women received avatars with their likeness depicted on scantily clad bodies or completely topless. Melissa Heikkilä wrote for the MIT Technology Review, “My avatars were cartoonishly pornified, while my male colleagues got to be astronauts, explorers, and inventors.”

35%

Storing and distributing images and videos of child pornography is a criminal offense in the United States. Who is culpable if an AI system produces sexualized images of an adult when they upload childhood photos? What stops a nefarious actor from intentionally uploading the image of a child to get sexualized images? At the time of writing there are no laws that ban the generation of sexualized images of children from AI systems. Governments and companies have a responsibility to do more.

35%

Images uploaded to the internet for one purpose are often repurposed without explicit consent. Artists who already struggle to make a living based on their creative practice have expressed alarm at the use of their work to fuel generative AI systems.

35%

By the time someone finds out their images have been used, the company has already trained the AI system, so deleting the image doesn’t delete its contribution to training the AI system to do a specific task. Even if the image is deleted, the many copies of the dataset already made still contain the images. This is why when Meta (then Facebook) announced the deletion of nearly 1 billion faceprints, there was pushback alongside the celebration. The celebration was around the fact that a major tech company deleting the faceprints was an acknowledgment of the risks associated with face-based ...more

This highlight has been truncated due to consecutive passage length restrictions.

36%

We need deep data deletion, which consists of fully deleting data derived from user uploads and the AI models trained with this data. Commercial AI products should be built only on explicitly consented data sources. We should mandate deep data deletions to prevent the development of AI harms that stem from the collection of unconsented data. Companies like Clearview AI, which scraped billions of photos of people’s faces uploaded to public social media platforms, was fined 20 million euros by the Italian Supervisory Authority and ordered to delete the biometrics data of “persons in the Italian ...more

36%

many online systems are linked to tech giants like Google and Amazon. For the Lensa app, the face data is processed in part using Amazon Web Services (AWS). AWS powers a significant part of the internet and is part of critical internet infrastructure. Companies like Netflix and Zoom are reliant on AWS.[15] Thus you have to think not just about what one company like Prisma Labs can do with your data, but also what interlinked companies that run the computing systems to process your data can do. There is a daisy chain of companies involved with internet and app-based AI services.

36%

The responsibility of preventing harms from AI lies not with individual users but with the companies that create these systems, the organizations that adopt them, and the elected officials tasked with the public interest. What we can do as individuals is share our stories, document harms, and demand that our dignity be a priority, not an afterthought. We can think twice before participating in AI trends like stylized profile images, and we can support organizations that put pressure on companies and policymakers to prevent AI harms.

37%

The classification systems I or other machine learning practitioners select, modify, inherit, or expand to label a dataset are a reflection of subjective goals, observations, and understandings of the world. These systems of labeling circumscribe the world of possibilities and experience for a machine learning model, which is also limited by the data available. For example, if you decide to use binary gender labels—male and female—and use them on a dataset that includes only the faces of middle-aged white actors, the system is precluded from learning about intersex, trans, or nonbinary ...more

38%

I stopped saying I was labeling gender and started saying I was labeling perceived gender, which from my experience of doing the labeling was a more apt description. Perceived gender introduced the notion that someone was doing the perceiving, and this perception might not be the so-called truth of the matter.

38%

In addition to applying binary gender labels, I also hand-labeled each face with what I considered to be the appropriate Fitzpatrick skin type on a six-point scale. Because there was already a precedent for applying race and ethnic labels, I felt if I used the familiar labels, I would be grappling with widely discussed topics about racial and ethnic discrimination. Instead, I was applying skin type labels from the Fitzpatrick scale, and I hesitated, because from personal experience I knew how skin color was often used for discrimination. While many talk about racism (discrimination and ...more

This highlight has been truncated due to consecutive passage length restrictions.

39%

The subject of colorism can be taboo, as some see it as divisive in the push for racial justice. The cruelty of colorism is that it recapitulates social rejection and exclusion based on race into a hierarchy based on skin color. Just as a white scholar might shy away from talking about racism and the ways in which she benefits from systemic racism, not many Black scholars who have investigated race and technology have focused on colorism. I wondered if that lack was because some of the leading Black voices, on the privileged end of colorism, did not see it as a topic worthy of discussion, were ...more

39%

The Fitzpatrick scale was based not on color alone, but on how skin responds to UV radiation from sunlight. This factor meant that skin type, while linked to skin color, had more elements at play, namely different kinds of melanin cells. Melanin comes in three flavors: eumelanin, pheomelanin, and neuromelanin. Eumelanin and pheomelanin affect skin color and hair color, while neuromelanin affects the color of the brain. Descriptions of skin type in the Fitzpatrick scale include these: “Type I skin—burns easily, light colored eyes, green/blue.” “Type VI, skin never burns.”

40%

I noticed that as more people labeled the dataset, the skin itself was not the deciding factor. Instead, facial features, national origin, and perceived ethnicity also played into whether someone on the borderline of classification would be placed in a lighter or a darker category. If the person appeared as South Asian they were labeled as Type IV; if they had lighter perceived skin but were in an African parliament and not phenotypically white, they were placed in a darker category. Like gender, the cases that defied clear-cut classification were those that exposed our assumptions most ...more

40%

We cannot assume that just because something is data driven or processed by an algorithm it is immune to bias. Labels and categories we may take for granted need to be interrogated. The more we know about the histories of racial categorization, the more we learn about how a variety of cultures approach gender, the deeper we dive into the development of scientific scales like the Fitzpatrick scale, the easier it is to see the human touch that shapes AI systems. Instead of erasing our fingerprints from the creation of algorithmic systems, exposing them more clearly gives us a better ...more

40%

In the land of these giants, I had to navigate carefully. Funding for AI research at leading universities was often sponsored by these companies. It was not lost on me that the Stata Center at MIT, which housed the Computer Science Artificial Intelligence Lab, had a Gates wing named after the founder of Microsoft. Some companies, like Google, IBM, and Microsoft, provided fellowships that would pay for or significantly subsidize the cost of completing graduate programs in computer science. After finishing a computer science degree, top students found competitive and compelling job offers from ...more

This highlight has been truncated due to consecutive passage length restrictions.

41%

All companies overall performed better on male faces than female faces. Microsoft had the smallest accuracy gap with an 8.1 percent difference. Face++ had the largest gap with a difference of 20.6 percent.

41%

Though the dataset was labeled by six skin types, to account for off-by-one errors, I grouped skin types I–III, which covered mainly the parliament members from Iceland, Finland, and Sweden, as the lighter-skinned group. I categorized skin types IV–VI as the darker-skinned group, which covered many of the parliament members in Senegal, Rwanda, and South Africa. Taking this step of phenotypic analysis at the time was a significant contribution to the field of computer vision. Phenotypic analysis demonstrated not just additional ways of understanding gender classifiers like the ones I was ...more

41%

The results of the phenotypic analysis showed that overall, Microsoft, IBM, and Face++ did better on lighter-skinned faces than darker-skinned faces when it came to guessing the gender of a face. If we were to stop at this level of single-axis analysis—that is, looking at gender in isolation and skin type in isolation—the assumption would be made that when it came to gender, regardless of skin type, systems performed better on men than women. Conversely, without digging deeper we might also assume that when it came to skin type, performance would be better on lighter skin than darker skin ...more

42%

What accuracy doesn’t reveal are questions around failure. When a system fails, how are the errors distributed? We should not assume equal distribution of errors. Beyond just looking at the accuracies of a system, we can also learn more about performance by looking at the kinds of errors that are made. For error analysis, I took it one step further and looked at the results by each skin type, revealing even worse performance. When I looked at women with the darkest skin on the Fitzpatrick skin type scale, I found that for female-labeled faces of type VI, error rates were up to 46.8 percent. To ...more

42%

I paid a visit to Hal Abelson. “Your calculations do show differences in accuracy, but what’s the harm? Why should people care?” Hal’s questions lingered with me. “Is being misclassified harmful?” “Does it matter if a parliament member is misgendered?” Maybe the example I had chosen was not compelling enough, or maybe having a committee of all men made them less sensitive to the implications of misgendering women? To me this was akin to announcing, “She was hit by a brick; look at the evidence” and then being asked, “How do we know being hit by a brick is harmful?” I could not assume everyone ...more

42%

I was frustrated because results that I thought were substantial now felt diminished. Maybe I was making too big a deal of what I had uncovered. Or maybe this was part of the academic rite of passage that required you to defend the importance of your work to a skeptical audience of senior academics. I lobbed back questions of my own to Hal and the rest of the committee. “Should I show the faces of those who are mislabeled?” None of the committee saw an issue, but showing the mislabeled faces directly didn’t sit well with me. It felt like I would be framing an insult that had been applied to ...more

42%

Over the summer, Hal was teaching me that I could not assume everybody would interpret gender classification or misclassification as being harmful. He was pushing me to provide more context and articulate why the work mattered. He was also preparing me for the skepticism that others in the computer science community would have. At the same time, I was not doing the work for the computer science community alone, but for those who could find themselves on the wrong side of a label. Hal guided me to emphasize that the techniques used for gender classification were being used in other areas of ...more

43%

Now with an accepted paper, I began sharing the results with all the companies. I emailed IBM’s Ruchir as well as Microsoft and Face++ representatives, since their companies were all implicated by the paper. I heard back from Ruchir almost immediately. He wanted to know who the competitors were. I had sent the paper results with all the findings, but instead of including the company names I used companies A, B, and C. They would have to wait until the conference to know for sure who their competitors were.

43%

I presented the “Gender Shades” work to Ruchir Puri, IBM chief scientist; Francesca Rossi, head of AI and Ethics; Anna Sekaran, head of communications; and members of the computer vision team, who sat nervously as I advanced through the slides showing performance results. “Who are the other companies?” they queried. “You are going to have to wait and see…” After the formal presentation, I enjoyed a lunch with the team. Again I was asked, “Who are the other companies?” The meal was good, but it wasn’t that good. “You will have to wait and see.” Later that evening I went out for a truly ...more

43%

After the NYC meeting, I had a few meetings in the IBM Boston office. The team let me know that they had developed a new model and wanted me to share the results at the conference. I insisted that I would have to test their new model at my office, so they took the trip to the Media Lab and met me right around the corner from the Foodcam. Their dress clothes gave corporate vibes as they walked past the ping-pong table in the atrium. Sitting next to my LEGOs, surrounded by grad school artifacts and the white mask, the IBM team members shared their model with me. The numbers did in fact look ...more

This highlight has been truncated due to consecutive passage length restrictions.

46%

the assumptions I’d made in Cordova did not quite hold up in Kombolcha. At the time, the Google Android tablets I was programming did not have Amharic keyboards, yet most of the people who were being surveyed and the health workers needed to use Amharic. Our project team developed a custom Amharic keyboard for the Android tablets that had to be loaded onto every new device. Defaults are not neutral. Years later I met some Android team members and asked why keyboards were not available in Amharic then. The answer was business economics. To Google, Ethiopia was not a priority market. In the age ...more

47%

My problem with effective altruism is that the approach entrenches the status quo. Supporting exclusive charities sidesteps addressing the issues that led to the rise of charities in the first place and does not require changing existing power relations or company practices.

47%

Longtermists follow a tradition of showing concern for future descendants. Many ancient cultures have emphasized the importance of taking care of planet Earth in order that future generations can breathe clean air and drink from the bounty of nature. However, safeguarding future generations means taking care of present and addressable dangers.

47%

The term “x-risk” is used as a shorthand for the hypothetical existential risk posed by AI. While my research supports why AI systems should not be integrated into weapons systems because of the lethal dangers, this isn’t because I believe AI systems by themselves pose an existential risk as superintelligent agents. AI systems falsely classifying individuals as criminal suspects, robots being used for policing, and self-driving cars with faulty pedestrian tracking systems can already put your life in danger. Sadly, we do not need AI systems to have superintelligence for them to have fatal ...more

48%

Though it is tempting to view physical violence as the ultimate harm, doing so makes it easy to forget pernicious ways our societies perpetuate structural violence. Johan Galtung coined this term to describe how institutions and social structures prevent people from meeting their fundamental needs and thus cause harm. Denial of access to healthcare, housing, and employment through the use of AI perpetuates individual harms and generational scars. AI systems can kill us slowly.

48%

people who would never be invited to a dinner like this, I doubted I would be receiving any future private dinner invitations. I thought about the excoded—people being harmed now and those who are at risk of harm by AI systems. When I think of x-risk, I also think of the risk and reality of being excoded. You can be excoded when a hospital uses AI for triage and leaves you without care, or uses a clinical algorithm that precludes you from receiving a life-saving organ transplant.[6] You can be excoded when you are denied a loan based on algorithmic decision-making.[7] You can be excoded when ...more

See a Problem?

Preview — Unmasking AI by Joy Buolamwini