More on this book
Community
Kindle Notes & Highlights
by
Cathy O'Neil
Read between
June 11 - July 8, 2022
Here we see that models, despite their reputation for impartiality, reflect goals and ideology. When I removed the possibility of eating Pop-Tarts at every meal, I was imposing my ideology on the meals model. It’s something we do without a second thought. Our own values and desires influence our choices, from the data we choose to collect to the questions we ask. Models are opinions embedded in mathematics.
Even if other tools supplant LSI–R as its leading WMD, the prison system is likely to be a powerful incubator for WMDs on a grand scale. So to sum up, these are the three elements of a WMD: Opacity, Scale, and Damage.
Newcomers were required to be on call every thirteen weeks in the futures group. This meant being ready to respond to computer problems whenever any of the world’s markets were open, from Sunday evening our time, when the Asian markets came to life, to New York’s closing bell at 4 p.m. on Friday. Sleep deprivation was an issue. But worse was the powerlessness to respond to issues in a shop that didn’t share information. Say an algorithm appeared to be misbehaving. I’d have to locate it and then find the person responsible for it, at any time of the day or night, and tell him (and it was always
...more
In both cultures, wealth is no longer a means to get by. It becomes directly tied to personal worth. A young suburbanite with every advantage—the prep school education, the exhaustive coaching for college admissions tests, the overseas semester in Paris or Shanghai—still flatters himself that it is his skill, hard work, and prodigious problem-solving abilities that have lifted him into a world of privilege. Money vindicates all doubts. And the rest of his circle plays along, forming a mutual admiration society. They’re eager to convince us all that Darwinism is at work, when it looks very much
...more
In most disciplines, the analysis feeding a model would demand far more rigor. In
Education benefited from the focus. He admitted that the most relevant data—what the students had learned at each school—was inaccessible. But the U.S. News model, constructed from proxies, was the next best thing. However, when you create a model from proxies, it is far simpler for people to game it. This is because proxies are easier to manipulate than the complicated reality they represent. Here’s an example. Let’s say a website is looking to hire a social media maven. Many people apply for the job, and they send information about the various marketing campaigns they’ve run. But it takes
...more
A California-based entrepreneur, Steven Ma, takes this market-based approach to an extreme. Ma, founder of ThinkTank Learning, places the prospective students into his own model and calculates the likelihood that they’ll get into their target colleges. He told Bloomberg BusinessWeek, for example, that an American-born senior with a 3.8 GPA, an SAT score of 2000, and eight hundred hours of extracurricular activities had a 20.4 percent shot of getting into New York University, and a 28.1 percent chance at the University of Southern California. ThinkTank then offers guaranteed consulting
...more
So the government capitulated. And the result might be better. Instead of a ranking, the Education Department released loads of data on a website. The result is that students can ask their own questions about the things that matter to them—including class size, graduation rates, and the average debt held by graduating students. They don’t need to know anything about statistics or the weighting of variables. The software itself, much like an online travel site, creates individual models for each person. Think of it: transparent, controlled by the user, and personal. You might call it the
...more
The top 20 percent of the population controls 89 percent of the wealth in the country, and the bottom 40 percent controls none of it.
From time to time, people ask me how to teach ethics to a class of data scientists. I usually begin with a discussion of how to build an e-score model and ask them whether it makes sense to use “race” as an input in the model. They inevitably respond that such a question would be unfair and probably illegal. The next question is whether to use “zip code.” This seems fair enough, at first. But it doesn’t take long for the students to see that they are codifying past injustices into their model. When they include an attribute such as “zip code,” they are expressing the opinion that the history
...more
As I write this, ten states have passed legislation to outlaw the use of credit scores in hiring. In banning them, the New York City government declared that using credit checks “disproportionately affects low-income applicants and applicants of color.” Still, the practice remains legal in forty states.
The housing authority knew which Catherine Taylor it was dealing with. The question we’re left with is this: How many Wanda Taylors are out there clearing up false identities and other errors in our data? The answer: not nearly enough. Humans in the data economy are outliers and throwbacks. The systems are built to run automatically as much as possible. That’s the efficient way; that’s where the profits are. Errors are inevitable, as in any statistical program, but the quickest way to reduce them is to fine-tune the algorithms running the machines. Humans on the ground only gum up the works.
everything from shades of color to the distance between
That may be true. But punctuation and spelling mistakes also point to low education, which is highly correlated with class and race. So when poor people and immigrants qualify for a loan, their substandard language skills might drive up their fees. If they then have trouble paying those fees, this might validate that they were a high risk to begin with and might further lower their credit scores. It’s a vicious feedback loop, and paying bills on time plays only a bit part.
Insurers draw these scores from credit reports, and then, using the insurer’s proprietary algorithm, create their own ratings, or e-scores. These are proxies for responsible driving. But Consumer Reports found that the e-scores, which include all sorts of demographic data, often count for more than the driver’s record. In other words, how you manage money can matter more than how you drive a car. In New York State, for example, a dip in a driver’s credit rating from “excellent” to merely “good” could jack up the annual cost of insurance by $255. And in Florida, adults with clean driving
...more
It gets worse. In a filing to the Wisconsin Department of Insurance, the CFA listed one hundred thousand microsegments in Allstate’s pricing schemes. These pricing tiers are based on how much each group can be expected to pay. Consequently, some receive discounts of up to 90 percent off the average rate, while others face an increase of 800 percent. “Allstate’s insurance pricing has become untethered from the rules of risk-based premiums and from the rule of law,” said J. Robert Hunter, CFA’s director of insurance and the former Texas insurance commissioner. Allstate responded that the CFA’s
...more
yellow, blue, and green. The tribes were emerging.
In 1943, at the height of World War II, when the American armies and industries needed every troop or worker they could find, the Internal Revenue Service tweaked the tax code, granting tax-free status to employer-based health insurance. This didn’t seem to be a big deal, certainly nothing to rival the headlines about the German surrender in Stalingrad or Allied landings on Sicily. At the time, only about 9 percent of American workers received private health coverage as a job benefit. But with the new tax-free status, businesses set about attracting scarce workers by offering health insurance.
...more
As I write this, about two-thirds of American adults have a profile on Facebook. They spend thirty-nine minutes a day on the site, only four minutes less than they dedicate to face-to-face socializing. Nearly half of them, according to a Pew Research Center report, count on Facebook to deliver at least some of their news, which leads to the question: By tweaking its algorithm and molding the news we see, can Facebook game the political system?
run by Michael Slaby, the chief technology officer of Obama’s 2012 campaign. The goal, according
Successful microtargeting, in part, explains why in 2015 more than 43 percent of Republicans, according to a survey, still believed the lie that President Obama is a Muslim. And 20 percent of Americans believed he was born outside the United States and, consequently, an illegitimate president. (Democrats may well spread their own disinformation in microtargeting, but nothing that has surfaced matches the scale of the anti-Obama campaigns.)
But even television is moving toward personalized advertising. New advertising companies like Simulmedia, in New York, assemble TV viewers into behavioral buckets, so that advertisers can target audiences of like-minded people, whether hunters, pacifists, or buyers of tank-sized SUVs. As television and the rest of the media move toward profiling their viewers, the potential for political microtargeting grows. As this happens, it will become harder to access the political messages our neighbors are seeing—and as a result, to understand why they believe what they do, often passionately. Even a
...more
The result of these subterranean campaigns is a dangerous imbalance. The political marketers maintain deep dossiers on us, feed us a trickle of information, and measure how we respond to it. But we’re kept in the dark about what our neighbors are being fed. This resembles a common tactic used by business negotiators. They deal with different parties separately so that none of them knows what the other is hearing. This asymmetry of information prevents the various parties from joining forces—which is precisely the point of a democratic government. This growing science of microtargeting, with
...more
As I write this, the entire voting population that matters lives in a handful of counties in Florida, Ohio, Nevada, and a few other swing states. Within those counties is a small number of voters whose opinions weigh in the balance. I might point out here that while many of the WMDs we’ve been looking at, from predatory ads to policing models, deliver most of their punishment to the struggling classes, political microtargeting harms voters of every economic class. From Manhattan to San Francisco, rich and poor alike find themselves disenfranchised (though the truly affluent, of course, can
...more
Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit.
Following the market crash of 2008, two financial engineers, Emanuel Derman and Paul Wilmott, drew up such an oath. It reads: ~ I will remember that I didn’t make the world, and it doesn’t satisfy my equations. ~ Though I will use models boldly to estimate value, I will not be overly impressed by mathematics. ~ I will never sacrifice reality for elegance without explaining why I have done so. ~ Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights. ~ I understand that my work may have enormous effects on
...more
Not all of them would turn out to be nefarious. Following the 2012 presidential election, for example, ProPublica built what it called a Message Machine, which used crowdsourcing to reverse-engineer the model for the Obama campaign’s targeted political ads. Different groups, as it turned out, heard glowing remarks about the president from different celebrities, each one presumably targeted for a specific audience. This was no smoking gun. But by providing information and eliminating the mystery behind the model, the Message Machine reduced (if only by a tad) grounds for dark rumors and
...more
As we discussed in the chapter on credit scores, the civil rights laws referred to as the Fair Credit Reporting Act (FCRA) and the Equal Credit Opportunity Act (ECOA) were meant to ensure fairness in credit scoring. The FCRA guarantees that a consumer can see the data going into their score and correct any errors, and the ECOA prohibits linking race or gender to a person’s score.
If we want to bring out the big guns, we might consider moving toward the European model, which stipulates that any data collected must be approved by the user, as an opt-in. It also prohibits the reuse of data for other purposes. The opt-in condition is all too often bypassed by having a user click on an inscrutable legal box. But the “not reusable” clause is very strong: it makes it illegal to sell user data. This keeps it from the data brokers whose dossiers feed toxic e-scores and microtargeting campaigns. Thanks to this “not reusable” clause, the data brokers in Europe are much more
...more
Finally, models that have a significant impact on our lives, including credit scores and e-scores, should be open and available to the public. Ideally, we could navigate them at the level of an app on our phones. In a tight month, for example, a consumer could use such an app to compare the impact of unpaid phone and electricity bills on her credit score and see how much a lower score would affect her plans to buy a car. The technology already exists. It’s only the will we’re lacking.