More on this book
Community
Kindle Notes & Highlights
by
Cathy O'Neil
Read between
September 24 - September 28, 2021
Statisticians count on large numbers to balance out exceptions and anomalies. (And WMDs, as we’ll see, often punish individuals who happen to be the exception.)
In fact, there are plenty of responsible people and good workers who suffer misfortune and see their credit scores fall. But the belief that bad credit correlates with bad job performance leaves those with low scores less likely to find work. Joblessness pushes them toward poverty, which further worsens their scores, making it even harder for them to land a job. It’s a downward spiral. And employers never learn how many good employees they’ve missed out on by focusing on credit scores.
This underscores another common feature of WMDs. They tend to punish the poor. This is, in part, because they are engineered to evaluate large numbers of people. They specialize in bulk, and they’re cheap. That’s part of their appeal. The wealthy, by contrast, often benefit from personal input. A white-shoe law firm or an exclusive prep school will lean far more on recommendations and face-to-face interviews than will a fast-food chain or a cash-strapped urban school district. The privileged, we’ll see time and again, are processed more by people, the masses by machines.
An algorithm processes a slew of statistics and comes up with a probability that a certain person might be a bad hire, a risky borrower, a terrorist, or a miserable teacher. That probability is distilled into a score, which can turn someone’s life upside down. And yet when the person fights back, “suggestive” countervailing evidence simply won’t cut it. The case must be ironclad.
The human victims of WMDs, we’ll see time and again, are held to a far higher standard of evidence than the algorithms themselves.
Ill-conceived mathematical models now micromanage the economy, from advertising to prisons.
When they’re building statistical systems to find customers or manipulate desperate borrowers, growing revenue appears to show that they’re on the right track. The software is doing its job. The trouble is that profits end up serving as a stand-in, or proxy, for truth. We’ll see this dangerous confusion crop up again and again.
“Moneyball” is now shorthand for any statistical approach in domains long ruled by the gut.
This may sound obvious, but as we’ll see throughout this book, the folks building WMDs routinely lack data for the behaviors they’re most interested in. So they substitute stand-in data, or proxies.
A model, after all, is nothing more than an abstract representation of some process, be it a baseball game, an oil company’s supply chain, a foreign government’s actions, or a movie theater’s attendance. Whether it’s running in a computer program or in our head, the model takes what we know and uses it to predict responses in various situations.
There would always be mistakes, however, because models are, by their very nature, simplifications. No model can include all of the real world’s complexity or the nuance of human communication. Inevitably, some important information gets left out.
To create a model, then, we make choices about what’s important enough to include, simplifying the world into a toy version that can be easily understood and from which we can infer important facts and actions. We expect it to handle only one job and accept that it will occasionally act like a clueless machine, one with enormous blind spots.
Our own values and desires influence our choices, from the data we choose to collect to the questions we ask. Models are opinions embedded in mathematics.
The most common model for detecting fires in a home or office weighs only one strongly correlated variable, the presence of smoke. That’s usually enough. But modelers run into problems—or subject us to problems—when they focus models as simple as a smoke alarm on their fellow humans.
Racism, at the individual level, can be seen as a predictive model whirring away in billions of human minds around the world. It is built from faulty, incomplete, or generalized data. Whether it comes from experience or hearsay, the data indicates that certain types of people have behaved badly. That generates a binary prediction that all people of that race will behave that same way.
Needless to say, racists don’t spend a lot of time hunting down reliable data to train their twisted models. And once their model morphs into a belief, it becomes hardwired. It generates poisonous assumptions, yet rarely tests them, settling instead for data that seems to confirm and fortify them. Consequently, racism is the most slovenly of predictive models. It is powered by haphazard data gathering and spurious correlations, reinforced by institutional inequities, and poll...
This highlight has been truncated due to consecutive passage length restrictions.
Opaque and invisible models are the rule, and clear ones very much the exception. We’re modeled as shoppers and couch potatoes, as patients and loan applicants, and very little of this do we see—even in applications we happily sign up for. Even when such models behave themselves, opacity can lead to a feeling of unfairness.
WMDs are, by design, inscrutable black boxes. That makes it extra hard to definitively answer the second question: Does the model work against the subject’s interest? In short, is it unfair? Does it damage or destroy lives?
The game for hedge funds was not so much to ride markets up as to predict the movements within them. Down could be every bit as lucrative.
The firm welcomed Alan Greenspan, the former Fed chairman, and Robert Rubin, the former Treasury secretary and Goldman Sachs executive. Rubin had pushed for a 1999 revision of the Depression-era Glass-Steagall Act. This removed the glass wall between banking and investment operations, which facilitated the orgy of speculation over the following decade. Banks were free to originate loans (many of them fraudulent) and sell them to their customers in the form of securities.
However, now that Glass-Steagall was gone, the banks could, and sometimes did, bet against the very same securities that they’d sold to customers. This created mountains of risk—and endless investment potential for hedge funds. We placed our bets, after all, on market movements, up or down, and those markets were frenetic.
But what if the frightening tomorrow on the horizon didn’t resemble any of the yesterdays? What if it was something entirely new and different? That was a concern, because mathematical models, by their nature, are based on the past, and on the assumption that patterns will repeat.
For decades, mortgage securities had been the opposite of scary. They were boring financial instruments that individuals and investment funds alike used to diversify their portfolios. The idea behind them was that quantity could offset risk. Each single mortgage held potential for default: the home owner could declare bankruptcy, meaning the bank would never be able to recover all of the money it had loaned. At the other extreme, the borrower could pay back the mortgage ahead of schedule, bringing the flow of interest payments to a halt.
These risk models also created their own pernicious feedback loop. The AAA ratings on defective products turned into dollars. The dollars in turn created confidence in the products and in the cheating-and-lying process that manufactured them. The resulting cycle of mutual back-scratching and pocket-filling was how the whole sordid business operated until it blew up.
Paradoxically, the supposedly powerful algorithms that created the market, the ones that analyzed the risk in tranches of debt and sorted them into securities, turned out to be useless when it came time to clean up the mess and calculate what all the paper was actually worth. The math could multiply the horseshit, but it could not decipher it. This was a job for human beings.
The refusal to acknowledge risk runs deep in finance. The culture of Wall Street is defined by its traders, and risk is something they actively seek to underestimate. This is a result of the way we define a trader’s prowess, namely by his “Sharpe ratio,” which is calculated as the profits he generates divided by the risks in his portfolio. This ratio is crucial to a trader’s career, his annual bonus, his very sense of being.
A young suburbanite with every advantage—the prep school education, the exhaustive coaching for college admissions tests, the overseas semester in Paris or Shanghai—still flatters himself that it is his skill, hard work, and prodigious problem-solving abilities that have lifted him into a world of privilege. Money vindicates all doubts.
I wondered what the analogue to the credit crisis might be in Big Data. Instead of a bust, I saw a growing dystopia, with inequality rising. The algorithms would make sure that those deemed losers would remain that way. A lucky minority would gain ever more control over the data economy, raking in outrageous fortunes and convincing themselves all the while that they deserved it.
when you create a model from proxies, it is far simpler for people to game it. This is because proxies are easier to manipulate than the complicated reality they represent.
By leaving cost out of the formula, it was as if U.S. News had handed college presidents a gilded checkbook. They had a commandment to maximize performance in fifteen areas, and keeping costs low wasn’t one of them. In fact, if they raised prices, they’d have more resources for addressing the areas where they were being measured. Tuition has skyrocketed ever since.
Between 1985 and 2013, the cost of higher education rose by more than 500 percent, nearly four times the rate of inflation. To attract top students, colleges, as we saw at TCU, have gone on building booms, featuring glass-walled student centers, luxury dorms, and gyms with climbing walls and whirlpool baths. This would all be wonderful for students and might enhance their college experience—if they weren’t the ones paying for it, in the form of student loans that would burden them for decades.
The victims, of course, are the vast majority of Americans, the poor and middle-class families who don’t have thousands of dollars to spent on courses and consultants. They miss out on precious insider knowledge. The result is an education system that favors the privileged.