More on this book
Community
Kindle Notes & Highlights
by
Ian Ayres
Read between
April 30 - May 19, 2018
Both the wine dealers and writers have a vested interest in maintaining their informational monopoly on the quality of wine. The dealers use the perennially inflated initial rankings as a way to stabilize prices.
Many times this Super Crunching revolution is a boon to consumers as it helps sellers and governments make better predictions about who needs what. At other times, however, consumers are playing against a statistically stacked deck.
Wal-Mart had to apologize when people who searched for Martin Luther King: I Have a Dream were told they might also appreciate a Planet of the Apes DVD collection. Amazon.com similarly offended some customers who searched for “abortion” and were asked “Did you mean adoption?” The adoption question was generated automatically simply because many past customers who searched for abortion had also searched for adoption.
Chicago law professor Cass Sunstein worries that there’s a social cost to exploiting the long tail. The more successful these personalized filters are, the more we as a citizenry are deprived of a common experience.
Nicholas Negroponte, MIT professor and guru of media technology, sees in these “personalized news” features the emergence of the “Daily Me”—news publications that expose citizens only to information that fits with their narrowly preconceived preferences.
In some contexts, collective predictions are more accurate than the best estimate that any member of the group could achieve.
For example, imagine that you offer a $100 prize to a college class for the student with the best estimate of the number of pennies in a jar. The wisdom of the group can be found simply by calculating their average estimate. It’s been shown repeatedly that this average estimate is very likely to be closer to the truth than any of the individual estimates.
regression is a statistical procedure that takes raw historical data and estimates how various causal factors influence a single variable of interest.
In fact, because the algorithm is not public, it is possible that eHarmony puts a normative finger on the scale to favor certain clients.
Firms are figuring out more and more sophisticated ways to treat the price-oblivious differently than the price-conscious.
In each of these cases, the companies not only know the generalized probability of some behavior, they can make incredibly accurate predictions about how individual customers are going to behave.
Indeed, because of Super Crunching, firms sometimes may be able to make more accurate predictions about how you’ll behave than you could ever make yourself.
Visa already does predict the probability of divorce based on credit card purchases
Coming to the aid of consumers, these firms are using data-crunching to counteract the excesses of seller-side price extraction.
The statistical regression not only produces a prediction, it also simultaneously reports how precisely it was able to predict. That’s right—a regression tells you how accurate the prediction is.
“Taguchi Method,”
The random trial method is not the end of intuition. Instead it puts intuition to the test.
The NIT pays you money if your income falls below some minimum level and effectively guarantees people a minimum income regardless of how much they earn working. Heather wanted to see whether the NIT reduced people’s incentives to work.
She found that the NIT didn’t reduce employment nearly as much as people feared, but there was a very unexpected spike in divorce. Poor families that were randomly selected to receive the NIT were more likely to split up.
the cost of providing the search assistance. For every dollar invested in job assistance, the government saved about two dollars.
Voilà, we have an informational cascade, where an initial mistaken inference based on an extremely small sample is propagated in generation after generation of education.
The openness of the Internet is even transforming the culture of medicine. The results of regressions and randomized trials are out and available not just for doctors but for anyone who has time to Google a few keywords. Doctors are feeling pressured to read not just because their (younger) peers are telling them to, but because increasingly they read to stay ahead of their patients. Just as car buyers are jumping on the Internet before they visit the showroom, many patients are going to sites like Medline to figure out what might be ailing them. The Medline website was originally intended for
...more
Misdiagnosis accounts for about one-third of all medical error.
Autopsy studies show that doctors seriously misdiagnose fatal illnesses about 20 percent of the time.
Chris Snijders, a professor at Eindhoven University of Technology, in the Netherlands, decided to see whether he could out-purchase professional corporate buyers.
had collected a database on more than 5,200 computer equipment and software purchases by more than 700 Dutch businesses. For each purchase, Snijders had information on more than 300 aspects of the transaction—including several aspects of purchasing satisfaction such as whether the delivery was late or non-conforming, or whether the product had insufficient documentation.
Snijders’s purchasing managers couldn’t outperform a simple statistical formula to predict timeliness of delivery, adherence to the budget, or purchasing satisfaction.
Once we form a mistaken belief about something, we tend to cling to it. As new evidence arrives, we’re likely to discount disconfirming evidence and focus instead on evidence that supports our preexisting beliefs.
In noisy environments, it’s often not clear what factors should be taken into account. This is where we tend mistakenly to bow down to experts who have years of experience—the baseball scouts and physicians—who are confident that they know more than the average Joe.
A statistical procedure cannot estimate the causal impact of rare events
While most consumers now know that the sales price of a car can be negotiated, many do not know that auto lenders, such as Ford Motor Credit or GMAC, often give dealers the option of marking up a borrower’s interest rate.
For example, Ford Motor Credit tells a dealer that it was willing to lend Susan money at a 6 percent interest rate, but that they would pay the dealership $2,800 if the dealership could get Susan to sign an 11 percent loan. The borrower would never be told that the dealership was marking up the loan. The dealer and the lender would then split the expected profits from the markup, with the dealership taking the lion’s share.
As the size of datasets balloons almost beyond the scope of our imagination, it becomes all the more important to continually audit them to check for the possibility of error.
Yahoo! currently records over twelve terabytes of data daily.
On the one hand, this is a massive amount of information—it’s roughly equivalent to more than half the information contained in all the books in the Library of Congress.
The timing is best explained by the digital breakthroughs that make it cheaper to capture, to merge, and to store huge electronic databases.
it often becomes impossible to figure out how an individual input is affecting the predicted outcome.
Lyndon Johnson, as part of his War on Poverty, wanted to “follow through” on the vanishing gains seen from Head Start. Concerned that “poor children tend to do poorly in school,” the Office of Education and the Office of Economic Opportunity sought to determine what types of education models could best break this cycle of failure. The result was Project Follow Through, an ambitious effort that studied 79,000 children in 180 low-income communities for twenty years at a price tag of more than $600 million.
This is a new kind of caveat emptor, where consumers are going to have to search more to make sure that the offered price is fair. Consumers are going to have to engage in a kind of number crunching of their own, creating and comparing datasets of (quality-adjusted) competitive prices.
The idea that a university or insurer could predict your race is itself just another way that Super Crunching is reducing our sphere of effective privacy.
Information is not only easier to capture now in digital form, but it is also virtually costless to copy.
Super Crunching moves us toward a kind of statistical predeterminism.
Everybody knows that a baby is due roughly nine months after conception. However, few people know that the standard deviation is fifteen days.