More on this book
Community
Kindle Notes & Highlights
It is the peculiar and perpetual error of the human understanding to be more moved and excited by affirmatives than by negatives. Francis Bacon, Novum Organum (1620)
For a scientific finding to be worth taking seriously, it can’t be something that occurred because of random chance, or a glitch in the equipment, or because the scientist was cheating or dissembling. It has to have really happened. And if it did, then in principle I should be able to go out and find broadly the same results as yours. In many ways, that’s the essence of science, and something that sets it apart from other ways of knowing about the world: if it won’t replicate, then it’s hard to describe what you’ve done as scientific at all.
Perhaps Bem himself best summed up many scientists’ attitudes to replication, in an interview some years after his infamous study. ‘I’m all for rigor,’ he said, ‘but I don’t have the patience for it … If you looked at all my past experiments, they were always rhetorical devices. I gathered data to show how my point would be made. I used data as a point of persuasion, and I never really worried about, “Will this replicate or will this not?”’13 Worrying about whether results will replicate or not isn’t optional. It’s the basic spirit of science; a spirit that’s supposed to be made manifest in
...more
Other books feature scientists taking the fight to a rogue’s gallery of pseudoscientists: creationists, homeopaths, flat-Earthers, astrologers, and their ilk, who misunderstand and abuse science – usually unwittingly, sometimes maliciously, and always irresponsibly.14 This book is different. It reveals a deep corruption within science itself: a corruption that affects the very culture in which research is practised and published. Science, the discipline in which we should find the harshest scepticism, the most pin-sharp rationality and the hardest-headed empiricism, has become home to a
...more
doing science involves much more than just running experiments or testing hypotheses. Science is inherently a social thing, where you have to convince other people – other scientists – of what you’ve found. And since science is also a human thing, we know that any scientists will be prone to human characteristics, such as irrationality, biases, lapses in attention, in-group favouritism and outright cheating to get what they want.
Much can be learned from mistakes. On one of his albums, the musician Todd Rundgren has a spoken-word introduction encouraging the listener to play a little game he calls ‘Sounds of the Studio’. Rundgren describes all the missteps that can be made when recording music: hums, hisses, pops on the microphone whenever someone sings a word with a ‘p’ in it, choppy editing, and so on. He suggests that the reader listen for these mistakes in the songs that follow, and on other records. And just as a better understanding of recording studio slip-ups can give you a new insight into how music is made,
...more
Most people, including scientists, assume peer review has always been a crucial feature of scientific publication, but its history is more complicated. Although in the seventeenth century the Royal Society tended to ask some of its members whether they thought a paper was interesting enough to publish in Philosophical Transactions, requiring them to provide a written evaluation of each study wasn’t tried until at least 1831.15 Even then, the formal peer review system we know today didn’t become universal until well into the twentieth century (as you can tell from a letter Albert Einstein sent
...more
This highlight has been truncated due to consecutive passage length restrictions.
What ensures that the participants in the process just described – the researcher who submits the paper, the editor at the journal, the peers who review it – all conduct themselves with the honesty and integrity that trustworthy science requires? There’s no law requiring that everyone acts fairly and rationally when evaluating science, so what’s needed is a shared ethos, a set of values that aligns the scientists’ behaviour.19 The best-known attempt to write down these unwritten rules is that of the sociologist Robert Merton. In 1942, Merton set out four scientific values, now known as the
...more
This highlight has been truncated due to consecutive passage length restrictions.
Replication, then, has long been a key part of how science is supposed to work – and incidentally, it’s another of its social aspects, with results only being taken seriously after they’ve been corroborated by multiple observers. But somewhere along the way, between Boyle and modern academia, a great many scientists forgot about the importance of replication. In the collision of our Mertonian ideals with the realities of the scientific publication system – not to mention the realities of human nature – the ideals have proven the more fragile, leaving us with a scientific literature full of
...more
studies we saw above).32 In general, though, the effect on psychology has been devastating. This wasn’t just a case of fluffy, flashy research like priming and power posing being debunked: a great deal of far more ‘serious’ psychological research (like the Stanford Prison Experiment, and much else besides) was also thrown into doubt. And neither was it a matter of digging up some irrelevant antiques and performatively showing that they were bad – like when Pope Stephen VI, in the year 897, exhumed the corpse of one of his predecessors, Pope Formosus, and put it on trial (it was found guilty).
...more
In economics, a miserable 0.1 per cent of all articles published were attempted replications of prior results; in psychology, the number was better, but still nowhere near good, with an attempted replication rate of just over 1 per cent.40 If everyone is constantly marching onwards to new findings without stopping to check if our previous knowledge is robust, is the above list of replication failures that much of a surprise?
You’d think that if you obtained the exact same dataset as was used in a published study, you’d be able to derive the exact same results that the study reported. Unfortunately, in many subjects, researchers have had terrible difficulty with this seemingly straightforward task. This is a problem sometimes described as a question of reproducibility, as opposed to replicability (the latter term being usually reserved to mean studies that ask the same questions of different data). How is it possible that some results can’t even be reproduced? Sometimes it’s due to errors in the original study.
...more
Although we’ve seen poor replicability in some weightier areas, such as economic theory, how much difference can it make to our lives if a bunch of academics end up disagreeing over whether power posing works, or whether alpha-male sparrows have more black patches of feathers? There are two responses to make. The first is that there’s a wider principle at stake here: science is crucial to our society, and we mustn’t let any of it be compromised by low-quality, unreplicable studies. If we let standards slip in any part of science, we risk tarnishing the reputation of the scientific enterprise
...more
a more comprehensive study took a random sample of 268 biomedical papers, including clinical trials, and found that all but one of them failed to report their full protocol. Meaning that, once again, you’d need additional details beyond the paper even to try to replicate the study.53 Another analysis found that 54 per cent of biomedical studies didn’t even fully describe what kind of animals, chemicals or cells they used in their experiment.54 Let’s take a moment to think about how odd this is. If a paper only provides a superficial description of a study, with necessary details only appearing
...more
Replication problems in medicine haven’t only affected laboratory based, preclinical research: they can have direct effects on the treatments doctors give to their patients. Common treatments, it turns out, are often based on low-quality research; instead of being solidly grounded in evidence, accepted medical wisdom is instead regularly contradicted by new studies. This phenomenon occurs so frequently that the medical scientists Vinay Prasad and Adam Cifu have a name for it: ‘medical reversal’.57
It’s understandable that doctors, and those who write the guidelines for medical treatments, sometimes find themselves relying on low-quality evidence. Often the alternative is no evidence at all, and their job is to help patients who need treatment right now. And it’s inevitable that advances in technology, methodology, and funding allow scientists to do better research today than was possible a few years ago: that’s normal scientific progress. But scientists have let doctors and patients down by creating such a constant state of flux in the medical literature, running and publishing
...more
A 2016 survey of over 1,500 researchers – though admittedly not a properly representative one, since it just involved those who filled in a questionnaire on the website of the journal Nature – found that 52 per cent thought there was a ‘significant crisis’ of replicability. A further 38 per cent believed there was, in a somewhat peculiar turn of phrase, at least a ‘slight crisis’.69 Nearly 90 per cent of chemists said that they’d had the experience of failing to replicate another researcher’s result; nearly 80 per cent of biologists said the same, as did almost 70 per cent of physicists,
...more
What led to such a drawn-out process, where even the horrific, painful deaths of multiple patients hadn’t moved the university to fire, or even censure, the man responsible? Indeed, what led to the university lashing out against the very people who blew the whistle on the falsified results?33 Macchiarini clearly used his ‘breakthrough’ operations to burnish his reputation and fame. But he was also an asset to the Karolinska Institute and its planned international expansion. It has been suggested that the university’s association with the superstar surgeon helped to grease the wheels of its new
...more
The fact that the scientific community so proudly cherishes an image of itself as objective and scrupulously honest – a system where fraud is complete anathema – might, perversely, be what prevents it from spotting the bad actors in its midst. The very idea of villains like Macchiarini existing in science is so abhorrent that many adopt a see-no-evil attitude, overlooking even the most glaring signs of scientific misconduct. Others are in denial about the prevalence and the effects of fraud. But as we’ll see in this chapter, fraud in science is not the vanishingly rare scenario that we
...more
Both measurement error and sampling error are unpredictable, but they’re predictably unpredictable. You can always expect data from different samples, measures or groups to have somewhat different characteristics – in terms of the averages, the highest and lowest scores, and practically everything else. So even though they’re normally a nuisance, measurement error and sampling error can be useful as a means of spotting fraudulent data. If a dataset looks too neat, too tidily similar across different groups, something strange might be afoot. As the geneticist J. B. S. Haldane put it, ‘man is an
...more
The immunologist and Nobelist Sir Peter Medawar has argued, perhaps counter-intuitively, that scientists who commit fraud care too much about the truth, but that their idea of what’s true has become disconnected from reality. ‘I believe,’ he wrote, ‘that the most important incentive to scientific fraud is a passionate belief in the truth and significance of a theory or hypothesis which is disregarded or frankly not believed by the majority of scientists – colleagues who must accordingly be shocked into recognition of what the offending scientist believes to be a self-evident truth.’103 The
...more
Particularly interesting for our hunt for the motives of fraudsters, though, is what comes at the end of the lengthy investigation report: a short reply from Schön himself. He pleads that, although he ‘made mistakes’ and ‘realise[s] there is a lack of credibility’ he also ‘truly believe[s] that the reported scientific effects are real, exciting, and worth working for’.110 We should of course be extremely wary about taking fraudsters at their word. But Schön’s apparent faith in his theory, even after so much of his work had been expunged from the scientific literature (he’s currently fifteenth
...more
The obesity researcher Eric Poehlman, for example, the first scientific fraudster to be jailed in America, wasted millions of dollars of taxpayer money in grants from the US government in producing a decade’s worth of useless, fabricated data.114 And how many perfectly innocent scientists have wasted their own research grant money trying to follow or replicate his work, or that of other fraudsters? Aside from the waste, fraud has a terribly demoralising effect on scientists.
The acceptance of Wakefield’s study in as prominent an outlet as the Lancet will go down as one of the worst decisions in the history of scientific publication. There could hardly be a more concrete example of the importance of reliable science for society’s wellbeing, and of the failure of the peer-review system to screen out bad research. It brings us back once again to the issue of trust – this time, the public’s trust in science. Having your child vaccinated is an act of commission: you’re actively having something done to them, trusting that the medics are right that it’s safe.143 If a
...more
An adopted hypothesis gives us lynx-eyes for everything that confirms it and makes us blind to everything that contradicts it. Arthur Schopenhauer, The World as Will and Presentation (1818)
Science … commits suicide when it adopts a creed. T. H. Huxley, ‘The Darwin Memorial’ (1885)
There’s an age-old philosophical question that goes: ‘Why is there something rather than nothing?’ We can pose a similar query about the scientific process: ‘Why do studies always find something rather than nothing?’
There’s a straightforward, but devastating, reason for this persistent positivity: scientists choose whether to publish studies based on their results. In a perfect world, the methodology of a study would be all that matters: if everyone agrees it’s a good test of its hypothesis, from a well-designed piece of research, it gets published. This would be a true expression of the Mertonian norm of disinterestedness, where scientists are supposed to care not about their specific results (the very idea of having a ‘pet theory’ is an affront to this norm) but just the rigour with which they’re
...more
This highlight has been truncated due to consecutive passage length restrictions.
Despite being one of the most commonly used statistics in science, the p-value has a notoriously tricky definition. A recent audit found that a stunning 89 per cent of a sample of introductory psychology textbooks got the definition wrong; I’ll try to avoid making the same mistake here.16 The p-value is the probability that your results would look the way they look, or would seem to show an even bigger effect, if the effect you’re interested in weren’t actually present.17 Notably, the p-value doesn’t tell you the probability that your result is true (whatever that might mean), nor how
...more
Outside of exceptions such as the Higgs Boson, though, the 0.05 threshold remains, through conformity, tradition and inertia, the most widely used criterion today. It has scientists feverishly rifling through their statistical tables, checking for p-values lower than 0.05 so that they can report their results as being statistically significant. It’s easy to forget the arbitrariness. Richard Dawkins has bemoaned the ‘discontinuous mind’: our human tendency to think in terms of distinct, sharply defined categories rather than the messy, blurry, ambiguous way the world really is.25 One example is
...more
Perhaps the scientists who ran these studies thought something like: ‘well, it was only a small study, and the small effect I found is probably just due to noisy data. Come to think of it, I was silly even to expect to find an effect here! There’s no point in trying to publish this.’ Crucially, though, this post hoc rationalisation wouldn’t have occurred to them if the same small-sample study, with its potentially noisy data, happened to show a large effect: they’d have eagerly sent off their positive results to a journal. This double standard, based on the entrenched human tendency towards
...more
In business and politics, one of the great villains is the Yes Man. Countless books tell aspiring managers and leaders to be careful never to surround themselves with people who’ll just nod along with all their decisions, even the bad ones. Winston Churchill proclaimed that ‘the temptation to tell a chief in a great position the things he most likes to hear is one of the commonest explanations of mistaken policy. Thus the outlook of the leader on whose decisions fateful events depend is usually far more sanguine than the brutal facts admit.’42 In science, publication bias makes Yes Men out of
...more
Wansink wrote a blog post about how he’d encouraged one of his graduate students to analyse a dataset he’d collected in a New York pizza restaurant.46 The original hypothesis, he said, had ‘failed’, but instead of publishing those null results or file-drawering the whole thing, he told the student that ‘there’s got to be something [in the dataset] we can salvage’. She complied, and ‘every day … came back with puzzling new results … and every day … c[a]me up with another way to reanalyse the data’. By openly admitting to dredging through his datasets looking for anything that was ‘significant’,
...more
This latter type of p-hacking is known as HARKing, or Hypothesising After the Results are Known. It’s nicely summed up by the oft-repeated analogy of the ‘Texas sharpshooter’, who takes his revolver and randomly riddles the side of a barn with gunshots, then swaggers over to paint a bullseye around the few bullet holes that happen to be near to one another, claiming that’s where he was aiming all along.50
Did Carney receive a torrent of vile abuse, or lose her job, for admitting all this? No. In fact, the reaction was the precise opposite. A search of Twitter, often maligned as a haven for online bullies, reveals other researchers (rightly) calling Carney’s statement ‘brave’, ‘impressive’, ‘admirable’, ‘the way forward’, an example of ‘how to deal with failure to replicate one’s work’ and a ‘remarkable demonstration of scientific integrity’. One neuroscientist called Carney an ‘intellectual/academic hero’.64 I couldn’t find a single negative reaction – except the one from Amy Cuddy herself, who
...more
Similarly disheartening findings come from surveys in other fields. A survey of biomedical statisticians in 2018 found that 30 per cent had been asked by their scientist clients to ‘interpret the statistical findings on the basis of expectations, not the actual results’ while 55 per cent had been asked to ‘stress only significant findings but underreport nonsignificant ones’.67 32 per cent of economists in another survey admitted to ‘present[ing] empirical findings selectively so that they confirm[ed their] argument’, while 37 per cent said they had ‘stopped statistical analysis when they had
...more
if you already believe your hypothesis is true before testing it, it can seem eminently reasonable to give any uncertain results a push in the right direction. And whereas the true fraudster knows that they’re acting unethically, everyday researchers who p-hack often don’t.
Endless choices offer endless opportunities for scientists who begin their analysis without a clear idea of what they’re looking for. But as should now be clear, more analyses mean more chances for false-positive results. As the data scientists Tal Yarkoni and Jake Westfall explain, ‘The more flexible a[n] … investigator is willing to be – that is, the wider the range of patterns they are willing to ‘see’ in the data – the greater the risk of hallucinating a pattern that is not there at all.’71
Even without the trial-and-error of classic p-hacking, then, scientists who don’t come to their data with a proper plan can end up analysing themselves into an unreplicable corner. Why unreplicable? Because when they reach each fork in the path, the scientist is being strung along by the data: making a choice that looks like it might lead to p < 0.05 in that dataset, but that won’t necessarily do the same in others. This is the trouble with all kinds of p-hacking, whether explicit or otherwise: they cause the analysis to – using the technical term – overfit the data.73 In other words, the
...more
This is what scientists are unwittingly doing when they p-hack: they’re making a big deal of what is often just random noise, counting it as part of the model instead of as nuisance variation that should be disregarded in favour of the real signal (if a signal exists in the first place). Woe betide anyone who takes a p-hacked, overfitted model and tries to replicate it in a different sample: its results were contingent on the specific forking paths its creators followed through their noisy data, so it’ll probably tell us very little about the world beyond that single dataset.
You can see why scientists are tempted by overfitting. If you focus only on your own data and forget that your job is to make general statements about the world, a perfectly fitting model like graph 3C is very alluring: there are no uncertainties, no messy datapoints that evade the line you’ve drawn. The neatness alone is not what makes it so compelling, though: simply connecting the dots in the graph doesn’t require any scientific knowledge. But a paper that sounds as if you had come up with the specific shape of the line (your theory) before collecting the data? Now you’ve got the scientific
...more
Stephen Jay Gould referred to science as ‘a profession that awards status and power for clean and unambiguous discovery’ [my italics]. The social psychologist Roger Giner-Sorolla concurs: ‘In a head-to-head competition between papers … the paper with results that are all significant and consistent will be preferred over the equally well-conducted paper that reports the outcome, warts and all, to reach a more qualified conclusion.’74 Here we see how publication bias and p-hacking are two manifestations of the same phenomenon: a desire to erase results that don’t fit well with a preconceived
...more
The desire for attractive-looking results affects even the ‘hardest’ sciences. In her book Lost in Math, the physicist Sabine Hossenfelder argues that physicists have gotten high on their own supply, focusing on the elegance and beauty of models such as string theory at the expense of being able to test, in practice, whether they’re actually true.77 Although the lofty, mathematical work of these string theorists feels like it could hardly be further from the (almost literally) kitchen-sink science of Brian Wansink, both kinds of research can become saturated with the same kinds of
...more
It’s impossible to know for sure how many patients have been given useless medical treatments – and false hope – because of p-hacked clinical trials, but the number is surely enormous.83 Think back to the meta-analyses we covered earlier. Even leaving aside the fact that some research is often missing due to publication bias, if the studies included in the meta-analysis are all themselves exaggerated by p-hacking, the overall combined effect – in what’s supposed to be an authoritative overview of all the knowledge on the topic – will end up far from reality.84 You might wonder how doctors and
...more
In the US, where numbers are easily available, just over a third of registered medical trials in recent years were funded by the pharmaceutical industry.85 To what extent does this funding, from companies who plan to market the drug if it works, influence the results? The consensus of meta-scientific research on clinical trials is that industry-funded drug trials are indeed more likely to report positive results. In a recent review, for every positive trial funded by a government or non-profit source, there were 1.27 positive trials by drug companies.86
It’s currently a requirement at most journals that you disclose at the end of any published paper, in a conflict-of-interest section, any money you’ve received for, say, consulting for a pharmaceutical company.89 But other kinds of financial conflicts aren’t treated the same way. Many scientists, for example, forge lucrative careers based on their scientific results, producing bestselling books and regularly being paid five- or six-figure sums for lectures, business consulting and university commencement addresses.90 People should of course be allowed to pay whatever they like for book deals,
...more
the scientist means well and wants to feel like their research is beneficial. We could even call this the ‘meaning well bias’. It’s crushing when the trial you’ve designed to test your new treatment provides null results, meaning we’re no closer to helping sick people. It’s disheartening when you hypothesise about the link of some biological factor to a disease, and it turns out you’ve been barking up the wrong tree. Or, at least, it might feel that way if you have the wrong attitude to science. The currency of positive, statistically significant results in science is so strong that many
...more
What about researchers who have a more ideological or political stake in the truth or falsity of a result? One of the more remarkable conflict-of-interest sections I’ve come across is in a public health paper on the so-called ‘Glasgow Effect’. This is the phenomenon whereby people from Glasgow, and Scotland more generally, die younger on average than those in other similar cities or countries, even after accounting for levels of poverty and deprivation. After reviewing the evidence on this effect, the paper concluded that the root of the unique problem was the ‘political attack’ on Scotland
...more
This highlight has been truncated due to consecutive passage length restrictions.
The psychologist and philosopher Cordelia Fine – a perspicacious critic of shoddy, often p-hacked studies that purport to explain behavioural differences between the sexes by straightforwardly linking them to testosterone levels, in the process forgetting about social explanations – has also addressed the issue of males being treated as the ‘default’ in medical research, with females being considered a secondary concern or even an aberration.107 In a 2018 opinion piece in the Lancet, Fine suggested that ‘feminist science’ could redress the balance by highlighting these kinds of omissions. She
...more
Trying to correct for bias in science by injecting an equal and opposite dose of bias only compounds the problem, and potentially invites a vicious cycle of ever-increasing division between different ideological camps. Not only that but the suggestion that scientists should feel proud to let their ideological views impinge on their research seems to offend against both the Mertonian norms of disinterestedness (since it involves allowing non-scientific concerns to encroach on research) and universalism (since it might involve holding scientific arguments to a different standard depending on the
...more