More on this book
Community
Kindle Notes & Highlights
Biases are an unavoidable part of human nature and it’s naïve to think they could ever be eradicated from anything that we do. We do, however, have tools that are supposed to bring a little more objectivity to the table. Statistical analysis itself is aimed at taking decisions out of our biased human hands – yet we’ve seen how easy it is for the numbers to be nudged in our favour. Peer review is supposed to act as a check on our prejudices – yet we’ve seen how the attempt to persuade reviewers and editors to publish our work leads to inconvenient results being hidden away entirely or else
...more
In 2016, the psychologist Matti Heino applied the GRIM test to one of the most famous psychology papers of all time: Leon Festinger and James Carlsmith’s 1959 paper on ‘cognitive dissonance’13. This is the now widely known idea that forcing someone to say or do something inconsistent with their true beliefs will make them psychologically uncomfortable and they’ll do their best to alter those beliefs to make them fit with what they’ve been made to say or do. In the 1959 study, participants were made to complete some tedious, pointless tasks, such as endlessly twisting pegs around on a pegboard.
...more
This highlight has been truncated due to consecutive passage length restrictions.
Running a study with low statistical power is like setting out to look for distant galaxies with a pair of binoculars: even if what you’re looking for is definitely out there, you have essentially no chance of seeing it. Sadly, this point seems to have passed many scientists by, not least in Macleod’s chosen field of animal research. A 2013 review looked across a variety of neuroscientific studies, including, for example, research on sex differences in the ability of mice to navigate mazes.43 To have enough statistical power to detect the typically expected sex effect in maze-navigating
...more
At the risk of sounding tautological: since underpowered studies only have the power to detect large effects, those are the only effects they see. This is where the logic leads. If you find an effect in an underpowered study, that effect is probably exaggerated.47 Then comes publication bias: since large effects are exciting effects, they’re much more likely to go on to get published. That’s why, when reading the scientific literature, so many tiny studies seem to be reporting big effects: as we saw with the funnel plots in the last chapter, journals are often missing all the small studies
...more
When it comes to studies of such extraordinarily complex systems as the body or the brain, or an ecosystem, the economy or society, it’s rare for scientists to find one factor that has a massive effect on another. Instead, most of the psychological, social and even medical phenomena we’re interested in are made up of lots of small effects, each of them playing a small role. For example, if economists want to explain why different people in their sample have different incomes, they’d need to take into account where the participants live, their family backgrounds, their abilities, personalities
...more
Reading through the candidate gene literature is, in hindsight, a surreal experience: they were building a massive edifice of detailed studies on foundations that we now know to be completely false. As Scott Alexander of the blog Slate Star Codex put it: ‘This isn’t just an explorer coming back from the Orient and claiming there are unicorns there. It’s the explorer describing the life cycle of unicorns, what unicorns eat, all the different subspecies of unicorn, which cuts of unicorn meat are tastiest, and a blow-by-blow account of a wrestling match between unicorns and Bigfoot.’61
In surveys of the public, scientists are generally described as being very competent.66 The most frustrating thing about the errors documented in this chapter is that the overwhelming majority of scientists do indeed know better. They know to check thoroughly for typos and other slip-ups, they know about the importance of randomisation and blinding, and they know (and have known since the 1950s) that contamination of cell lines is a grave problem for fields like oncology. They know, even from their undergraduate statistics lectures, that statistical power is a vital consideration, especially
...more
The scenario where an innocent researcher is minding their own business when the media suddenly seizes on one of their findings and blows it out of proportion is not at all the norm.16 The main problem with press releases isn’t that they routinely report earth-shaking findings that turn out to be erroneous. Instead, it’s that they puff up the results of often perfectly serviceable scientific papers, making them seem more important, ground-breaking, or relevant to people’s lives than they really are.
The first was unwarranted advice: press releases gave recommendations for ways readers should change their behaviour – for example, telling them to engage in a specific kind of exercise – that was more simplistic or direct than the results of the study could support.
Another variety of hype was the cross-species leap. As we’ve seen previously, lots of preclinical medical research is done using non-human animals like rats and mice – a practice known as translational research, or animal modelling.18 The idea is that the basic principles of how, say, the brain or the gut or the heart work can be studied in the animal ‘model’ and then, with lots of work, the findings will eventually translate to humans, helping us design better treatments. Yet there are a lot of steps between making a discovery in mice (or in cells in a dish, or in computer simulations) and it
...more
The psychophysiologist James Heathers has set up a novelty Twitter account that exists solely to retweet misleading news headlines from translational studies, such as ‘Scientists Develop Jab that Stops Craving for Junk Food’ or ‘Compounds in Carrots Reverse Alzheimer’s-Like Symptoms’ with a simple but accurate addition: ‘… IN MICE’.21 The third kind of hype found by the Cardiff team was possibly the most embarrassing. Everyone, especially scientists, is supposed to know that correlation is not causation.22 This basic insight is taught in every elementary statistics course and is a perennial
...more
The Cardiff researchers found that if the press release exaggerated the claim first, similar exaggeration in the media was 6.5 times more likely for advice claims, 20 times more likely for causal claims, and a whopping 56 times more likely for translational claims. (When the press release was more circumspect, journalists only exaggerated a small amount.) And although this itself was merely a correlational study, the Cardiff team followed up in 2019 with an impressive randomised trial.26 They worked with university press offices to modify randomly selected press releases by adding unwarranted
...more
A study in 2017 found that only around 50 per cent of health studies covered in the media are eventually confirmed by meta-analyses (that is, 50 per cent are found to be broadly replicable). This finding would be scandalous enough on its own, but what makes it even worse is the fact that those meta-analyses are rarely, if ever, covered in the press.29 By that time the damage might already have been done. With apologies to Jonathan Swift: hyped science flies, and the refutations come limping after it. Arguably, the kind of ephemeral hype that’s spread by news articles isn’t the most concerning
...more
The originator of the mindset concept, the Stanford psychologist Carol Dweck, has published hundreds of scientific papers on it, but her book Mindset: The New Psychology of Success is where the idea really hit the big time. She makes the notion of mindsets sound potentially life-altering. ‘The view you adopt for yourself profoundly affects the way you lead your life,’ writes Dweck. And: ‘When you enter a mindset, you enter a new world.’31 Indeed, when you learn about mindsets, ‘You’ll suddenly understand the greats – in the sciences and arts, in sports, and in business – and the
...more
This highlight has been truncated due to consecutive passage length restrictions.
Even with such a small benefit, if you could roll it out across thousands or millions of students, then on aggregate you might do a decent amount of good.36 But that’s not how Dweck chose to frame growth mindset, nor would that kind of framing have made parents and teachers flock to buy her book. Instead, she hyped up its individual effects, making it sound almost revelatory.37 The risk of such overhyping is that teachers and politicians begin to view ideas like mindset as a kind of panacea for education, focusing time and resources on them that might be better spent on dealing with the
...more
Maybe I’m missing the point here. Maybe popular-science books, which are commercial enterprises, don’t need to be 100 per cent rigorously accurate, resistant to every nit-picking criticism. Maybe writing easy-to-digest treatments of scientific findings, even if they’re a little on the simplified side, is beneficial overall, since it promotes science and makes it relevant to people’s lives. And wouldn’t we rather that such books were being written by people who at least nod at the evidence? There’s some merit in this kind of argument, but it’s bad news in the long run. Letting the facts slide
...more
The most glamorous journals state on their websites that they want papers that have ‘great potential impact’ (Nature); that are ‘most influential in their fields’ and ‘present novel and broadly important data’ (Science); and that are of ‘unusual significance’ (Cell) or ‘exceptional importance’ (Proceedings of the National Academy of Sciences).59 Conspicuous by their absence from this list are any words about rigour or replicability – though hats off to the New England Journal of Medicine, the world’s top medical journal, for stating that it’s looking for ‘scientific accuracy, novelty, and
...more
There’s a whole literature of studies by scientific spin-watchers, each of them highlighting spin in their own fields. 15 per cent of trials in obstetrics and gynaecology spun their non-significant results as if they showed benefits of the treatment.63 35 per cent of studies of prognostic tests for cancer used spin to obfuscate the non-significant nature of their findings.64 47 per cent of trials of obesity treatments published in top journals were spun in some way.65 83 per cent of papers reporting trials of antidepressant and anxiety medication failed to discuss important limitations of
...more
All this spin serves the same ultimate purpose as exaggerations in press releases and books: scientists want to emphasise the impressiveness and ‘impact’ of their work because impressive and impactful work is what attracts grants, publications and plaudits. The problem is that this can create a feedback loop: the hype nurtures an expectation of straightforward, simple stories on the part of funders, publishers and the public, meaning that scientists must dumb down and prettify their work even more in order to maintain interest and continue to receive funding. The science itself comes out of
...more
At any given time, there’s usually an ‘emerging’ field that’s subject to the worst hype. Typically, a few publications with easy-to-grasp results in big-name journals get picked up by the media, public interest intensifies, and scientists in the field develop a kind of recklessness, feeding the hype cycle with careless and overblown statements. Then big claims fail to replicate in later experiments, the furore dies away and normal science resumes. Ultra-hyped fields include stem cells, genetics, epigenetics, machine learning and brain imaging; for the past few years, a strong contender for the
...more
You might think that the link between these behaviours in mice and autism in humans is, to say the least, somewhat tenuous. You might also wonder whether such a tiny number of donors could possibly be representative of people with autism in general.81 Nevertheless, the authors were happy to draw an impressive conclusion: ‘microbiota-based interventions such as probiotics [or] fecal microbiota transplantation … may offer a timely and tractable approach to addressing the lifelong challenges of [autism spectrum disorder].’82 They put out a press release discussing the ‘profound’ effects of the
...more
There have been recent calls from within the scientific community to cool down the gee-whiz hype surrounding the microbiome and its associated treatments, and to improve the quality of the research.88 In the meantime, the grossly exaggerated claims of these papers and press releases provide the semblance of scientific backing to a host of useless, harmful, or just plain daft microbiome-related remedies: a probiotic drink made using microbes found in the guts of elite athletes that can supposedly boost your performance; the craze for ‘colonic irrigation’, which involves flushing out your bowels
...more
Fads like microbiome mania wax and wane, but there’s one field of research that consistently generates more hype, inspires more media interest and suffers more from the deficiencies outlined in this book than any other. It is, of course, nutrition. The media has a ravenous appetite for its supposed findings: ‘The Scary New Science That Shows Milk is Bad For You’; ‘Killer Full English: Bacon Ups Cancer Risk’; ‘New Study Finds Eggs Will Break Your Heart’.90 Given the sheer volume of coverage, and the number of conflicting assertions about how we should change our diets, little wonder the public
...more
In a now-classic paper entitled ‘Is everything we eat associated with cancer?’, researchers Jonathan Schoenfeld and John Ioannidis randomly selected fifty ingredients from a cookbook, then checked the scientific literature to see whether they had been said to affect the risk of cancer.100 Forty of them had, including bacon, pork, eggs, tomatoes, bread, butter and tea (essentially all the aspects of that Killer Full English). Some foods apparently raised the risk, some reduced it, and some had different effects in different studies. We know that numbers are noisy, so we’d expect the literature
...more
I include PREDIMED here not because it was a particularly bad example of hype, but because it’s an example of some of the best research in an extraordinarily hyped field – and because it shows us how even a poster child for rigour might be subject to hidden flaws. Rather like psychology, nutritional epidemiology is hard. An incredibly complex physiological and mental machinery is involved in the way we process food and decide what to eat; observational data are subject to enormous noise and the vagaries of human memory; randomised trials can be tripped up by the complexities of their own
...more
the way OPERA handled its unexpected result was close to exemplary. The physicists drew attention to a strange finding that needed replication while avoiding hype and expressing all the necessary reservations, giving the world a valuable lesson in scientific uncertainty. The initial flurry of reports was followed by further coverage of the story’s resolution.127 Is it too far-fetched to suggest that, if the OPERA scientists had been psychologists instead of physicists, they’d have skipped over the double-checking of their results and rushed to sign book deals for titles like Breaking Barriers:
...more
This highlight has been truncated due to consecutive passage length restrictions.
It was a classic case of a perverse incentive. The Corps of Engineers had incentivised not the clean-up, but the mere fact of having a heavier truck, inadvertently creating new problems. It isn’t hard to think of similar examples from other fields: incentives for journalists that reward revenue rather than original reporting, leading to flimsy clickbait articles; incentives for teachers that reward school rankings rather than learning, leading to questionable marking; incentives for politicians that reward short-term vote gains rather than long-term solutions, leading to the subsidy of
...more
the scientific incentive system engenders an obsession not just with certain kinds of papers, but with publication itself. The system incentivises scientists not to practise science, but simply to meet its own perverse demands. These incentives are at the root of so many of the dubious practices that undermine our research.
It’s sometimes said, mostly seriously, that Charles Darwin was the last true scientific expert. He knew everything that there was to know at the time about his field of natural history, in no small part due to his global fact-finding voyages and his network of scientific correspondents whom he would – to use his own words – ‘pester with letters’.5 Nowadays, an all-knowing expert like Darwin couldn’t exist in any scientific subject. That’s because we’re now drowning in scientific papers. A modern Darwin would have to keep up with 400,000 or so new studies published annually in the biological
...more
Why, you might wonder, do universities prioritise this publication-based measure over others that might have more to do with the quality of research – for example, whether a scientist’s work meet standards like randomisation or blinding, or even replicability? The answer is that they’re subject to financial pressures as well. In many countries, including the UK, universities themselves are ranked by the government on the prestige of the papers their academics produce, with taxpayers’ money divvied up accordingly.11 All of this is what gives rise to the clichéd phrase ‘publish or perish’: keep
...more
Having perpetually to seek funding is not just an expenditure of time: it leads to an enormous amount of failure and disappointment. The problem is compounded by what’s known as the Matthew Effect: when scientific grants are allocated, the already rich get richer. (The name is a reference to Matthew 25:29: ‘For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken away even that which he hath.’)14 There’s good evidence that this occurs: in one large study, scientists whose early-career grant applications were judged to be just above the
...more
In cognitive psychology experiments where participants must press a button as quickly as possible when, and only when, a light flashes, researchers talk about the ‘speed-accuracy trade-off’. When the subjects focus on haste, their accuracy suffers; when they focus on getting it right, they have to slow down. (And by the way, this is a nice straightforward psychological effect that really does replicate, in almost every relevant experiment.)19 It’s no different in scientific publishing.20 Time is finite. Pushing scientists to publish more and more papers and bring in more and more money –
...more
Professor Gao is the academic equivalent of the band Status Quo, who found success by churning out endless minor variations on the same couple of basic rock songs from the 1970s onwards. It’s very far indeed from Charles Darwin. Gao was engaging in a process that’s become known as ‘salami-slicing’: taking a set of scientific results, often from a single study, that could have been published together as one paper, and splitting them up into smaller sub-papers, each of which can be published separately.23 It’s the closest analogue to the disaster-clean-up drivers who loaded their trucks with wet
...more
Salami-slicing doesn’t in itself mean that the science contained in each of the slices is necessarily of poor quality (though the fact the researchers are willing to take advantage of the publication system so blatantly doesn’t exactly speak to their trustworthiness). However, some salami-slicing may have a more sinister purpose than mere CV-building. In clinical trials, it’s been alleged that pharmaceutical companies and other drug researchers use tactical salami-slicing to take advantage of the fact that readers aren’t paying full attention to every publication. Split up your study into
...more
The worst predatory journals will publish literally anything, even the most obvious hoax. In 2014, the computer scientist Peter Vamplew became so irritated by the constant stream of junk emails from the predatory journal International Journal of Advanced Computer Technology that he submitted a joke article entitled ‘Get Me Off Your Fucking Mailing List’. The paper consisted entirely of the sentence ‘Get me off your fucking mailing list’ repeated over eight hundred times (including a helpful flowchart figure with boxes and arrows portraying the message Get → me → off → Your → Fucking → Mail →
...more
The trifecta of salami-slicing, dodgy journals and peer-review fraud makes it clear that we shouldn’t be rating scientists on their total number of publications: quantity is far too easily gamed. One response to this has been instead to rate scientists according to the number of citations their papers receive. As we’ve just seen, this measure should give us a better indication of their actual contribution to science or to the community. In an extreme case, however, a scientist could have a single highly successful paper with thousands of citations, then follow it up with dozens of worthless
...more
As you might expect, a scientist’s h-index is often explicitly taken into account for hiring and promotion decisions. Scientists, then, have a strong incentive to procure citations, as well as to publish more and more papers that might be cited. Once again, well intentioned as the h-index is, the incentives it produces can lead to behaviours that cater to the system itself rather than to the goals of science.
A far more effective way to increase your citations, though, is simply to cite yourself. Self-citation, which one analysis found makes up around a third of all citations in the first three years after a paper’s publication, is a grey area.39 Science is incremental, and researchers work for many years on specific topics. It would be senseless to prevent them citing their own previous work when they’re taking the next step in their research programme. But some take it too far. The line between acceptable and problematic self-citation is often blurry, but some cases are clear cut.40
Robert Sternberg also engaged in a kind of hybrid of salami-slicing and self-citation: self-plagiarism. In new papers, he re-used chunks of text that he’d previously published elsewhere. You may wonder how one can self-plagiarise: isn’t the whole point of plagiarism that you steal ideas and phrasing from other people? Recycling text might be lazy, but at least it’s not increasing the number of bad or wrong ideas in the world. However, self-plagiarism breaks the author’s contract – sometimes a literal one in the case of copyright forms, but more importantly the metaphorical one with the reader
...more
This highlight has been truncated due to consecutive passage length restrictions.
So just like publication counts and h-indices, impact factors can be deliberately gamed. And as soon as scientists start to artificially inflate these numbers by self-citation, coercive citation and other suspect practices, they lose their meaning as measures of scientific quality. They begin to say less about which scientists and which journals are the best, and more about which have the most single-minded focus on boosting their metrics. It’s a clear example of Goodhart’s Law: ‘when a measure becomes the target, it ceases to be a good measure’.55 As we’ve seen, these measures have very much
...more
At first, having numbers that can quantify a scientist’s, or a journal’s, level of contribution might seem scientifically appealing: objective quantification, after all, is one of the unique strengths of science. But as Goodhart’s Law states, once you begin to chase the numbers themselves rather than the principles that they stand for – in this case, the principle of finding research that makes a big contribution to our knowledge – you’ve completely lost your way. The fact that these metrics aren’t just the preserve of individual scientists jockeying for status but are woven into the fabric of
...more
Is it right to cast the scientific publication system as some kind of ur-problem that underlies all of the above? Can we truly say that the perverse incentives created by prioritising publications, citations, and grant money have led directly to acts of fraud, bias, negligence, and hype? We can never know for sure what’s going through the mind of a scientist when they commit one of the problematic practices we’ve seen in the preceding chapters. But we can try to make an inference to the best explanation. Scientists are human, and humans respond to incentives. The problems in science that we’ve
...more
This highlight has been truncated due to consecutive passage length restrictions.
Perverse incentives work like an ill-tempered genie, giving you exactly what you asked for but not necessarily what you wanted. Incentivise publication quantity, and you’ll get it – but be prepared for scientists to have less time to check for mistakes, and for salami-slicing to become a norm. Incentivise publication in high-impact journals, and you’ll get it – but be prepared for scientists to use p-hacking, publication bias and even fraud in their attempts to get there. Incentivise competing for grant money, and you’ll get it – but be prepared for scientists to hype and spin their findings
...more
Knowing about the incentive problem doesn’t mean that scientists should be let off the hook when they engage in malpractice. We all feel the force of the incentives, but we still ought to do our best to defy them, for the sake of the science.63 It would be better, though, if we didn’t have to resist the crushing weight of the publish-or-perish system when trying to make discoveries about the world. It would be better if we could reach a happy medium where scientists are incentivized towards hard work and creativity, but also caution and rigour; where the bias is towards getting it right rather
...more
After this cycle of bias and spin, a mere five studies were left that were unambiguously negative – that is, ten times fewer than existed at the start. And as a final, rotten cherry on top, the positive trials ended up being cited by later research three times as often as the negative ones. The whole dispiriting step-by-step process is illustrated in Figure 4 below. This isn’t unique to research on antidepressants: the authors found a similar chain of events in trials of new psychotherapies.3 Indeed, the same thing is happening, to a greater or lesser extent, almost everywhere in scientific
...more
Whereas new, exciting results are the engine driving scientific progress, we’ve seen how an obsession with ‘groundbreaking’ results has led to entire fields of research being based on flimsy, unreplicable evidence. To paraphrase the biologist Ottoline Leyser, the point of breaking ground is to begin to build something; if all you do is groundbreaking, you end up with a lot of holes in the ground but no buildings.13
The remedy proposed most often is simply to abandon the idea of statistical significance. In 2019, over 850 scientists signed an open letter in Nature arguing just that. ‘It’s time,’ they wrote, ‘for statistical significance to go.’23 Rather than emphasising significance, they argued, researchers should be clearer about the uncertainty of their findings, reporting instead the margin of error around each number and generally having more humility about what can be derived from often-blurry statistical results.24 There’s a lot to be said for this, although it should be borne in mind that the most
...more
Orben and Przybylski found that there were a few analyses showing fairly substantial negative effects of screen time, some that showed no effect at all, and some that showed that screen time was actually beneficial. They took the average. It was negative, but very weak indeed, with screen time accounting for approximately 0.4 per cent of the variation in wellbeing. To put that into perspective, it’s around the same-sized correlation one finds between wellbeing and regularly eating potatoes, and smaller than the link between wellbeing and wearing glasses. So much for all the scare stories – the
...more
In addition to pre-registering the fact that a study will take place, researchers can also pre-register a detailed plan for how they intend to analyse the data. We’ve seen how it’s the unplanned nature of statistical analysis – the undisclosed flexibility – that can lead scientists down forking paths to results that are statistically significant (and publishable) but don’t actually correspond to reality. The idea of pre-registering your analyses is a scientific Ulysses pact: by posting a plan for your analysis somewhere public, you lash yourself to the mast and stop yourself giving in to the
...more
Not that pre-registration is a silver bullet. Many scientists who pre-register their study still fail to publish it (or at least report the results) in the time period that’s mandated by the trial registry. Others, despite pre-registration, still make changes to their analysis after it begins.46 In the case of clinical trials, the scientists aren’t just flouting best practice, they’re breaking the law – and yet one investigation by the journal Science in 2020 found that over 55 per cent of trials had their results reported late to the US government’s trial registry. Suffice to say, this isn’t
...more