More on this book
Community
Kindle Notes & Highlights
Read between
April 26 - June 15, 2021
In macroeconomics (research on, for example, tax policies and how they affect countries’ economic growth), a re-analysis of sixty-seven studies could only reproduce the results from twenty-two of them using the same datasets,
that systematically assesses the quality of medical treatments. Of those, a startling 45 per cent conclude that there’s insufficient evidence to decide whether the treatment in question works or not.
the amount spent every year on low-quality studies that can’t be replicated has been calculated at $28 billion in the US alone
The glowing media coverage of Hwang had been so intense that not even this egregious scientific fraud was enough to put off his admirers.
why did he think he could get away with such blatant, careless fraud? The answer speaks not just to his own (lack of) character, but to something broken about the scientific system.
the system is largely built on trust:
If this kind of fraud occurs at the very highest levels of science, it suggests that there’s much more of it that flies under the radar, in less well-known journals.
Bik calculated that there are up to 35,000 papers in the literature that need to be retracted.
‘man is an orderly animal’ who ‘finds it very hard to imitate the disorder of nature’,
Once again, it was not just the reviewers’ desire for attractive and exciting findings that had been exploited, but also their trusting nature. It’s unavoidable that some trust will be involved in the peer review process: reviewers can’t be expected to double-check every single datapoint for signs of tampering. But the lesson of the data fraud stories is that the bar might be set rather too low for true organised scepticism to be taking place. For the sake of the science, it might be time for scientists to start trusting each other a little less.
18,000 retractions in the scientific literature since the 1970s
Among retractions in general, honest mistakes only make up around 40 per cent or less of the total. The majority are due to some form of immoral behaviour, including fraud (around 20 per cent), duplicate publication and plagiarism.
2 per cent of individual scientists are responsible for 25 per cent of all retractions.
that 1.97 per cent of scientists admit to faking their data at least once.
asked scientists how many other researchers they know who have falsified data, the figure jumped up to 14.1 per cent
the most important incentive to scientific fraud is a passionate belief in the truth and significance of a theory or hypothesis which is disregarded or frankly not believed by the majority of scientists – colleagues who must accordingly be shocked into recognition of what the offending scientist believes to be a self-evident truth.’
seen his rule-breaking as a necessary evil in his quest to bring them to the world’s attention.
By now, most people know that Wakefield’s findings have been discredited. Since 1998, there have been several large-scale, rigorous studies showing no relation between the MMR vaccine (or any other vaccine) and autism spectrum disorder.
‘I suspect that unconscious or dimly perceived finagling, doctoring, and massaging are rampant, endemic, and unavoidable in a profession that awards status and power for clean and unambiguous discovery’.
Different fields of science had different positivity levels. The lowest rate, though still high, at 70.2 per cent, was space science; you may not be surprised to discover that the highest was psychology/psychiatry, with positive studies making up 91.5 per cent of publications.
the proportion of positive results in the literature isn’t just high, it’s unrealistically high.
the p-value has a notoriously tricky definition. A recent audit found that a stunning 89 per cent of a sample of introductory psychology textbooks got the definition wrong;
But a 2014 survey of reviews in top medical journals found that 31 per cent of meta-analyses didn’t even check for it. (Once it was properly checked for, 19 per cent of those meta-analyses indicated that publication bias was indeed present.)
41 per cent of the studies that were completed found strong evidence for their hypothesis, 37 per cent had mixed results, and 22 per cent were null.
65 per cent of studies with null results had never even been written up in the first place, let alone sent off to a journal.
The second option involves taking an existing dataset, running lots of ad hoc statistical tests on it with no specific hypothesis in mind, then simply reporting whichever effects happen to get p-values below 0.05. The scientist can then declare, often perhaps convincing even themselves, that they’d been searching for these results from the start.49 This latter type of p-hacking is known as HARKing, or Hypothesising After the Results are Known.
How common are the sorts of analytic biases that pervaded Wansink’s work, and that undermined the study on power posing? In 2012, a poll of over 2,000 psychologists asked them if they had engaged in a range of p-hacking practices.66 Had they ever collected data on several different outcomes but not reported all of them? Approximately 65 per cent said they had. Had they excluded particular datapoints from their analysis after peeking at the results? 40 per cent said yes. And around 57 per cent said they’d decided to collect further data after running their analyses – and presumably finding them
...more
whereas the true fraudster knows that they’re acting unethically, everyday researchers who p-hack often don’t.
Even without the trial-and-error of classic p-hacking, then, scientists who don’t come to their data with a proper plan can end up analysing themselves into an unreplicable corner.
If you hide the full extent of your testing from the reader, they won’t be on their guard for the upwardly creeping risk of false positives.
Across all the papers, there were 354 outcomes that simply disappeared between registration and publication (it’s safe to assume that most of these had p-values over 0.05), while 357 unregistered outcomes appeared in the journal papers ex nihilo.
You might wonder how doctors and their patients are supposed to trust a medical literature that’s permeated with bias in this way, outside of a minority of clear, well-replicated findings. My response is that I have no idea.
But the stories of bullying and intimidation that result when researchers challenge the amyloid hypothesis hint at a field where bias has become collective, where new ideas aren’t given the hearing they deserve, and where scientists routinely fail to apply the norm of organised scepticism to their own favoured theories.
‘Glasgow Effect’. This is the phenomenon whereby people from Glasgow, and Scotland more generally, die younger on average than those in other similar cities or countries, even after accounting for levels of poverty and deprivation.
Furthermore, Lewis and his team argued that Morton simply hadn’t manipulated the sample groupings (omitting to mention groups from non-White races with high average skull sizes) in the way Gould had charged. In fact, Lewis and colleagues alleged that Gould made his own mistakes, splitting up Morton’s sample in ways that suited his preferred beliefs about the equality of the skull sizes.
the watchers must also be watched and the debunkers debunked.
Nearly half of the papers that included relevant statistics had at least one numerical inconsistency.
13 per cent had a Reinhart-Rogoff-style serious mistake that might have completely changed the interpretation of their results (for example, flipping a statistically significant p-value into a non-significant one, or vice versa).
Brown and Heathers used the GRIM test to check a selection of seventy-one published psychology papers and found that half of them reported at least one impossible number, while 20 per cent contained several.
26 per cent of psychologists were willing to send their data to other researchers upon an email request, and similarly dismal figures come from other fields.
Nature Reviews Cancer put it bluntly: ‘The scientific community has failed to tackle this problem and consequently thousands of misleading and potentially erroneous papers have been published using cell lines that are incorrectly identified.’
32,755 papers that used so-called impostor cells, and over 500,000 papers that cited those contaminated studies.26 (Because so many cell lines studied by scientists are of cancer cells, the field of research with the biggest number of contaminated papers, unsurprisingly, was oncology.)
Large-scale reviews have also found that underpowered research is rife in medical trials, biomedical research more generally, economics, brain imaging, nursing research, behavioural ecology and – quelle surprise – psychology.
In a low-powered study, we can only find a positive result – a significant p-value – if the sample shows an unusually and spuriously large effect.
At the risk of sounding tautological: since underpowered studies only have the power to detect large effects, those are the only effects they see.
Reading through the candidate gene literature is, in hindsight, a surreal experience: they were building a massive edifice of detailed studies on foundations that we now know to be completely false. As Scott Alexander of the blog Slate Star Codex put it: ‘This isn’t just an explorer coming back from the Orient and claiming there are unicorns there. It’s the explorer describing the life cycle of unicorns, what unicorns eat, all the different subspecies of unicorn, which cuts of unicorn meat are tastiest, and a blow-by-blow account of a wrestling match between unicorns and Bigfoot.’
When I’ve made comments about low statistical power in scientific seminars, I’ve often heard replies along the lines of ‘my students need to publish papers to be competitive on the job market, and they can’t afford to do large-scale studies. They need to make do with what they have.’ This is a prime example of well-intentioned individual scientists being systematically encouraged – some would argue, forced – to accept compromises that ultimately render their work unscientific.
The following tale of alien encounters is true. And by true, I mean false. It’s all lies. But they’re entertaining lies. And in the end, isn’t that the real truth? The answer is no. Leonard Nimoy, The Simpsons
unwarranted advice: press releases gave recommendations for ways readers should change their behaviour – for example, telling them to engage in a specific kind of exercise – that was more simplistic or direct than the results of the study could support.
As a matter of factual accuracy, the sheer density of misleading statements in a book that sold in such numbers should keep us all awake at night.