
How to Read Numbers: A Guide to Stats in the News (and Knowing When to Trust Them)
by Tom Chivers and David Chivers
News stories often cite misleading statistics. This book seeks to inform news consumers how to critically examine the claims. It also includes a “statistical style guide” to help journalists report numbers more responsibly.
Simpson’s paradox. “The median wage of every single educational group went down, and yet the median wage of the population as a whole went up. So what’s going on? … Even though the median wage of people with degrees went down, the number of people with degrees went up considerably… This is called Simpson’s paradox… It doesn’t just apply to medians—it can happen with means as well.”
“The trouble is that, in Simpson’s-paradox situations, you can use the same data to tell diametrically opposed stories, depending on what political point you want to make. The honest thing to do is to explain that the paradox is present.”
Sample sizes. “As a study increases in size, all else being equal, your confidence in it should increase… The smaller the effect an intervention has, the more people you need to look at to investigate it.”
Statistical significance. “‘Statistical significance’ is a measure of how likely you are to see something by fluke, not how important it is.”
“Let’s suppose that, looking at the data, it seems that people who have read this book did better on the test…How do we know that they did better because of some real difference, not just random variation? … First, we imagine the results we’d expect to see if the book had no effect whatsoever. This is called the ‘null hypothesis’. The other possibility—that the book does have some positive effect—is called your ‘alternative hypothesis.’”
“Scientists measure the chances of coincidence with something called the probability value, or ‘p-value.’ The more unlikely something is to happen by random chance, the lower the p-value is: so if there’s only a 1-in-100 chance that you’d see a result at least that extreme if there was no effect, that would be written as p=0.01, or one divided by 100. (What that doesn’t mean…is that there is only a 1-in-100 chance that the result is wrong.)”
“In many parts of science, there’s a convention that if p is equal to or less than 0.05—if you’d expect to see results that extreme no more than 5% of the time—then the finding is ‘statistically significant,’ meaning that you can reject the null hypothesis.”
“Statistical significance is confusing… A 2002 study found that 100% of psychology undergraduates misunderstand significance—as, even more shockingly, did 90% of their lecturers. And another study found that 25 of 28 psychology textbooks they looked at contained at least one error in their definition of statistical significance.” Yikes!
Confounding Variable. “On days when ice cream sales go up, so do drownings. But obviously ice cream doesn’t make people drown. Instead, ice cream sales go up on hot days, because ice cream is nice on a hot day; and so is swimming, which unfortunately leads to some people drowning. Once you adjust for temperature—or ‘control’ for it, in statistical language—the link goes away. So if you were to look at ice cream sales and drownings just on cold days, or just on hot days, you wouldn’t see a link.”
Collider Bias. “A collider… is the opposite of a confounder: where a confounder causes both the variables you’re looking at, the two things you’re looking at both cause a collider. So where controlling for a confounder helps reduce bias, controlling for (or selecting on) a collider can introduce it.”
“If you were to conduct a study looking at the link between food poisoning and influenza, and you controlled for whether or not people have a fever, it could look as though children who had food poisoning are less likely to have the flu—that food poisoning protects against influenza somehow.”
Causality. “Most observational studies can simply tell you whether two or more numbers tend to go up and down at around the same time. They can tell you about correlation, but not about causation… This is not always clear in newspaper reporting, however.”
“So how do we go about working out whether one thing caused another thing? Ideally, we use something called a randomized controlled trial [RCT]… Of course, you can’t always do an RCT. Sometimes it’s impractical or unethical—you can’t study the effects of smoking on children by giving 500 of them a pack of Embassy No.1s a day for 10 years and comparing it to a control group… Instead, you can try to look for ‘natural’ experiments—places where groups have been separated at random for other purposes.”
Is that a Big Number? “Often, though, numbers in the news are presented without the context that you need to work out whether it’s a big number or not. The most important piece of context is the denominator.”
Absolute vs Relative Risk. “CNN reported that bacon increases your risk of bowel cancer: the more you eat, the higher your risk, with the risk ‘rising 20% with every 25 grams of processed meat people ate per day.”
“Say you’re a man in the UK. Your background risk of bowel cancer is about 7%. You eat an extra rasher of bacon every day (about 25g). That puts your risk up by 20%. But remember—that’s 20% of 7%, which is 1.4%. So it goes from 7% to 8.4%. If you’re unwary, or unused to dealing with percentages, you might think it goes up by 20 percentage points, or up to 27%. But it doesn’t.”
“Scientific journals, university press offices, and the media all need to establish, as an immutable rule, that risk should be presented in absolute, not just relative terms.”
Has What We’re Measuring Changed? “In 2000, the US Centers for Disease Control and Prevention estimated that about one in every 150 children had autism-spectrum disorder; by 2006 that figure was one in 54.”
“Suddenly we have a simple explanation for why autism diagnoses went up so much: the term ‘autism’ changed its meaning several times, expanding to include more people. Plus, as the condition became more widely known both by parents and doctors, and as meaningful ways of improving the lives of autistic children became available, more children were screened to see whether they met the criteria.”
Is it Representative of the Literature? “Imagine that eating fish fingers slightly reduces the risk of snoring… Let’s say that there have been lots of different studies into whether or not fish fingers affect snoring. And let’s say that, while some of the studies are quite small, they’ve all been perfectly well conducted, and there’s no publication bias, p-hacking, or other dodgy statistical practice going on.”
“What we’d expect is that the average finding of the studies would be that fish-finger-eaters snore slightly less. But any individual study could end up returning slightly different results. If the studies truly are unbiased, then you’d expect them to cluster around the true effect in a normal distribution. Some will be higher, some lower, most of them about right.”
“The virtuous thing to do is to try to work out what all these studies are clustering around: what the average result is. This is why people do literature reviews at the beginning of academic papers—to put their results in the context of the scientific literature as a whole. Sometimes researchers do meta-analyses—academic papers which go through all the existing literature and try to synthesize the results. If there have been enough studies, and if there isn’t a systematic bias either in the research or the publication process (as we’ve mentioned, two very big ifs), then hopefully the aggregated result will give you a good idea of what the true effect is.”
“This is how science advances, at least in theory. Each time a new study comes out, it gets added to the pile; it’s a new set of data points which—hopefully, on average—will bring the consensus of scientific understanding closer to the underlying reality.”
“Be wary when you see something, especially something health-and-lifestyle-related, which includes the phrase ‘new study says.’”
Demand for Novelty. “If one study finds that psychic powers are real, and 99 find that they aren’t, then you can probably write off the one outlier as a fluke… For that to work, though, it is vital that all the studies performed on a subject are published. But—because scientific journals want to publish scientific articles that are interesting—that doesn’t happen.”
“The Bem study, in fact, caused a huge ruction in psychology, because researchers realized that they had to accept one of two unacceptable truths: either psychic powers were real or the experimental and statistical methods that underpin psychological science were capable of churning out meaningless nonsense.”
“The demand for novelty leads to a fundamental problem in science called publication bias.”
“More than thirty years ago, the researcher R.J. Simes noted that published cancer studies which were registered in advance (registering studies in advance means they can’t so easily be quietly filed away if they didn’t find anything) were much less likely to return positive results than studies which weren’t, suggesting that a lot of the unregistered studies were not being published.”
“There are noble steps to reduce this problem in science. The most promising are so-called ‘Registered Reports’ (RRs), in which journals agree to publishing studies on the basis of the methods, and then will do so regardless of the results, avoiding publication bias. One study compared standard psychological research to RRs in psychology and found that while 96% of standard papers returned a positive result, only 44% of RRs did, suggesting a huge problem. RRs are catching on fast, and hopefully will become mainstream soon.”
Cherry-picking. “A Sunday Times front-page story in 2019 declared that the ‘suicide rate among teenagers has nearly doubled in eight years.’ … It measured from 2010, which was the lowest year for teen suicides in England and Wales on record. Literally any year, measured from 2010 would have shown a rise (or you could pick any year before 2010, and show a decline).”
“Cherry-picking your start and end points is an example of what is known as ‘hypothesizing after results are known’, or HARKing. It means getting your data, and then going through it to find exciting things. In noisy datasets, such as those on climate change or suicides, you’ll find natural variation—things will go up and down for no particular reason.”
“Taking in the wider picture helps, as does checking to see whether there are clear trends, or if it’s just a noisy, wobbly line.”
Confidence Interval. “If you report that the Office for Budget Responsibility’s model says that the economy will grow 2.4% next year, that sounds accurate and scientific. But if you don’t mention that their 95% uncertainty interval is between -1.1% and +5.9%, then you’ve given a spurious sense of precision.”
Other topics include: sampling bias, leading questions, effect size, forecasting and Brier scores, Goodhardt’s law, survivorship bias, the Texas sharpshooter fallacy, and Bayes’ Theorem (conditional probability).
Statistical Style Guide. “Here is our statistical style guide for numerically responsible journalists.
- Put numbers into context..
- Give absolute risk, not just relative…
- Check whether the study you’re reporting on is fair representation of the literature…
- Give the sample size of the study—and be wary of small samples…
- Be aware of problems that science is struggling with, like p-hacking and publication bias…
- Don’t report forecasts as single numbers. Give the confidence interval and explain it…
- Be careful about saying or implying that something causes something else…
- Be wary of cherry-picking and random variation…
- Beware of rankings…
- Always give your sources…
- If you get it wrong, admit it.”
Tom Chivers has written a more recent book called Everything Is Predictable: How Bayesian Statistics Explain Our World (2024). This hyperbolic title takes me aback, especially since How to Read Numbers is about sniffing out misleading statistics. As Peter Lynch wrote in Beating the Street, “Nobody can predict interest rates, the future direction of the economy, or the stock market. Dismiss all such forecasts and concentrate on what’s actually happening to the companies in which you’ve invested.”
Chivers, Tom and David Chivers. How to Read Numbers: A Guide to Statistics in the News (and Knowing When to Trust Them). London: Weidenfeld & Nicolson, 2021. Buy from Amazon.com
Disclosure: As an Amazon Associate I earn from qualifying purchases.
Note: I have modified the spelling to appease my American spellchecker.
RELATED READING:
- Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics by Gary Smith (2014)
- The Halo Effect and the Eight Other Business Delusions That Deceive Managers by Phil Rosenzweig (2014)
- How to Lie with Statistics by Darrell Huff (1954)
Discover more from The Key Point
Subscribe to get the latest posts sent to your email.