Simpson’s Paradox: Numbers are stranger than we think

Dishonest statisticians can produce misleading data but some misleading data are the result of curious flukes of statistics. Simpson’s Paradox is one of these flukes:
Here’s an example. Baseball player Mickey has a better batting average than Babe in both April and May. So, in terms of batting average, Mickey is a better baseball player than Babe. Right?
No.
It turns out that Babe’s combined batting average for April and May can be higher than Mickey’s. In fact, Mickey can have a better batting average than Babe every month of the baseball season and Babe may still be a better hitter. How? That’s Simpson’s Paradox… Robert J. Marks, “Simpson’s Paradox: Big Data Can Lie” at Mind Matters
One outcome of Simpson’s Paradox is that machines cannot replace statisticians in analysing results. A great deal depends on interpretation, as Marks shows. “Clustering remains largely an art.”
Follow UD News at Twitter!
Also by Robert J. Marks: Things Exist That Are Unknowable: A tutorial on Chaitin’s number
See also: Too Big to Fail Safe? (cautions on overuse of Big Data in medicine)
and
Machines cannot take over Fundamental constraints in nature make nonsense of the claim. Great sci-fi plots though.
Copyright © 2019 Uncommon Descent . This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement UNLESS EXPLICIT PERMISSION OTHERWISE HAS BEEN GIVEN. Please contact legal@uncommondescent.com so we can take legal action immediately.
Plugin by Taragana
Michael J. Behe's Blog
- Michael J. Behe's profile
- 219 followers
