Simpson’s Paradox: Numbers are stranger than we think





Dishonest statisticians can produce misleading data but some misleading data are the result of curious flukes of statistics. Simpson’s Paradox is one of these flukes:





Here’s an example. Baseball player Mickey has a better batting average than Babe in both April and May. So, in terms of batting average, Mickey is a better baseball player than Babe. Right?


No.


It turns out that Babe’s combined batting average for April and May can be higher than Mickey’s. In fact, Mickey can have a better batting average than Babe every month of the baseball season and Babe may still be a better hitter. How? That’s Simpson’s Paradox… Robert J. Marks, “Simpson’s Paradox: Big Data Can Lie” at Mind Matters





One outcome of Simpson’s Paradox is that machines cannot replace statisticians in analysing results. A great deal depends on interpretation, as Marks shows. “Clustering remains largely an art.”





Follow UD News at Twitter!





Also by Robert J. Marks: Things Exist That Are Unknowable: A tutorial on Chaitin’s number





See also: Too Big to Fail Safe? (cautions on overuse of Big Data in medicine)





and





Machines cannot take over Fundamental constraints in nature make nonsense of the claim. Great sci-fi plots though.


Copyright © 2019 Uncommon Descent . This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at is guilty of copyright infringement UNLESS EXPLICIT PERMISSION OTHERWISE HAS BEEN GIVEN. Please contact legal@uncommondescent.com so we can take legal action immediately.
Plugin by Taragana
 •  0 comments  •  flag
Share on Twitter
Published on April 16, 2019 04:58
No comments have been added yet.


Michael J. Behe's Blog

Michael J. Behe
Michael J. Behe isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Michael J. Behe's blog with rss.