More on this book
Community
Kindle Notes & Highlights
Read between
January 12, 2021 - December 8, 2023
a confidence interval is the range of population parameters for which our observed statistic is a plausible consequence.
A hypothesis can be defined as a proposed explanation for a phenomenon.
term apophenia to describe the capacity to see patterns where they do not exist, and it has been suggested that this tendency might even confer an evolutionary advantage – those ancestors who ran away from rustling in the bushes without waiting to find out whether it was definitely a tiger may have been more likely to survive.
while this attitude may be fine for hunter-gatherers, it cannot work in science – indeed, the whole scientific process is undermined if claims are just figments of our imagination. There must be a way of protecting us against false discoveries, and hypothesis testing attempts to fill that role.
A P-value is the probability of getting a result at least as extreme as we did, if the null hypothesis (and all other modelling assumptions) were really true.
the approach to statistical inference in which probability is used not only for aleatory uncertainty, but also epistemic uncertainty about unknown facts. Bayes’ theorem is then used to revise these beliefs in the light of new evidence.
Central Limit Theorem: the tendency for the sample mean of a set of random variables to have a normal sampling distribution, regardless (with certain exceptions) of the shape of the underlying sampling distribution of the random variable. If n independent observations each have mean μ and variance σ2, then under broad assumptions their sample mean is an estimator of μ, and has an approximately normal distribution with mean μ, variance σ2/n, and standard deviation (also known as the standard error of the estimator).
interactions: when multiple explanatory variables combine to produce an effect different from that expected from their individual contributions.
Law of Large Numbers: the process by which the sample mean of a set of random variables tends towards the population mean.
machine learning: procedures for extracting algorithms, say for classification, prediction or clustering, from complex data.
prosecutor’s fallacy: when a small probability of the evidence, given innocence, is mistakenly interpreted as the probability of innocence, given the evidence.
signal and the noise: the idea that observed data arises from two components: a deterministic signal which we are really interested in, and random noise that comprises the residual error. The challenge of statistical inference is to appropriately identify the two, and not be misled into thinking that noise is actually a signal.
Simpson’s paradox: when an apparent relationship reverses its sign when a confounding variable is taken into account.
skewed distribution: when a sample or population distribution is highly asymmetric, and has a long left- or right-hand tail. This might typically occur for variables such as income and sales of books, when there is extreme inequality. Standard measures (such as means) and standard deviations can be very misleading for such distributions.
the ‘law of the transposed conditional’, which sounds delightfully obscure, but simply means that the probability of A given B is confused with the probability of B given A.