More on this book
Community
Kindle Notes & Highlights
Read between
May 19 - May 29, 2023
From the method of moments, Pearson established four parameters for curve-fitting to show how the data clustered (the mean), how it spread (the standard deviation), if there were a loss of symmetry (skewness) and if the shape of the distribution were peaked or flat (kurtosis). These four parameters describe the essential characteristics of any distribution:
The measurement of variation is the lynchpin of mathematical statistics.
the interquartile range = Q3 – Q1 or 8 – 4 = 4.
remained a quick and easy way to hand-calculate an approximate estimate of variation,
the interquartile range is not influenced by outliers:
a mathematically perfect correlation does not mean causation: it simply means that two variables are very highly correlated. This may even be the result of a spurious or illusory correlation due to the influence of a third variable, called a “lurking variable”. While students’ university qualifications are highly correlated with their income later in life (the higher the grades, the higher the salary), this correlation could be due to a third (lurking or hidden) variable, such as the tendency to work hard.
regression to the mean refers to the tendency of a characteristic in a population to move away from extreme values and closer towards average values,
factor analysis, which reduces a set of complex data into a more manageable form that makes it possible to detect structures in the relationship between variables.
goodness of fit test, which allows the statistician to see how well the data matches or corresponds to a normal distribution.
While the normal distribution is used for continuous data that conforms to the symmetric, bell-shaped curves, the chi-square distribution can be used for discrete data that takes on any-shaped distribution, such as asymmetrical, binomial or Poisson distributions.
The first statistical quality control test for industry was devised by the statistician and chemist William Sealy Gosset, a master brewer at Guinness in the early years of the 20th century. Since Gosset was bound by his appointment not to publish under his own name (perhaps because Guinness didn’t want rival breweries to know that they were training some of their scientific staff in statistical theory), he adopted the pseudonym “Student”.
hops
Student’s z-ratio became the first statistical test for quality control in industry.
Fisher was so inspired by Gosset’s statistical test that in 1924 he transformed Gosset’s z-ratio and re-introduced it as “Student’s t-test”. Fisher then recalculated Gosset’s values from the z-table and replaced it with a t-table, which Fisher termed “Student’s t distribution”.
Fisher’s Statistical Analysis of Variance
analysis of variance (ANOVA)
analysis of covariance (ANCOVA)
Hypothesis testing
Estimation theory
Statistics, which use Roman letters such as , s, and r (for the mean, standard deviation and correlation, respectively), were developed primarily by Pearson. Parameters, denoted instead by Greek letters such as μ (mu), σ (“little sigma”) and p (rho), were introduced by Fisher in 1922 to estimate the mean, standard deviation and correlation, respectively, in populations.
statistics are to samples what parameters are to populations.
To make generalizations about a population, statistical information is taken from a representative sample. Every sample drawn from the population has its own statistic (, s, or r), which is used to estimate the parameter (μ, σ, or p) of its population. According to Fisher, a sample statistic should be an unbiased estimator of the corresponding population parameter. (Fisher created three other estimators for parameters, which had to be statistically consistent, efficient and sufficient.)
a population parameter is a way of summarizing a probability distribution, while a sample statistic is a way of summarizing a sample of observations.
The bureaucratic compilation of the vast amount of vital statistical data by mid-Victorian statisticians enabled them to deploy a statistical system to measure the health of the nation, which led to political reform and the creation of Public Health Acts in Britain.
This Darwinian framework prompted a reconceptualization of a new statistical methodology beginning with Francis Galton, whose interest in measuring individual differences brought variation into the forefront of statistics;
The first statistical quality control test for industry was devised by Pearson’s student William Sealy Gosset, whose work inspired Ronald Fisher to create a statistical system for the analysis of small samples, thereby introducing experimental design and randomization into statistical theory. Fisher’s development of inferential statistics inaugurated the second phase in the development of modern mathematical statistics.