More on this book
Community
Kindle Notes & Highlights
There are two types: vital statistics and mathematical statistics. Vital statistics is what most people understand by statistics. It is used as a plural noun and refers to an aggregate set of data.
This process is primarily concerned with average values, and uses life tables, percentages, proportions and ratios: probability is most commonly used for actuarial (i.e. life-insurance) purposes. It was not until the 20th century that the singular form “statistic”, signifying an individual fact, came into use.
Mathematical statistics is used as a singular noun, and it arose out of the mathematical theory of probability in the late 18th century from the work of such continental mathematicians as Jacob Bernoulli, Abraham DeMoivre, Pierre-Simon Laplace and Carl Friedrich Gauss.
Mathematical statistics encompasses a scientific discipline that analyses variation, and is often underpinned by matrix algebra. It deals with the collection, classification, description and interpretation of data from social surveys, scientific experiments and clinical trials. Probability is used for statistical tests of significance.
Used in this sense, statistics is a technical discipline, and while it is mathematical, it is essential to understand the statistical concepts underlying the mathematical procedures.
The decision to examine averages or to measure variation is rooted in philosophical ideologies that governed the thinking of statisticians, natural philosophers and scientists throughout the 19th century. The emphasis on statistical averages was underpinned by the philosophical tenets of determinism and typological ideas of biological species, which helped to perpetuate the idea of an idealized mean. Determinism implies that there is order and perfection in the universe …
The typological concept of species, which was the dominant thinking of taxonomists,* typologists and morphologists until the end of the 19th century, gave rise to the morphological concept of species. Species were thought to have represented an ideal type.
Malthus believed that populations would increase exponentially (2, 4, 8, 16, 32, etc.), whereas food supplies would increase mathematically (2, 4, 6, 8, 10, etc.). Malthus’ hypothesis implied that the actual population would always have a tendency to push above the food supply.
Demography began as the numerical study of poverty.
How did 19th-century statisticians reduce data to something more manageable? While data was summarized in diagrams and tables, until the end of the 19th century the two main statistical tools were probability and averages.
Probability is one of the oldest statistical concepts: notions of probability were used as a tool to solve problems in games of chance beginning in the 14th century.
There are different approaches to probability: 1. Subjective 2. Games of chance 3. Mathematical 4. Relative frequency 5. Bayesian
With six main probability distributions: 1. Binomial distribution 2. Poisson distribution 3. Normal distribution 4. Chi-square distribution 5. t distribution 6. F distribution
There are two types of statistical distributions: probability distributions, which describe the possible events in a sample and the frequency with which each will occur; and frequency distributions
Statisticians use probability distributions to interpret the results from a set of data that has been analysed by using various statistical methods. Frequency distributions transform very large groups of numbers into a more manageable form and show how frequently a particular item or unit in a group occurs.
Variables are characteristics of an individual or a system that can be measured or counted. These can vary over time or between individuals.
The Subjective approach to probability involves a degree of rational belief.
Gaming theory is assessed through a betting scheme based on what the person thinks the probability of some outcome will be. The idea is to locate probability where it should be in the mind of the observer, not in the outside world. The problem is that people with equal knowledge and skills can come to different answers.
Relative frequency is an approach that makes it possible to make formal probability statements (P, A) about uncertain events, where “P” is the probability of an uncertain event “A”. Thus, the probability of an event happening is the proportion of times that events of the same kind will appear in the long run.
Bayes’ theorem is a formula that shows how existing beliefs, formally expressed as probability distributions, are modified by new information.
The Binomial distribution is a discrete probability distribution and represents the probability of two outcomes, which may or may not occur. It describes the possible number of times that a particular event will occur in a sequence of observations. For example, it will give the probability of obtaining five tails when tossing ten coins.
The binomial distribution models experiments in which a repeated binary outcome is counted. Each binary outcome is called a “Bernoulli trial”. The binomial distribution (p + q)n is determined by the number of observations n, and the probability of occurrence, denoted by p + q (the two possible outcomes). This provides a model for various probabilities of outcomes that can occur. To determine the probability of each outcome, the binomial distribution has to be expanded by the number of observations – by raising p + q to the nth power.
The Poisson distribution, discovered by Siméon-Denis Poisson (1781–1840), is a discrete probability distribution used to describe the occurrence of unlikely events in a large number of independent repeated trials. The Poisson is a good approximation to the binomial distribution when the probability is small and the number of trials is large.
The analysis of mortality statistics often employs Poisson distributions on the assumption that deaths from most diseases occur independently and at random in populations.
The Normal distribution is a continuous distribution, and is related to the binomial. As n approaches infinity, the binomial will approach the normal distribution as its limit. That is, as the binomial connects an infinite number of infinitesimal little bars the binomial will become the normal distribution.
It is also known as the normal curve, sometimes (inaccurately) referred to as the Gaussian distribution, and has long been used as a yardstick to compare other types of statistical distributions. It plays a vital role in modern statistics because it enables statisticians to interpret their data by using various statistical methods, which are quite often modelled on the normal distribution.
The French mathematician and astronomer Pierre-Simon Laplace (1749–1827) was responsible for advancing probability as a tool for the reduction and measurement of uncertainty in data. By 1789 he realized that measurements were affected by a number of independent small errors, and showed that the law of error could be derived mathematically. Following this, he made his most important contribution to statistics through his work on the Central Limit Theorem in 1810.
Or as statisticians would say: the sampling distribution of means gets closer and closer to the normal curve as the sample size increases, despite any departure from normality in the population distribution.
The mathematical underpinnings of this theorem state that data which are influenced by a very large number of many small and unrelated random effects will be approximately normally distributed.
The normal curve has three mathematical properties: 1. It is a bell-shaped symmetrical curve, which is continuous and ranges from negative infinity to positive infinity.
2. The mean (see here) and the standard deviation (see here) define its shape; the theoretical normal distribution has a population mean of zero and a standard deviation of 1. Different standard deviations will produce slightly different shapes. The mean is the placement of the distribution on the X axis and variability, which shows how scores scatter or spread. In these figures, the mean is in the same location but curve B has more variability than curve A.
3. The skewness of the normal curve is zero, because it is symmetric around the mean. If the distribution were skewed to the left side, a measure of skewness would produce a negative value; if skewed to the right, this would result in positive value. The direction of the tail indicates if it is positively or negatively skewed.
From the method of moments, Pearson established four parameters for curve-fitting to show how the data clustered (the mean), how it spread (the standard deviation), if there were a loss of symmetry (skewness) and if the shape of the distribution were peaked or flat (kurtosis). These four parameters describe the essential characteristics of any distribution: the system is parsimonious and elegant. These statistical tools are essential for interpreting any set of statistical data, whatever shape the distribution takes.
By calculating the method of moments, Pearson also provided a variety of theoretical curves in varying graduations, which could then be superimposed onto an empirical curve to determine which gave the best “fit”. These curves were referred to as the “Pearsonian family of curves”.
Type III, the Gamma Curve, which he went on to use for finding the exact chi-square distribution (discussed later) Type IV, the Family of Asymmetric Curves (created for Weldon’s data) Type V, the Normal Curve Type VII, now known as Student’s distribution for t-tests (examined later)
The statistician begins by looking for overall patterns of variation and any striking deviation from that pattern.
The measurement of variation is the lynchpin of mathematical statistics. Galton devised the first measure of statistical variation in 1875 when he introduced the “semi-interquartile range”, which he expressed as: A quartile is a point on the distribution.
Like the semi-interquartile range, the interquartile range is not influenced by outliers:
In his first Gresham Lectures on statistics in 1892, Pearson introduced the range, which is the simplest method used to measure variation. The range measures the distance between the largest and smallest values from a particular set of measurements and gives an idea of the spread of the data.
The virtue of the range is its simplicity, but it’s the least reliable measure of variation, as it doesn’t use all data and is also affected by outliers.
Pearson introduced the standard deviation in his Gresham lecture of 31 January 1893, referring to it initially as the “standard divergence”. John Venn had used the term “divergence” a couple of years earlier when referring to deviation. The standard deviation is a measure of variation. It indicates how widely or closely spread the values are in a set of a data, and shows how much each of these individual values deviate from the average (i.e. the mean).
The covariance is the measure of how much two random variables move together. If two variables tend to move together in the same direction, then the covariance between the two variables will be positive. If two variables move in the opposite direction, the covariance will be negative. If there is no tendency for two variables to move one way or the other, then the covariance will be zero.
A large standard deviation (relative to the value of the mean) shows that the frequency distribution is widely spread out from the mean, whereas a small standard deviation indicates that it lies closely concentrated near the mean, with little variability between one observation and another. Although the standard deviation indicates to what extent the whole group deviates from the mean, it does not show how variable a particular group is.
While the standard deviation is a practical measure of variation, the variance is used for theoretical work, especially with the analysis of variance
The variance is also a measure of variation, but it is used for random variables and indicates the extent to which its values are spread around the expected values.*
Since the standard deviation doesn’t show the range of variation within a group, how did Pearson determine how variable a particular group was and how to make comparisons with other groups with widely different means? For this he needed a different statistical method.
Pearson thought the best way to compare deviations in heights of men and women was to alter the deviations in the same ratio. The use of the standard deviation alone, which could be measured in centimetres or inches, would most likely show that men are taller on average, since they would have a higher mean value, but this wouldn’t answer the question: “Who shows more variation within their group?” Pearson devised the coefficient of variation to measure this. This was important to Pearson when he was trying to determine how variable some of Weldon’s prawns and crabs were.
Pearson came up with his new method by expressing the standard deviation as a percentage of the arithmetic mean. The coefficient of variation is a relative measure of variation, whereas the standard deviation is an absolute measure of variation. As Pearson stressed, one has to remember that the relative size influences not only the means, but also the deviation from the mean.
The coefficient of variation has no units, so one can use it to compare variation for different variables with different units.
Pearson first encountered variables that could not be treated as continuous when he began to look at the inheritance of eye colour in man and colour of coat in horses and dogs. In these situations, the only form of classification available for the variables is one that involves “counting”, rather than “measuring”: eye colour cannot be measured in the same way that stature, weight or time can be measured. Pearson referred to such variables as eye colour as nominal.