Introducing Statistics: A Graphic Guide (Graphic Guides)
Rate it:
Open Preview
54%
Flag icon
Nominal variables include nearly all demographic variables such as religious affiliation, political persuasion and socio-economic status.
54%
Flag icon
Ordinal variables are simply ordered and then named. The Mohs Scale, devised by the German mineralogist Friedrich Mohs in 1822, is an example of an ordinal scale.
55%
Flag icon
The American psychologist Stanley Smith Stevens (1906–73) made a further sub-division with “continuous variables” in 1947 when he introduced ratio and interval scales of measurement (most of Pearson’s continuous variables were ratio). Stevens proposed the following: 1. Ratio scales These differ from interval variables (see here) in two ways: a) an absolute zero indicates the absence of the property being measured (i.e. height, weight and blood pressure) and b) ratio scales are additive.
55%
Flag icon
2. Interval scales The zero point is arbitrary and does not reflect the absence of an attribute (such as 0 Celsius and 0 Fahrenheit readings).
56%
Flag icon
Correlation one of the most widely used statistical methods, indicates the extent to which two variables go together (e.g., height and weight). The most common type measures a linear relationship between two variables, and refers to how well they go together in a straight line. But not every pair of characters or variables can be assessed by using a statistical correlation, and different methods of correlation are used within the biological, medical, behavioural, social and environmental sciences, as well as in industry, commerce, economics and education. Different types of correlational ...more
57%
Flag icon
Francis Galton was the first person to come up with a method to measure correlation when he created a graph to find a relationship between mother and daughter sweet peas. Until Galton invented the idea of correlation, causation was the primary way in which two related events were explained, especially in the physical sciences.
58%
Flag icon
Hence, a mathematically perfect correlation does not mean causation: it simply means that two variables are very highly correlated. This may even be the result of a spurious or illusory correlation due to the influence of a third variable, called a “lurking variable”. While students’ university qualifications are highly correlated with their income later in life (the higher the grades, the higher the salary), this correlation could be due to a third (lurking or hidden) variable, such as the tendency to work hard.
58%
Flag icon
Correlation is often depicted graphically on something called a scatter diagram to see what shape produces. If two variables produce a narrow ellipse that resembles a straight line, this would indicate a high correlation. A full-size ellipse reveals a moderate correlation, whereas a circle indicates no correlation in this way, correlation measures the strength (high, medium or low) of the relationship.
59%
Flag icon
Correlation cannot, however, be transformed to a percentage. Thus, a moderate correlation of 0.55 or a high correlation of 0.80 is not equivalent to 55% or 80%, as some people erroneously believe.
59%
Flag icon
The numerical index that correlation yields also measures the direction of the relationship. Either two variables move up or down the graph together (e.g., height and weight in healthy infants goes up together) or one variable moves up while the other moves down (e.g., the faster one travels in a car, the sooner the destination is reached: speed increases as time decreases). The former produces a positive or direct correlation while the latter yields a negative or inverse correlation.
59%
Flag icon
Though the numerical index provides some information about the degree of a linear relationship, a scatter plot is a useful tool, because it may reveal instead a curvilinear relationship. Pearson introduced the correlation ratio in 1905 to measure a curvilinear relationship.
60%
Flag icon
Galton measured the diameter and weight of thousands of mother and daughter sweet pea seeds in 1875, and found that the population of the offspring reverted towards the parents and followed the normal distribution. As the size of the mother pea seed increased, so did the size of the daughter pea seed, but the offspring was not as big or as small as the mother pea; it therefore regressed back towards the size of its “ancestor pea”.
60%
Flag icon
Regression to the Mean This refers to the tendency of a characteristic in a population to move away from the extreme value and closer to the average values.
62%
Flag icon
At the end of the 19th century, Pearson’s student George Udny Yule (1871–1951) introduced a novel approach to interpreting correlation and regression with a conceptually new use of the method of least squares, which is a mathematical tool that reduces the influence of errors when fitting a regression line to a set of data points.
62%
Flag icon
Using the method of least squares, a regression analysis allows statisticians to estimate the response variable “Y” (the dependent variable or the one being manipulated) from a specified variable “X” (the independent variable or the one being studied).
63%
Flag icon
Although the method of least squares may be used to analyse regression lines, much of the confusion surrounding regression to the mean can be attributed to those who forget that Galton’s regression to the mean involves two regression lines and not simply one regression line to be used to predict future outcomes by using the method of least squares.
63%
Flag icon
Though Galton wanted to measure the correlation of stature between father and son, Pearson discovered in 1896 that Galton’s procedure for finding “co-relation”, as he spelt it, measured the slope of the regression line, which was a measure of the regression coefficient instead.
64%
Flag icon
The covariance, ∑(xy) is a measure of how much the deviations of two random variables move together.
65%
Flag icon
In 1925, R.A. Fisher (1890–1962) reconstructed Pearson’s notation, introducing Y = a + bX (the general equation for a straight line) and incorporating the terms “dependent” variable and “independent” variable. This was an essential distinction to make for regression, because the independent variable is the predictor and the dependent variable is the criterion. Fisher then produced the equation for the regression (or predicted) line: Y’ = a + bX (where b is the regression coefficient and Y’, pronounced “Y prime”, indicates a regression line).
65%
Flag icon
Pearson introduced the term simple correlation when measuring a linear relationship between two continuous variables only, such as the relationship between stature of father and stature of son.
65%
Flag icon
This work provided the basis for the development of multiple regression. Like simple regression, it involves a linear prediction, but rather than using only one variable to be “predicted”, a collection of variables can be used instead.
66%
Flag icon
To calculate the multiple correlation coefficient, Pearson introduced a higher form of mathematics. This played a pivotal role in the professionalization of mathematical statistics as an academic discipline at the end of the 19th century. Pearson learnt this type of mathematics at Cambridge from J.J. Sylvester and Arthur Cayley (1821–95), who had created matrix algebra out of their discovery of the theory of invariants during the mid-19th century.
66%
Flag icon
This higher level of mathematics enabled statisticians to find complex mathematical solutions for statistical problems in a multivariate (or p-dimensional) space when a bivariate (or two-dimensional) system was insufficient.
66%
Flag icon
Scientists can use two types of control when undertaking research: experimental and statistical control.
66%
Flag icon
Pearson offered one way to statistically control certain variables in 1895 with part correlation, which is used with multiple correlation only and thus involves three or more variables. It is the correlation between the dependent variable and one of the independent variables after the researcher statistically removes the influence of one of the other independent variables from the first independent variable. Thus, the researcher can mathematically isolate the variable when it cannot be experimentally isolated. The statistician is essentially treating the item as if one of the variables doesn’t ...more
67%
Flag icon
George Udny Yule later introduced partial correlation, in which the statistician removes the effects of one or more of the independent variables from both the dependent and one of the other independent variables. Partial correlation helps to identify spurious correlations
67%
Flag icon
Pearson introduced two new methods in 1900: the tetrachoric (i.e. “four-fold”) correlation coefficient (rt); and his phi coefficient (φ), known later as “Pearson’s phi coefficient” for discrete variables. Both methods measure the association between two variables, designed for 2 × 2 (or four-fold) tables, which can be placed into two mutually exclusive categories, (called “dichotomous” variables).
68%
Flag icon
Pearson’s phi coefficient was designed for two variables where a true dichotomy exists and thus the variables are not continuous. This technique is commonly used by psychometricians for test-construction in situations where a true dichotomy exists, such as “true” or “false” test items, and by epidemiologists who use it to assess a risk factor associated with the “presence” or “absence” of a disease against the incidence of mortality.
68%
Flag icon
Yule proposed the Q statistic which he named for Quetelet, in 1899 (one month after Pearson introduced the phi coefficient and tetrachoric correlation). Yule was also looking for a measure that didn’t rely on continuous variables or depend on an underlying normal distribution, as was the case with the Pearson product-moment correlation.
69%
Flag icon
Pearson devised the biserial correlation in 1909. This is related to the product-moment correlation (in which both variables are continuous), with one difference.
69%
Flag icon
The point-biserial correlation is related to Pearson’s biserial correlation, but one variable is continuous and the other is a “true dichotomy”, such as male/female. This is an estimate of what the product-moment correlation would be if the dichotomous variable were replaced by a continuous variable instead.
70%
Flag icon
Rank order correlation is the study of relationships between different rankings on the same set of items. It deals with measuring correspondence between two rankings, and assessing the statistical significance of this. Two of the main methods were devised by Charles Spearman (1863–1945, a student of Karl Pearson) and Maurice Kendall. Three other tests include the Wilcoxon signed-rank test, Mann-Whitney U test and the Kruskal-Wallis analysis of ranks.
71%
Flag icon
Spearman was also influenced by Galton’s ideas of measuring individual differences in human abilities and by his early ideas on intelligence testing. Using Pearson’s product-moment correlation and the principal components method* that Pearson introduced in 1901, Spearman created a new statistical method, known as factor analysis, which reduces a set of complex data into a more manageable form that makes it possible to detect structures in the relationship between variables.
71%
Flag icon
The English statistician Maurice Kendall (1907–83) created another ranking method of correlation in 1938, known as Kendall’s tau. This method is a scheme based on the number of agreements or disagreements in ranked data.
« Prev 1 2 Next »