More on this book
Kindle Notes & Highlights
by
Andy Field
Read between
January 9 - February 23, 2021
In other words, the only way to infer causality is through comparing two controlled situations: one in which the cause is present and one in which the cause is absent.
there are two ways to manipulate the independent variable. The first is to test different entities.
different groups of entities take part in each experimental condition (a between-groups, between-subjects or independent design).
manipulate the independent variable using the...
This highlight has been truncated due to consecutive passage length restrictions.
the unsystematic variation will be bigger than for a repeated-measures design.
Systematic variation: This variation is due to the experimenter doing something in one condition but not in the other condition. Unsystematic variation: This variation results from random factors that exist between the experimental conditions (such as natural differences in ability, the time of day, etc.).
repeated-measures design, differences between two conditions can be caused by only two things: (1) the manipulation that was carried out on the participants or (2) any other factor that might affect the way in which an entity performs from one time to the next.
By keeping the unsystematic variation as small as possible we get a more sensitive measure of the experimental manipulation. Generally, scientists use the randomization of entities to treatment conditions to achieve this goal.
Randomization is important because it eliminates most other sources of systematic variation, which allows us to be sure that any systematic variation between experimental conditions is due to the manipulation of the independent variable.
Practice effects: Participants may perform differently in the second condition because of familiarity with the experimental situation and/or the measures being used. Boredom effects: Participants may perform differently in the second condition because they are tired or bored from having completed the first condition.
we can ensure that they produce no systematic variation between our conditions by counterbalancing the order in which a person participates in a condition.
The final stage of the research process is to analyze the data you have collected.
Once you’ve collected some data a very useful thing to do is to plot a graph of how many times each score occurs. This is known as a frequency distribution or histogram, which is a graph plotting values of observations on the horizontal axis, with a bar showing how many times each value occurred in the data set.
if we drew a vertical line through the center of the distribution then it should look the same on both sides. This is known as a normal distribution and is characterized by the bell-shaped curve with which you might already be familiar.
There are two main ways in which a distribution can deviate from normal: (1) lack of symmetry (called skew) and (2) pointyness (called kurtosis).
Kurtosis, despite sounding like some kind of exotic disease, refers to the degree to which scores cluster at the ends of the distribution (known as the tails) and this tends to express itself in how pointy a distribution is (but there are other factors that can affect how pointy the distribution looks
A distribution with positive kurtosis has many scores in the tails (a so-called heavy-tailed distribution) and is pointy. This is known as a leptokurtic distribution.
a distribution with negative kurtosis is relatively thin in the tails (has light tails) and tends to be flatter than normal. This...
This highlight has been truncated due to consecutive passage length restrictions.
We can calculate where the center of a frequency distribution lies (known as the central tendency) using three measures commonly used: the mean, the mode and the median.
mode is the score that occurs most frequently in the data set.
Another way to quantify the center of a distribution is to look for the middle score when scores are ranked in order of magnitude. This is called the median.
The median is relatively unaffected by extreme scores at either end of the distribution:
median is also relatively unaffected by skewed distributions and can be used with ordinal, interval and ratio data (it cannot, however, be used with nominal data because these data have no numerical order).
mean is the measure of central tendency that you are most likely to have heard of because it is the average score, and the media love an average score.
one disadvantage of the mean: it can be influenced by extreme scores.
The easiest way to look at dispersion is to take the largest score and subtract from it the smallest score. This is known as the range of scores.
It’s worth noting here that quartiles are special cases of things called quantiles. Quantiles are values that split a data set into equal portions.
Quartiles are quantiles that split the data into four equal parts, but there are other quantiles such as percentiles (points that split the data into 100 equal parts), noniles (points that split the data into nine equal parts) and so on.
If we use the mean as a measure of the center of a distribution, then we can calculate the difference between each score and the mean, which is known as the deviance
We can use the sum of squares as an indicator of the total dispersion or total deviance of scores from the mean.
Therefore, it can be useful to work not with the total dispersion, but the average dispersion, which is also known as the variance.
The sum of squares, variance and standard deviation are all measures of the dispersion or spread of data around the mean. A small standard deviation (relative to the value of the mean itself) indicates that the data points are close to the mean. A large standard deviation (relative to the mean) indicates that the data points are distant from the mean.
Another way to think about frequency distributions is not in terms of how often scores actually occurred, but how likely it is that a score would occur (i.e., probability).
For any distribution of scores we could, in theory, calculate the probability of obtaining a score of a certain size – it would be incredibly tedious and complex to do it, but we could. To spare our sanity, statisticians have identified several common distributions. For each one they have worked out mathematical formulae (known as probability density functions, PDF) that specify idealized versions of these distributions.
the important thing to remember is that all of these distributions have something in common: they are all defined by an equation that enables us to calculate precisely the probability of obtaining a given score.
Scientists tell the world about our findings by presenting them at conferences and in articles published in scientific journals. A scientific journal is a collection of articles written by scientists on a vaguely similar topic.
Fitting models that accurately reflect the observed data is important to establish whether a hypothesis (and the theory from which it derives) is true.
To many students, statistics is a bewildering mass of different tests, each with their own set of equations. The focus is often on ‘difference’.
In doing so, I want to set the tone for you focusing on the similarities between statistical models rather than the differences. If your goal is to use statistics as a tool, rather than to bury yourself in the theory, then I think this approach makes your job a lot easier.
The mathematical form of the model changes, but it usually boils down to a representation of the relations between an outcome and one or more predictors.
we collect data from the real world to test predictions from our hypotheses about that phenomenon. Testing these hypotheses involves building statistical models of the phenomenon of interest.
Once the model has been built, it can be used to predict things about the real world:
It is important that the model accurately represents the real world, otherwise any conclusions she extrapolates to the real-world bridge will be meaningless.
Scientists do much the same: they build (statistical) models of real-world processes to predict how these processes operate under certain conditions
we don’t have access to the real-world situation and so we can only infer things about psychological, societal, biological or economic processes based upon the models we build.
our models need to be as accurate as possible so that the predictions we make about the real world are accurate too; the statistical model should represent the data collected (the observed data) as closely as possible. The degree to which a statistical m...
This highlight has been truncated due to consecutive passage length restrictions.
This equation means that the data we observe can be predicted from the model we choose to fit plus some amount of error..
No matter how long the equation that describes your model might be, you can just close your eyes, reimagine it as the word ‘model’ (much less scary) and think of the equation above: we predict an outcome variable from some model (that may or may not be hideously complex) but we won’t do so perfectly so there will be some error in there too.
it’s worth remembering that scientists are usually interested in finding results that apply to an entire population of entities.