Discovering Statistics Using IBM SPSS Statistics: North American Edition
Rate it:
4%
Flag icon
In other words, the only way to infer causality is through comparing two controlled situations: one in which the cause is present and one in which the cause is absent.
4%
Flag icon
there are two ways to manipulate the independent variable. The first is to test different entities.
4%
Flag icon
different groups of entities take part in each experimental condition (a between-groups, between-subjects or independent design).
4%
Flag icon
manipulate the independent variable using the...
This highlight has been truncated due to consecutive passage length restrictions.
4%
Flag icon
the unsystematic variation will be bigger than for a repeated-measures design.
4%
Flag icon
Systematic variation: This variation is due to the experimenter doing something in one condition but not in the other condition. Unsystematic variation: This variation results from random factors that exist between the experimental conditions (such as natural differences in ability, the time of day, etc.).
5%
Flag icon
repeated-measures design, differences between two conditions can be caused by only two things: (1) the manipulation that was carried out on the participants or (2) any other factor that might affect the way in which an entity performs from one time to the next.
5%
Flag icon
By keeping the unsystematic variation as small as possible we get a more sensitive measure of the experimental manipulation. Generally, scientists use the randomization of entities to treatment conditions to achieve this goal.
5%
Flag icon
Randomization is important because it eliminates most other sources of systematic variation, which allows us to be sure that any systematic variation between experimental conditions is due to the manipulation of the independent variable.
5%
Flag icon
Practice effects: Participants may perform differently in the second condition because of familiarity with the experimental situation and/or the measures being used. Boredom effects: Participants may perform differently in the second condition because they are tired or bored from having completed the first condition.
5%
Flag icon
we can ensure that they produce no systematic variation between our conditions by counterbalancing the order in which a person participates in a condition.
5%
Flag icon
The final stage of the research process is to analyze the data you have collected.
5%
Flag icon
Once you’ve collected some data a very useful thing to do is to plot a graph of how many times each score occurs. This is known as a frequency distribution or histogram, which is a graph plotting values of observations on the horizontal axis, with a bar showing how many times each value occurred in the data set.
5%
Flag icon
if we drew a vertical line through the center of the distribution then it should look the same on both sides. This is known as a normal distribution and is characterized by the bell-shaped curve with which you might already be familiar.
5%
Flag icon
There are two main ways in which a distribution can deviate from normal: (1) lack of symmetry (called skew) and (2) pointyness (called kurtosis).
5%
Flag icon
Kurtosis, despite sounding like some kind of exotic disease, refers to the degree to which scores cluster at the ends of the distribution (known as the tails) and this tends to express itself in how pointy a distribution is (but there are other factors that can affect how pointy the distribution looks
5%
Flag icon
A distribution with positive kurtosis has many scores in the tails (a so-called heavy-tailed distribution) and is pointy. This is known as a leptokurtic distribution.
5%
Flag icon
a distribution with negative kurtosis is relatively thin in the tails (has light tails) and tends to be flatter than normal. This...
This highlight has been truncated due to consecutive passage length restrictions.
5%
Flag icon
We can calculate where the center of a frequency distribution lies (known as the central tendency) using three measures commonly used: the mean, the mode and the median.
5%
Flag icon
mode is the score that occurs most frequently in the data set.
5%
Flag icon
Another way to quantify the center of a distribution is to look for the middle score when scores are ranked in order of magnitude. This is called the median.
5%
Flag icon
The median is relatively unaffected by extreme scores at either end of the distribution:
5%
Flag icon
median is also relatively unaffected by skewed distributions and can be used with ordinal, interval and ratio data (it cannot, however, be used with nominal data because these data have no numerical order).
5%
Flag icon
mean is the measure of central tendency that you are most likely to have heard of because it is the average score, and the media love an average score.
5%
Flag icon
one disadvantage of the mean: it can be influenced by extreme scores.
5%
Flag icon
The easiest way to look at dispersion is to take the largest score and subtract from it the smallest score. This is known as the range of scores.
5%
Flag icon
It’s worth noting here that quartiles are special cases of things called quantiles. Quantiles are values that split a data set into equal portions.
5%
Flag icon
Quartiles are quantiles that split the data into four equal parts, but there are other quantiles such as percentiles (points that split the data into 100 equal parts), noniles (points that split the data into nine equal parts) and so on.
5%
Flag icon
If we use the mean as a measure of the center of a distribution, then we can calculate the difference between each score and the mean, which is known as the deviance
5%
Flag icon
We can use the sum of squares as an indicator of the total dispersion or total deviance of scores from the mean.
5%
Flag icon
Therefore, it can be useful to work not with the total dispersion, but the average dispersion, which is also known as the variance.
5%
Flag icon
The sum of squares, variance and standard deviation are all measures of the dispersion or spread of data around the mean. A small standard deviation (relative to the value of the mean itself) indicates that the data points are close to the mean. A large standard deviation (relative to the mean) indicates that the data points are distant from the mean.
5%
Flag icon
Another way to think about frequency distributions is not in terms of how often scores actually occurred, but how likely it is that a score would occur (i.e., probability).
6%
Flag icon
For any distribution of scores we could, in theory, calculate the probability of obtaining a score of a certain size – it would be incredibly tedious and complex to do it, but we could. To spare our sanity, statisticians have identified several common distributions. For each one they have worked out mathematical formulae (known as probability density functions, PDF) that specify idealized versions of these distributions.
6%
Flag icon
the important thing to remember is that all of these distributions have something in common: they are all defined by an equation that enables us to calculate precisely the probability of obtaining a given score.
6%
Flag icon
Scientists tell the world about our findings by presenting them at conferences and in articles published in scientific journals. A scientific journal is a collection of articles written by scientists on a vaguely similar topic.
7%
Flag icon
Fitting models that accurately reflect the observed data is important to establish whether a hypothesis (and the theory from which it derives) is true.
7%
Flag icon
To many students, statistics is a bewildering mass of different tests, each with their own set of equations. The focus is often on ‘difference’.
7%
Flag icon
In doing so, I want to set the tone for you focusing on the similarities between statistical models rather than the differences. If your goal is to use statistics as a tool, rather than to bury yourself in the theory, then I think this approach makes your job a lot easier.
7%
Flag icon
The mathematical form of the model changes, but it usually boils down to a representation of the relations between an outcome and one or more predictors.
7%
Flag icon
Standard error Parameters Interval estimates (confidence intervals) Null hypothesis significance testing Estimation
lyn ₊˚.⋆☾⋆⁺₊✧
Spine of statistics.
7%
Flag icon
we collect data from the real world to test predictions from our hypotheses about that phenomenon. Testing these hypotheses involves building statistical models of the phenomenon of interest.
7%
Flag icon
Once the model has been built, it can be used to predict things about the real world:
7%
Flag icon
It is important that the model accurately represents the real world, otherwise any conclusions she extrapolates to the real-world bridge will be meaningless.
7%
Flag icon
Scientists do much the same: they build (statistical) models of real-world processes to predict how these processes operate under certain conditions
7%
Flag icon
we don’t have access to the real-world situation and so we can only infer things about psychological, societal, biological or economic processes based upon the models we build.
7%
Flag icon
our models need to be as accurate as possible so that the predictions we make about the real world are accurate too; the statistical model should represent the data collected (the observed data) as closely as possible. The degree to which a statistical m...
This highlight has been truncated due to consecutive passage length restrictions.
7%
Flag icon
This equation means that the data we observe can be predicted from the model we choose to fit plus some amount of error..
7%
Flag icon
No matter how long the equation that describes your model might be, you can just close your eyes, reimagine it as the word ‘model’ (much less scary) and think of the equation above: we predict an outcome variable from some model (that may or may not be hideously complex) but we won’t do so perfectly so there will be some error in there too.
7%
Flag icon
it’s worth remembering that scientists are usually interested in finding results that apply to an entire population of entities.