Discovering Statistics Using IBM SPSS Statistics: North American Edition
Rate it:
9%
Flag icon
The hypothesis or prediction from your theory would normally be that an effect will be present. This hypothesis is called the alternative hypothesis and is denoted by H1.
9%
Flag icon
There is another type of hypothesis called the null hypothesis, which is denoted by H0. This hypothesis is the opposite of the alternative hypothesis and so usually states that an effect is absent.
9%
Flag icon
Alternative hypothesis: if you imagine eating chocolate you will eat less of it. Null hypothesis: if you imagine eating chocolate you will eat the same amount as normal.
9%
Flag icon
The null hypothesis is useful because it gives us a baseline against which to evaluate how plausible our alternative hypothesis is.
9%
Flag icon
When we collect data to test theories we work in these terms: we cannot talk about the null hypothesis being true or the experimental hypothesis being true, we can talk only in terms of the probability of obtaining a particular result or statistic if, hypothetically speaking, the null hypothesis were true.
9%
Flag icon
Hypotheses can be directional or non-directional.
9%
Flag icon
directional hypothesis states that an effect will occur, but it also states the direction of the effect.
9%
Flag icon
non-directional hypothesis states that an effect will occur, but it doesn’t state the direction of the effect.
9%
Flag icon
NHST is a system designed to tell us whether the alternative hypothesis is likely to be true – it helps us to decide whether to confirm or reject our predictions.
9%
Flag icon
the process starts with a research hypothesis that generates a testable prediction. These predictions are decomposed into a null (there is no effect) and alternative hypothesis (there is an effect). At this point you decide upon the long-run error rate that you are prepared to accept, alpha (α).
9%
Flag icon
This is the significance level, the probability of accepting an effect in our population as true, when no such effect exists
9%
Flag icon
You should determine your error rate based on the nuances of your research area, and what it is you’re trying to test. Put another way, it should be a meaningful decision. In reality, it is not: everyone uses 0.05 (a 5% error rate) with barely a thought for what it means or why they’re using it.
10%
Flag icon
You fit the statistical model that tests your hypothesis to the data.
10%
Flag icon
It is important that you collect the amount of data that you set out to collect, otherwise the p-value you obtain will not be correct.
10%
Flag icon
If you cut data collection short (or extend it) for this sort of arbitrary reason, then whatever p-value you end up with is certainly not the one you want. Again, it’s cheating: you’re changing your team after you have placed your bet, and you will likely end up with research egg on your face when no one can replicate your findings.
10%
Flag icon
Having hopefully stuck to your original sampling frame and obtained the appropriate p-value, you compare it to your original alpha value (usually 0.05). If the p you obtain is less than or equal to the original α, scientists typically use this as grounds to reject the null hypothesis outright; if the p is greater than α, then they accept that the null hypothesis is plausibly true
10%
Flag icon
Systematic variation is variation that can be explained by the model that we’ve fitted to the data (and, therefore, due to the hypothesis that we’re testing). Unsystematic variation is variation that cannot be explained by the model that we’ve fitted.
10%
Flag icon
it is error or variation not attributable to the effect we’re investigating. The simplest way, therefore, to test whether the model fits the data or whether our hypothesis is a good explanation of the data we have observed, is to compare the systematic variation against the unsystematic variation.
10%
Flag icon
best way to test a parameter is to look at the size of the parameter relative to the background noise (the sampling variation) that produced it.
10%
Flag icon
The ratio of effect relative to error is a test statistic,
10%
Flag icon
if our model is good then we’d expect it to be able to explain more variance than it can’t explain. In this case, the test statistic will be greater than 1 (but not necessarily significant). Similarly, larger parameters (bigger effects) that are likely to represent the population (smaller sampling variation) will produce larger test statistics.
10%
Flag icon
test statistic is a statistic for which we know how frequently different values occur.
10%
Flag icon
if a test statistic comes from one of these distributions we can calculate the probability of obtaining a certain value
10%
Flag icon
Given that the statistical model that we fit to the data reflects the hypothesis that we set out to test, then a significant test statistic tells us that the model would be unlikely to fit this well if the there was no effect in the population (i.e., the null hypothesis was true).
10%
Flag icon
A statistical model that tests a directional hypothesis is called a one-tailed test, whereas one testing a non-directional hypothesis is known as a two-tailed test.
10%
Flag icon
Imagine we wanted to discover whether reading this book increased or decreased the desire to kill me.
lyn ₊˚.⋆☾⋆⁺₊✧
Bruh, you are already on the list of authors I want to see dead.
10%
Flag icon
if we make a specific prediction then we need a smaller test statistic to find a significant result (because we are looking in only one tail of the distribution), but if our prediction happens to be in the wrong direction then we won’t detect the effect that does exist.
10%
Flag icon
if the result of a one-tailed test is in the opposite direction to what you expected, you cannot and must not reject the null hypothesis.
10%
Flag icon
one-tailed tests encourage cheating. If you do a two-tailed test and find that your p is 0.06, then you would conclude that your results were not significant
10%
Flag icon
if we find a two-tailed p that is just non-significant, we might be tempted to pretend that we’d always intended to do a one-tailed test because our ‘one-tailed’ p-value is significant.
10%
Flag icon
A Type I error occurs when we believe that there is a genuine effect in our population, when in fact there isn’t.
10%
Flag icon
The opposite is a Type II error, which occurs when we believe that there is no effect in the population when, in reality, there is.
10%
Flag icon
There is a trade-off between these two errors: if we lower the probability of accepting an effect as genuine (i.e., make α smaller) then we increase the probability that we’ll reject an effect that does genuinely exist (because we’ve been so strict about the level at which we’ll accept that an effect is genuine).
10%
Flag icon
The exact relationship between the Type I and Type II error is not straightforward because they are based on different assumptions: to make a Type I error there must be no effect in the population, whereas to make a Type II error the opposite is true
10%
Flag icon
This error rate across statistical tests conducted on the same data is known as the familywise or experimentwise error rate.
10%
Flag icon
To combat this build-up of errors, we can adjust the level of significance for individual tests such that the overall Type I error rate (α) across all comparisons remains at 0.05.
10%
Flag icon
This method is known as the Bonferroni correction, because it uses an inequality described by Carlo Bonferroni, but despite the name its modern application to confidence intervals can be attributed to Olive Dunn
10%
Flag icon
The ability of a test to find an effect is known as its statistical power
11%
Flag icon
statistical powder, which is an illegal substance that makes you better understand statistics). The power of a test is the probability that a given test will find an effect assuming that one exists in the population.
11%
Flag icon
Calculate the power of a test: Given that we’ve conducted our experiment, we will have already selected a value of α, we can estimate the effect size based on our sample data, and we will know how many participants we used.
11%
Flag icon
Calculate the sample size necessary to achieve a given level of power: We can set the value of α and 1 − β to be whatever we want (normally, 0.05 and 0.8, respectively).
11%
Flag icon
if you find a non-significant effect then you didn’t have enough power, if you found a significant effect, then you did.
11%
Flag icon
What do we mean by moderate overlap? Cumming (2012) defines it as half the length of the average margin of error (MOE). The MOE is half the length of the confidence interval (assuming it is symmetric), so it’s the length of the bar sticking out in one direction from the mean.
11%
Flag icon
First, the sample size affects whether a difference between samples is deemed significant or not. In large samples, small differences can be significant, and in small samples large differences can be non-significant.
11%
Flag icon
large samples have more power to detect effects.
11%
Flag icon
Second, even a difference of practically zero can be deemed ‘significant’ if the s...
This highlight has been truncated due to consecutive passage length restrictions.
11%
Flag icon
The standard error is estimated from the sample size, and the bigger the sample size, the smaller the standard error. Therefore, bigger samples have less ‘noise’, so even a tiny signal can be detected.
19%
Flag icon
Although it is true that statisticians need all the friends they can get, Rosenthal didn’t mean that: he was urging researchers not to rush data analysis.
19%
Flag icon
The first thing is to look at a graph; for data this is the equivalent of a profile picture.
19%
Flag icon
What makes a good profile picture? Lots of people seem to think that it’s best to jazz it up: have some impressive background location, strike a stylish pose, mislead people by inserting some status symbols that you’ve borrowed, adorn yourself with eye-catching accessories, wear your best clothes to conceal the fact you usually wear a onesie, look like you’re having the most fun ever so that people think your life is perfect.