lyn ₊˚.⋆☾⋆⁺₊✧’s Kindle Notes & Highlights for Discovering Statistics Using IBM SPSS Statistics: North American Edition

The hypothesis or prediction from your theory would normally be that an effect will be present. This hypothesis is called the alternative hypothesis and is denoted by H1.

9%

There is another type of hypothesis called the null hypothesis, which is denoted by H0. This hypothesis is the opposite of the alternative hypothesis and so usually states that an effect is absent.

9%

Alternative hypothesis: if you imagine eating chocolate you will eat less of it. Null hypothesis: if you imagine eating chocolate you will eat the same amount as normal.

9%

The null hypothesis is useful because it gives us a baseline against which to evaluate how plausible our alternative hypothesis is.

9%

When we collect data to test theories we work in these terms: we cannot talk about the null hypothesis being true or the experimental hypothesis being true, we can talk only in terms of the probability of obtaining a particular result or statistic if, hypothetically speaking, the null hypothesis were true.

9%

Hypotheses can be directional or non-directional.

9%

directional hypothesis states that an effect will occur, but it also states the direction of the effect.

9%

non-directional hypothesis states that an effect will occur, but it doesn’t state the direction of the effect.

9%

NHST is a system designed to tell us whether the alternative hypothesis is likely to be true – it helps us to decide whether to confirm or reject our predictions.

9%

the process starts with a research hypothesis that generates a testable prediction. These predictions are decomposed into a null (there is no effect) and alternative hypothesis (there is an effect). At this point you decide upon the long-run error rate that you are prepared to accept, alpha (α).

9%

This is the significance level, the probability of accepting an effect in our population as true, when no such effect exists

9%

You should determine your error rate based on the nuances of your research area, and what it is you’re trying to test. Put another way, it should be a meaningful decision. In reality, it is not: everyone uses 0.05 (a 5% error rate) with barely a thought for what it means or why they’re using it.

10%

You fit the statistical model that tests your hypothesis to the data.

10%

It is important that you collect the amount of data that you set out to collect, otherwise the p-value you obtain will not be correct.

10%

If you cut data collection short (or extend it) for this sort of arbitrary reason, then whatever p-value you end up with is certainly not the one you want. Again, it’s cheating: you’re changing your team after you have placed your bet, and you will likely end up with research egg on your face when no one can replicate your findings.

10%

Having hopefully stuck to your original sampling frame and obtained the appropriate p-value, you compare it to your original alpha value (usually 0.05). If the p you obtain is less than or equal to the original α, scientists typically use this as grounds to reject the null hypothesis outright; if the p is greater than α, then they accept that the null hypothesis is plausibly true

10%

Systematic variation is variation that can be explained by the model that we’ve fitted to the data (and, therefore, due to the hypothesis that we’re testing). Unsystematic variation is variation that cannot be explained by the model that we’ve fitted.

10%

it is error or variation not attributable to the effect we’re investigating. The simplest way, therefore, to test whether the model fits the data or whether our hypothesis is a good explanation of the data we have observed, is to compare the systematic variation against the unsystematic variation.

10%

best way to test a parameter is to look at the size of the parameter relative to the background noise (the sampling variation) that produced it.

10%

The ratio of effect relative to error is a test statistic,

10%

if our model is good then we’d expect it to be able to explain more variance than it can’t explain. In this case, the test statistic will be greater than 1 (but not necessarily significant). Similarly, larger parameters (bigger effects) that are likely to represent the population (smaller sampling variation) will produce larger test statistics.

10%

test statistic is a statistic for which we know how frequently different values occur.

10%

if a test statistic comes from one of these distributions we can calculate the probability of obtaining a certain value

10%

Given that the statistical model that we fit to the data reflects the hypothesis that we set out to test, then a significant test statistic tells us that the model would be unlikely to fit this well if the there was no effect in the population (i.e., the null hypothesis was true).

10%

A statistical model that tests a directional hypothesis is called a one-tailed test, whereas one testing a non-directional hypothesis is known as a two-tailed test.

10%

Imagine we wanted to discover whether reading this book increased or decreased the desire to kill me.

Bruh, you are already on the list of authors I want to see dead.

10%

if we make a specific prediction then we need a smaller test statistic to find a significant result (because we are looking in only one tail of the distribution), but if our prediction happens to be in the wrong direction then we won’t detect the effect that does exist.

10%

if the result of a one-tailed test is in the opposite direction to what you expected, you cannot and must not reject the null hypothesis.

10%

one-tailed tests encourage cheating. If you do a two-tailed test and find that your p is 0.06, then you would conclude that your results were not significant

10%

if we find a two-tailed p that is just non-significant, we might be tempted to pretend that we’d always intended to do a one-tailed test because our ‘one-tailed’ p-value is significant.

10%

A Type I error occurs when we believe that there is a genuine effect in our population, when in fact there isn’t.

10%

The opposite is a Type II error, which occurs when we believe that there is no effect in the population when, in reality, there is.

10%

There is a trade-off between these two errors: if we lower the probability of accepting an effect as genuine (i.e., make α smaller) then we increase the probability that we’ll reject an effect that does genuinely exist (because we’ve been so strict about the level at which we’ll accept that an effect is genuine).

10%

The exact relationship between the Type I and Type II error is not straightforward because they are based on different assumptions: to make a Type I error there must be no effect in the population, whereas to make a Type II error the opposite is true

10%

This error rate across statistical tests conducted on the same data is known as the familywise or experimentwise error rate.

10%

To combat this build-up of errors, we can adjust the level of significance for individual tests such that the overall Type I error rate (α) across all comparisons remains at 0.05.

10%

This method is known as the Bonferroni correction, because it uses an inequality described by Carlo Bonferroni, but despite the name its modern application to confidence intervals can be attributed to Olive Dunn

10%

The ability of a test to find an effect is known as its statistical power

11%

statistical powder, which is an illegal substance that makes you better understand statistics). The power of a test is the probability that a given test will find an effect assuming that one exists in the population.

11%

Calculate the power of a test: Given that we’ve conducted our experiment, we will have already selected a value of α, we can estimate the effect size based on our sample data, and we will know how many participants we used.

11%

Calculate the sample size necessary to achieve a given level of power: We can set the value of α and 1 − β to be whatever we want (normally, 0.05 and 0.8, respectively).

11%

if you find a non-significant effect then you didn’t have enough power, if you found a significant effect, then you did.

11%

What do we mean by moderate overlap? Cumming (2012) defines it as half the length of the average margin of error (MOE). The MOE is half the length of the confidence interval (assuming it is symmetric), so it’s the length of the bar sticking out in one direction from the mean.

11%

First, the sample size affects whether a difference between samples is deemed significant or not. In large samples, small differences can be significant, and in small samples large differences can be non-significant.

11%

large samples have more power to detect effects.

11%

Second, even a difference of practically zero can be deemed ‘significant’ if the s...

This highlight has been truncated due to consecutive passage length restrictions.

11%

The standard error is estimated from the sample size, and the bigger the sample size, the smaller the standard error. Therefore, bigger samples have less ‘noise’, so even a tiny signal can be detected.

19%

Although it is true that statisticians need all the friends they can get, Rosenthal didn’t mean that: he was urging researchers not to rush data analysis.

19%

The first thing is to look at a graph; for data this is the equivalent of a profile picture.

19%

What makes a good profile picture? Lots of people seem to think that it’s best to jazz it up: have some impressive background location, strike a stylish pose, mislead people by inserting some status symbols that you’ve borrowed, adorn yourself with eye-catching accessories, wear your best clothes to conceal the fact you usually wear a onesie, look like you’re having the most fun ever so that people think your life is perfect.