More on this book
Kindle Notes & Highlights
Read between
May 29 - June 29, 2023
For those of you who are unfamiliar with this often-used term, a hypothesis is basically “an educated guess.” Its most important role is to reflect the general problem statement or question that was the motivation for asking the research question in the first place. That’s why taking the care and time to formulate a really precise and clear research question is so important. This research question will guide your creation of a hypothesis, and in turn, the hypothesis will determine the techniques you will use to test it and answer the question that was originally asked.
This is an important distinction, because while hypotheses usually describe a population, hypothesis testing deals with a sample and then the results are generalized to the larger
population. We also address the two main types of hypotheses (the null hypothesis and the research hypothesis).
Given the constraints of never enough time and never enough research funds, with which almost all scientists live, the next best strategy is to take a portion of a larger group of participants and do the research with that smaller group. In this context, the larger group is referred to as a population, and the smaller group selected from that population is referred to as a sample. Statistics
as a field, in fact, is all about looking at a sample and inferring to the population it represents. Indeed, the word statistic technically means a number that describes a sample (and the word we use for a number that describes a population is parameter). A measure of how well a sample approximates the characteristics of a population is called sampling error. Sampling error is basically the difference between the values of the sample statistic and the population parameter. The higher the sampling error, the less precision you have in sampling, and the more difficult it will be to make the case
...more
This highlight has been truncated due to consecutive passage length restrictions.
It’s easy to equate “big” with “representative.” Keep in mind that it is far more important to have an accurately representative sample than it is to have a big sample (people often think that big is better—only true on Thanksgiving, by the way). Having lots and lots of participants in a sample may be very impressive, but if the participants do not represent the larger population, then the research will have little value.
Okay. So we have a sample of participants selected from a population, and to begin the test of our research hypothesis, we first formulate the null hypothesis. The null hypothesis is an interesting little creature. If it could talk, it would say something like “I represent no relationship between the variables that you are studying.”
What these four null hypotheses have in common is that they all contain a statement that two or more things are equal or unrelated (that’s the “no difference” and “no relationship” part) to each other.
First, the null hypothesis acts as a starting point because it is the state of affairs that is accepted as true in the absence of any other information.
You might speculate as to why one group might outperform another, using theory or common sense, but if you have no evidence a priori (“from before”), then what choice do you have but to assume that they are equal? This lack of a relationship as a starting point is a hallmark of this whole topic. Until you prove that there is a difference, you
have to assume that there is no difference. And a statement of no difference or no relationship is exactly what the null hypothesis is all about. Such a statement ensures that (as members of the scientific community) we are starting on a level playing field with no bias toward one or the other direction as to how the test of our hypothesis will turn out. Furthermore, if there are any differences between these two groups, then you have to assume that these differences are due to the most attractive explanation for differences between any groups on any variable—chance! That’s right: Given no
...more
And, by the way, you might find it useful to think of chance as being somewhat equivalent to the idea of error. When we can control sources of error, the likelihood that we can offer a meaningful explanation for some outcome increases. The second purpose of the null hypothesis is to provide a benchmark against which observed outcomes can be compared to see if these differences are due to some other factor. The null hypothesis helps to define a range within which any observed differences between groups can be attributed to chance (which is the null hypothesis’s contention) or are due to
...more
Whereas a null hypothesis is usually a statement of no relationship between variables or that a certain value is zero, a research hypothesis is usually a definite statement that a relationship exists between variables.
Each of these four research hypotheses has one thing in common: They are all statements of inequality. They posit a relationship between variables and not an equality, as the null hypothesis does. The nature of this inequality can take two different forms—a directional or a nondirectional research hypothesis. If the research hypothesis posits no direction to the inequality (such as only saying “different from”), the hypothesis is a nondirectional research hypothesis. If the research hypothesis posits a direction to the inequality (such as “more than” or “less than”), the research hypothesis is
...more
A nondirectional research hypothesis reflects a difference between groups, but the direction of the difference is not specified.
A directional research hypothesis reflects a difference between groups, and the direction of the difference is specified.
Another way to talk about directional and nondirectional hypotheses is to talk about one- and two-tailed tests. A one-tailed
test (reflecting a directional hypothesis) posits a difference in a particular direction, such as when we hypothesize that Group 1 will score higher than Group 2. A two-tailed test (reflecting a nondirectional hypothesis) posits a difference but in no particular direction.
First, for a bit of review, the two types of hypotheses differ in that one (the null hypothesis) usually states that there is no relationship between variables (an equality), whereas the research hypothesis usually states that there is a relationship between the variables (an inequality). This is the primary difference. Second, null hypotheses always refer to the population, whereas research hypotheses usually refer to the sample. We select a sample of participants from a much larger population. We then try to generalize the results from the sample back to the population. If you remember your
...more
Fifth, null hypotheses are always written using Greek symbols, and research hypotheses are always written using Roman symbols. Thus, the null hypothesis that the average score for 9th graders is equal to that of 12th graders is represented like this:
Finally, because you cannot directly test the null hypothesis, it is an implied hypothesis. But the research hypothesis is explicit and is stated as such. This is another reason why you rarely see null
hypotheses stated in research reports and will very often see a statement (be it in symbols or words) of the research hypothesis.
We can’t stress enough how important it is to ask the question you want answered and to keep in mind that any hypothesis you present is a direct extension of the original question you asked.
First, a good hypothesis is stated in declarative form and not as a question.
Second, a good hypothesis posits an expected relationship between variables.
Notice the word expected in the second criterion. Defining an expected relationship is intended to prevent a fishing trip to look for any relationships that may be found (sometimes called the “shotgun” approach), which may be tempting but is not very productive.
Good researchers do not want just anything they can catch or shoot. They want specific results. To get them, researchers need their opening questions and hypotheses to be clear, forceful, and easily understood. Third, hypotheses reflect the theory or literature on which they are based.
A good hypothesis reflects this, in that it has a substantive link to existing literature and theory.
Fourth, a hypothesis should be brief and to the point. You want your hypothesis to describe the relationship between variables in a declarative form and to be as direct and explicit as possible. The more to the point it is, the easier it will be for others (such as your master’s thesis or doctoral dissertation committee members!) to read your research and understand exactly what you are hypothesizing and what the important variables are. In fact, when people read and evaluate research (as you will learn more about later in this chapter), the first thing many of them do is find the hypotheses
...more
In sum, research hypotheses should be stated in declarative form, posit a relationship between variables, reflect a theory or a body of literature on which they are based, be brief and to the point, and be testable.
Why? First, the normal curve provides us with a basis for understanding the probability associated with any possible outcome (such as the chance of getting a certain score on a test or the chance of a coin flip coming up “heads”). Second, the study of probability is the basis for determining the degree of confidence we have in stating that a particular finding or outcome is “true.” Or, better said, that an outcome (like an average score) may not have occurred due to chance alone.
What is a normal curve? Well, the normal curve (also called a bell-shaped curve, or bell curve) is a visual representation of a distribution of scores that has three characteristics.
The normal curve represents a distribution of values in which the mean, the median, and the mode are equal to one another. You probably remember from Chapter 4 that if the median and the mean are different, then the distribution is skewed in one
direction or the other. The normal curve is not skewed. It’s got a nice hump (only one), and that hump is right in the middle. Second, the normal curve is perfectly symmetrical about the mean. If you fold one half of the curve along its center line, the two halves would lie perfectly on top of each other. They are identical. One half of the curve is a mirror image of the other. Finally (and get ready for a mouthful), the tails of the normal curve are asymptotic—a big word. What it means is that they come closer and closer to the horizontal axis but never touch. See if you have some idea (in
...more
As you will learn later in this chapter, the fact that the tails never touch the x-axis means that there is an infinitely small likelihood that a score can be obtained that is very extreme (way out under the left or right tail of the curve). If the tails did touch the x-axis, then the likelihood that a very extreme score could be obtained would be nonexistent.
We hope your next question is, “But there are plenty of sets of scores where the distribution is not normal or bell shaped, right?” Yes (and it’s a big but). First, most of the time when scores are allowed to vary and we measure a lot of people, the shape of the distribution of all those people will look pretty normal.
Even if individual scores aren’t normally distributed, though, researchers tend to make statistical inferences about summaries of scores, like measures of central tendencies, and the distribution of those values will tend to be normal regardless of the distribution of individual scores. When we deal with big sample sizes (more than 30), and we take repeated samples from a population, the means of those samples will distribute themselves pretty closely to the shape of a normal curve. This is very important, because a lot of what we do when we talk about inferring from a sample to a population
...more
For any distribution of scores (regardless of the value of the mean and standard deviation), if the scores are distributed normally, almost 100% of the scores will fit between −3 and +3 standard deviations from the
mean. This is very important, because it applies to all normal distributions. Because the rule does apply (once again, regardless of the value of the mean or standard deviation), distributions can be compared with one another.
With that said, we’ll extend this idea a bit more. If the distribution of scores is normal, we can also say that between different points along the x-axis (such as between the mean and 1 standard deviation), a certain percentage of cases will fall. In fact, between the mean (which in this case is 100—got that yet?) and 1 standard deviation above the mean (which is 110), about 34% (actually 34.13%) of all cases in the distribution of scores will fall. That’s about a third of all scores. Because the normal curve is normal, this is true for going the other direction, too. About a thi...
This highlight has been truncated due to consecutive passage length restrictions.
All of this is pretty neat, especially when you consider that the values of 34.13% and 13.59% and so on are absolutely independent of the actual values of the mean and the standard deviation. These percentages are due to the shape of the curve and do not depend on the value of any of the scores in the distribution or the value of the mean or standard deviation.
Say hello to standard scores. These are scores that are comparable because they are standardized in units of standard deviations.
Although there are other types of standard scores, the one that you will see most frequently in your study of statistics is called a z score.
First, those scores below the mean (such as 8 and 10) have negative z scores, and those scores above the mean (such as 13 and 14) have positive z scores. Second, positive z scores always fall to the right of the mean and are in the upper half of the distribution. And negative z scores always fall to the left of the mean and are in the lower half of the distribution. Third, when we talk about a score being located 1 standard deviation above the mean, it’s the same as saying that the score is a z score of 1. For our purposes, when comparing scores across distributions, z scores and standard
...more
No—we did that on purpose to point out how you can compare performance based on scores from different sets of data or distributions. Both raw scores of 12.8 and 64.8, relative to one another, are equal distances (and equally distant) from the mean. When these raw scores are represented as standard scores, then they are directly comparable to one another in terms of their relative location in their respective distributions.
You already know that a particular z score represents not only a raw score but also a particular location along the x-axis of a distribution. And the more extreme the z score (such as –2.0 or +2.6), the farther it is from the mean.
Eighty-four percent of all the scores fall below a z score of +1 (the 50% that fall below the mean plus the 34% that fall between the mean and a z score of 1). Sixteen percent of all the scores fall above a z score of +1 (because the total area under the curve has to equal 100%, and 84% of the scores fall below a score of +1.0).
What we are saying is that, given the normal distribution, different areas of the curve are encompassed by different values of standard deviations or z scores. Okay—here it comes. These percentages or areas can also easily be seen as representing probabilities of a certain score occurring.
Now the method we just went through is fine for z values of 1, 2, and 3. But what if the value of the z score is not a whole number like 2 but is instead 1.23 or −2.01? We need to find a way to be more precise. How do we do that? Simple—learn calculus and apply it to the curve to compute the area underneath it at almost every possible point along the x-axis, or (and we like this alternative much more) use Table B.1 found in Appendix B (the normal distribution table). This is a listing of all the values (except the very most extreme) for the areas under a curve that correspond to different z
...more
But are we always interested only in the amount of area between the mean and some other z score? What about between two z scores, neither of which is the mean? For example, what if we were interested in knowing the amount of area between a z score of 1.5 and a z score of 2.5, which translates to a probability that a score falls between the two z scores? How can we use the table to compute the answer to such questions? It’s easy. Just find the

