Statistics for People Who (Think They) Hate Statistics
Rate it:
14%
Flag icon
The standard deviation is computed as the average distance from the mean. So, you will need to first compute the mean as a measure of central tendency. Don’t fool around with the median or the mode in trying to compute the standard deviation. The larger the standard deviation, the more spread out the values are, and the more different they are from one another. Just like the mean, the standard deviation is sensitive to extreme scores. When you are computing the standard deviation of a sample and you have extreme scores, note that fact somewhere in your written report and in your interpretation ...more
14%
Flag icon
Here comes another measure of variability and a nice surprise. If you know the standard deviation of a set of scores and you can square a number, you can easily compute the variance of that same set of scores. This third measure of variability, the variance, is simply the standard deviation squared.
14%
Flag icon
You are not likely to see the variance mentioned by itself in a journal article or see it used as a descriptive statistic. This is because the variance is a difficult number to interpret and apply to a set of data. After all, it is based on squared deviation scores, and its size is only a function of the number of scores that happen to be in a distribution.
14%
Flag icon
First, and most important, the standard deviation (because we take the square root of the average summed squared deviation) is stated in the original units from which it was derived. The variance is stated in units that are squared (the square root of the final value is never taken).
15%
Flag icon
Minimize chart or graph junk. “Chart junk” (a close cousin to “word junk”) happens when you use every function, every graph, and every feature a computer program has to make your charts busy, full, and uninformative. With graphs, more is definitely less. Plan out your chart before you start creating the final copy. Use graph paper even if you will be using a computer program to generate the graph. Actually, why not just use your computer to generate and print out graph paper (try www.printfreegraphpaper.com). Say what you mean and mean what you say—no more and no less. There’s nothing worse ...more
15%
Flag icon
Maintain the scale in a graph. “Scale” refers to the proportional relationship between the horizontal and vertical axes. This ratio should be about 3 to 4, so a graph that is 3 inches wide will be about 4 inches tall. Simple is best and less is more. Keep the chart simple but not simplistic. Convey one idea as straightforwardly as possible, with distracting information saved for the accompanying text. Remember, a chart or graph should be able to stand alone, and the reader should be able to understand the message. Limit the number of words you use. Too many words, or words that are too large ...more
15%
Flag icon
The most basic way to illustrate data is through the creation of a frequency distribution. A frequency distribution is a method of tallying and representing how often certain scores occur. In the creation of a frequency distribution, scores are usually grouped into class intervals, or ranges of numbers.
16%
Flag icon
As you can see from the above table, a class interval is a range of numbers, and the first step in the creation of a frequency distribution is to define how large each interval will be.
16%
Flag icon
Here are some general rules to follow in the creation of a class interval, regardless of the size of values in the data set you are dealing with: Select a class interval that has a range of 2, 5, 10, 15, or 20 data points. In our example, we chose 5.
16%
Flag icon
Select a class interval so that 10 to 20 such intervals cover the entire range of data. A convenient way to do this is to compute the range and then divide by a number that represents the number of intervals you want to use (between 10 and 20). In our example, there are 50 scores, and we wanted 10 intervals: 50/10 = 5, which is the size of each class interval. If you had a set of scores ranging from 100 to 400, you could start with an estimate of 20 intervals and see if the interval range makes sense for your data: 300/20 = 15, so 15 would be the class interval. Begin listing the class ...more
16%
Flag icon
There are some simple steps for creating class intervals on the way to creating a frequency distribution. Here are six general rules: Determine the range. Decide on the number of class intervals. Decide on the size of the class interval. Decide the starting point for the first class.
16%
Flag icon
Create the class intervals. Put the data into the class intervals.
16%
Flag icon
Now that we’ve got a tally of how many scores fall in what class
16%
Flag icon
intervals, we’ll go to the next step and create what is called a histogram, a visual representation of the frequency distribution where the frequencies are represented by bars.
16%
Flag icon
All you need to know is that a graph or a chart is the visual representation of data.
16%
Flag icon
Using a piece of graph paper, place values at equal distances along the x-axis, as shown in Figure 4.1. Now, identify the midpoint of each class interval, which is the middle point in the interval. It’s pretty easy to just eyeball, but you can also just add the top and bottom values of the class interval and divide by 2. For example, the midpoint of the class interval 0–4 is the average of 0 and 4, or 4/2 = 2. Draw a bar or column centered on each midpoint that represents the entire class interval to the height representing the frequency of that class interval. For example, in Figure 4.2, you ...more
16%
Flag icon
one time a value between 0 and 4 occurs). Continue drawing bars or columns until each of the frequencies for each of the class intervals is represented. Figure 4.2 is a nice hand-drawn (really!) histogram for the frequency distribution of the 50 scores that we have been working with so far.
16%
Flag icon
We’re going to use the same data—and, in fact, the histogram that you just saw created—to create a frequency polygon. (Polygon is a word for shape.) A frequency polygon is a continuous line that represents the frequencies of scores within a class interval, as shown in Figure 4.4.
16%
Flag icon
Place a midpoint at the top of each bar or column in a histogram (see Figure 4.2). Connect the lines and you’ve got it—a frequency polygon!
16%
Flag icon
Why use a frequency polygon rather than a histogram to represent data? For two reasons. Visually, a frequency polygon appears more dynamic than a histogram (a line that represents change in frequency always looks neat). Also, the use of a continuous line suggests that the variable represented by the scores along the x-axis is also a theoretically continuous, interval-level measurement as we talked about in Chapter 2. (To purists, the fact that the bars touch each other in a histogram suggests the interval-level nature of the variable, as well.)
16%
Flag icon
Once you have created a frequency distribution and have visually represented those data using a histogram or a frequency polygon, another option is to create a visual representation of the cumulative frequency of occurrences by class intervals. This is called a cumulative frequency distribution.
16%
Flag icon
Notice this frequency polygon is shaped a little like a letter S. If the scores in a data set are distributed the way scores typically are, cumulative frequencies will often graph this way.
17%
Flag icon
Another name for a cumulative frequency polygon is an ogive. And, if the distribution of the data is normal or bell shaped (see Chapter 8 for more on this), then the ogive represents what is popularly known as a bell curve or a normal distribution.
17%
Flag icon
A bar or column chart should be used when you want to compare the frequencies of different categories with one another. Categories are organized horizontally on the x-axis, and values are shown vertically on the y-axis.
17%
Flag icon
A column chart is identical to a bar chart, but in this chart, categories are organized on the y-axis (which is the vertical one), and values are shown on the x-axis (the horizontal one).
17%
Flag icon
A line chart should be used when you want to show a trend in the data at equal intervals. This sort of graph is often used when the x-axis represents time.
17%
Flag icon
A pie chart should be used when you want to show the proportion or percentage of people or things in various categories.
17%
Flag icon
Note that a pie chart describes a nominal-level variable (such as ethnicity, time of enrollment, and age groups).
19%
Flag icon
A correlation coefficient is a numerical index that reflects the relationship or association between two variables. The value of this descriptive statistic ranges between −1.00 and +1.00. A correlation between two variables is sometimes referred to as a bivariate (for two variables) correlation. Even more specifically, the type of correlation that we will talk about in the majority of
19%
Flag icon
this chapter is called the Pearson product-moment correlation, named for its inventor, Karl Pearson.
19%
Flag icon
The Pearson correlation coefficient examines the relationship between two variables, but both of those vari...
This highlight has been truncated due to consecutive passage length restrictions.
19%
Flag icon
Interval and ratio levels of measurement are continuous. But a host of other variables are not continuous. They’re called discrete or categorical variables, and examples are race (such as black and white), social class (such as high and low), and political affiliation (such as Democrat and Republican). In Chapter 2, we called these types of variables nominal level. You need to use other correlational techniques, such as the phi correlation, in these cases. These topics are for a more advanced course, but you should know they are acceptable and very useful techniques.
19%
Flag icon
A correlation reflects the dynamic quality of the relationship between variables. In doing so, it allows us to understand whether variables tend to move in the same or opposite directions in relationship to each other. If variables change in the same direction, the correlation is called a direct correlation or a positive correlation. If variables change in opposite directions, the correlation is called an indirect correlation or a negative correlation. Table 5.1 shows a summary of these relationships.
19%
Flag icon
We are computing the correlation between the two variables for the group of people, not for any one particular person.
19%
Flag icon
A correlation can range in value from −1.00 to +1.00. The absolute value of the coefficient reflects the strength of the correlation. So a correlation of −.70 is stronger than a correlation of +.50. One frequently made mistake regarding correlation coefficients occurs when students assume that a direct or positive correlation is always stronger (i.e., “better”) than an indirect or negative correlation because of the sign and nothing else. To calculate a correlation, you need exactly two variables and at least two people. Another easy mistake is to assign a value judgment to the sign of the ...more
19%
Flag icon
terms negative and positive, you might prefer to use the terms indirect and direct to communicate meaning more clearly.
19%
Flag icon
rxy is the correlation between variable X and variable Y. rweight-height is the correlation between weight and height. rSAT.GPA is the correlation between SAT score and grade point average (GPA).
19%
Flag icon
On the other hand, if one variable does not change in value and therefore has nothing to share, then the correlation between it and another variable is zero.
20%
Flag icon
When you are interested in the relationship between two variables, try to collect sufficiently diverse data—that way, you’ll get the truest representative result. And how do you do that? Measure a variable as precisely as possible (use higher, more informative levels of measurement) and use a sample that varies greatly on the characteristics you are interested in.
20%
Flag icon
It’s easy to confuse the sum of a set of values squared and the sum of the squared values. The sum of a set of values squared is
20%
Flag icon
taking values such as 2 and 3, summing them (to be 5), and then squaring that (which is 25). The sum of the squared values is taking values such as 2 and 3, squaring them (to get 4 and 9, respectively), and then adding those together (to get 13). Just look for the parentheses as you work.
20%
Flag icon
What’s really interesting about correlations is that they measure the amount of distance that one variable covaries in relation to another. So, if both variables are highly variable (have lots of wide-ranging values), the correlation between them is more likely to be high than if not. Now, that’s not to say that lots of variability guarantees a higher correlation, because the scores have to vary in a systematic way. But if the variance is constrained in one variable, then no matter how much the other variable changes, the correlation will be lower. For example, let’s say you are examining the ...more
20%
Flag icon
There’s a very simple way to visually represent a correlation: Create what is called a scatterplot, or scattergram (in SPSS lingo it’s a scatter/dot graph). This is simply a plot of each set of scores on separate axes.
20%
Flag icon
Don’t ever expect to find a perfect correlation between any two variables in the behavioral or social sciences.
20%
Flag icon
In fact, r values approaching .7 and .8 are just about the highest you’ll see.
20%
Flag icon
Not all correlations are reflected by a straight line showing the X and the Y values in a relationship called a linear correlation (see Chapter 16 for tons of fun stuff about this).
20%
Flag icon
It’s a curvilinear relationship, and sometimes, the best description of a relationship is that it is curvilinear.
20%
Flag icon
What happens if you have more than two variables and you want to see correlations among all pairs of variables? How are the correlations illustrated? Use a correlation matrix like the one shown in Table 5.2—a simple and elegant solution.
21%
Flag icon
In applications like Excel, you can use the Data Analysis ToolPak.
21%
Flag icon
An effect size is an index of the strength of the relationship among variables, and with most statistical procedures we learn about, there will be an associated effect size that should be reported and interpreted.