Statistics for People Who (Think They) Hate Statistics
Rate it:
6%
Flag icon
Statistics—the science of organizing and analyzing information to make it more easily understood—made these tasks doable.
7%
Flag icon
In the most general sense, statistics describes a set of tools and techniques that are used for describing, organizing, and interpreting information or data.
7%
Flag icon
In all these examples, and the million more we could think of, data are collected, organized, summarized, and then interpreted. In this book, you’ll learn about collecting, organizing, and summarizing data as part of descriptive statistics. And then you’ll learn about interpreting data when you learn about the usefulness of inferential statistics.
7%
Flag icon
Descriptive statistics are used to organize and describe the characteristics of a collection of data. The collection is sometimes called a data set or just data. Scientists would say that descriptive statistics describe a sample—a collection of data that you have in front of you.
7%
Flag icon
Inferential statistics are often (but not always) the next step after you have collected and summarized data. Inferential statistics are used to make inferences based on a smaller group of data (such as our group of 22 students) about a possibly larger one (such as all the undergraduate students in the College of Arts and Sciences).
7%
Flag icon
A smaller group of data is often called a sample, which is a portion, or a subset, of a population. For example, all the fifth graders in Newark (Neil’s fair city of origin), New Jersey, would be a population (the population is all the occurrences with certain characteristics, in this case, being in fifth grade and attending school in Newark), whereas a selection of 150 of these students would be a sample. If we think this sample represents the population well, we can make guesses about the population.
9%
Flag icon
An average is the one value that best represents an entire group of scores.
9%
Flag icon
You can usually think of an average as the “middle” space or as a fulcrum on a seesaw. It’s the point in a range of values that seems to most fairly represent all the values. Averages, also called measures of central tendency, come in three flavors: the mean, the median, and the mode.
9%
Flag icon
The mean is the most common type of average that is computed. It is so popular that scientists sometimes sloppily treat the word
9%
Flag icon
average as if it means mean when it only sometimes means mean. The mean is simply the sum of all the values in a group, divided by the number of values in that group.
9%
Flag icon
The mean is sometimes represented by the letter M and is also called the typical, average, or most central score. If you are reading another statistics book or a research report and you see something like M = 45.87, it probably means that the mean is equal to 45.87. Technically, that capital letter M is used when you are talking about the mean of the larger population represented by the sample in front of you.
9%
Flag icon
In the formula, a small n represents the sample size for which the mean is being computed. A large N (← like this) would represent the population size.
9%
Flag icon
And if you want to be technical about it, the arithmetic mean (which is the one that we have discussed up to now) is also defined as the point about which the sum of the deviations is equal to zero (whew!). Each score in a sample is some distance from the mean. If you add up all those distances, it will equal zero. Always. Every time.
9%
Flag icon
Remember that the word average means only the one measure that best represents a set of scores and that there are many different types of averages.
10%
Flag icon
In basic statistics, an important distinction is made between those values associated with samples (a part of a population) and those associated with populations. To do this, statisticians use the following conventions. For a sample statistic (such as the mean of a sample), Roman letters are used. For a population parameter (such as the mean of a population), Greek letters are used. So, for example, the mean for the spelling score for a sample of 100 fifth graders is represented as whereas the mean for the spelling score for the entire population of fifth graders is represented, using the ...more
10%
Flag icon
The median is also an average, but of a very different kind. The median is defined as the midpoint in a set of scores. It’s the point at which one half, or 50%, of the scores fall above and one half, or 50%, fall below.
10%
Flag icon
When there is an even number of values, the median is simply the mean of the two middle values.
10%
Flag icon
If you know about medians, you should also know about percentile ranks. Percentile ranks are used to define the percentage of cases equal to or below a certain point in a distribution or set of scores. For example, if a score is “at the 75th percentile,” it means that the score is at or above 75% of the other scores in the distribution. The median is also known as the 50th percentile, because it’s the point at or below which 50% of the cases in the distribution fall. Other percentiles are useful as well, such as the 25th percentile, often called Q1, and the 75th percentile, referred to as Q3. ...more
10%
Flag icon
The median is insensitive to extreme scores, whereas the mean is not.
10%
Flag icon
There are just too many extreme scores that would skew, or significantly distort, what is actually a central point in the set or distribution of scores.
10%
Flag icon
Because the median is based on how many cases there are, and not the values of those cases, extreme scores (sometimes called outliers) only count a little.
10%
Flag icon
Want to know the easiest and most commonly made mistake made when computing the mode? It’s selecting the number of times a category occurs rather than the label of the category itself.
10%
Flag icon
If every value in a distribution contains the same number of occurrences, then there really isn’t a single mode. But if more than one value appears with equal frequency, the distribution is multimodal.
10%
Flag icon
Can you have a trimodal distribution? Sure—where three values have the same frequency. It’s unlikely, especially when you are dealing with a large set of data points, or observations, but certainly possible. The real answer to the above stand-apart question is that categories have to be mutually exclusive—you simply cannot have both black and red hair (although if you look around the classroom, you may think differently).
10%
Flag icon
Which measure of central tendency you use depends on certain characteristics of the data you are working with—specifically the scale of measurement at which those data occur. And that scale or level dictates the specific measure of central tendency you will use.
10%
Flag icon
Measurement is the assignment of values to outcomes following a set of rules—simple. The results are the different scales we’ll define in a moment, and an outcome is anything we are interested in measuring, such as hair color, gender, test score, or height. These scales of measurement, or rules, are the particular levels at which outcomes are observed. Each level has a particular set of characteristics, and scales of measurement come in four flavors (there are four types): nominal, ordinal, interval, and ratio.
11%
Flag icon
The nominal level of measurement is defined by the characteristics of an outcome that fit into one and only one class or category.
11%
Flag icon
The ord in ordinal level of measurement stands for order, and the characteristic of things being measured here is that they are ordered.
11%
Flag icon
Now we’re getting somewhere. When we talk about the interval level of measurement, a test or an assessment tool is based on some underlying continuum such that we can talk about how much more a higher performance is than a lesser one. For example, if you get 10 words correct on a vocabulary test, that is 5 more than getting 5 words correct. A distinguishing characteristic of interval-level scales is that the intervals or spaces or points
11%
Flag icon
along the scale are equal to one another.
11%
Flag icon
An assessment tool at the ratio level of measurement is characterized by the presence of an absolute zero on the scale.
11%
Flag icon
Any outcome can be assigned to one of four levels of measurement. Levels of measurement have an order, from the least precise being nominal to the most precise being ratio. The “higher up” the scale of measurement, the more precise the data being collected, and the more detailed and informative the data are. It may be enough to know that some people are rich and some poor (and that’s a nominal or categorical distinction), but it’s much better to know exactly how much money they make (ratio). We can always make the “rich” versus “poor” distinction if we want to once we have all the information. ...more
11%
Flag icon
points, but you also know that the Cubs are better than the Tigers (but not by how much) and that the Cubs are different from the Tigers (but there’s no direction to the difference).
11%
Flag icon
In general, which measure of central tendency you use depends on the type of data that you are describing, which in turn means at what level of measurement the data occur. Unquestionably, a measure of central tendency for qualitative, categorical, or nominal data (such as racial group, eye color, income bracket, voting preference, and neighborhood location) can be described using only the mode. It’s not meaningful to talk about the mean eye color in a classroom.
11%
Flag icon
In general, the median and the mean are best used with quantitative data, such as height, income level in dollars (not categories), age, test score, reaction time, and number of hours completed toward a degree.
11%
Flag icon
Use the mode when the data are categorical (“nominal”) in nature and the people or things can fit into only one class, such as hair color, political affiliation, neighborhood location, and religion. When this is the case, these categories are called mutually exclusive. Use the median when you have extreme scores and you don’t want an average that is misleading, such as when the variable of interest is income expressed in dollars. Finally, use the mean when you have data that do not include extreme scores and are not categorical, such as the numerical score on a test or the number of seconds it ...more
12%
Flag icon
In the simplest of terms, variability reflects how much scores differ from one another.
12%
Flag icon
Variability (also called spread or dispersion) can be thought of as a measure of how different scores are from one another. It’s even more accurate (and maybe even easier) to think of variability as how different scores are from one particular score. And what “score” do you think that might be? Well, instead of comparing each score to every other score in a distribution, the one score that could be used as a comparison is—that’s right—the average. So, variability becomes a measure of how much each score in a group of scores differs from the average, usually the mean. More about this in a ...more
13%
Flag icon
The range is the simplest measure of variability and kind of intuitive. It is the distance of the biggest score from the smallest score.
13%
Flag icon
There really are two kinds of ranges. One is the exclusive range, which is the highest score minus the lowest score (or h − l) and the one we just defined. The second kind of range is the inclusive
13%
Flag icon
range, which is the highest score minus the lowest score plus 1 (or h − l + 1). The difference is that the inclusive range counts both the high and low scores as being among the values in the range. You most commonly see the exclusive range in research articles, but the inclusive range is also used on occasion if the researcher prefers it. The range tells you how different the highest and lowest values in a data set are from one another—that is, the range shows how much spread there is from the lowest to the highest point in a distribution. So, although the range is fine as a general indicator ...more
13%
Flag icon
Now we get to the most frequently used measure of variability, the standard deviation. Just think about what the term implies—it’s a deviation from something (guess what?) that has been standardized. Actually, the standard deviation (sometimes abbreviated as SD, sometimes s) represents the average amount of variability in a set of scores. In practical terms, it’s the average distance of each score from the mean. The larger the standard deviation, the larger the average distance each data point is from
13%
Flag icon
the mean of the distribution and the more variable the set of scores is.
13%
Flag icon
First, why didn’t we just add up the deviations from the mean? Because the sum of the deviations from the mean is always equal to zero.
13%
Flag icon
There’s another type of deviation that you may read about, and you should know what it means. The mean deviation (also called the mean absolute deviation) is the sum of the absolute value of the deviations from the mean divided by the number of scores. You already know that the sum of the deviations from the mean must equal zero (otherwise, the mean is computed incorrectly). That’s why we square the deviations before summing them. Another option, though, would be to take the absolute value of each deviation (which is the value regardless of the
13%
Flag icon
sign). Sum the absolute values and divide by the number of data points, and you have the mean deviation. So, if you have a set of scores such as 3, 4, 5, 5, 8 and the arithmetic mean is 5, the mean deviation is the sum of 2 (the absolute value of 5 − 3), 1, 0, 0, and 3, for a total of 6. Then divide this by 5 to get the result of 1.2. (Note: The absolute value of a number is usually represented as that number with a vertical line on each side of it, such as |5|. For example, the absolute value of −6, or |−6|, is 6.) Second, why do we square the deviations? Because we want to get rid of the ...more
13%
Flag icon
Why do we divide by n − 1 rather than just plain ol’ n, like we usually do when we calculate the mean? Good question.
13%
Flag icon
The answer is that s (the standard deviation) is an estimate of the population standard deviation, and it is an unbiased estimate. Unbiased means that your sample estimate of the mean is just as likely to be a little higher than the population mean as it is to be a little lower. It is unbiased, though, only when we subtract 1 from n. By subtracting 1 from the denominator, we artificially force the standard deviation to be a tiny bit larger than it would be otherwise. Why would we want to do that? Because, as good scientists, we are conservative. Being conservative means that if we have to err ...more
13%
Flag icon
All other things being equal, then, the larger the size of the sample, the less difference there is between the biased and the unbiased estimates of the standard deviation.
13%
Flag icon
The moral of the story? When you compute the standard deviation for a sample, which is an estimate of the population, the larger the sample is, and the more accurate the estimate will be.
« Prev 1 3 8