More on this book
Kindle Notes & Highlights
by
Andy Field
Read between
January 9 - February 23, 2021
all stats software (including IBM SPSS Statistics) enables you to do all of things that I just told you not to do
Much as pink graphs might send a twinge of delight down your spine, remember why you’re drawing the graph – it’s not to make yourself (or others) purr with delight at the pinkness, it’s to present information (dull, but true).
✓ Show the data. ✓ Induce the reader to think about the data being presented (rather than some other aspect of the graph, like how pink it is). ✓ Avoid distorting the data. ✓ Present many numbers with minimum ink. ✓ Make large data sets (assuming you have one) coherent. ✓ Encourage the reader to compare different pieces of data. ✓ Reveal the underlying message of the data.
The bars have a 3-D effect: Never use 3-D plots for a graph plotting two variables because it obscures the data.
✗ Patterns: The bars also have patterns, which, although very pretty, distract the eye from what matters (namely the data). These are completely unnecessary. ✗ Cylindrical bars: Were my data so sewage-like that I wanted to put them in silos? The cylinder effect muddies the data and distracts the eye from what is important. ✗ Badly labeled y-axis: ‘Number’ of what? Delusions? Fish? Cabbage-eating sea lizards from the eighth dimension? Idiots who don’t know how to draw graphs?
✓ A 2-D plot: The completely unnecessary third dimension is gone, making it much easier to compare the values across therapies and thoughts/behaviors. ✓ I have superimposed the summary statistics (means and confidence intervals) over the raw data so readers get a full sense of the data (without it being overwhelming). ✓ The y-axis has a more informative label: We now know that it was the number of obsessions per day that was measured. I’ve also added a legend to inform readers that obsessive thoughts and actions are differentiated by color. ✓ Distractions: There are fewer distractions like
...more
Governments lie with statistics, but scientists shouldn’t.
Data analysis is a bit like Internet dating (it’s not, but bear with me). You can scan through the vital statistics and find a perfect match (good IQ, tall, physically fit, likes arty French films, etc.) and you’ll think you have found the perfect answer to your question.
Data analysis is much the same: inspect your data with a picture, see how it looks and only then can you interpret the more vital statistics.
We encountered histograms (frequency distributions) in Chapter 1; they’re a useful way to look at the shape of your data and spot problems
Simple histogram: Use this option to visualize frequencies of scores for a single variable. Stacked histogram: If you have a grouping variable (e.g., whether people worked hard or wished upon a star) you can produce a histogram in which each bar is split by group. In this example, each bar would have two colors, one representing people who worked hard and the other those who wished upon a star. This option is a good way to compare the relative frequency of scores across groups (e.g., were those who worked hard more successful than those who wished upon a star?). Frequency polygon: This option
...more
To compare frequency distributions of several groups simultaneously we can use a population pyramid.
It shows that for those who wished upon a star there is a fairly normal distribution centerd at about the midpoint of the success scale (50%).
A boxplot or box–whisker diagram is one of the best ways to display your data.
Boxplots display a summary for a single outcome variable;
1-D Boxplot: This option produces a single boxplot of all scores for the chosen outcome
Simple boxplot: This option produces multiple boxplots for the chosen outcome by splitting the data by a categorical variable.
Clustered boxplot: This option is the same as the simple boxplot, except that it splits the data by a second categorical variable. Boxplots for this second variable are produced in different colors.
Like histograms, boxplots also tell us whether the distribution is symmetrical or skewed.
Bar charts are the usual way for people to display means, although they are not ideal because they use a lot of ink to display only one piece of information.
Simple bar: Use this option to display the means of scores across different groups or categories of cases.
Clustered bar: If you have a second grouping variable you can produce a simple bar chart (as above) but with different colored bars to represent levels of a second grouping variable.
Stacked bar: This is like the clustered bar, except that the different-colored bars are stacked on top of each other rather than placed side by side.
Simple 3-D bar: This is also like the clustered bar, except that the second grouping variable is displayed not by different-colored bars, but by an additional axis.
Clustered 3-D bar: This is like the clustered bar chart above, except that you can add a third categorical variable on an extra axis.
Stacked 3-D bar: This graph is the same as the clustered 3-D graph, except the different-colored bars are stacked on top of each other instead of standing side by side.
Simple error bar: This is the same as the simple bar chart, except that, instead of bars, the mean is represented by a dot, and a line represents the precision of the estimate of the mean (usually, the 95% confidence interval is plotted, but you can plot the standard deviation or standard error of the mean instead).
Clustered error bar: This is the same as the clustered bar chart, except that the mean is displayed as a dot with an error bar around it.
You can select to show an I-beam (the bar is reduced to a line with horizontal bars at the top and bottom) or just a whisker (the bar is reduced to a vertical line). The I-beam and whisker options might be useful when you’re not planning on adding error bars, but because we are going to show error bars we should stick with a bar.
Graphing means from the same entities is trickier, but, as they say, if you’re going to die, die with your boots on. So, let’s put our boots on and hopefully not die.
Mostly you can let SPSS construct the scale automatically – if it doesn’t do it sensibly you can edit it later.
The Chart Builder can produce graphs of a mixed design (see Chapter 16). A mixed design has one or more independent variables measured using different groups, and one or more independent variables measured using the same entities. The Chart Builder can produce a graph, provided you have only one repeated-measure variable.
Line charts are bar charts but with lines instead of bars. Therefore, everything we have just done with bar charts we can do with line charts instead.
Simple line: Use this option to display the means of scores across different groups of cases.
Multiple line: This option is equivalent to the clustered bar chart: it will plot means of an outcome variable for different categories/groups of a predictor variable and also produce different-colored lines for each category/group of a second predictor variable.
Sometimes we need to look at the relationships between variables (rather than their means or frequencies).
scatterplot is a graph that plots each person’s score on one variable against their score on another. It visualizes the relationship between the variables, but also helps us to identify unusual cases that might bias that relationship.
Simple scatter: Use this option to plot values of one continuous variable against another.
Grouped scatter: This is like a simple scatterplot, except that you can display points belonging to different groups in different colors (or symbols).
Simple 3-D scatter: Use this option to plot values of one continuous variable agai...
This highlight has been truncated due to consecutive passage length restrictions.
Grouped 3-D scatter: Use this option to plot values of one continuous variable against two others, but differentiating groups o...
This highlight has been truncated due to consecutive passage length restrictions.
Summary point plot: This graph is the same as a bar chart (see Section 5.6), except that a do...
This highlight has been truncated due to consecutive passage length restrictions.
Simple dot plot: Otherwise known as a density plot, this graph is like a histogram (see Section 5.4), except that, rather than having a summary bar representing the frequency of...
This highlight has been truncated due to consecutive passage length restrictions.
Scatterplot matrix: This option produces a grid of scatterplots showing the relationships between multiple pairs of variables in each cell of the grid.
Drop-line: This option produces a plot similar to a clustered bar chart (see, for example, Section 5.6.2) but with a dot representing a summary statistic (e.g., the mean) instead of a bar, and with a line connecting the ‘summary’ (e.g., mean) of each group.
Often it is useful to plot a line that summarizes the relationship between variables on a scatterplot (this is called a regression line,
A 3-D scatterplot displays the relationship between three variables, and the reason why it’s sometimes OK to use a 3-D graph in this context is that the third dimension tells us something useful
Instead of plotting several variables on the same axes on a 3-D scatterplot (which can be difficult to interpret), I think it’s better to plot a matrix of 2-D scatterplots. This type of plot allows you to see the relationship between all combinations of many different pairs
the simple dot plot, or density plot as it is also known, is a histogram except that each data point is plotted (rather than using a single summary bar to show each frequency). Like a histogram, the data are still placed into bins (SPSS Tip 5.2), but a dot is used to represent each data point.
the drop-line plot is fairly similar to a clustered bar chart (or line chart), except that each mean is represented by a dot (rather than a bar), and within groups these dots are linked by a line (contrast this with a line graph, where dots are joined across groups, rather than within groups).

