The typical way to measure how far off we are from the line of best fit is by assigning a “cost” to each residual, that is 0 if the residual is zero and positive otherwise, and then adding up the cost of every residual. Traditionally the cost of a residual is just its square, so that Typically we divide this by what the total cost would have been if we had just passed a flat line through the average height of the data, as in Figure 4.9. Passing a flat line like this is about the most naïve “curve fitting” possible, so it is used as the benchmark for the penalty that a very bad fit would incur
...more