An exercise in hypothesis testing

More likely to be killed by a terrorist New study: vaccines prevent disease a...

An exercise in hypothesis testing

I've just turned in the manuscript for the second edition of Think Stats. If you're dying to get your hands on a copy, you can pre-order one here.

Most of the book is about computational methods, but in the last chapter I break out some analytic methods, too. In the last section of the book, I explain the underlying philosophy:

This book focuses on computational methods like resampling and permutation. These methods have several advantages over analysis:
They are easier to explain and understand. For example, one of the most difficult topics in an introductory statistics class is hypothesis testing. Many students don’t really understand what p-values are. I think the approach I presented in Chapter 9—simulating the null hypothesis and computing test statistics—makes the fundamental idea clearer.They are robust and versatile. Analytic methods are often based on assumptions that might not hold in practice. Computational methods require fewer assumptions, and can be adapted and extended more easily.They are debuggable. Analytic methods are often like a black box: you plug in numbers and they spit out results. But it’s easy to make subtle errors, hard to be confident that the results are right, and hard to find the problem if they are not. Computational methods lend themselves to incremental development and testing, which fosters confidence in the results.But there is one drawback: computational methods can be slow. Taking into account these pros and cons, I recommend the following process:
Use computational methods during exploration. If you find a satisfactory answer and the run time is acceptable, you can stop.If run time is not acceptable, look for opportunities to optimize. Using analytic methods is one of several methods of optimization.If replacing a computational method with an analytic method is appropriate, use the computational method as a basis of comparison, providing mutual validation between the computational and analytic results.For the vast majority of problems I have worked on, I didn’t have to go past Step 1.

The last exercise in the book is based on a question my colleague, Lynn Stein, asked me for a paper she was working on:

In a recent paper
The reason I ask: Question 1 is pretty much a textbook problem; you can probably find an online calculator to do it for you. But you are less likely to find a canned solution to Question 2, so I am curious to see how people go about it. I hope to post some different solutions soon.

By the way, this is not meant to be a "gotcha" question. If some people get it wrong, I am not going to make fun of them. I am looking for different correct approaches; I will ignore mistakes, and only point out incorrect approaches if they are interestingly incorrect.

You can post a solution in the comments below, or discuss it on reddit.com/r/statistics, or if you don't want to be influenced by others, send me email at downey at allendowney dot com.

View more on Allen B. Downey's website »

Like • 0 comments • flag

Published on August 22, 2014 11:50

No comments have been added yet.

Probably Overthinking It

Probably Overthinking It is a blog about data science, Bayesian Statistics, and occasional other topics.

Allen B. Downey's profile
236 followers