Gwern's Reviews > Past, Present, and Future of Statistical Science

Past, Present, and Future of Statistical Science by Xihong Lin
Rate this book
Clear rating

by
11004626
's review
Jul 14, 2014

really liked it
Read from June 05 to July 13, 2014

Past, Present, and Future of Statistical Science (ed. Lin et al 2014) is a large (52 chapters by ~50 contributors, 643 pages, 9.8M PDF) anthology of essays/articles/reviews/lists touching on all sorts of topics by many famous names (Efron, Rubin, Gelman, Wasserman, Tibshirani, Laird, Cook) - some of whom I know solely from methods bearing their names! The typesetting is tasteful & of high quality, with so many equations & graphs my PDF viewer visibly lags when scrolling. I read about it on Andrew Gelman's blog & thought it would be interesting to read a broad survey of what's going on in statistics.

The anthology ranges from bureaucracy to professional autobio to reviews of subfields to speculations & challenges about future developments to advice about publishing & research. (Probably it would have been better to turn this into 2 volumes: the readers interested in careers & advice have to the technical material, while readers interested in that may not survive the sections about COPSS & autobios.) Since statisticians get involved with any topic they please, the subject areas range from deer in Canada - & trying not to fall out of the helicopter - to traveling to the moon to breast cancer to polygraphs.

Given the heterogeneity, much of it was boring or over my head or both, but much was interesting & I learned about novel topics. In one chapter, a survey statistician reminiscences about how she stumbled into statistics accidentally & fighting sexism in her early career & another mentions that the methodological debates over the famous Kinsey studies of sexuality were her entree to biostatistics while a third was unfairly treated by a Coast Guard exam & learned statistics to prove the exam was bogus while yet a fourth picked math as his major because the signup line at the college was shorter & thereby wandered into the intersection of statistics & agriculture, & in another chapter, Arthur Dempster is still gamely defending the Dempster-Shafer paradigm of statistics after all these years, while in yet another chapter there is a discussion of issues in high-dimensional data I couldn't understand etc.

The introductory bits about the history of COPSS were boring, self-indulgent, & devoid of explanations why the organization functioned or what good it did or why outsiders valued it & what really went on inside it.

The autobiography section features people who can remember all the way back to the 1920s or so, a time when statistics was very different than it is now. Reading them a few at a time (they're generally easy reads), a number of interesting trends pop up. For example, people seem to get married extremely young, as grad students or undergrads, after short romances; it's impossible to mistake the computing revolution: before the 1960s or so, computers & techniques requiring a great deal of computation never come up, but then they become increasingly common (sometimes with shocking details: one person mentions that to test a cool new idea, using a simulation method, ate their department's entire computer budget for that month) & transformed approaches starting in the '80s, & Bickel mentions in his essay his "pleased surprise that some of my asymptotic theory based ideas, in particular, one-step estimates, really worked" when implemented on modern computers; a subtrend here is also that Bayesian methods seem to explode overnight then too & even frequentists begin borrowing Bayesian techniques & logic when useful (thankfully, Tukey's quip that "The collective noun for a group of statisticians is a quarrel" may no longer be true); WWII appears as a clear break-line in the earliest autobios, & to judge by the autobios (a selected sample to be sure!) academia used to be far less competitive & one could (in the great post-WWII expansion) almost fall into a tenured position. Some bios are humorous, like Olkin's :

...Wald had a classic European lecture style. He started at the upper left corner of the blackboard and finished at the lower right. The lectures were smooth and the delivery was a uniform distribution.

...The notion of an application in its current use did not exist. I don't recall the origin of the following quotation, but it is attributed to Wald: "Consider an application. Let X1, . . . , Xn be i.i.d. random variables."

...The Master's degree program required a thesis and mine was written with Wolfowitz. The topic was on a sequential procedure that Leon Herbach (he was ahead of me) had worked on. Wolfowitz had very brief office hours, so there usually was a queue to see him. When I did see him in his office he asked me to explain my question at the blackboard. While talking at the blackboard Wolfowitz was multi-tasking (even in 1947) by reading his mail and talking on the telephone. I often think of this as an operatic trio in which each singer is on a different wavelength. This had the desired effect in that I never went back.

...Many prominent statisticians attended the meeting, and I had a chance to meet some of them and young students interested in statistics, and to attend the courses. Wolfowitz taught sequential analysis, Cochran taught sampling, and R.A. Fisher taught something.


Or the history related is surprising, for example, the revelation that the Chernoff bound was actually proven by Rubin (yes, he did that too) in Chernoff's essay "A career in statistics", where he mentions a tragicomic incident in rocketry where a clever method for course-correction turned out to be unnecessary.While Cook's distance in looking for problems in linear models stems from one bizarre rat ("Reflections on a statistical career and their implications"):

...I redid his calculations, looked at residual plots and performed a few other checks that were standard for the time. This confirmed his results, leading to the possibilities that either there was something wrong with the experiment, which he denied, or his prior expectations were off. All in all, this was not a happy outcome for either of us.

I subsequently decided to use a subset of the data for illustration in a regression course that I was teaching at the time. Astonishingly, the selected subset of the data produced results that clearly supported my colleague's prior expectation and were opposed to those from the full data. This caused some anxiety over the possibility that I had made an error somewhere, but after considerable additional analysis I discovered that the whole issue centered on one rat. If the rat was excluded, my colleague's prior expectations were sustained; if the rat was included his expectations were contradicted. The measurements on this discordant rat were accurate as far as anyone knew, so the ball was back in my now quite perplexed colleague's court.

The anxiety that I felt during my exploration of the rat data abated but did not disappear completely because of the possibility that similar situations had gone unnoticed in other regressions. There were no methods at the time that would have identified the impact of the one unusual rat; for example, it was not an outlier as judged by the standard techniques. I decided that I needed a systematic way of finding such influential observations if they were to occur in future regressions, and I subsequently developed a method that easily identified the irreconcilable rat. My colleagues at Minnesota encouraged me to submit my findings for publication (Cook, 1977), which quickly took on a life of their own, eventually becoming known as Cook's Distance.


And naturally, someone will choose to go meta & criticize the implicit goal of the autobios & explicit goal of the career advice section - as one would hope of statisticians, he recognizes the epistemological peril of a series of highly-selected freeform anecdotes; Terry Speed in "Never ask for or give advice, make mistakes, accept mediocrity, enthuse":

What's wrong with advice? For a start, people giving advice lie. That they do so with the best intentions doesn't alter this fact. This point has been summarized nicely by Radhika Nagpal (2013). I say trust the people who tell you "I have no idea what I'd do in a comparable situation. Perhaps toss a coin." Of course people don't say that, they tell you what they'd like to do or wish they had done in some comparable situation. You can hope for better. What do statisticians do when we have to choose between treatments A and B, where there is genuine uncertainty within the expert community about the preferred treatment? Do we look for a statistician over 40 and ask them which treatment we should choose? We don't, we recommend running a randomized experiment, ideally a double-blind one, and we hope to achieve a high adherence to the assigned treatment from our subjects. So, if you really don't know what to do, forget advice, just toss a coin, and do exactly what it tells you. But you are an experiment with n = 1, you protest. Precisely. What do you prefer with n = 1: an observational study or a randomized trial? (It's a pity the experiment can't be singly, much less doubly blinded.) You may wonder whether a randomized trial is justified in your circumstances. That's a very important point. Is it true that there is genuine uncertainty within the expert community (i.e., you) about the preferred course of action? If not, then choosing at random between your two options is not only unethical, it's stupid.


Not all life incidents are amusing. In Gray's "Promoting equity", in between fighting the good fight, she proudly relates an incident I would be ashamed of, especially were I a statistician:

Early in my career I received a notice from Teachers Insurance and Annuity Association (TIAA), the retirement plan used at most private and many public universities including American University, listing what I could expect in retirement benefits from my contribution and those of the university in the form of x dollars per $100,000 in my account at age 65. There were two columns, one headed "women" and a second, with amounts 15% higher, headed "men." When I contacted the company to point out that Title VII prohibited discrimination in fringe benefits as well as in salary, I was informed that the figures represented discrimination on the basis of "longevity," not on the basis of sex.

When I asked whether the insurer could guarantee that I would live longer than my male colleagues, I was told that I just didn't understand statistics. Learning that the US Department of Labor was suing another university that had the same pension plan, I offered to help the attorney in charge, the late Ruth Weyand, an icon in women's rights litigation...At first we concentrated on gathering data to demonstrate that the difference in longevity men and women was in large part due to voluntary lifestyle choices, most notably smoking and drinking. In a settlement conference with the TIAA attorneys, one remarked, "Well, maybe you understand statistics, but you don't understand the law."


A statistician asking for guarantees! & why should voluntary lifestyle changes affect whether a predictable difference be compensated for? Pensions are job compensation, not a moral code handed down from on high, & if men do not live as long as women, 'equal' pay is never equal & defrauds them. Or, would Gray be against maternal leave, seeing as pregnancy is a "voluntary lifestyle choice"? & consider the sophistry: "in large part" - so would she have supported a differential which corresponded to the residual? If their analysis had showed up that black men drink & smoke even more than white men, would Gray be pleased to see a 'black penalty' applied to their pension payments? When is equal not equal? As always, one merely needs to ask: "who, whom?"

The autobiographical essays are interesting, but somewhat dry. I was pleased to reach the meat of the anthology: the freeform technical papers. Some of the chapters introduced me to ideas I had missed, such as the "bet on sparsity" argument (Cook, pg103), which reminds me of one folk argument for Occam's razor: you should assume the world is relatively simple & predictable & take actions based on that belief, because if the world is that way, then your actions will attain their ends & that is good, while if the world is inherently complex/unpredictable, then your actions will have no net effect which is neither good nor bad, so the former scenario dominates the latter. I paid close attention to Tibshirani's paper later in the volume, "In praise of sparsity and convexity".

Similarly, Dunson's "Nonparametric Bayes" introduced me to an area I had little inkling of prior. The biostatistics papers (eg Breslow's "Lessons in biostatistics" or Flournoy's "A vignette of discovery") bring up interesting challenges & biases to keep in mind when evaluating the latest clinical research (a skill useful for anyone), & leave me heartened at the life-saving practical work that field is doing. Nan M. Laird's "Meta-analyses: Heterogeneity can be a good thing" reminded me of the need, when doing my own meta-analyses, to not simply ignore high I2/heterogeneity but think hard about what moderators I should include to try to explain some of it. Others raised interesting questions I've wondered about myself, for example, Xiao-Li Meng in "A trio of inference problems" asks how big a biased sample of a population has to be before it's of comparable quality to a random sample:

Over the century, statisticians, social scientists, and others have amply demonstrated theoretically and empirically that (say) a 5% probabilistic/random sample is better than any 5% non-random samples in many measurable ways, e.g., bias, MSE, confidence coverage, predictive power, etc. However, we have not studied questions such as "Is an 80% non-random sample 'better' than a 5% random sample in measurable terms? 90%? 95%? 99%?" This question was raised during a fascinating presentation by Dr. Jeremy Wu...The synthetic data created for LED used more than 20 data sources in the LEHD (Longitudinal Employer-Household Dynamics) system. These sources vary from survey data such as a monthly survey of 60,000 households, which represent only .05% of US households, to administrative records such as unemployment insurance wage records, which cover more than 90% of the US workforce, to census data such as the quarterly census of earnings and wages, which includes about 98% of US jobs (Wu, 2012 and personal communication from Wu). The administrative records such as those in LEHD are not collected for the purpose of statistical inference, but rather because of legal requirements, business practice, political considerations, etc. They tend to cover a large percentage of the population, and therefore they must contain useful information for inference.


which is what I've wondered while working on my census of biracial characters, since my sample is biased but capture-recapture analysis indicates I've compiled up to 1/3 of the population, so how much does that compensate, does it drive the error from biases down to the same size as the sampling error? Meng derives an inequality:

For example, even if ns = 100, we would need over 96% of the population if ρN = .5 [level of bias]. This reconfirms the power of probabilistic sampling and reminds us of the danger in blindly trusting that “Big Data” must give us better answers. On the other hand, if ρN = .1, then we will need only 50% of the population to beat a SRS [simple random sample] with ns = 100...the same ρN = .1 also implies that a 96% subpopulation will beat a SRS as large as ns = ... 2400, which is no longer a practically irrelevant sample size.


Berger's "Conditioning is the issue" is a bit lost on me but interesting is one passage's discussion of turning notorious p-values into something more meaningful, error probabilities:

The practical import of switching to conditional frequentist testing (or the equivalent objective Bayesian testing) is startling. For instance, Sellke et al. (2001) uses a nonparametric setting to develop the following very general lower bound on α(s), for a given p-value...p = .05, which many erroneously think implies strong evidence against H0, actually corresponds to a conditional frequentist error probability at least as large as .289, which is a rather large error probability. If scientists understood that a p-value of .05 corresponded to that large a potential error probability in rejection, the scientific world would be a quite different place.

TABLE 23.1
Values of the lower bound α(s) in (23.4) for various values of p.
p .2 .1 .05 .01 .005 .001 .0001 .00001
α(s) .465 .385 .289 .111 .067 .0184 .0025 .00031



Other papers are a bit of a misfire: I hadn't heard of "symbolic data" before Lynne Billard's "The past's future is now: What will the present's future bring?", & the paper still leaves me wondering what it really is.

Some I had already read - Gelman & Wasserman has already blogged about their entries.

And still others make one wonder; in Rubin's interesting retrospective of his greatest-hits, "Converting rejections into positive stimuli", he encourages the reader to not be discouraged by the journal submission process as it is so random & some of his best papers were rejected - which makes me wonder, 'so why have this whole journal rigmarole if rejection means so little...? would you use a statistical test which exhibited such poor calibration & discrimination?' & his remark that "if you are repeatedly told by some reviewers that everyone knows what you are saying, but without specific references, and other reviewers are saying what you are writing is completely wrong but without decent reasons, you are probably on to something" is true.

Overall, the anthology is interesting & worth reading (if not each and every paper).
5 likes · flag

Sign into Goodreads to see if any of your friends have read Past, Present, and Future of Statistical Science.
Sign In »

Reading Progress

06/05/2014 marked as: currently-reading
07/14/2014 marked as: read
show 2 hidden updates…

No comments have been added yet.