Allen B. Downey's Blog: Probably Overthinking It, page 3

September 16, 2024

Ears Are Weird

In a previous article, I looked at 93 measurements from the ANSUR-II dataset and found that ear protrusion is not correlated with any other measurement. In a followup article, I used principle component analysis to explore the correlation structure of the measurements, and found that once you have exhausted the information encoded in the most obvious measurements, the ear-related measurements are left standing alone.

I have a conjecture about why ears are weird: ear growth might depend on idiosyncratic details of the developmental environment — so they might be like fingerprints. Recently I discovered a hint that supports my conjecture.

This Veritasium video explains how we locate the source of a sound.

In general, we use small differences between what we hear in each ear — specifically, differences in amplitude, quality, time delay, and phase. That works well if the source of the sound is to the left or right, but not if it’s directly in front, above, or behind — anywhere on vertical plane through the centerline of your head — because in those cases, the paths from the source to the two ears are symmetric.

Fortunately we have another trick that helps in this case. The shape of the outer ear changes the quality of the sound, depending on the direction of the source. The resulting spectral cues makes it possible to locate sources even when they are on the central plane.

The video mentions that owls have asymmetric ears that make this trick particularly effective. Human ears are not as distinctly asymmetric as owl ears, but they are not identical.

And now, based on the Veritasium video, I suspect that might be a feature — the shape of the outer ear might be unpredictably variable because it’s advantageous for our ears to be asymmetric. Almost everything about the way our bodies grow is programmed to be as symmetric as possible, but ears might be programmed to be different.

 •  0 comments  •  flag
Share on Twitter
Published on September 16, 2024 07:01

September 4, 2024

Rip-off ETF?

An article in a recent issue of The Economist suggests, right in the title, “Investors should avoid a new generation of rip-off ETFs”. An ETF is an exchange-traded fund, which holds a collection of assets and trades on an exchange like a single stock. For example, the SPDR S&P 500 ETF Trust (SPY) tracks the S&P 500 index, but unlike traditional index funds, you can buy or sell shares in minutes.

There’s nothing obviously wrong with that – but as an example of a “rip-off ETF”, the article describes “defined-outcome funds” or buffer ETFs, which “offer investors an enviable-sounding opportunity: hold stocks, with protection against falling prices. All they must do is forgo annual returns above a certain level, often 10% or so.”

That might sound good, but the article explains, “Over the long term, they are a terrible deal for investors. Much of the compounding effect of stock ownership comes from rallies.”

To demonstrate, they use the value of the S&P index since 1980: “An investor with returns capped at 10% and protected from losses would have made a real return of 403% over the period, a fraction of the 3,155% return offered by just buying and holding the S&P 500.”

So that sounds bad, but returns from 1980 to the present have been historically unusual. To get a sense of whether buffer ETFs are more generally a bad deal, let’s get a bigger picture.

Click here to run this notebook on Colab

The Dow Jones

The MeasuringWorth Foundation has compiled the value of the Dow Jones Industrial Average at the end of each day from February 16, 1885 to the present, with adjustments at several points to make the values comparable. The series I collected starts on February 16, 1885 and ends on August 30, 2024. The following cells download and read the data.

DATA_PATH = "https://github.com/AllenDowney/ThinkS... = "DJA.csv"download(DATA_PATH + filename)djia = pd.read_csv(filename, skiprows=4, parse_dates=[0], index_col=0)djia.head()DJIADate1885-02-1630.92261885-02-1731.33651885-02-1831.47441885-02-1931.67651885-02-2031.4252

To compute annual returns, we’ll start by selecting the closing price on the last trading day of each year (dropping 2024 because we don’t have a complete year).

annual = djia.groupby(djia.index.year).last().drop(2024)annualDJIADate188539.4859188641.2391188737.7693188839.5866188942.0394……201928538.4400202030606.4800202136338.3000202233147.2500202337689.5400

139 rows × 1 columns

Next we’ll compute the annual price return, which is the ratio of successive year-end closing prices.

annual['Ratio'] = annual['DJIA'] / annual['DJIA'].shift(1)annualDJIARatioDate188539.4859NaN188641.23911.044401188737.76930.915861188839.58661.048116188942.03941.061960………201928538.44001.223384202030606.48001.072465202136338.30001.187275202233147.25000.912185202337689.54001.137034

139 rows × 2 columns

And the relative return as a percentage.

annual['Return'] = (annual['Ratio'] - 1) * 100

Looking at the years with the biggest losses and gains, we can see that most of the extremes were before the 1960s – with the exception of the 2008 financial crisis.

annual.dropna().sort_values(by='Return')DJIARatioReturnDate193177.90000.473326-52.667396190743.03820.622683-37.73174320088776.39000.661629-33.8370971930164.58000.662347-33.765293192071.95000.670988-32.901240…………1954404.39001.43962343.962264190863.11041.46638146.6381031928300.00001.48221348.221344193399.90001.66694566.694477191599.15001.81659981.659949

138 rows × 3 columns

Here’s what the distribution of annual returns looks like.

from empiricaldist import Cdfcdf_return = Cdf.from_seq(annual['Return'])cdf_return.plot()decorate(xlabel='Annual return (percent)', ylabel='CDF')

Immediately we see why capping returns at 10% might be a bad idea – this cap is exceeded almost 45% of the time, and sometimes by a lot!

1 - cdf_return(10)0.4492753623188406Long-Term Returns

We’ll use the following function to compute long-term returns. It takes a start date and a duration, and computes two ratios:

The total price return based on actual annual returns.The total price return if annual returns are clipped at 0 and 10 – that is, any negative returns are set to 0 and any returns above 10 are set to 10.def compute_ratios(start=1993, duration=30): end = start + duration interval = annual.loc[start: end] ratio = interval['Ratio'].prod() low, high = 1.0, 1.10 clipped = interval['Ratio'].clip(low, high) ratio_clipped = clipped.prod() return start, end, ratio, ratio_clipped

With this function, we can replicate the analysis The Economist did with the S&P 500. Here are the results for the DJIA from the beginning of 1980 to the end of 2023.

compute_ratios(1980, 43)(1980, 2023, 44.93751117788029, 15.356490985533199)

A buffer ETF over this period would have grown by a factor of more than 15 in nominal dollars, with no risk of loss. But an index fund would have grown by a factor of almost 45. So yeah, the ETF would have been a bad deal.

However, if we go back to the bad old days, an investor in 1900 would have been substantially better off with a buffer ETF held for 43 years – a factor of 7.2 compared to a factor of 2.8.

compute_ratios(1900, 43)(1900, 1943, 2.8071864303140583, 7.225624631784611)

It seems we can cherry-pick the data to make the comparison go either way – so let’s see how things look more generally. Starting in 1886, we’ll compute price returns for all 30-year intervals, ending with the interval from 1993 to 2023.

duration = 30ratios = [compute_ratios(start, duration) for start in range(1886, 2024-duration)]ratios = pd.DataFrame(ratios, columns=['Start', 'End', 'Index Fund', 'Buffer ETF'])ratios.index = ratios['Start']ratios.tail()StartEndIndex FundBuffer ETFStart19891989201913.1600276.53212519901990202011.1166936.36861519911991202113.7976437.00547619921992202210.4604076.36861519931993202311.4172326.724757

Here’s what the returns look like for an index fund compared to a buffer ETF.

ratios['Index Fund'].plot()ratios['Buffer ETF'].plot()decorate(xlabel='Start year', ylabel='30-year price return')

The buffer ETF performs as advertised, substantially reducing volatility. But it has only occasionally been a good deal, and not in my lifetime.

According to ChatGPT, the primary reasons for strong growth in stock prices since the 1960s are “technological advancements, globalization, financial market innovation, and favorable monetary policies”. If you think these elements will generally persist over the next 30 years, you might want to avoid buffer ETFs.

 •  0 comments  •  flag
Share on Twitter
Published on September 04, 2024 13:21

August 23, 2024

Probably the Book

Last week I had the pleasure of presenting a keynote at posit::conf(2024). When the video is available, I will post it here. In the meantime, you can read the slides, if you don’t mind spoilers.

For people at the conference who don’t know me, this might be a good time to introduce you to this blog, where I write about data science and Bayesian statistics, and to Probably Overthinking It, the book based on the blog, which was published by University of Chicago Press last December. Here’s an outline of the book with links to excerpts I’ve published in the blog and talks I’ve presented based on some of the chapters.

For your very own copy, you can order from Bookshop.org if you want to support independent bookstores, or Amazon if you don’t.

Twelve Excellent Chapters

In Chapter 1, we learn that no one is normal, everyone is weird, and everyone is about the same amount of weird. I published an excerpt from this chapter, and talked about it during this section of the SuperDataScience podcast. And it is featured in an interactive article at Brilliant.org, which includes this animation showing how measurements are distributed in multiple dimensions.

Chapter 2 is about the inspection paradox, which affects our perception of many real-world scenarios, including fun examples like class sizes and relay races, and more serious examples like our understanding of criminal justice and ability to track infectious disease. I published a prototype of this chapter as an article called “The Inspection Paradox is Everywhere“, and gave a talk about it at PyData NYC:

Chapter 3 presents three consequences of the inspection paradox in demography, especially changes in fertility in the United States over the last 50 years. It explains Preston’s paradox, named after the demographer who discovered it: if each woman has the same number of children as her mother, family sizes — and population — grow quickly; in order to maintain constant family sizes, women must have fewer children than their mothers, on average. I published an excerpt from this chapter, and it was discussed on Hacker News.

Chapter 4 is about extremes, outliers, and GOATs (greatest of all time), and two reasons the distribution of many abilities tends toward a lognormal distribution: proportional gain and weakest link effects. I gave a talk about this chapter for PyData Global 2023:

And here’s a related exploration I cut from the book.

Chapter 5 is about the surprising conditions where something used is better than something new. Most things wear out over time, but sometimes longevity implies information, which implies even greater longevity. This property has implications for life expectancy and the possibility of much longer life spans. I gave a talk about this chapter at ODSC East 2024 — there’s no recording, but the slides are here.

Chapter 6 introduces Berkson’s paradox — a form of collision bias — with some simple examples like the correlation of test scores and some more important examples like COVID and depression. Chapter 7 uses collision bias to explain the low birthweight paradox and other confusing results from epidemiology. I gave a “Talk at Google” about these chapters:

Chapter 8 shows that the magnitudes of natural and human-caused disasters follow long-tailed distributions that violate our intuition, defy prediction, and leave us unprepared. Examples include earthquakes, solar flares, asteroid impacts, and stock market crashes. I gave a talk about this chapter at SciPy 2023:

The talk includes this animation showing how plotting a tail distribution on a log-y scale provides a clearer picture of the extreme tail behavior.

Chapter 9 is about the base rate fallacy, which is the cause of many statistical errors, including misinterpretations of medical tests, field sobriety tests, and COVID statistics. It includes a discussion of the COMPAS system for predicting criminal behavior.

Chapter 10 is about Simpson’s paradox, with examples from ecology, sociology, and economics. It is the key to understanding one of the most notorious examples of misinterpretation of COVID data. This is the first of three chapters that use data from the General Social Survey (GSS).

Chapter 11 is about the expansion of the Moral Circle — specifically about changes in attitudes about race, gender, and homosexuality in the U.S. over the last 50 years. I published an excerpt about the remarkable decline of homophobia since 1990, featuring lyrics from “A Message From the Gay Community“.

Chapter 12 is about the Overton Paradox, a name I’ve given to a pattern observed in GSS data: as people get older, their beliefs become more liberal, on average, but they are more likely to say they are conservative. This chapter is the basis of this interactive lesson at Brilliant.org. And I gave a talk about it at PyData NYC 2022:

There are still a few chapters I haven’t given a talk about, so watch this space!

Again, you can order the book from Bookshop.org if you want to support independent bookstores, or Amazon if you don’t.

Supporting code for the book is in this GitHub repository. All of the chapters are available as Jupyter notebooks that run in Colab, so you can replicate my analysis. If you are teaching a data science or statistic class, they make good teaching examples.

Chapter 1: Are You Normal? Hint: No.

Run the code on Colab

Run the code that prepares the BRFSS data

Run the code that prepares the Big Five data

Chapter 2: Relay Races and Revolving Doors

Run the code on Colab

Chapter 3: Defy Tradition, Save the World

Run the code on Colab

Chapter 4: Extremes, Outliers, and GOATs

Run the code on Colab

Run the code that prepares the BRFSS data

Run the code that prepares the NSFG data

Chapter 5: Bettter Than New

Run the code on Colab

Chapter 6: Jumping to Conclusions

Run the code on Colab

Chapter 7: Causation, Collision, and Confusion

Run the code on Colab

Run the code that prepares the NCHS data

Chapter 8: The Long Tail of Disaster

Run the code on Colab

Run the code that prepares the earthquake data

Run the code that prepares the solar flare data

Chapter 9: Fairness and Fallacy

Run the code on Colab

Chapter 10: Penguins, Pessimists, and Paradoxes

Run the code on Colab

Run the code that prepares the GSS data

Chapter 11: Changing Hearts and Minds

Run the code on Colab

Chapter 12: Chasing the Overton Window

Run the code on Colab

 •  0 comments  •  flag
Share on Twitter
Published on August 23, 2024 07:36

July 25, 2024

Where’s My Train?

Yesterday I presented a webinar for PyMC Labs where I solved one of the exercises from Think Bayes, called “The Red Line Problem”. Here’s the scenario:


The Red Line is a subway that connects Cambridge and Boston, Massachusetts. When I was working in Cambridge I took the Red Line from Kendall Square to South Station and caught the commuter rail to Needham. During rush hour Red Line trains run every 7-8 minutes, on average.


When I arrived at the subway stop, I could estimate the time until the next train based on the number of passengers on the platform. If there were only a few people, I inferred that I just missed a train and expected to wait about 7 minutes. If there were more passengers, I expected the train to arrive sooner. But if there were a large number of passengers, I suspected that trains were not running on schedule, so I expected to wait a long time.


While I was waiting, I thought about how Bayesian inference could help predict my wait time and decide when I should give up and take a taxi.


I used this exercise to demonstrate a process for developing and testing Bayesian models in PyMC. The solution uses some common PyMC features, like the Normal, Gamma, and Poisson distributions, and some less common features, like the Interpolated and StudentT distributions.

The video is on YouTube now:

The slides are here.

This talk will be remembered for the first public appearance of the soon-to-be-famous “Banana of Ignorance”. In general, when the data we have are unable to distinguish between competing explanations, that uncertainty is reflected in the joint distribution of the parameters. In this example, if we see more people waiting than expected, there are two explanation: a higher-than-average arrival rate or a longer-than-average elapsed time since the last train. If we make a contour plot of the joint posterior distribution of these parameters, it looks like this:

The elongated shape of the contour indicates that either explanation is sufficient: if the arrival rate is high, elapsed time can be normal, and if the elapsed time is high, the arrival rate can be normal. Because this shape indicates that we don’t know which explanation is correct, I have dubbed it “The Banana of Ignorance”:

For all of the details, you can read the Jupyter notebook or run it on Colab.

The original Red Line Problem is based on a student project from my Bayesian Statistics class at Olin College, way back in Spring 2013.

 •  0 comments  •  flag
Share on Twitter
Published on July 25, 2024 08:50

July 17, 2024

Elements of Data Science

I’m excited to announce the launch of my newest book, Elements of Data Science. As the subtitle suggests, it is about “Getting started with Data Science and Python”.

Order now from Lulu.com and get 20% off!

I am publishing this book myself, which has one big advantage: I can print it with a full color interior without increasing the cover price. In my opinion, the code is more readable with syntax highlighting, and the data visualizations look great!

In addition to the printed edition, all chapters are available to read online, and they are in Jupyter notebooks, where you can read the text, run the code, and work on the exercises.

Description

Elements of Data Science is an introduction to data science for people with no programming experience. My goal is to present a small, powerful subset of Python that allows you to do real work with data as quickly as possible.

Part 1 includes six chapters that introduce basic Python with a focus on working with data.

Part 2 presents exploratory data analysis using Pandas and empiricaldist — it includes a revised and updated version of the material from my popular DataCamp course, “Exploratory Data Analysis in Python.”

Part 3 takes a computational approach to statistical inference, introducing resampling method, bootstrapping, and randomization tests.

Part 4 is the first of two case studies. It uses data from the General Social Survey to explore changes in political beliefs and attitudes in the U.S. in the last 50 years. The data points on the cover are from one of the graphs in this section.

Part 5 is the second case study, which introduces classification algorithms and the metrics used to evaluate them — and discusses the challenges of algorithmic decision-making in the context of criminal justice.

This project started in 2019, when I collaborated with a group at Harvard to create a data science class for people with no programming experience. We discussed some of the design decisions that went into the course and the book in this article.

 •  0 comments  •  flag
Share on Twitter
Published on July 17, 2024 06:04

June 28, 2024

Have the Nones Leveled Off?

Last month Ryan Burge published “The Nones Have Hit a Ceiling“, using data from the 2023 Cooperative Election Study to show that the increase in the number of Americans with no religious affiliation has hit a plateau. Comparing the number of Atheists, Agnostics, and “Nothing in Particular” between 2020 and 2023, he found that “the share of non-religious Americans has stopped rising in any meaningful way.”

When I read that, I was frustrated that the HERI Freshman Survey had not published new data since 2019. I’ve been following the rise of the “Nones” in that dataset since one of my first blog articles.

As you might guess, the Freshman Survey reports data from incoming college students. Of course, college students are not a representative sample of the U.S. population, and as rates of college attendance have increased, they represent a different slice of the population over time. Nevertheless, surveying young adults over a long interval provides an early view of trends in the general population.

Well, I have good news! I got a notification today that HERI has published data tables for the 2020 through 2023 surveys. They are in PDF, so I had to do some manual data entry, but I have results!

Religious preference

Among other questions, the Freshman Survey asks students to select their “current religious preference” from a list of seventeen common religions, “Other religion,” “Atheist”, “Agnostic”, or “None.”  

The options “Atheist” and “Agnostic” were added in 2015.  For consistency over time, I compare the “Nones” from previous years with the sum of “None”, “Atheist” and “Agnostic” since 2015.

The following figure shows the fraction of Nones from 1969, when the question was added, to 2023, the most recent data available.

The blue line shows data until 2015; the orange line shows data from 2015 through 2019. The gray line shows a quadratic fit.  The light gray region shows a 95% predictive interval.

The quadratic model continues to fit the data well and the recent trend is still increasing, but if you look at only the last few data points, there is some evidence that the rate of increase is slowing.

But not for women

Now here’s where things get interesting. Until recently, female students have been consistently more religious than male students. But that might be changing. The following figure shows the percentages of Nones for male and female students (with a missing point in 2018, when this breakdown was not available).

Since 2019, the percentage of Nones has increased for women and decreased for men, and it looks like women may now be less religious. So the apparent slowdown in the overall trend might be a mix of opposite trends in the two groups.

The following graph shows the gender gap over time, that is, the difference in percentages of male and female students with no religious affiliation.

The gap was essentially unchanged from 1990 to 2020. But in the last three years it has changed drastically. It now falls outside the predictive range based on past data, which suggests a change this large would be unlikely by chance.

Attendance

The survey also asks students how often they “attended a religious service” in the last year. The choices are “Frequently,” “Occasionally,” and “Not at all.” Respondents are instructed to select “Occasionally” if they attended one or more times, so a wedding or a funeral would do it.

The following figure shows the fraction of students who reported any religious attendance in the last year, starting in 1968. I discarded a data point from 1966 that seems unlikely to be correct.

There is a clear dip in 2021, likely due to the pandemic, but the last two data points have returned to the long-term trend.

Data Source

The data reported here are available from the HERI publications page. Since I entered the data manually from PDF documents, it’s possible I have made errors.

 •  0 comments  •  flag
Share on Twitter
Published on June 28, 2024 17:14

June 14, 2024

Should divorce be more difficult?

“The Christian right is coming for divorce next,” according to this recent Vox article, and “Some conservatives want to make it a lot harder to dissolve a marriage.”

As always when I read an article like this, I want to see data — and the General Social Survey has just the data I need. Since 1974, they have asked a representative sample of the U.S. population, “Should divorce in this country be easier or more difficult to obtain than it is now?” with the options to respond “Easier”, “More difficult”, or “Stay as is”.

Here’s how the responses have changed over time:

Since the 1990s, the percentage saying divorce should be more difficult has dropped from about 50% to about 30%. [The last data point, in 2022, may not be reliable. Due to disruptions during the COVID pandemic, the GSS changed some elements of their survey process — in the 2021 and 2022 data, responses to several questions have deviated from long-term trends in ways that might not reflect real changes in opinion.]

If we break down the results by political alignment, we can see whether these changes are driven by liberals, conservatives, or both.

Not surprisingly, conservatives are more likely than liberals to believe that divorce should be more difficult, by a margin of about 20 percentage points. But the percentages have declined in all groups — and fallen below 50% even among self-described conservatives.

As the Vox article documents, conservatives in several states have proposed legislation to make divorce more difficult. Based on the data, these proposals are likely to be unpopular.

To see my analysis, you can run this notebook on Colab. For similar analysis of other topics, see Chapter 11 of Probably Overthinking It.

 •  0 comments  •  flag
Share on Twitter
Published on June 14, 2024 07:00

June 6, 2024

Migration and Population Growth

On a recent run I was talking with a friend from Spain about immigration in Europe. We speculated about whether the population of Spain would be growing or shrinking if there were no international migration. I thought it might be shrinking, but we were not sure. Fortunately, Our World in Data has just the information we need!

I downloaded data from OWID’s interactive graph, “Population growth rate with and without migration”, ultimately from UN, World Population Prospects (2022) and processed by Our World in Data.

It includes “The annual change in population with migration included versus the change if there was zero migration (neither emigration nor immigration). The latter, therefore, represents the population change based only on domestic births and deaths.”

I selected data from 1990 to 2022. Here are the results for Spain.

In this graph, we can see:

Without migration, population growth would have been close to zero – and negative since 2015.With migration, population growth has been substantially higher except for a few years from 2012 to 2014, during a period of high unemployment and austerity measures following the global financial crisis.

I’m not sure what caused the increased migration around 2001. At first I thought it might be when several Eastern European and Baltic countries joined the EU, but that was not until 2004. If anyone knows the reason, let me know.

From the annual growth rates, we can compute the cumulative growth over the 32-year period.

With migration, Spain grew by 22% over 32 years, which is very slow. Without migration, it would have grown by only 2.6%.

So the answer to our question is that the population of Spain would have grown very slowly without migration – but it is so close to zero, we were probably right to be unsure if it was negative.

All Countries

Looking across other regions and countries (and some territories) we can see more general patterns. The following figure shows actual growth rates with migration on the x-axis, and hypothetical rates without migration on the y-axis. To see the interactive version of this figure, you can run this Jupyter notebook on Colab.

Many high-income countries and regions have low fertility rates; without migration, their populations would grow slowly or even shrink. For example, the population of Europe grew by only 3.3% in 32 years from 1990 to 2022, slower than any other region. But without migration – that is, with population change based only on domestic births and deaths – it would have shrunk by 2.1%.

During the same period, the population of Northern America grew by 37%, much more quickly than Europe. But more than half of that growth was due to international migration; without it, growth would have been only 18%.

Countries in the lower-right quadrant are growing only because of migration; without it, they would be shrinking. They include several European countries and Japan.

As fertility has decreased, the populations of high-income countries have aged, with fewer employed workers to support more retirees. These aging populations depend on the labor of immigrants, who tend to be younger, willing to work at jobs some native-born workers would not, and providing skills in areas where there are shortages, including child care and health care.

However, in some countries, the immigration levels needed to stabilize the population face political barriers, and even the perception of increased immigration can elicit anti-migration sentiments. Particularly in Europe and Northern America, concerns about immigration – real and imagined – have contributed to the growth of right wing populism.

1 like ·   •  1 comment  •  flag
Share on Twitter
Published on June 06, 2024 11:10

May 20, 2024

Bertrand’s Boxes

An early draft of Probably Overthinking It included two chapters about probability. I still think they are interesting, but the other chapters are really about data, and the examples in these chapters are more like brain teasers — so I’ve saved them for another book. Here’s an excerpt from the chapter on Bayes theorem.

In 1889 Joseph Bertrand posed and solved one of the oldest paradoxes in probability. But his solution is not quite correct – it is right for the wrong reason.

The original statement of the problem is in his Calcul des probabilités (Gauthier-Villars, 1889). As a testament to the availability of information in the 21st century, I found a scanned copy of the book online and pasted a screenshot into an online OCR server. Then I pasted the French text into an online translation service. Here is the result, which I edited lightly for clarity:


Three boxes are identical in appearance. Each has two drawers, each drawer contains a medal. The medals in the first box are gold; those in the second box, silver; the third box contains a gold medal and a silver medal.


We choose a box; what is the probability of finding, in its drawers, a gold coin and a silver coin?


Three cases are possible and they are equally likely because the three chests are identical in appearance. Only one case is favorable. The probability is 1/3.


Having chosen a box, we open a drawer. Whatever medal one finds there, only two cases are possible. The drawer that remains closed may contain a medal whose metal may or may not differ from that of the first. Of these two cases, only one is in favor of the box whose parts are different. The probability of having got hold of this set is therefore 1/2.


How can it be, however, that it will be enough to open a drawer to change the probability and raise it from 1/3 to 1/2? The reasoning cannot be correct. Indeed, it is not.


After opening the first drawer, two cases remain possible. Of these two cases, only one is favorable, this is true, but the two cases do not have the same likelihood.


If the coin we saw is gold, the other may be silver, but we would be better off betting that it is gold.


Suppose, to show the obvious, that instead of three boxes we have three hundred. One hundred contain two gold medals, one hundred and two silver medals and one hundred one gold and one silver. In each box we open a drawer, we see therefore three hundred medals. A hundred of them are in gold and a hundred in silver, that is certain; the hundred others are doubtful, they belong to boxes whose parts are not alike: chance will regulate the number.


We must expect, when opening the three hundred drawers, to see less than two hundred gold coins the probability that the first that appears belongs to one of the hundred boxes of which the other coin is in gold is therefore greater than 1/2.


Now let me translate the paradox one more time to make the apparent contradiction clearer, and then we will resolve it.

Suppose we choose a random box, open a random drawer, and find a gold medal. What is the probability that the other drawer contains a silver medal? Bertrand offers two answers, and an argument for each:

Only one of the three boxes is mixed, so the probability that we chose it is 1/3.When we see the gold coin, we can rule out the two-silver box. There are only two boxes left, and one of them is mixed, so the probability we chose it is 1/2.

As with so many questions in probability, we can use Bayes theorem to resolve the confusion. Initially the boxes are equally likely, so the prior probability for the mixed box is 1/3.

When we open the drawer and see a gold medal, we get some information about which box we chose. So let’s think about the likelihood of this outcome in each case:

If we chose the box with two gold medals, the likelihood of finding a gold medal is 100%.If we chose the box with two silver medals, the likelihood is 0%.And if we chose the box with one of each, the likelihood is 50%.

Putting these numbers into a Bayes table, here is the result:

PriorLikelihoodProductPosteriorTwo gold1/311/32/3Two silver1/3000Mixed1/31/21/61/3

The posterior probability of the mixed box is 1/3. So the first argument is correct. Initially, the probability of choosing the mixed box is 1/3 – opening a drawer and seeing a gold coin does not change it. And the Bayesian update tells us why: if there are two gold coins, rather than one, we are twice as likely to see a gold coin.

The second argument is wrong because it fails to take into account this difference in likelihood. It’s true that there are only two boxes left, but it is not true that they are equally likely. This error is analogous to the base rate fallacy, which is the error we make if we only consider the likelihoods and ignore the prior probabilities. Here, the second argument is wrong because it commits the a “likelihood fallacy” – considering only the prior probabilities and ignoring the likelihoods.

Right for the wrong reason

Bertrand’s resolution of the paradox is correct in the sense that he gets the right answer in this case. But his argument is not valid in general. He asks, “How can it be, however, that it will be enough to open a drawer to change the probability…”, implying that it is impossible in principle.

But opening the drawer does change the probabilities of the other two boxes. Having seen a gold coin, we rule out the two-silver box and increase the probability of the two-gold box. So I don’t think we can dismiss the possibility that opening the drawer could change the probability of the mixed box. It just happens, in this case, that it does not.

Let’s consider a variation of the problem where there are three drawers in each box: the first box contains three gold medals, the second contains three silver, and the third contains two gold and one silver.

In that case the likelihood of seeing a gold coin is each case is 1, 0, and 2/3, respectively. And here’s what the update looks like:

PriorLikelihoodProductPosteriorThree gold1/311/33/5Three silver1/3000Two gold, one silver1/32/32/92/5

Now the posterior probability of the mixed box is 2/5, which is higher than the prior probability, which was 1/3. In this example, opening the drawer provides evidence that changes the probabilities of all three boxes.

I think there are two lessons we can learn from this example. The first is, don’t be too quick to assume that all cases are equally likely. The second is that new information can change probabilities in ways that are not obvious. The key is to think about the likelihoods.

 •  0 comments  •  flag
Share on Twitter
Published on May 20, 2024 07:14

March 8, 2024

Think Python Goes to Production

Think Python has moved into production, on schedule for the official publication date in July — but maybe earlier if things go well.

To celebrate, I have posted the next batch of chapters on the new site, up through Chapter 12, which is about Markov text analysis and generation, one of my favorite examples in the book. From there, you can follow links to run the notebooks on Colab.

And we have a cover!

The new animal is a ringneck parrot, I’ve been told. I will miss the Carolina parakeet that was on the old cover, which was particularly apt because it is an ex-parrot. Nevertheless, I think the new cover looks great!

Huge thanks to Sam Lau and Luciano Ramalho for their technical reviews. Both made many helpful corrections and suggestions that improved the book. Sam is an expert on learning to program with AI assistants. And Luciano was inspired by the turtles to make an improved module for turtle graphics in Jupyter, called jupyturtle. Here’s an example of what it looks like (from Chapter 5):

If you have a chance to check out the current draft, and you have any corrections or suggestions, please create an issue on GitHub.

And if you would like a copy of the book as soon as possible, you can read the Early Release version and order from O’Reilly here or pre-order the third edition from Amazon.

.

 •  0 comments  •  flag
Share on Twitter
Published on March 08, 2024 07:50

Probably Overthinking It

Allen B. Downey
Probably Overthinking It is a blog about data science, Bayesian Statistics, and occasional other topics.
Follow Allen B. Downey's blog with rss.