Brad Balderson’s Kindle Notes & Highlights for The Book of Why: The New Science of Cause and Effect (Penguin Science)

Every one of these had to take shape in someone’s imagination before it was realized in the physical world.

Exactly as Napolean Hill suggests in Think and Grow Rich

“Presumably the child brain is something like a notebook as one buys it from the stationer’s,” he wrote. “Rather little mechanism, and lots of blank sheets.” He was wrong about that: the child’s brain is rich in mechanisms and prestored templates.

Exactly, we are encoded with pre-existing structures which we can use to interpret the world.

10%

Humans must have some compact representation of the information needed in their brains, as well as an effective procedure to interpret each question properly and extract the right answer from the stored representation. To pass the mini-Turing test, therefore, we need to equip machines with a similarly efficient representation and answer-extraction algorithm.

This is the mental representations i have discussed on beautiful bioinformatics... And reading allows us to extract those mental representations from the minds of other people, so that we needn't go through the hardwork necessary to do so ourselves.

11%

Note also that merely collecting Big Data would not have helped us ascend the ladder and answer the above questions. Assume that you are a reporter collecting records of execution scenes day after day. Your data will consist of two kinds of events: either all five variables are true, or all of them are false. There is no way that this kind of data, in the absence of an understanding of who listens to whom, will enable you (or any machine learning algorithm) to predict the results of persuading marksman A not to shoot.

This is an important point; to create a proper causal reasoning model, one needs to bring in extra information in order to know who does what; thus it becomes necessary, for instance, to know what tfs bind to what genes, and what tfs are able to interact with one another. This is an example of mind-first, or knowledge first, data second.

11%

As all three examples show, we have to teach the computer how to selectively break the rules of logic. Computers are not good at breaking rules, a skill at which children excel. (Cavemen too! The Lion Man could not have been created without a breach of the rules about what head goes with what body.)

The importance here of non-conformism.

13%

I usually pay a great deal of attention to what philosophers have to say about slippery concepts such as causation, induction, and the logic of scientific inference. Philosophers have the advantage of standing apart from the hurly-burly of scientific debate and the practical realities of dealing with data. They have been less contaminated than other scientists by the anticausal biases of statistics.

This here is a fantastic point to make; it really points to the importance of stepping back from the work and thinking about it from an extremely high level abstraction as to whether, in general, the beliefs governing the approach actually makes sense.

13%

The main point is this: while probabilities encode our beliefs about a static world, causality tells us whether and how probabilities change when the world changes, be it by intervention or by act of imagination.

So they allow us to truly manipulate the model and ask what might happen.. Exactly as Sydney Brenner suggested. This is clearly exactly what i've been looking for; the main thing now is finding a way to express it.

14%

Sons of tall men tend to be taller than average—but not as tall as their fathers. Sons of short men tend to be shorter than average—but not as short as their fathers. Galton first called this phenomenon “reversion” and later “regression toward mediocrity.”

Regression toward the mean.

15%

height and forearm length were “co-related.” Eventually, he opted for the more normal English word “correlated.”

Ha! So that's where tha word comes from!

17%

“Force as a cause of motion is exactly on the same footing as a tree-god as a cause of growth.” More generally, Pearson belonged to a philosophical school called positivism, which holds that the universe is a product of human thought and that science is only a description of those thoughts. Thus causation, construed as an objective process that happens in the world outside the human brain, could not have any scientific meaning. Meaningful thoughts can only reflect patterns of observations, and these can be completely described by correlations. Having decided that correlation was a more ...more

Clearly a wrong turn and goes completely against what science is really about. It is about expanding the human perception of the world outside our current human bodies capcity to percieve, in order to better predict how the world works from causation and thus decrease the chance of death.

17%

After graduating from Cambridge in 1879, Pearson spent a year abroad in Germany and fell so much in love with its culture that he promptly changed his name from Carl to Karl. He was a socialist long before it became popular, and he wrote to Karl Marx in 1881, offering to translate Das Kapital into English. Pearson, arguably one of England’s first feminists, started the Men’s and Women’s Club in London for discussions of “the woman question.”

So we can see hear that this communism problem seems also to have wormed its way int screwing with how we think about science.

18%

Pearson discovered possibly the most interesting kind of “spurious correlation” as early as 1899. It arises when two heterogeneous populations are aggregated into one. Pearson, who, like Galton, was a fanatical collector of data on the human body, had obtained measurements of 806 male skulls and 340 female skulls from the Paris Catacombs (Figure 2.5). He computed the correlation between skull length and skull breadth. When the computation was done only for males or only for females, the correlations were negligible—there was no significant association between skull length and breadth. But when ...more

This is an interesting thing to know, and somethng i didn't realise... So if there is population heterogeneity, this can result in a correlation betwen two variables due to their common cause (one of the populations). But when correlated within the group, they are no correlated, because there is no variation in the common cause.

20%

Of course, at times scientists do not know the entire web of relationships between their variables. In that case, Wright argued, we can use the diagram in exploratory mode; we can postulate certain causal relationships and work out the predicted correlations between variables. If these contradict the data, then we have evidence that the relationships we assumed were false. This way of using path diagrams, rediscovered in 1953 by Herbert Simon (a 1978 Nobel laureate in economics), inspired much work in the social sciences.

This is a fantastic idea; and should really be used and taken into consideration when dealing with big data, though i'm curious as to how one can infer what correlations to expect..

20%

“The writer [i.e., Wright himself] has never made the preposterous claim that the theory of path coefficients provides a general formula for the deduction of causal relations. He wishes to submit that the combination of knowledge of correlations with knowledge of causal relations to obtain certain results, is a different thing from the deduction of causal relations from correlations implied by Niles’ statement.”

In other words, we must have mind-first, data second. Since correlations betwen variables can occur spuriously, and raw data cannot make sense without a prior structure to use for its interpretation (which also is in-line with Kant's "Critique of Pure Reasoning"). I feel like this is an existential struggle, the struggle of 'gods' or what i call 'old-minds' right to rule; i.e. Listening to yourself versus mindless empricism.

20%

When you grow up, you will understand.” And he continues, “I am not dismissing your gurus, but a spade is a spade. My path coefficients are not correlations. They are something totally different: causal effects.” Imagine that you are in kindergarten, and your friends mock you for believing that 3 + 4 = 7, when everybody knows that 3 + 4 = 8. Then imagine going to your teacher for help and hearing her say, too, that 3 + 4 = 8. Would you not go home and ask yourself if perhaps there was something wrong with the way you were thinking? Even the strongest man would start to waver in his ...more

Exactly, Wright., and as so Judea Pearl, appear to be men of integrity and conviction, who put their mind to good use.

20%

Wright get this inner conviction that he was on the right track and the rest of the kindergarten class was just plain wrong? Maybe his Midwestern upbringing and the tiny college he went to encouraged his self-reliance and taught him that the surest kind of knowledge is what you construct yourself.

Unless it makes sense, based on principle, and not on the emotional opinion of other human beings, then it is clearly not the full truth.

20%

he still declared, “And yet it moves!” Colleagues tell me that when Bayesian networks fought against the artificial intelligence establishment (see Chapter 3), I acted stubbornly, single-mindedly, and uncompromisingly. Indeed, I recall being totally convinced of my approach, with not an iota of hesitation.

This is an example of non-conformism at its finest.

21%

we obtain three equations that can be solved algebraically for the unknown path coefficients, p, lʹ, and l × q. Then we are done, because the desired quantity p has been obtained.

That's awesome! So before Bayesian statistics was introduced to path analysis, instead used averages and simultaneous equations!

21%

Note that this result is biologically meaningful. It tells us how rapidly the pups are growing per day before birth. By contrast, the number 5.66 grams per day has no biological significance, because it conflates two separate processes, one of which is not causal but anticausal (or diagnostic) in the link P ← L.

Thus we see here that this is really Bioinformatics! Or what Bioinformatics should be; derivation of biological quanties based on causal models of the phenomena!

21%

only we could go back and ask Wright’s contemporaries, “Why didn’t you pay attention?” Crow suggests one reason: path analysis “doesn’t lend itself to ‘canned’ programs. The user has to have a hypothesis and must devise an appropriate diagram of multiple causal sequences.” Indeed, Crow put his finger on an essential point: path analysis requires scientific thinking, as does every exercise in causal inference. Statistics, as frequently practiced, discourages it and encourages “canned” procedures instead. Scientists will always prefer routine calculations on data to methods that challenge their ...more

Exactly!! This is the massive problem with Bioinformatics, as opposed to being a filled to help us discover novel biology through the use of proper, interpretable models of biology, instead we have 'canned' 'tools' which are suppose to spit out an answer, when nothing could be further from the truth!! One must mould the solution to the problem!!!

22%

Sociologists renamed path analysis as structural equation modeling (SEM), embraced diagrams, and used them extensively until 1970, when a computer package called LISREL automated the calculation of path coefficients (in some cases). Wright would have predicted what followed: path analysis turned into a rote method, and researchers became software users with little interest in what was going on under the hood. In the late 1980s, a public challenge (by statistician David Freedman) to explain the assumptions behind SEM went unanswered, and some leading SEM experts even disavowed that SEMs had ...more

This here is a big problem, thinking that a method will work out of the box, and nott thinking deeply about the method. Whether a method is applicable depends on the question asked, and therefore one must think deeply to moukd a solution to the problem, as opposed to forcing a solution onto a problem.

22%

Not surprisingly, some economists continue to claim that “it’s all in the data” to this very day.

This is David Hume versus Kant all over again.

22%

What does he mean? First, he means that path analysis should be based on the user’s personal understanding of causal processes, reflected in the causal diagram. It cannot be reduced to mechanical routines, such as those laid out in statistics manuals. For Wright, drawing a path diagram is not a statistical exercise; it is an exercise in genetics, economics, psychology, or whatever the scientist’s own field of expertise is.

In other words, it ensures the person doing the research actually understands it :)

22%

Second, Wright traces the allure of “model-free” methods to their objectivity. This has indeed been a holy grail for statisticians since day one—or since March 15, 1834, when the Statistical Society of London was founded. Its founding charter said that data were to receive priority in all cases over opinions and interpretations. Data are objective; opinions are subjective. This paradigm long predates Pearson. The struggle for objectivity—the idea of reasoning exclusively from data and experiment—has been part of the way that science has defined itself ever since Galileo.

This is the subject-object duality Pirsig denounces in Zen and Motorcycle Maintenence. It's the same thing Hume and Kant were arguing over. I agree, there is no such thing as separation of subject and object. Our minds were moulded by evolution to understand and comprehend the universe; it thus follows our minds might have a thing or two to say after a few billion years of experience. The universe is so complicated I think it requires a point of view; or it requires an angle through which to perceive. So I think that it is actually the opposite of this; pure data is almost pure subjectivity; there are so many ways to interpret it; one needs to use other information to constrain that interpretation.

23%

Having induced several hypotheses, Holmes eliminated them one by one in order to deduce (by elimination) the correct one. Although induction and deduction go hand in hand, the former is by far the more mysterious. This fact kept detectives like Sherlock Holmes in business.

This is exactly the scientific method.

24%

Bayes didn’t need the help. His paper is remembered and argued about 250 years later, not for its theology but because it shows that you can deduce the probability of a cause from an effect. If we know the cause, it is easy to estimate the probability of the effect, which is a forward probability. Going the other direction—a problem known in Bayes’s time as “inverse probability”—is harder. Bayes did not explain why it is harder; he took that as self-evident, proved that it is doable, and showed us how.

This is exactly what Einstein was doing with his thought experiments;inferring the existence of invisible things based on the observed effects and hypothetical mechanisms.

24%

If we observe a cause—for example, Bobby throws a ball toward a window—most of us can predict the effect (the ball will probably break the window). Human cognition works in this direction. But given the effect (the window is broken), we need much more information to deduce the cause (which boy threw the ball that broke it or even the fact that it was broken by a ball in the first place). It takes the mind of a Sherlock Holmes to keep track of all the possible causes. Bayes set out to break this cognitive asymmetry and explain how even ordinary humans can assess inverse probabilities.

Ah, such is the difficulty; the massive number of causes that could explain the same observed effects.

27%

More specifically, instead of representing probability in huge tables, as was previously done, let’s represent it with a network of loosely coupled variables. If we only allow each variable to interact with a few neighboring variables, then we might overcome the computational hurdles that had caused other probabilists to stumble.

Perfect! This is what makes BNs so beautiful, and it's amazing to me that this is actually a follow-on from expert systems !

27%

Reading Rumelhart’s paper, I felt convinced that any artificial intelligence would have to model itself on what we know about human neural information processing and that machine reasoning under uncertainty would have to be constructed with a similar message-passing architecture.

An autoencoder structure could be an extremely interesting and related version of this..

30%

Bayesian networks are by now a mature technology, and you can buy off-the-shelf Bayesian network software from several companies.

I should investigate and learn more about available Bayesian network modelling techniques in python.

45%

However, the cultural shocks that emanate from new scientific findings are eventually settled by cultural realignments that accommodate those findings—not by concealment. A prerequisite for this realignment is that we sort out the science from the culture before opinions become inflamed. Fortunately, the language of causal diagrams now gives us a way to be dispassionate about causes and effects not only when it is easy but also when it is hard.

This is a difficult problem, and unfortunetly probably always will be

46%

TABLE 6.1. The three possible arrangements of doors and goats in Let’s Make a Deal, showing that switching doors is twice as attractive as not.

I don't understand this.. Explained more clearly in a few pages.

48%

Instead of constructing elaborate psychosocial theories, consider a simpler explanation. Your choice of people to date depends on two factors: attractiveness and personality. You’ll take a chance on dating a mean attractive person or a nice unattractive person, and certainly a nice attractive person, but not a mean unattractive person. It’s the same as the two-coin example, when you censored tails-tails outcomes. This creates a spurious negative correlation between attractiveness and personality. The sad truth is that unattractive people are just as mean as attractive people—but you’ll never ...more

That's really interesting; this is a difficult concept to grasp, but is clearly worth the effort.

50%

I don’t want you to get the impression from this example that aggregating the data is always wrong or that partitioning the data is always right. It depends on the process that generated the data. In the Monty Hall paradox, we saw that changing the rules of the game also changed the conclusion. The same principle works here. I’ll use a different story to demonstrate when pooling the data would be appropriate. Even though the data will be precisely the same, the role of the “lurking third variable” will differ and so will the conclusion.

So the question is whether we need to aggregate and take the average, or we need to take the average of the averages. In the case of a confounder, it must be a weighted average of the freq. OF the different values the confounder can take. Other important point here is that the rules of the game fundamentally changes the way we need to interpret the data.

51%

It’s again fictitious, but fictitious examples (like Einstein’s thought experiments) always provide a good way to probe the limits of our understanding.

Pearl knows about the thought experiments ;)

52%

However, for statisticians who are trained in “conventional” (i.e., model-blind) methodology and avoid using causal lenses, it is deeply paradoxical that the correct conclusion in one case would be incorrect in another, even though the data look exactly the same.

I need to learn and memorise these rules properly.

55%

Anytime the causal effect of X on Y is confounded by one set of variables (C) and mediated by another (M) (see Figure 7.2), and, furthermore, the mediating variables are shielded from the effects of C, then you can estimate X’s effect from observational data.

Since you can separate the effect of the confounder by using your knowledge of the mediators.

58%

In mathematical logic, this is known as the “decision problem.” Many logical systems are plagued with intractable decision problems. For instance, given a pile of dominos of various sizes, we have no tractable way to decide if we can arrange them to fill a square of a given size. But once an arrangement is proposed, it takes no time at all to verify whether it constitutes a solution.

There may be multiple paths or none at all. Really this is an NP-hard problem.

62%

On the other hand, starting to reduce your cholesterol when you’re young—whether through diet or exercise or even statins—will have big effects later.

In other words, the earlier you stop looking after yourself; the worse you'll be in the future.

63%

Every other creature can see what is. Our gift, which may sometimes be a curse, is that we can see what might have been.

Humans greatest capacity lies in the ability to see with mindsight, and not hindsight.

64%

in An Enquiry Concerning Human Understanding, he wrote something quite different: “We may define a cause to be an object followed by another, and where all the objects, similar to the first, are followed by objects similar to the second. Or, in other words, where, if the first object had not been, the second never had existed”

The same few philosophers keep popping up as hugely influential in modern academia.

65%

As a licensed Whiggish philosopher, I can explain this consistency quite well: it stems from the fact that we experience the same world and share the same mental model of its causal structure.

Disagree that it's the complete same; there are differences between individuals in the level of development of that mental phenomena. Hence, in a sense, we each occupy a slightly different world, looked at through a slightly different lense, biased in a slightly different way.