TL;DR - amazing, got way more out of this time around. Rounding down to 4 stars because he gets kinda mired in formalism by the second half/as a book, but obviously this is some of the most important mathematics of the last hundred years. Really fascinating to see this stuff in its original form, and the sort of generality / breadth of insight it produced. Weaver’s intro is amazing, Shannon’s first half is quite readable and intuitive, gets a bit clunkier by the time he deals with continuous signals.
I read this for the first time about a year ago, with essentially none of the formal mathematical training, in an English independent study; I’ve since taken most of a math core (stats, linear algebra, an info theory/ML class, etc) and got way, way more out of it this time around. By any measure this is one of the most influential (and interesting) works of math of the last 100 years, and it’s quite incredible to read as it was originally formulated.
For one, Shannon is a pretty great writer. For the first three or four chapters there’s a remarkable lack of dense formal notation, and the simplicity, elegance, and generality of a lot of the methods, I think, underscores how profound some of the insights here are.
The first time I read this, I had a version without Warren Weaver’s introduction, a huge mistake; I think his work is sort of essential here in bringing information theory’s significance into context. Weaver begins by defining communication broadly—the way one mind might affect another or, even more generally, “procedures by which one mechanism affects another mechanism.” He then breaks communication down into three phases: (A) the technical problem, accuracy of symbols, (B) the semantic problem, how precisely symbols convey meaning (which we might think of, now, as semiotics!), and (C), the effectiveness problem, whether the received meaning affects conduct in a desired way. In this division is an assumption that Weaver spells out: “communication is always attempting to influence the conduct of the receiver,” somewhat obvious, but sort of remarkable in its generality. Perhaps most interestingly for my study, Weaver suggests (and Shannon eventually elaborates), that study of (A), the focus of the book, yields interesting insights into (B) and (C). This seems almost like a trope now, the hasty overapplication of information theory into other disciplines, but the appeal of such an approach is clear in this description.
Weaver describes all of the communications problems of this first level. A message is encoded into a signal, sent over a communication channel, and decoded by a receiver. There is often noise in the channel. This raises several questions that Shannon goes on to answer—measuring the amount of information, capacity of the communication channel, process of encoding (and how this might be optimal), the characteristics of noise, and how solving this problem might differ in the continuous and discrete case.
I always found Norbert Wiener’s work so seductive because of the breadth (and promise!) of his claims. If I recall correctly (from beyond here), Wiener and Shannon actually had considerable conflict over some of the credit for this work, but they seem to amply acknowledge one another, in Weaver’s description. Regardless, where Wiener’s contributions seem largely sci-fi and to have receded into the background, it’s quite remarkable to see Shannon’s, essentially, living up to the tone of generality and consequence described in the work, in their later use (across tons of disciplines!).
Weaver devotes lots of time into this throughout, particularly in the ability to apply such techniques to meaningful or meaningless messages—semantics are irrelevant to the engineering problem. He then expands on Shannon’s general formulation—that information is a measure of one’s freedom when selecting a message (the logarithm of available choices) and that probability shapes these concerns. For example, we might see English as a stochastic process or Markov process, in that the probabilities of one letter occurring depend directly on the preceding letters. Weaver really elegantly breaks down more of the formal structure here—defining entropy as a degree of randomness (and suggesting how fundamental entropy is as a principle), defining redundancy and the “relative entropy” of a message (the amount of free choice that is not constrained by the statistical contraints of the structure, i.e. how certain letters cannot follow others).
He similarly expands on coding, differentiating between C (the capacity of the channel) and H (the average bits per signal), noting that it is possible to transmit signals approximately at C / H, but no better. The best codings (or more ideal codings) will maximize signal and channel entropy to make them equal to the capacity of the channel, but more ideal coding will incur more delays in coding.
He also works through noise, introducing uncertainty about what message was sent, and “equivocation,” the uncertainty present in a received message when the original signal is known. Noisy channels, as such, must incur some redundancy later.
Weaver ends by attempting to expand some of Shannon’s methods to other domains; suggesting “semantic noise” when concepts or unclear, or an ability of a speaker to overload the capacity of an audience, like a communications channel. He similarly gestures toward future breakthroughs in linguistics, suggesting handling language statistically or as Markov processes. Weaver, interestingly, seems to be addressing an only partially mathematically trained audience in his writing—for example, there’s footnotes explaining what a logarithm is, or formulating entropy.
The book then enters (sidenote: I almost wrote ‘delve’ instead of ‘enter’ here, but find myself avoiding ‘delve’ because of its known overuse in language models, LOL) into Shannon’s portion, which formalizes more of this. Shannon acknowledges debt to Nyquist and Hartley, and suggests the goal of reproducing an exact message as structuring the work.
His progression is really logical—first, simplest, discrete noiseless systems. He then adds in noise, expands to continuous information, and further formalization. In discrete noiseless systems, he enters into a pretty fascinating description of the structure of the English language, particularly showing its statistical structure using first-order, second-order, and n-order approximations. As he writes, suggestively (wow, we really did just need to throw more compute and data the problem!): “a sufficiently complex stochastic process will give a satisfactory representation of a discrete source.” There’s pretty interesting graphical representation of markov processes, and results of entropy in here.
I was pretty amused when he wrote about how James Joyce’s Finnegan’s Wake “arguably achieves compression of semantic context.” The academic part of my brain is inclined to be pretty suspicious of this move across disciplines, or to dismiss this as overreach beyond discipline. However, I’m much more compelled by the idea that this, actually, is incredibly significant; my English professor, I think correctly, argues that Barthes and many others formulating semiotics later on, are greatly indebted to information theory. In essence—this seems like a rare example of technical conclusions actually reaching beyond their disciplien and having real, foundational impact.
Shannon then extends his early formalizations to a discrete channel with noise. This remains quite intuitive; the real capacity of a channel is actually related to the measure of missing information, that would be acaused by noise. This would be, essentially, the chance that a value is mistaken; Shannon formalizes this, in a binary sense, by adding the chance that a 0 was a 1 plus the chance a 1 was a 0. Certain amounts of redundancy that take into account the type and amount of noise will allow for noiseless reproduction eventually—for example, redundancy in English is desirable.
For the continuous case, Shannon notes that one can divide the discrete case into a number of small regions and solve based on a discrete basis. He also suggests that his method does not get toward maximal generality or mathematical rigor, which might require measure theory. This is a pretty interesting challenge to a lot of my personal conception of mathematics, the idea that the most compelling explanations would be those that were maximally foundational and rigorous; this still may be the case, but it’s an intersting move from Shannon.
Regardless, I am not sure I felt like this was sufficient; it’s quite interesting, but, compared with the first two sections, I think Shannon is mired in formalism and loses a lot of his coherence here. I’m reminded of some of Thomas Kuhn’s work detailing how early formulations of a theory are often clunkier and less well-phrased, and the goal of “normal science” is to reformulate observations to be sufficiently general, intuitive, and elegant; this section, unlike the first, feels like the sort of observation that will eventually be refined, and Weaver notes that many of the more rigorous proofs would follow Shannon’ work.
Either way, this is the sort of work I (and I think anyone in a technical discipline!) should aspire toward. Deeply foundation and paradigm-shifting, intuitive and written without an overabundance of formalism, simplistic, clear, elegant, and broadly useful, across an incredible number of fields. It’s sort of impossible to read this without gaining more admiration for Shannon; it’s really, quite inspiring.
My review from my first read is below (note: I came back to it!):
read for deak, rounding up from 3.5. parts of this were really incredible / really interesting and readable (the first section in particular, plus a lot of the introductions at the beginning of chapters), but I found a lot of the actual mathematical expositions / derivations considerably harder to follow / less thoroughly (or entertainingly) explained and sped through a lot of them, although I admittedly don't have a strong foundation in information theory / some of the discrete probability stuff, so this may have just gone over my head. anyways, still a really interesting project / it makes sense why this is considered so foundational / could very much see myself coming back to this again in the future