TfE: The Problem with Bayes / Solmonoff

Here’s a recent thread musing about problems with Bayesian conceptions of general intelligence and the more specific variants based on Solmonoff induction, such as AIXI. I’ve been thinking about these issues a lot recently, in tandem with the proper interpretation of the no free lunch theorems in the context of machine learning. I’m writing something that will hopefully incorporate some of these ideas, but I do not know how much detail I will be able to go into there.

If the problem with Bayesianism is that it takes representational content for granted, the problem with Solmonoff induction is that it obviates representation entirely, objuring semantics in favour of foundational syntax. Each ignores the underlying dynamics of representation.

By which I mean the non-trivial process through which singular objects are individuated (Kant’s problem) and the general types and predicates that subsume them are revised (Hegel’s problem). Semantic evolution is a constitutive feature of epistemic progress.

The key point is this: if you model epistemic progress as a process of exhausting possibilities, then you need to have a predetermined grasp of the possibility space. Either this is purely combinatorial (mere syntax) or it is meaningfully restricted (added semantics).

The former tends toward computational intractability (and effective vacuity), while the latter risks drawing the bounds of possibility too tightly. The one fails to create a truly coherent possibility space, the other fails to create a truly exhaustive one.

There are two ways around this: one is to insist on semantic atomism (a complete set of atoms from which all meaningful possibilities can be combinatorially generated), the other is to utilise algorithmic information (a universal syntax modulo choice of UTM). Fodor or Solmonoff.

It’s a bit trickier to show what is wrong with these perspectives, but I think their errors are complimentary and instructive. It all comes down to what differentiates the semantic content of empirical terms from those of mathematical terms: their role in observation and action.

Semantic atomism is forced to countenance basic observations and actions. This actually gets formalised in AIXI extensions of Solmonoff induction. It’s important to stress that epistemology and philosophy of action have spent much of the 20th century critiquing these ideas.

Solmonoff induction folds observation and action into the structure of its Universal Turing Machine and its interface with the environment, i.e., whatever I/O side effects it is capable of. Meaning is provided by black-box computational dynamics.

This black-box character is supposed to be a selling point, insofar as it is precisely what obviates representation. But what it really does is remove any possibility of distinguishing distinct algorithmic hypotheses, e.g., about specific phenomena, states, or laws.

All the Solmonoff engine can do is test hypotheses about the universe as a whole, precisely insofar as the universe is indistinguishable from another UTM generating undifferentiated binary output. The only question is ‘what is the binary expansion of reality’.

To temper this, as AIXI seems to, you need to open up the black box and identify the basic operations that structure the underlying binary input. The important question is: why would you want to temper it? The relevant answer is: because otherwise it is completely useless.

If you don’t know how the binary expansion is encoding the features of the world you might possibly be interested in, then you cannot use its hypotheses to predict and manipulate anything specific. It is as useful as the binary expansion of pi containing everything.

To be maximally clear: something cannot count as a general problem solver if it is incapable of solving any specific problems. There is an explanatory gap between optimally predicting any binary stream given an arbitrary UTM and generating discrete usable hypotheses.

There is a related and equally instructive problem here regarding the semantic content of mathematical terms and statements. One might be inclined to think that the lack of connection to observation and action might obviate any representational problems in the mathematical case.

It’s possible to imagine cutting up the world explicitly along AIXI lines and running Solmonoff induction on the pieces, developing progressively better specific hypotheses. But could one do this in a way that would allow one the prove mathematical conjectures? In a word, no.

In more words, I’m sure it’s possible to create sequences that can’t be successfully predicted without proving some corresponding mathematical theorem. Maybe this can even be done for every theorem. But it would generally depend upon encodings that presupposed the proof.

Mathematics as such is dissolved into the underlying computational processes, either as a hard coded feature of the instruction set of the UTM, or as an emergent feature of the program/hypothesis that is generating predictions. A means but never an end.

The ultimate point that I’m working towards is that whatever semblance of discrete knowledge is accrued by Solmonoff induction cannot play any significant role in guiding where the process goes next. New maths is not incorporated into the UTM. New science does not evolve.

The only sense in which there is any intrinsic progress in which knowledge is built upon is the trivial sense in which any such ampliative algorithmic process would also be run by SI, just insofar as it runs every possible algorithm.

Crucially, knowledge is built on not just in the sense that fixed premises can serve as the basis for new inferences, but also that existing concepts’ specific inadequacies suggest a range of options for revision going forward. Computational hysteresis, not brute force search.

To give a related example, I’m a big fan of predictive coding approaches to cognition, which seem superficially similar to SI insofar as they treat experience as prediction (less superficially from the Bayesian brain perspective). However, they treat error very differently.

A predictive coding system has a series of representational layers ranging from the concrete to the abstract (e.g., edges>faces>objects>types>etc.) and two signals: prediction (abstract>concrete) and error (concrete>abstract). All the world provides is error.

This looks superficially similar to SI because all the world provides at each SI step is a bit that lets you discard every program that failed to predict that bit. But error works very differently in the PC case, in at least two ways.

First, there is a tolerance for error. If my current world model predicts that I will hear a loud sound from a certain direction but gets the pitch slightly wrong, that is no reason to substantially modify the model. The error signal doesn’t get passed up to the next layer.

Second, there is specificity of error. Any variance between my predictions and my sensory input trickles up through the representational layers until it reaches its appropriate level, triggering a specific and suitable correction that then cascades back down. Rinse and repeat.

It is this looping process of specific error and appropriate response that enables the model’s predictions to hone in on a correct representation of the world. It is how error gets used that constitutes intelligence in this context.

I’d argue that the same is broadly the case in the Lakatosian model of progressive research programs, allowing anomalies to rise to the correct level of abstraction and elicit appropriate responses. But that’s beside the point.

The point is that the only way Solmonoff induction simulates a predictive coding system is in the negative. Every time a simulated PC program gets an error bit it is discarded. It just so happens that every possible PC program that predicted the correct bit is still running.

Of course, these programs are thereby necessarily more complex than those that contain no instructions governing the use of error signal at all, and so will be ranked less probable than their purely predictive counterparts.

Any hypothesis that algorithmically encodes how it should be modified in response to surprisingly input from the environment is first discounted and then eliminated before it gets the chance to try, because the modified versions have already been running all along.

And this for me demonstrates the real problem with Solmonoff induction, but by extension Bayesianism more generally, at least insofar as they are conceived as methods of exhaustion. To the extent that they are incomputable ideals to be approximated, they idealise the wrong thing.

The ‘shortcuts’ required to make them not just computable, but tractable, essentially involve learning how to make intelligible the specific error signals that steer us toward limited ranges of options for revising our beliefs and the concepts that constitute them.

This is why a few years ago I said that the real coming intellectual conflict is between Bayesian positivists and cybernetic falsificationists. There is a deep connection between semantics, error, and the anticipation of revision that we have yet to fully appreciate.

I’ll finish with one final thought, as I suspect that AIXI fans will not find my gestures toward critiques of basic observations/actions to be sufficiently rigorous. Can we see different representational layers of PC systems as in some sense mapping the same possibility space?

In some sense, everything is being cashed out at the bottom, in the most concrete layer, where our semantic atoms are to be found. Why bother with the tower of abstractions built on top of this bedrock? Can’t we just see them as combinatorial constructions/restrictions?

I think it worth pondering the possibility that the possibility spaces do not neatly map onto one another, and that this is a feature not a bug. Mark Wilson has made the case for this at the level of scientific description already.

Let’s call it there.

[image error]
 •  0 comments  •  flag
Share on Twitter
Published on October 18, 2024 05:42
No comments have been added yet.


Peter Wolfendale's Blog

Peter Wolfendale
Peter Wolfendale isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Peter Wolfendale's blog with rss.