PatternBoost (Yet Another Paper About Math and AI)

I have a new preprint up with some new collaborators: François Charton from Meta, Geordie Williamson from Sydney, and Adam Zsolt Wagner who used to be from WPI and as of last month is from DeepMind. As you can guess from those affiliations, this is a paper at the math/machine learning interface. I’m having a lot of fun working on this stuff!

Like my earlier paper with DeepMind people, the emphasis here is on machines as fast, eccentric example generators. Suppose you wanted to find a graph on 33 nodes with as many edges as possible that has no 4-cycle. It’s not at all easy to see how you’d get started looking. Perhaps you’d think: well, what if it were a 3-cycle to exclude? And you’d quickly come up with the idea of using a bipartite graph, which has tons of edges (a positive density of all edges in the complete graph!) and no odd-length cycles at all. And maybe you’d hope some similarly clever construction would generate very dense graph with no 4-cycle. But alas there is no such construction, or at least nobody has come up with one.

One natural thing to try is a greedy algorithm: start with the empty graph and add edges at random, subject to the constraint that you can’t add any edge that creates a 4-cycle. Keep doing that until you can’t. Of course if you could do this infinitely many times you’d eventually come up with whatever the maximal squareless graph was. But you can’t. And in practice if you do it 50 million times you never get past 89 edges.

So here’s what PatternBoost does. You start by doing that greedy search above. You pick out the 50,000 best graphs, and you pass them to a transformer called Makemore and say “give me 500,000 more like this.” Then you run the greedy search again, now starting from those 500,000 graphs instead the empty graph. Now you have 550,000 squareless graphs, the ones you just made and the 50,000 you already have stockpiled. Take the 50,000 of these with the most edges and hand them back to Makemore. And now just repeat! The hope is that whatever features are common to graphs with many edges and no squares will be somehow captured by the transformer, leading it to produce candidates which after refinement by local search become still better examples. We call the greedy search the “local” step and the transformer the “global” step. Here’s the metaphor we use in the intro:

[C]onsider an example of a collective human endeavor, the design of bicycles. The “local” step involves many individual bicyclemakers each making a sequence of careful tweaks to their design, trying to make it as efficient and comfortable as possible. The “global” step involves people living in the world, seeing lots of bicycles around, each one of which has been carefully optimized locally, and then developing a new bicycle design based on those they have observed. Of course this new design would then be carefully optimized by both its designer and others; and if after those tweaks, the new bikes turned out to be notably comfortable and efficient, a lot of them would be sold, and they would join the general fleet of bicycles observed by the next prospective designer,… and on and on we go.

So does this actually work? Yes! Well, kind of. For the problem I just mentioned, PatternBoost does find a squareless graph on 33 nodes with 96 edges, which is known to be best possible. But — as with a lot of problems in math/ML, it turns out to matter a lot how you encode a graph as a string. There are lots of ways to do this, of course! You can treat a graph as an adjacency matrix, and then record the 1s and 0s in some order to get a string of bits. But what order do you put the entries in? To a pure mathematician, this distinction is barely worth mentioning. But it matters to the transformer! In particular, it helped performance a lot when we switched from encoding the matrix row by row to encoding the matrix row by row but including a comma at the end of each row. One of the psychological difficulties mathematicians have to overcome when working in this area is that choices we expect not to matter actually do.

Also: n=33 is actually the largest number of nodes where PatternBoost finds the best squareless graph. For larger n, it certainly learns to do better than greedy search, but it doesn’t match the best known examples.

But the point of this paper isn’t to construct graphs with no 4-cycles, it’s to develop a protocol that can be used on lots of different problems, so that we can try to understand which problems are likely to be graspable by the transformer. And there are problems (like squareless graphs) where PatternBoost can’t match human performance, others where it’s about equal, and others where it finds things humans weren’t able to find. And you can try it on your own problem, too! The code is all right here.

(One problem we worked on that I’ve developed kind of a taste for: Suppose a 0-1 matrix avoids the pattern (312) — i.e. there is never an i < j < k such that a_{ik}, a_{ji}, and a{kj} are all 1. How big can its permanent be? This is one where PatternBoost comes up with just about as good a solution as people have been able to. Maybe in another blog post I’ll write about why I think this is a pretty natural question, as long as you think pattern-avoiding permutations are natural in the first place.)

To be honest, I still don’t think I have a clear picture of what separates the problems PatternBoost finds hard from the problems it finds tractable. Except for one sort of obvious point — you are a lot less likely to beat the best human efforts on problems many humans have tried really hard to solve. If your goal is to show the superiority of machine cognition to the human version, this might be dispiriting. But if your goal is mathematical progress, it’s fine! Even when we’re working on a famous hard problem, our approach, if it’s at all original, is going to pass through questions and hypotheses that have not been thought about much before. If ML techniques help generate rapid answers to those unfamous questions, that’s great! Who cares if humans would have done better in a counterfactual universe where the question was famous and sought-after — or even in a counterfactual universe where one person had spent a couple of weeks on it? We don’t live in that universe! I’ve said it before and I’ll say it again: the promise of AI is mediocrity at scale.

 •  0 comments  •  flag
Share on Twitter
Published on December 15, 2024 11:19
No comments have been added yet.


Jordan Ellenberg's Blog

Jordan Ellenberg
Jordan Ellenberg isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Jordan Ellenberg's blog with rss.