More on AI and math: Two open problems and my experience with them

Jun 02, 2026

Update: The main variant of the first open problem has been solved — see here.

This post hardly deserves to be under the category of AI fails, but I’ll post it here anyway (and at least it does involve AI not succeeding at something — yet — though on the other hand, at least it didn’t pretend that it had).

Researchers from Google Research are organizing a workshop called AI-driven research in EconCS and have a call out for open problems, which they approached a number of people, including me, about. I have two open math/algorithms problems in this field that I advertised as early as 2015 in a program at the Simons Institute for the Theory of Computing. I don’t know how hard they really are, but I have suggested these problems to people a number of times over the years, and to my knowledge they’re still open. So it’s not exactly the unit distance problem in terms of impressiveness, but at least they’re on topics that I know well.

I appreciate the effort the Google researchers are putting into this workshop and the repository they’re putting together for the community, but why release the problems only to this workshop and repository? So I’m advertising them here as well. Here they are:

Open problem: Computing the optimal deterministic budget-balanced mechanism.
Open problem: Computing Bayes-Nash equilibrium in simple security games.

Actually, before posting this, I first tried to see what would happen with ChatGPT 5.5 Pro — would it just solve the problems right away? The short answer is, no, it didn’t solve them. Maybe that’s my fault for not prompting the right way; anyway, here’s my experience with that.

I spent the most time (and compute) on the first problem, in part because I thought it would be a nice story for AI to discover an algorithm that would then allow AI to design mechanisms for maximizing welfare! We got off to a rocky start, with 5.5 Pro for some reason being unable to get the PDF from the link I gave, and instead of pointing this out to me, spending a lot of time and compute at its guess of what the problem was (which, incidentally, was a different problem, one that was already solved in one of our early papers — though it neither figured that out nor solved that problem). I can’t completely blame it — I had prompted it by saying I was busy so I really wanted it to triple-check its own work before coming back to me with anything. Still, it was amusing to see the struggle with downloading a PDF — like so many other little annoying basic technical problems that somehow don’t get better over the decades even while technology rapidly advances in other ways.

After that, though, it did seem to understand what the problem was and make good partial progress on it, mostly identifying various integrality gaps (which is a natural thing to do on this kind of problem). I didn’t check all of it carefully but it all seemed plausible, and it realized a few things that I already knew about. If anything, the type of progress it was making felt very human-like; perhaps the only thing that would be unusual for a human being is how inclined it was to write code and try lots of examples. Other than that, it seemed to make progress like a well-trained person in the area (rather than someone/something with a totally new perspective). And it seemed like enough progress that if 5 years ago I had given this problem to a PhD student, and that student came back a week later with this as the progress, it would have seemed to me that the student was very competent and seriously working on the problem over the week.

The experience with the second problem was similar; again it didn’t resolve the main problem but it did identify some approaches and special cases that can be done. In fact, in both cases, I very much appreciate that it didn’t just claim a solution and then present me with a wrong proof — as earlier models very much tended to do; see my earlier post here. (Though, I haven’t checked all its partial work, so it’s possible there are mistakes in it.)

Will someone, or something, or someone and something together, solve these problems soon? Let’s see / have at it!

Funny AI fails with Vincent Conitzer

Discussion about this post

Ready for more?