AI for math

May 24, 2026

Given the recent breakthrough results using frontier models for mathematics (here and here), below are some of my own experiences with using LLMs for mathematics —based mostly on a talk I gave in Oxford in October 2025, so a bit behind where we are now. Some fails in this one towards the end, but overall extremely impressive how far we have now come.

During the Oxford talk, I asked the audience the following question. I had made this one up myself for the purpose.

Try it for yourself first (solution below)!

It’s not a particularly insightful problem, and after a little while the audience produced the right answer, but it takes some work. First, to fit in a regular polygon with m vertices, the only way to do that is to make sure that each edge of the inserted polygon skips the same number of vertices of the original polygon, so it has to be the case that m divides n. You could do it with n=9 and three triangles, except then they’d all have the same size. n=10 isn’t going to work because it is 2 times 5, so you could use only pentagons. n=11 is even more hopeless, as a prime number. But n=12 will actually work: one regular hexagon and two regular triangles will do the trick. So it ends up being more a number theory problem than a geometry problem.

ChatGPT 5 had no problem with it (and even did some extra credit work).

So it indeed figured out to treat it as a number theory problem. What about a more visual geometric proof? I actually have a recent one in The American Mathematical Monthly, which you can find here. It is a visual proof of the parallelogram law, which states that for a parallelogram, the sum of the squares of the diagonals is equal to the sum of the squares of the sides. (It’s a generalization of the Pythagorean theorem.) Here’s a picture from it; the paper has the full explanation but it relies on two overlapping tilings of the plane.

Could ChatGPT 5 do such a proof? (Was it just going to look up my proof?) Let’s see.

Let’s first take a look at the produced images.

Well, I don’t think it’s possible to get much out of those, but probably creating images is hard when you’re trained mostly on text, so let’s look at the corresponding writing instead. It doesn’t seem to actually be a tiling proof, but is it correct? Think about it for yourself as you read it.

This looks plausible at first glance, but some text is a bit of a head scratcher and some details are missing. Can those be completed, or is this the kind of “proof by bluff” that students sometimes give on an exam? It turns out that it’s the latter. As those of us who grade exams like to do, let’s try to find a decisive argument that this proof can never work. Consider that u and v might be almost the same vector, say both of unit length but pointing in a slightly different direction. In that case, the square on u-v is really tiny, and there would be no way to fit even a single one of those right triangles inside it. So it can’t work.

Perhaps visual proofs are (for now?) a particularly human type of proof. I’m working on other candidates for particularly human proofs…

Here is an example of a simple geometry problem that it gets very strangely wrong. (It does sometimes get it right too; I’m guessing that had it allocated some more compute to this response it wouldn’t have been so wrong.)

Still, we’ve come a long way in little time. Earlier models were willing to prove some exceptionally false results, even when they really, really should have recognized that along the way…

… and of course there is always AI Overview, which here is on an entirely different level (and not in a good way).

Funny AI fails with Vincent Conitzer

Discussion about this post

Ready for more?