undefined | Better HN

0 pointspclmulqdq1y ago0 comments

Comparing "turn photo into LaTeX" to "translate theorems into Lean" is like comparing a child's watercolor drawing to the Mona Lisa.

0 comments

Smaug1231y ago

… no? After the LaTeX output, I told stock GPT4o that the answer was "all even integers", and asked for the statement in Lean. I had to make two changes to its output (both of which were compile-time errors, not misformalisations), and it gave me the formalisation of the difficult direction of the problem.

Both changes were trivial: it had one incorrect (but unnecessary) import, and it used the syntax from Lean 3 instead of Lean 4 in one lambda definition. A system that was trained harder on Lean would not make those mistakes.

The one actual error it made was in not proposing that the other direction of the "if and only if" is required. Again, I am quite confident that this formalisation failure mode is not hard to solve in a system that is, like, actually trained to do this.

Obviously formalising problems that a working mathematicican solves is dramatically harder than formalising IMO problems, and is presumably way ahead of the state of the art.

lmm1y ago

> I am quite confident that this formalisation failure mode is not hard to solve in a system that is, like, actually trained to do this.

Why?

Smaug1231y ago

This is really not the kind of problem LLMs are bad at! But since you insist, given the LaTeX, Claude 3.5 Sonnet correctly stated the theorem in full while inventing notation for the floor operation (it did correctly note unprompted what the right function was and how to obtain it from mathlib, but it incorrectly attempted to define syntax sugar for it).

lmm1y ago

The hard part isn't getting the formalisation right sometimes, it's getting it right reliably (and unlike mistakes in the formal part, there's no way for the system to check itself in that part).

IDK, even for translation between languages outside of mathematics, missing small qualifiers that change the whole meaning of the sentence is a worrying failure mode I've seen, and with mathematical problems there are a lot more cases like that.

sebzim45001y ago

I think that's exagerating a bit. If you are familiar with both Lean and LaTeX then I think transcribing these problems to Lean only takes about twice as long as transcribing them to LaTeX.

j / k navigate · click thread line to collapse

0 comments

Smaug1231y ago

Obviously formalising problems that a working mathematicican solves is dramatically harder than formalising IMO problems, and is presumably way ahead of the state of the art.

lmm1y ago

> I am quite confident that this formalisation failure mode is not hard to solve in a system that is, like, actually trained to do this.

Why?

Smaug1231y ago

lmm1y ago

The hard part isn't getting the formalisation right sometimes, it's getting it right reliably (and unlike mistakes in the formal part, there's no way for the system to check itself in that part).

sebzim45001y ago

I think that's exagerating a bit. If you are familiar with both Lean and LaTeX then I think transcribing these problems to Lean only takes about twice as long as transcribing them to LaTeX.

j / k navigate · click thread line to collapse