undefined | Better HN

0 pointsrerdavies4mo ago0 comments

LLMs DO understand the math. At least Claude does. Seems to be able to solve linear systems, invert matrices, and do a significant amount calculus, and handle some seriously advanced math problems. I haven't probed the outer limits, but, so far, Claude has handled all the math problems I've given it so far.

0 comments

harshreality4mo ago

Even if a deep thinking LLM like Opus can get some math questions right when that depends on identifying the type of problem and applying a learned procedure, it's not going to be able to evaluate the pedagogy of math books it's never encountered, or at most was fringe material in its training set.

I'm also referring to the faster models, not the slow and expensive deep thinking ones which I have little experience with. I don't see how reasoning would enable deep thinking models to meaningfully evaluate textbook pedagogy, either.

rerdaviesOP4mo ago

It seems pretty silly to make pronouncements about what LLMs are capable of doing if the sum total of your experience is casual use of the cheapest and least capable LLMs. (ChatGPT: measured IQ of about 70, vs. measured IQs of 120+ for more capable models, some of which are available for free).

They DO understand what they are doing. When I ask it to solve math problems, it goes through the several (many) steps involved (e.g. e.g. "apply the chain rule" while doing partial differentiation on a term in a Jacobian matrix). It gets pretty tedious when solving systems of linear equations, where it goes through each step of the Gauss-Jordan elimination while doing an LU decomposition, row by row. But one learns to ignore the blah-blah. Step by step, in absolutely ridiculous detail. The point: they absolutely 100% understand what they are doing, and understand it in minute detail.

It's clearly NOT regurgitating something that it has literally seen before, because the level of detail is beyond ridiculous for a human. It is applying generalized rules to specific concrete problems, and doing so with some level of strategic thinking.

Where did it learn those generalized principles, and how did it learn to do that? With absolute certainty, there are math textbooks among the materials they have been trained on. And they certainly learned it from SOMEWHERE. Probably math textbooks. How did they learn to generalize and think strategically? Well, that's the big mystery, isn't it? But they do.

The very best models achieve high scores on Math Olympiad problem sets (so competitive with some of the best minds on the planet). And Terrence Tau (greatest living mathematician) declares state-of-the-art models to be "better than most of my post-graduate students".

And what they can and cannot do is increasing by leaps and bounds on a weekly or monthly basis right now. It's hard to keep up. I frequently find that they can do things this week, that they could not do a week or a month ago. Startling, and quite utterly amazing.

Most of the time, I am using Claude Sonnet 4.5 as my coding agent, for which I pay $10/month. Measured IQ of 110, I think, with an IQ of 120 if you flip it into thinking mode. But only because there isn't enough undergraduate level mathematics in a standard IQ test. Claude Sonnet 4.5 is also available for free here: https://claude.ai/chats (during periods of heavy load, it may fall back to simpler models). I often use the free web interface instead of the Coding Agent interface for math problems, because it's easier to read mathematical equations in the browser version. version). And I generally use the free version of Claude instead of Google Search these days.

harshreality4mo ago

You're arguing things I didn't argue. The top-level comment wasn't about how to do some math problem that Sonnet or even Opus is capable of; it was about math book recommendations, and I was specifically mentioning that even though the LLM won't understand the math pedagogy behind why one book might be better than another, it's trained on enough commentary that it will give good recommendations (or anti-recommendations) for any well-known textbooks.

My experience with people who have LLM subscriptions of any kind is that they use them all the time and would immediately ask an LLM that kind of question, rather than asking on a web forum that's not even dedicated to math. So I think it's a fair presumption that someone asking that question doesn't have access to the best commercial models.

On the largely irrelevant question of what math LLMs can do, although Opus may do better, Sonnet can follow procedures sometimes but not consistently. It has blind spots and can't scale procedures; beyond certain numbers or dimensions or problem complexity, it just guesses (wrong). And those limits are quite low. If you want 2 simple examples:

4294967297*1331

Invert this matrix: m=[1 0 5 0 3 7; 2 3 0 3 3 2; 1 0 1 1 0 1; 3 5 3 5 1 2; 2 4 3 2 1 5; 1 0 5 2 1 5]

LLMs follow procedures, but whimsically. Better LLMs will be less whimsical, but they still won't be fully competent unless they digest questions into more formal terms and then interface with an engine like Wolfram.

j / k navigate · click thread line to collapse

0 comments

harshreality4mo ago

rerdaviesOP4mo ago

harshreality4mo ago

4294967297*1331

Invert this matrix: m=[1 0 5 0 3 7; 2 3 0 3 3 2; 1 0 1 1 0 1; 3 5 3 5 1 2; 2 4 3 2 1 5; 1 0 5 2 1 5]

j / k navigate · click thread line to collapse