I'm also referring to the faster models, not the slow and expensive deep thinking ones which I have little experience with. I don't see how reasoning would enable deep thinking models to meaningfully evaluate textbook pedagogy, either.
They DO understand what they are doing. When I ask it to solve math problems, it goes through the several (many) steps involved (e.g. e.g. "apply the chain rule" while doing partial differentiation on a term in a Jacobian matrix). It gets pretty tedious when solving systems of linear equations, where it goes through each step of the Gauss-Jordan elimination while doing an LU decomposition, row by row. But one learns to ignore the blah-blah. Step by step, in absolutely ridiculous detail. The point: they absolutely 100% understand what they are doing, and understand it in minute detail.
It's clearly NOT regurgitating something that it has literally seen before, because the level of detail is beyond ridiculous for a human. It is applying generalized rules to specific concrete problems, and doing so with some level of strategic thinking.
Where did it learn those generalized principles, and how did it learn to do that? With absolute certainty, there are math textbooks among the materials they have been trained on. And they certainly learned it from SOMEWHERE. Probably math textbooks. How did they learn to generalize and think strategically? Well, that's the big mystery, isn't it? But they do.
The very best models achieve high scores on Math Olympiad problem sets (so competitive with some of the best minds on the planet). And Terrence Tau (greatest living mathematician) declares state-of-the-art models to be "better than most of my post-graduate students".
And what they can and cannot do is increasing by leaps and bounds on a weekly or monthly basis right now. It's hard to keep up. I frequently find that they can do things this week, that they could not do a week or a month ago. Startling, and quite utterly amazing.
Most of the time, I am using Claude Sonnet 4.5 as my coding agent, for which I pay $10/month. Measured IQ of 110, I think, with an IQ of 120 if you flip it into thinking mode. But only because there isn't enough undergraduate level mathematics in a standard IQ test. Claude Sonnet 4.5 is also available for free here: https://claude.ai/chats (during periods of heavy load, it may fall back to simpler models). I often use the free web interface instead of the Coding Agent interface for math problems, because it's easier to read mathematical equations in the browser version. version). And I generally use the free version of Claude instead of Google Search these days.
My experience with people who have LLM subscriptions of any kind is that they use them all the time and would immediately ask an LLM that kind of question, rather than asking on a web forum that's not even dedicated to math. So I think it's a fair presumption that someone asking that question doesn't have access to the best commercial models.
On the largely irrelevant question of what math LLMs can do, although Opus may do better, Sonnet can follow procedures sometimes but not consistently. It has blind spots and can't scale procedures; beyond certain numbers or dimensions or problem complexity, it just guesses (wrong). And those limits are quite low. If you want 2 simple examples:
4294967297*1331
Invert this matrix: m=[1 0 5 0 3 7; 2 3 0 3 3 2; 1 0 1 1 0 1; 3 5 3 5 1 2; 2 4 3 2 1 5; 1 0 5 2 1 5]
LLMs follow procedures, but whimsically. Better LLMs will be less whimsical, but they still won't be fully competent unless they digest questions into more formal terms and then interface with an engine like Wolfram.