I’ll concede that if you tokenized the equations correctly, you might be able to get a language model to learn arithmetic, since it’s just symbol manipulation; but to make the leap that a general text model has learned anything like arithmetic is more than two bridges too far.
While deep learning language models are useful for certain cases (eg translation, and autocomplete), and are better at making superficially grammatical text than previous models; they are most emphatic my not learning anything about general concepts. They can’t even create coherent text for more than a paragraph, and even then it’s obvious they have no idea what any of the words actually mean.
These large language models are the MOST overhyped piece of AI I’ve seen in my professional career. The fact that they’re neural nets redux is just the chef’s kiss.