I wouldn't trust the LLM's raw output to be correct, but math is provable and if there was a filter between the LLM's output (which would be in some more rigid/structured format, not free form text) and whatever the user sees that tries to prove the LLM's output is correct (and try again if it goes wrong[0]), then i can see LLMs being perfectly fine for that.
In fact i'd say in general anything that LLMs produce that can be "statically checked" in some way, can be fine to rely on. You most likely need more than a chat interface though, but i think in general it is plausible for such solutions to exit.
[0] hopefully it wont end up always failing, ending up in an infinite loop :-P