A very, very big problem we have with LLM discourse is that LLMs have changed
radically since the beginning of last year. If you're making an argument about modern foundation models based on the idea that they can't generate reliably correct answers to whether 5.11 is greater than 5.9, your mental model is completely out of date.
You don't have to believe me on this, just your own lying eyes. Go try this for yourself right now: ask it dy/dx of h(x)/g(x) where h(x) is x^3 + 1 and g(x) is -2e^x. That's a random Math Academy review problem I did last night that I pulled out of Notes.app. Go look.