Post moving. "Reasonable" is just an arbitrary line. Especially since most if not all would make some mistake somewhere along the line.
You can greatly increase GPT's arithmetic capabilities tackling it like a problem to solve "on paper" in context. And this was done on 3.5 not 4.
https://arxiv.org/abs/2211.09066