Granting that, the original point was that they're not excited about this particular paper unless (for example) it improves the networks' general reasoning abilities.
The problem was never "my llm can't do addition" - it can write python code!
The problem is "my llm can't solve hard problems that require reasoning"