You are clear but mistaken.
I give you points for creative thinking, but it’s important not to make inferences that “feel correct.” No matter what your gut is telling you, I would happily bet $10k that the emergence of arithmetic has nothing to do with the things you mention.
If an alternative training scheme were devised that didn’t rely on any of that, it would still result in a model that behaved more or less the same as what we see here. The properties of the training process influence the result, but they don’t cause the result — that would be like saying your vocal cords cause you to be an excellent orator. Vocal cords don’t form the ideas; the training process doesn’t form the arithmetic.
What we’re seeing is a consequence of a large training dataset. The more tasks a model can perform, the better it is at any individual task.