The other parts seem unlikely. It has no knowledge of bases, except insofar as they appear in the training set. I saw this in our GPT chess work — even with strange tokenization, it learned chess notation well.
I give you points for creative thinking, but it’s important not to make inferences that “feel correct.” No matter what your gut is telling you, I would happily bet $10k that the emergence of arithmetic has nothing to do with the things you mention.
If an alternative training scheme were devised that didn’t rely on any of that, it would still result in a model that behaved more or less the same as what we see here. The properties of the training process influence the result, but they don’t cause the result — that would be like saying your vocal cords cause you to be an excellent orator. Vocal cords don’t form the ideas; the training process doesn’t form the arithmetic.
What we’re seeing is a consequence of a large training dataset. The more tasks a model can perform, the better it is at any individual task.