Network has to grasp the "abstract" number, but it clearly did not grasp that concept.
This paper shows fairly conclusively that the network 'groks' modular addition.
This is what it is. Not "general arithmetic".
Deep Symbolic Regression for Recurrent Sequences https://arxiv.org/abs/2201.04600
(Interactive demo: http://recur-env.eba-rm3fchmn.us-east-2.elasticbeanstalk.com... )
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets https://arxiv.org/abs/2201.02177
Both of these models can generalize to numbers it have not seen.