Deep Symbolic Regression for Recurrent Sequences https://arxiv.org/abs/2201.04600
If you look at embedding visualization it is very clear that the model learns order of numbers.
(Interactive demo: http://recur-env.eba-rm3fchmn.us-east-2.elasticbeanstalk.com... )
There is also:
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets https://arxiv.org/abs/2201.02177
Again, looking at visualizations the model very clearly grasps the structure of the function it models.