The work done in this paper is very interesting and your dismissal of “it can’t see a corpus and then generalize to n digits” is not called for. They are training models from scratch in 24 hours per model using only 20 million samples. It’s hard to equate that to an activity a single human could do. It’s as though you had piles of accounting ledgers filled with sums and no other information or knowledge of mathematics, numbers or the world and you discovered how to do addition based on that information alone. There is no textbook or tutor helping them do this either it should be noted.
There is a form of generalization if it can derive an algorithm based on a maximum length of 20 digit operands that also works for 120 digits. Is it the same algorithm we use by limiting ourselves to adding two digits at a time? Probably not but it may emulate some of what we are doing.