Don't mistake the "what" for the "how". What we ask LLMs to do is predict tokens. How they're any good at doing that is a more difficult question to answer, and how they are getting better at it, even with the same training data and model size, is even less clear. We don't program them, we have them train themselves. And there are a huge number of hidden variables that could be encoding things in weird ways.
These aren't n-gram models, and you're not going to make good predictions treating them as such.