As I see it, we do not really know much about how GPT does it. The approximations can be very universal so we do not really know what is computed. I take very much issue with people dismissing it as "pattern matching", "being close to the training data", because in order to generalise we try to learn the most general rules and through increasing complexity we learn the most general, simple computations (for some kind of simple and general).
But we have fundamental, mathematical bounds on the LLM. We know that the complexity is at most O(n^2) in token length n, probably closer to O(n). It can not "think" about a problem and recurse into simulating games. It can not simulate. It's an interesting frontier, especially because we have also cool results about the theoretical, universal approximation capabilities of RNNs.