The subtlety here is that NNs
do have a model, but it’s hard to see. Not just any neural network can perform as well as GPT-2–a very specific architecture can. That architecture, coupled with the data it’s trained on, implicitly represents a model, but it’s wildly obscured by the details of the architecture.
In this sense, people like Sutskever think that GPT-2 is a step on the path towards discovering the “correct” model.
It’s probably difficult to make much more progress without making extremely crisp by what you mean a “model” is, though, because I feel like it’s just as easy to move goal posts about what it means to “understand” as it does to “model”.
For example, replace every instance of “a model” in your post with “an understanding”, and it parses almost identically