But even so,
> In other words, I would claim that as a model more closely matches those "surface statistics" it necessarily more closely resembles the underlying mechanisms that gave rise to them.
I don't what it means, for example, for a deep neural network, to "more resemble" the underlying process of the weather. It's also obviously false in general: If you have a mechanical clock and quartz-crystal analog clock you are not going to be able to derive the internal workings of either or distinguish between them from the hand positions. The same is true for two different pseudo-random number generator circuits that produce the same output.
> I have yet to see an example where a more accurate model was conceptually simpler than the simplest known model at some lower level of accuracy.
I don't understand what you mean. Simple models often yield a high level of understanding without being better predictors. For example an idealized ball rolling down a plane, Galileo's mass/gravity thought experiment, Kepler etc. Many of these models ignore less important details to focus on the fundamental ones.
> From an information theoretic angle I think it's similar to compression (something that ML also happens to be almost unbelievably good at). Related to this, I've seen it argued somewhere (I don't immediately recall where though) that learning (in both the ML and human sense) amounts to constructing a world model via compression and that rings true to me.
In practice you get nowhere trying to recreate the internals of a cryptographic pseudo-random number generator from the output it produces (maybe in theory you could do it with infinite data and no bounds on computational complexity or something) even though the generator itself could be highly compressed.
> Sure, but what leads to those theories? They are invariably the result of attempting to more accurately model the things which we can observe.
Yes but if the model does not lead to understanding you cannot come up with the new ideas.