But, for a subset of topics, say, math and logic, a minimal set of core principles (axioms) is theoretically sufficient to derive the rest. For such topics, it might actually make sense to feed the output of a (very, very advanced) LLM back into itself. No reference to the real world is needed - only the axioms, and what the model knows (and can prove?) about the mathematical world as derived from those axioms.
Next, what's to say that a model can't "build theory", as hypothesized in this article (via the example of arithmetic)? If the model is fed a large amount of (noisy) experimental data, can it satisfactorily derive a theory that explains all of it, thereby compressing the data down to the theoretical predictions + lossy noise? Could a hypothetical super-model be capable of iteratively deriving more and more accurate models of the world via recursive training, assuming it is given access to the raw experimental data?