This is immediately obvious if you look at it through a statistical learning lens and not the mysticism crystal ball that many view NN’s through.
"Play and reflection" is something else, which isn't distillation.
Given this, there’s no reason why it could not be trivial to produce a child model from (filtered) parent output that exceeds the child model on a different, more meaningful objective like being a useful chatbot. There's no reason why this would have to be limited to domains with verifiable answers either.
It is not distillation. It's like how you can arrive at new knowledge by reflecting on existing knowledge.
Unfiltered? Sure. With human curation of the generated data it certainly can. (Even automated curation can do this, though its more obvious that human curation can.)
I mean, I can randomly developed fact claims about addition, and if I curate which ones go into a training set, train a model that reflects addition of integers much more accurately than the random process which generated the pre-curation input data.
Without curation, as I already said, the best you get is a distillation of the source model, which is highly improbable to be more accurate.
That is the existential, $1T question.
Also, can I have some money to build more data centres pls?