Distilling weights from prompts and responses is even more of a legal grey area. The legal system cannot respond quickly to such technological advancements so things necessarily remain a wild west until technology reaches the asymptotic portion of the curve.
In my view the most interesting thing is, do we really need vast data centers and innumerable GPUs for AGI? In other words, if intelligence is ultimately a function of power input, what is the shape of the curve?
Like you've put, it's still a somewhat gray area, and I personally have nothing against them (or anyone else) using copyrighted content to train models.
I do find it annoying that they're so closed-off about their tech when it's built on the shoulders of openness and other people's hard work. And then they turn around and throw Issy fits when someone copies their homework, allegedly.
Actually unless the law changes this is pretty settled territory in US law. All output of AIs are not copyrightable, and are therefore in the public domain. The only legal avenue of attack OpenAi has is Terms of Service violation, which is a much weaker breach then copyright if it is even true.
According to a quick google search, the human body consumes ~145W of power over 24h (eating 3000kcals/day). The brain needs ~20% of that so 29W/day. Much less than our current designs of software & (especially) hardware for AI.
If they actually figured out how to use output of existing models to build model that outperforms them then it's something that brings us closer to singularity than every other development so far.