undefined | Better HN

0 pointsosigurdson1y ago0 comments

I do think that distilling a model from another is much less impressive than distilling one from raw text. However, it is hard to say if it is really illegal or even immoral, perhaps just one step further in the evolution of the space.

0 comments

lemoncookiechip1y ago

It's about as illegal as the billions, if not trillions of IPs that ClosedAI infringed to train their own data without consent. Not that they're alone, and I personally don't mind that AI companies do it, but it's still amusing when they get this annoyed at others doing the same thing to them.

osigurdsonOP1y ago

I think they had the advantage of being ahead of the law in this regard. To my knowledge, reading copywritten material isn't (or wasn't illegal) and remains a legal grey area.

Distilling weights from prompts and responses is even more of a legal grey area. The legal system cannot respond quickly to such technological advancements so things necessarily remain a wild west until technology reaches the asymptotic portion of the curve.

In my view the most interesting thing is, do we really need vast data centers and innumerable GPUs for AGI? In other words, if intelligence is ultimately a function of power input, what is the shape of the curve?

lemoncookiechip1y ago

The main issue is that they've had plenty of instances where the LLM outputted copyrighted content verbatim, like it happened with the New York Times and some book authors. And then there's DALL-E, which is baked into ChatGPT and before all the guardrails came up, was clearly trained on copyrighted content to the point it had people's watermarks, as well as their styles, just like Stable Diffusion mixes can do (if you don't prompt it out).

Like you've put, it's still a somewhat gray area, and I personally have nothing against them (or anyone else) using copyrighted content to train models.

I do find it annoying that they're so closed-off about their tech when it's built on the shoulders of openness and other people's hard work. And then they turn around and throw Issy fits when someone copies their homework, allegedly.

greiskul1y ago

> Distilling weights from prompts and responses is even more of a legal grey area.

Actually unless the law changes this is pretty settled territory in US law. All output of AIs are not copyrightable, and are therefore in the public domain. The only legal avenue of attack OpenAi has is Terms of Service violation, which is a much weaker breach then copyright if it is even true.

ttesmer1y ago

> if intelligence is ultimately a function of power input, what is the shape of the curve?

According to a quick google search, the human body consumes ~145W of power over 24h (eating 3000kcals/day). The brain needs ~20% of that so 29W/day. Much less than our current designs of software & (especially) hardware for AI.

1 more reply

JTyQZSnP3cQGa8B1y ago

Illegally acquiring copyrighted material has always been highly illegal in France and I'm sure most other countries. Disney is another example of how it not grey at all.

ReptileMan1y ago

Is the question of training AI on data fair use settled yet? Because if it is not - it looks like fair use to me.

scotty791y ago

Isn't it more impressive given that training on model output usually leads to worse model?

If they actually figured out how to use output of existing models to build model that outperforms them then it's something that brings us closer to singularity than every other development so far.

j / k navigate · click thread line to collapse

0 comments

lemoncookiechip1y ago

osigurdsonOP1y ago

I think they had the advantage of being ahead of the law in this regard. To my knowledge, reading copywritten material isn't (or wasn't illegal) and remains a legal grey area.

lemoncookiechip1y ago

Like you've put, it's still a somewhat gray area, and I personally have nothing against them (or anyone else) using copyrighted content to train models.

greiskul1y ago

> Distilling weights from prompts and responses is even more of a legal grey area.

ttesmer1y ago

> if intelligence is ultimately a function of power input, what is the shape of the curve?

1 more reply

JTyQZSnP3cQGa8B1y ago

Illegally acquiring copyrighted material has always been highly illegal in France and I'm sure most other countries. Disney is another example of how it not grey at all.

ReptileMan1y ago

Is the question of training AI on data fair use settled yet? Because if it is not - it looks like fair use to me.

scotty791y ago

Isn't it more impressive given that training on model output usually leads to worse model?

If they actually figured out how to use output of existing models to build model that outperforms them then it's something that brings us closer to singularity than every other development so far.

j / k navigate · click thread line to collapse