undefined | Better HN

0 pointsgoogle2341234y ago0 comments

Copilot was trained on leaked internal Microsoft code that's on github at the moment. Anyway, everyone seems perfectly ok with training langauge models on copyright text.

0 comments

dylan6044y ago

Everyone is not perfectly OK with training language models on copyrighted text. It's just that evilCorps do it anyways, and there's nothing anyone can do to stop them. I can't do anything. At best, I could get a Twitter account and complain to the ether. The copyright holders can't do anything against the might evilCorps, but that doesn't make them okay with it. The fact you believe this is just sad, and exactly what evilCorps want from you.

This goes beyond fair use or satirical/comedic effect. They are training their models to output text in the style of the authors being absorbed. The style of is exactly the artistic effect that is being copyrighted.

gradys4y ago

Could you explain why you think training models on copyrighted text is illegal or copyright infringement or whatever else it might be?

klyrs4y ago

Training the models is fine. Applying the models, which reproduces copyrighted works without proper attribution, is where it gets sticky.

dylan6044y ago

My explanation will not be popular here on HN, but I'm never one to shy away. Especially when asked directly.

Buying a book, buying an audio CD, or buying a DVD/Blu-ray is granting the holder permission to read,listen,view that product as a single instance. You can lend them out, but that's all you're really allowed to do with them. The text,audio/video is not owned by you to do with as you please. People obviously do not like that, and argue making copies/backups is their right. Maybe that's acceptable, but we can agree posting them on torrents and sharing in any other manner from a copy made from the thing you have is not.

Saying that, training a model on someone's copyrighted text is not part of the agreement of the usage of said text whether it's a copyrighted magazine, newspaper, or book. If the people doing the training reach out to the copyright holders and get specific permission to use their copyrighted material in such a manner, then go ahead. The fact that people feel like they can do anything without the common courtesy of asking for permission is troubling to me that we've lost something as a society. There's no acknowledgment that someone has created something by their own work so that the creator can do with it as they please. A large portion of people believe that because it was created they deserve/should be able to/etc do what ever they want with someone else's creation. Including getting paid for derivitave works from the original creation.

Karrot_Kream4y ago

> The fact that people feel like they can do anything without the common courtesy of asking for permission is troubling to me that we've lost something as a society.

I see this sentiment a lot in FOSS spaces but I don't really understand why. The role of judicial process _isn't_ to provide a guiding moral philosophy around social organization. Depending on the government in question that's either a role of government functions or isn't something that should be guided at all. The role of law often (and yes, not in all governments, but at least in the US) is to offer a contract between the state and the individual.

I understand the potential for abuse here in using Copilot to regurgitate licensed works without adhering to the terms of the work's license, but I'm not fluent enough in law to know if this is illegal or not. Calling out and specifically applying strict limits this practice is certainly something I'm sympathetic to, and I'm very curious to see what the courts come up with. But swayed by a moral argument I am not.

1 more reply

alpaca1284y ago

> People obviously do not like that, and argue making copies/backups is their right.

In some jurisdictions this is in fact their right by law as long as they own the original (the music/film industry of course used this as an excuse to slap additional fees on every sale of any storage medium). Redistribution is different however.

liamwire4y ago

> My explanation will not be popular here on HN How is this better than ’bring on the downvotes’?

Moving on, I’ll put this to you: you claim training a ML model against copyrighted text is in violation of the ‘permission’ granted by the rights holder. However, flip this on its head for a moment – that’s basically all human brains do. Clearly, the greatest writers of our time haven’t written their works in a vacuum. Rather, that historical reading and inspiration becomes sufficiently obfuscated that we deem something adequately creative enough to be granted its own copyright.

Fundamentally, how does Copilot differ, other than perhaps being a poor implementation? Is it by not being ‘adequately creative’ enough? Is there some future version you could envision that would be, or is it the principle you’re arguing against?

3 more replies

leereeves4y ago

If a trained language model exactly reproduces copyrighted text, is there any question about whether copyright still applies?

benhurmarcel4y ago

But then the infringement is done by the person who publishes that output, not by the text editor that copies the code.

TchoBeer4y ago

This is a useless hypothetical, no language models do that

heavyset_go4y ago

And yet there are plenty of examples of Copilot reproducing copyrighted code verbatim, like is does in this example[1] that was posted on HN.

[1] https://twitter.com/mitsuhiko/status/1410886329924194309

jen204y ago

This is precisely what Copilot does, regularly.

j / k navigate · click thread line to collapse

0 comments

dylan6044y ago

gradys4y ago

Could you explain why you think training models on copyrighted text is illegal or copyright infringement or whatever else it might be?

klyrs4y ago

Training the models is fine. Applying the models, which reproduces copyrighted works without proper attribution, is where it gets sticky.

dylan6044y ago

My explanation will not be popular here on HN, but I'm never one to shy away. Especially when asked directly.

Karrot_Kream4y ago

> The fact that people feel like they can do anything without the common courtesy of asking for permission is troubling to me that we've lost something as a society.

1 more reply

alpaca1284y ago

> People obviously do not like that, and argue making copies/backups is their right.

liamwire4y ago

> My explanation will not be popular here on HN How is this better than ’bring on the downvotes’?

3 more replies

leereeves4y ago

If a trained language model exactly reproduces copyrighted text, is there any question about whether copyright still applies?

benhurmarcel4y ago

But then the infringement is done by the person who publishes that output, not by the text editor that copies the code.

TchoBeer4y ago

This is a useless hypothetical, no language models do that

heavyset_go4y ago

And yet there are plenty of examples of Copilot reproducing copyrighted code verbatim, like is does in this example[1] that was posted on HN.

[1] https://twitter.com/mitsuhiko/status/1410886329924194309

jen204y ago

This is precisely what Copilot does, regularly.

j / k navigate · click thread line to collapse