What made Napster illegal is that the company did not create their network for fair use of content, but to explicitly violate copyright for profit.
Copilot is like Napster in this case, in that both services launder copyrighted data and distributed it to users for profit.
Copilot is not like other P2P networks that exist to share data that is either free to distribute or can be used under the fair use doctrine. Copilot explicitly takes copyrighted content and distributes it to users in violation of licenses, that's its explicit purpose.
It's entirely possible to make a Copilot-like product that was trained on data that doesn't have restrictive licensing in the same way it's entirely possible to create a P2P network for sharing files that you have the right to share legally.
So if you produce napster 2.0 to be the best music piracy tool, and you test it for piracy, and you promote it for piracy... you're going to have trouble.
If you produce napster 2.0 as a general purpose file sharing system, let's call it a torrent client, and you can claim no ill intent... you may have trouble but it's a lot more defensible in court.
I would find it a big stretch to say Github's intent here is to illegally distribute copyrighted code. No judgment on whether the class action has any merit, just saying I would be very surprised if discovery turns up lots of emails where Github execs are saying "this is great, it'll let people steal code."
Almost everything on GitHub is subject to copyright, except for some very old works (maybe something written by Ada Lovelace?), and US government works not eligible for copyright.
Now, many of the works there are also licensed under permissive licenses, but that is only a defense to copyright infringement if the terms of those licenses are being adequately fulfilled.
Agreed. Like I said, it's about intent. Can anyone say with a straight face that copilot is an elaborate scheme to profit by duplicating copyrighted work?
I don't think the defense is that it wasn't trained on copyrighted data. It obviously was.
I think the defense is that anything, including a person, that learns from a large corpus of copyrighted data will sometimes produce verbatim snippets that reflect their training data.
So when it comes to copyright infringement, are we moving the goalposts to where merely learning from copyrighted material is already infringement? I'm not sure I want to go there.
The issue isn't downloading copyrighted stuff.
Rather, it's making available and letting others download it. That was where you got in trouble.
People used to get busted from buying bootleg VHS and DVDs on the street before P2P filesharing was a common thing. Then, early on, people were sued for downloading copyrighted files before rightsholders decided to take a different legal strategy to go after sharers and bootleggers.