It would be interesting to see a court ruling that the output of LLMs trained on copyleft code are licensed under the GPL ... and all other viral licenses simultaneously
It is quantum legality, to use copyright input is legal or illegal depending on the observer.
If the LLM reproduces a human's copyrighted work, then that copyright still stands. This is, in effect, the same as photocopying someone else's writing. The LLM was trained on the copyrighted work, is incapable of producing new copyrightable work, so if it duplicates the original work then the original author's copyright still stands.
I am not a lawyer
The courts have repeatedly said that copyright only applies to human creativity. The Supreme Court explicitly said this when they refused to hear the appeal:
https://en.wikisource.org/wiki/Thaler_v._Perlmutter,_Refusal...
> "We affirm our decision to refuse registration for the Work because it lacks the human authorship necessary to be eligible for copyright protection."
So they're saying that the LLM cannot be the author, because LLMs cannot claim copyright.
The related case about patents is more supportive of the narrative that AIs cannot be authors (see https://www.cafc.uscourts.gov/opinions-orders/21-2347.OPINIO...), specifically: "Here, there is no ambiguity: the Patent Act requires that inventors must be natural persons; that is, human beings."
The patent situation is that the Act says that inventor must be an individual, which the courts are interpreting to mean a human, so the LLM cannot be named as the inventor. So, in this case, yes, this is just saying that an LLM cannot be named as the inventor of a patent. That's not the same thing as the courts are saying with copyrights.
They're saying that the LLM can't be the author.
Now suppose you supply the LLM with a prompt that contains human creativity, it performs a deterministic mathematical transformation on the prompt to produce a derivative text, and you want to copyright that, claiming yourself as the author. What happens then?
If you think the answer is that you can't, how do you distinguish that from what happens when someone writes source code and has a compiler turn it into a binary computer program? Or do you think that e.g. Windows binaries can't be copyrighted because they were compiled by a machine?
So now consider two questions:
1. You actually didn't use an LLM, but they believe & claim you did. Who has the burden of proof to show that you actually own the copyright, and how do they do so?
2. They write new code that you feel is based on yours. They claim they washed it through an LLM, but you don't believe so. Who has the burden of proof here and how do they do so?