You would think having a massive scale just means it has infringed even more copyrights, and therefore should be in even more hot water
But most people don't want to live in permanent mental distress due to shame of past action or fear of rebellion, I guess.
More generally, we tend to view number of causalities in war as a large number, and not as the sum of every tragedies that it represent and that we perceive when fewer people die.
If you're only training on a handful of works then you're taking more from them, meaning it's not de minimus.
For the record, I got this legal theory from Cory Doctorow[0], but I'm skeptical. It's very plausible, but at the same time, we also thought sampling in music was de minimus until the Second Circuit said otherwise. Copyright law is extremely malleable in the presence of moneyed interests, sometimes without Congressional intervention even!
[0] who is NOT pro-AI, he just thinks labor law is a better bulwark against it than copyright
The word-probabilities are transformative use, a form of fair use and aren't an issue.
The specific output at each point in time is what would be judged to be fair use or copyright infringing.
I'd argue the user would be responsible for ensuring they're not infringing by using the output in a copyright infringing manner i.e. for profit, as they've fed certain inputs into the model which led to the output. In the same way you can't sue Microsoft for someone typing up copyrighted works into Microsoft Word and then distributing for profit.
De minimus is still helpful here, not all infringments are noteworthy.
The process involved in obtaining that end work is completely irrelevant to any copyright case. It can be a claim against the models weights (not possible as it's fair use), or it's against the specific once off output end work (less clear), but it can't be looked at as a whole.
> Collateralised Copyright Liability
Is this a real legal / finance term or did you make it up?Also, I do not follow you leap to compare LLMs to CDOs (collateralised debt obligations). And, do you specifically mean CDO or any kind of mortgage / commercial loan structured finance deal?
It's not merely a compressed version of a song intended to be used in the same way as the original copyright work, this would be copyright infringement.
It's up to you if that counts as "a handful" or not.
If we take math or computer science for example: some very important algorithms can be compressed to a few bits of information if you (or a model) have a thorough understanding of the surrounding theory to go with it. Would it not amount to IP infringement if a model regurgitates the relevant information from a patent application, even if it is represented by under a kilobyte of information?
If you were a director at a game company and needed art in that style, it would be cheaper to have the AI do it instead of buying from the artist.
I think this is currently an open question.
An ai-enhanced Photoshop, however, could do wonders though as the base capabilities seem to be mostly there. Haven't used any of the newer ai stuff myself but https://www.shruggingface.com/blog/how-i-used-stable-diffusi... makes it pretty clear the building blocks seem largely there. So my guess is the main disconnect is in making the machines understand natural language instructions for how to change the art.
I would think if I can recognize exactly what song it comes from - not de minimus.
My point isn't to argue merits of that case, it's just to point out that OP's joke is like a stereotypical output of an LLM: seems to make sense, but really doesn't.
1) the purpose and character of use.
2) the nature of the copyrighted material.
3) the *amount* and *substantiality* of the portion taken, and.
4) the effect of the use upon the *potential market*.
So in that regard, if you're training a personal assistance GPT, and use some software code to teach your model logic, that is easy to defend as fair use.
But the extent of use matters, and if you're training an AI for the sole purpose of regurgitating specific copyrighted material, it is infringement, if it is copyrighted, but in this case, it is not copyright issue, it is contracts and NDAs.