If you're only training on a handful of works then you're taking more from them, meaning it's not de minimus.
For the record, I got this legal theory from Cory Doctorow[0], but I'm skeptical. It's very plausible, but at the same time, we also thought sampling in music was de minimus until the Second Circuit said otherwise. Copyright law is extremely malleable in the presence of moneyed interests, sometimes without Congressional intervention even!
[0] who is NOT pro-AI, he just thinks labor law is a better bulwark against it than copyright
The word-probabilities are transformative use, a form of fair use and aren't an issue.
The specific output at each point in time is what would be judged to be fair use or copyright infringing.
I'd argue the user would be responsible for ensuring they're not infringing by using the output in a copyright infringing manner i.e. for profit, as they've fed certain inputs into the model which led to the output. In the same way you can't sue Microsoft for someone typing up copyrighted works into Microsoft Word and then distributing for profit.
De minimus is still helpful here, not all infringments are noteworthy.
The process involved in obtaining that end work is completely irrelevant to any copyright case. It can be a claim against the models weights (not possible as it's fair use), or it's against the specific once off output end work (less clear), but it can't be looked at as a whole.
> Collateralised Copyright Liability
Is this a real legal / finance term or did you make it up?Also, I do not follow you leap to compare LLMs to CDOs (collateralised debt obligations). And, do you specifically mean CDO or any kind of mortgage / commercial loan structured finance deal?
It's not merely a compressed version of a song intended to be used in the same way as the original copyright work, this would be copyright infringement.
It's up to you if that counts as "a handful" or not.
If we take math or computer science for example: some very important algorithms can be compressed to a few bits of information if you (or a model) have a thorough understanding of the surrounding theory to go with it. Would it not amount to IP infringement if a model regurgitates the relevant information from a patent application, even if it is represented by under a kilobyte of information?
If you were a director at a game company and needed art in that style, it would be cheaper to have the AI do it instead of buying from the artist.
I think this is currently an open question.
An ai-enhanced Photoshop, however, could do wonders though as the base capabilities seem to be mostly there. Haven't used any of the newer ai stuff myself but https://www.shruggingface.com/blog/how-i-used-stable-diffusi... makes it pretty clear the building blocks seem largely there. So my guess is the main disconnect is in making the machines understand natural language instructions for how to change the art.
I would think if I can recognize exactly what song it comes from - not de minimus.