Irrelevant and incorrect.
> It's doesn't seem like piracy to me.
It's pretty indisputably piracy, whether or not it's legal/fair use/whatever. Many of the training sets included material like the books3 corpus which was downloaded to a server somewhere. That is simply piracy, doesn't matter why they downloaded it.
I believe many artists rightly refuse to accept this threat to their livelihoods because it was built on their labor. It's so fucking rich to see people patronizingly suggest that this is just an economic problem and those artists better just figure out a new profession.
You built a commercial product on unlicensed data. Do you actually think the law is going to agree that that's fair use?
Ah, this is obviously some strange usage of the word 'indisputably' that I wasn't previously aware of.
> I believe many artists rightly refuse to accept this threat to their livelihoods because it was built on their labor.
This model is trained from scratch using only public domain/CC0 and copyright images with specific permission for use: https://huggingface.co/Mitsua/mitsua-diffusion-one
Does it change anything?
If all the other models were deleted, and this was the only one left, and all future models also had to be similarly licensed, would it change even one single point?
Even if it was the only remaining model and this kind of licensing a requirement for all future work, artists would still be automated out of their highly skilled yet poorly paid profession. It still sucks. There's still no nice way to convey that.
> You built a commercial product on unlicensed data. Do you actually think the law is going to agree that that's fair use?
What do you think the Google search engine is, if not a commercial product built on unlicensed data?
The courts go both ways on this specific question with Google depending on the exact details, because nothing in law is as easy or simple as the clear-cut, goodies-vs.-baddies, black-and-white morality play you want this to be.
The fact that Stability AI have not yet been sued out of existence in a simple open-and-shut court case about copyright infringement ought to have demonstrated both this point, and also that the question "is this piracy?" is, in fact, disputable.
It seems incredible to me to suggest that piracy wasn't involved in the collection of training data, regardless of your view on the morality or legality of it. Datasets like books 3 indisputably contained copyrighted content that was being distributed without permission from the rightsholder. That's just the definition of piracy. If we can't agree on that then I'm not sure what we're doing here.
More materially to this discussion, yes, it would absolutely make a difference if the AI was only trained on licensed content. I wouldn't use it but I wouldn't have a problem with it. The issue is specifically that much of the work being used without permission is being used to replace the people who made that work, and is being used without permission. If the model is based on ethically acquired data, it would be less able to reproduce the style of specific artists. Imo, there would be more room for both kinds of art in this case.
I'm also aware that it's not a clear cut case legally but I think AI advocates and tech enthusiasts think it's a lot more likely that AI will win in court than the actual chances. Napster took years to litigate and was eventually shutdown. There's a really good discussion about this on the decoder podcast between actual lawyers.
https://transparencyreport.google.com/copyright/overview?hl=...
> It seems incredible to me to suggest that piracy wasn't involved in the collection of training data, regardless of your view on the morality or legality of it. Datasets like books 3 indisputably contained copyrighted content that was being distributed without permission from the rightsholder.
Is the Google search engine piracy?
https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....
https://en.wikipedia.org/wiki/Perfect_10,_Inc._v._Amazon.com....
https://en.wikipedia.org/wiki/Field_v._Google,_Inc.
https://9to5google.com/2016/04/27/getty-images-google-piracy...
https://www.reuters.com/article/idUSN07281154/
> That's just the definition of piracy. If we can't agree on that then I'm not sure what we're doing here.
It literally isn't the definition of piracy.
Piracy exists only with regard to the legal definition: "Copyright infringement (at times referred to as piracy) is the use of works protected by copyright without permission for a usage where such permission is required, thereby infringing certain exclusive rights granted to the copyright holder, such as the right to reproduce, distribute, display or perform the protected work, or to make derivative works."
Even this definition annoys a lot of people, but I will ignore the whole "it's not theft because you're not depriving the original owner of anything" as a case of taking an analogy too literally.
> More materially to this discussion, yes, it would absolutely make a difference if the AI was only trained on licensed content. I wouldn't use it but I wouldn't have a problem with it. The issue is specifically that much of the work being used without permission is being used to replace the people who made that work, and is being used without permission. If the model is based on ethically acquired data, it would be less able to reproduce the style of specific artists. Imo, there would be more room for both kinds of art in this case.
Congratulations on being consistent, almost all the artists and authors are still permanently out of work.
Even ignoring that style isn't covered by copyright (because you could reasonably argue instead that it's a trademark and/or design right issue), most artists are already extremely poor due to oversupply by other humans.
> I'm also aware that it's not a clear cut case legally but I think AI advocates and tech enthusiasts think it's a lot more likely that AI will win in court than the actual chances. Napster took years to litigate and was eventually shutdown. There's a really good discussion about this on the decoder podcast between actual lawyers.
FWIW, I know better than to trust my own beliefs[0] about law, as (free) ChatGPT is simultaneously bad, and yet vastly better at it than me.
Likewise, I think (but hold the view weakly) the mere existence of AI at even the level it was before ChatGPT's first release, is going to force a radical change in the nature of IP laws — even then these models were too good-and-cheap for countries to not allow them, while also breaking a lot of the current assumptions about everything: https://benwheatley.github.io/blog/2022/10/09-19.33.04.html
[0] I really ought to get a T-shirt printed with "Wittgenstein was wrong!"; there are so many different ways I don't accept one of his famous quotes: https://philosophy.stackexchange.com/questions/72280/first-p...