> In late 2013, after the class action status was challenged, the District Court granted summary judgement in favor of Google, dismissing the lawsuit and affirming the Google Books project met all legal requirements for fair use. The Second Circuit Court of Appeal upheld the District Court's summary judgement in October 2015, ruling Google's "project provides a public service without violating intellectual property law." The U.S. Supreme Court subsequently denied a petition to hear the case.
[...]
> The court's summary of its opinion is:
[...]
> Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google’s commercial nature and profit motivation do not justify denial of fair use.
https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....
This doesn't touch on the ethics of course – at minimum I think allowing people to exclude themselves or their work from a dataset is necessary.
That's kind of what this whole article is about. Just training the systems in research is arguably fair use but creating the entire pipeline might not be and the "loophole" here is trying to claim no responsibility for the training at the center of it because that was technically done by a 3rd party (...funded by the final creator of the full entire pipeline.)
“… the revelations [i.e. the information served by Google Book Search] do not provide a significant market substitute for the protected aspects of the originals.”
This doesn’t apply to AI image generators which are clearly a “market substitute” for the protected originals used to train the system. For this reason I’d expect someone like Getty to want to revisit Authors Guild v Google sooner rather than later.
> It generates new audiences and creates new sources of income for authors and publishers.
This is definitely not the case for artists and photographers, who don't benefit at all from the transformative nature of the AI output, and in fact are significantly harmed since it dilutes the uniqueness of their work by allowing anyone to imitate their style. Though to my knowledge "style" isn't protected by copyright - only trademark - I can't imagine there won't be lawsuits about this in the future.
That one artist who complained that people can't find his original work online now because of so many imitated pics is definitely exhibit A in terms of direct harm.
It does seem like generative AI systems provide a significant market substitute, so this ruling probably wouldn’t apply, in court.
edit: see https://news.ycombinator.com/item?id=33194623 for some initial thoughts on how this problem (and others) could be rectified.
For example, with a database of protected works and self-censorship algorithms for generative AI systems, conscientiously objecting creatives could have a mechanism for excluding their works.
In all the talk about AI data laundering there really hasn't been any indication that the AI generated item substitutes for the item it's alleged to infringe on. Substituting for a whole profession and its practitioners doesn't enter into the concerns of copyright law. There might be some argument that it should (to "promote the progress of science and useful arts" as it were), but copyright law to my knowledge hasn't been used to prevent new tech from putting professionals as a whole out of business.
So is digitizing a copyright vhs and hosting it via torrents also fair use? Its transformative, the public display of the video is limited, there is no market for vhs.
I don't get it whats the difference other than Google having deeper pockets than me?
or they could open it all up for everybody and stop protecting the rights of death people (authors dead less then 70 years ago)
then again, that will make the publishers starve... but why pretend publishing corporations need food?
In my utopia, the end results are models containing the sum total of human output, available to everyone.
What I think is unconscionable is training the models on public works and then retaining them exclusively for private use.
Comment generated with gpt-neox prompt: Comment about AI and data collection and generation and its pitfalls, expressing concern, emphasis on professions, emphasis on automation, written by Stephen King, creative writing, award winning, trending on reddit, trending on hacker news, written by Greg Rutkowski, written by Zola, written by Voltaire, written by authpor, written by moyix.
(Just kidding, it wasn't AI generated but you see my point.)
Tell me how ML is different than the mind of a toddler ravenous for new information.
For every billion dollar start-up using data at scale, there are tens of thousands more researchers and hobbyists doing the exact same, producing wonderful results and advances.
If we stop this growth dead in the tracks, other countries more willing to look past the IP laws will jump ahead. And if Stability locks away their secret sauce, some new party will come and give away the keys to the kingdom yet again.
You can't block the signal. Except, of course, by legislating against it in some Luddite hope we can prevent the future from happening.
Instead of worrying careers will end, we should look at this as being the end of specialization. No longer do we need to pay 20,000 hours to learn one thing to the exclusion of all others we would like to try. Now we'll be able to clearly articulate ourselves with art, music, poetry. We'll become powerful beings of thought and expression.
Humans aren't the end or the peak of evolution. We should be excited to watch this unfold.
[1] Maybe Disney would like you to pay more for a premium learning plan for your child, but thankfully that's not (yet) possible.
There is no known experimentally verifyiable model of toddlers' brains, let alone one based on matrix multiplication and normalization. Developing such a model would be a noteworthy achievement.
Therefore these are different.
I'm a 20000 hours person. Knowing what I know about what I do, it's real sad to see someone misunderstand what goes into creativity this egregiously. Prompt engineering is such an unbelievably watered down "version" of making a painting. It's like writing a page, or even a folder! of bullet points and handing it to a ghostwriter, then telling them "put the end result between Shakespeare and Poe".
That's not unleashing your creative voice. Unleashing your voice and acquiring technical skills in a chosen field are the same. If you endlessly mixed all the prior classical works, it doesn't matter how you weight them, it won't spit out Mozart. You're stuck in the gamut of the model, between the maxima points of each artist.
It's an incredible tool to generate stuff quickly, and to some extent it will help artists whose work depends on quantity over quality.
If a person published a work that clearly plagiarized or violated a patent, that person would be open to legal action.
I’m all for systemic change, but uses like this may end up having a chilling effect on human-created work.
It’s not that AIs are too good. They look like crude knockoff products to trained eyes. And crude knockoffs are usually considered bad things.
The toddler is human. AIs are not humans.
It's a human right to learn. Non-humans don't (and shouldn't) have human rights.
>Humans aren't the end or the peak of evolution. We should be excited to watch this unfold.
Spoken like a true evolutionary loser.
Well, I can't keep a toddler in a data center, pumping out work on demand. Or copyright it and limit who it chooses to work for when it grows up.
For instance.
If Google Brain/DeepMind were to crack AGI, it would make Google/Alphabet crazy rich at the detriment of millions of YouTubers, Book authors, musicians, drivers.
AI will concentrate power and wealth to fewer individuals.
If I have a photographic memory and I memorize the Coca Cola logo and then draw it into a commercial work by decoding the firing of my neurons into muscle movements, the storage and retrieval method I used has no bearing on whether I infringed on their copyright.
What? It's clearly a derived work.
It is absolutely not clear when statistical models stops counting ngrams and starts making a derived works.
I can write code to get a list of characters in the book, get their page numbers analysed and draw graphs to help me create my own version. Am I breaking copyright laws? Most likely not.
It's a truly grey area which lawmakers never saw coming.
I believe if events unfold well we'll see and treat AI tools to be like sharp knives eventually. It will be up to the user what they do with it.
Most of the predictions in that first comment came true.
This is a very strong and likely inaccurate presumption.
The existing publicly available datasettes, algorithms, and weighted models certainly should be expected to be permanently in the hands of some non-law-abiding parties, at this point.
I think that it will be important to ensure that we have symmetric information, going forward, otherwise trying to put the genie back in the bottle may just end up further disadvantaging those that try to follow the rules.
Laundering private things through the commons feels not as shady as laundering in private networks. The commons benefits too.
It's more like open source that money laundering