AI Data Laundering (opens in new tab)

(waxy.org)

304 pointsmarceloabsousa3y ago113 comments

113 comments

The Authors Guild v Google decision about Google Books seems relevant:

> In late 2013, after the class action status was challenged, the District Court granted summary judgement in favor of Google, dismissing the lawsuit and affirming the Google Books project met all legal requirements for fair use. The Second Circuit Court of Appeal upheld the District Court's summary judgement in October 2015, ruling Google's "project provides a public service without violating intellectual property law." The U.S. Supreme Court subsequently denied a petition to hear the case.

[...]

> The court's summary of its opinion is:

[...]

> Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google’s commercial nature and profit motivation do not justify denial of fair use.

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

This doesn't touch on the ethics of course – at minimum I think allowing people to exclude themselves or their work from a dataset is necessary.

VanTheBrand3y ago

I would argue (as the court did) that google's use is transformative because the end result "book search" is in a different marketplace from "books." The end result / output of these generative AI systems trained on stock media and art is..."stock media and art."

That's kind of what this whole article is about. Just training the systems in research is arguably fair use but creating the entire pipeline might not be and the "loophole" here is trying to claim no responsibility for the training at the center of it because that was technically done by a 3rd party (...funded by the final creator of the full entire pipeline.)

pavlov3y ago

The court’s summary also mentions this aspect of differing marketplaces:

“… the revelations [i.e. the information served by Google Book Search] do not provide a significant market substitute for the protected aspects of the originals.”

This doesn’t apply to AI image generators which are clearly a “market substitute” for the protected originals used to train the system. For this reason I’d expect someone like Getty to want to revisit Authors Guild v Google sooner rather than later.

3 more replies

russellbeattie3y ago

An important part of the opinion (on the wiki page you linked to) is completely missing in the case of AI datasets:

> It generates new audiences and creates new sources of income for authors and publishers.

This is definitely not the case for artists and photographers, who don't benefit at all from the transformative nature of the AI output, and in fact are significantly harmed since it dilutes the uniqueness of their work by allowing anyone to imitate their style. Though to my knowledge "style" isn't protected by copyright - only trademark - I can't imagine there won't be lawsuits about this in the future.

That one artist who complained that people can't find his original work online now because of so many imitated pics is definitely exhibit A in terms of direct harm.

9wzYQbTYsAIc3y ago

> the revelations do not provide a significant market substitute for the protected aspects of the originals

It does seem like generative AI systems provide a significant market substitute, so this ruling probably wouldn’t apply, in court.

edit: see https://news.ycombinator.com/item?id=33194623 for some initial thoughts on how this problem (and others) could be rectified.

For example, with a database of protected works and self-censorship algorithms for generative AI systems, conscientiously objecting creatives could have a mechanism for excluding their works.

tpmoney3y ago

A substitute for what though? Copyright law is only concerned with substituting the work under copyright. That is to say, the consideration is whether the infringing aspects of the secondary work would alter the demand and market for the work being infringed.

In all the talk about AI data laundering there really hasn't been any indication that the AI generated item substitutes for the item it's alleged to infringe on. Substituting for a whole profession and its practitioners doesn't enter into the concerns of copyright law. There might be some argument that it should (to "promote the progress of science and useful arts" as it were), but copyright law to my knowledge hasn't been used to prevent new tech from putting professionals as a whole out of business.