undefined | Better HN

0 pointsbayindirh1y ago0 comments

The moment you earn money from it, that's not fair use anymore. When I last checked, unlimited access to said models were not free, plus it's not "research" anymore.

- Addenda -

For the interested parties, the law states the following [0].

Notwithstanding the provisions of sections 17 U.S.C. § 106 and 17 U.S.C. § 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include:

    1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
    2. the nature of the copyrighted work;
    3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
    4. the effect of the use upon the potential market for or value of the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors

So, if you say that these factors can be flexed depending on the defendant, and can be just waved away to protect the wealthy, then it becomes something else, but given these factors, and how damaging this "fair use" is, I can certainly say that training AI models with copyrighted corpus is not fair use in any way.

Of course at the end of the day, IANAL & IANAJ. However, my moral compass directly bars use of copyrighted corpus in publicly accessible, for profit models which undermine many people of their livelihoods.

From my perspective, people can whitewash AI training as they see fit to sleep sound at night, but this doesn't change anything from my PoV.

[0]: https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors

0 comments

FloorEgg1y ago

I really don't think it's that simple. I can read books and then earn money from applying what I learned in them. I can also study art and then make original art in the same or similar styles. If a person was doing this there would be no one claiming copyright infringement. The only difference is it's a machine doing it and not a person.

The nature of copyright and plagiarism boils down to paraphrasing, and so long as LLMs sufficiently paraphrase the content it's an open question whether it's copyright infringement and requires new law/precedent.

So the fact they are earning money is a red herring unless they are reproducing the exact same content without paraphrasing (with exception to commentary). E.g. they can quote part of a work while commenting on it.

Where they have gotten into trouble with e.g. NYT afaik is when the LLM reproduced a whole article word for word. I think they have all tried hard to prevent the LLM from ever doing that to avoid that legal risk.

bayindirhOP1y ago

> I can read books and then earn money from applying what I learned in them.

How many books can you read, understand and memorize in T time, and how many books an AI can ingest in the T time?

If we're down to paraphrasing, watch this video [1], and think again.

Many models, given that you ask the correct questions, reproduce their training set with great accuracy, and this is only prevented with monkey patching, IIUC.

So, it's still a big mess, even if we don't add copyrighted corpus to the mix. Oh, BTW, datasets like "The Stack" are not clean as they claim. I have seen at least two non-permissively licensed code repositories inside that dataset.

[1]: https://youtu.be/LrkAORPiaEA

FloorEgg1y ago

I agree it's a big mess, that was kind of my point.

I am curious about the video, but am not compelled to spend 24 min watching it when you haven't summarized its thesis for me. The title of the video makes it seem adjacent at best to the points I was making. (Some automated flagging system =/= actual law)

o11c1y ago

"Making money" does not immediately invalidate fair use, but it does wave a big red flag in the courts' faces.

throwaway20371y ago

I would be more nuanced on this matter. As I understand, in the US, fair use allows media to write critiques of cultural artefacts (sorry, I cannot think of a better, broad term). For example, you can include small quotes from the film script when writing a critique of it without requiring permission from the owner of the copyright. And, until the World Wide Web arrived to the masses in the mid-1990s, most critiques were published by commercial media outlets, such as a daily newspaper. They were certainly published by commercial, for-profit entities. That said, I think the intent of the fair use is very important to the courts, much more than the entity that is doing the fair use (newspaper, blogger, etc.).

Another weird carve-out for copyright law in the US: parody. Honestly, I don't know if other jurisdictions allow parody in the same protected manner.

iggldiggl1y ago

> Another weird carve-out for copyright law in the US: parody. Honestly, I don't know if other jurisdictions allow parody in the same protected manner.

Germany: https://www.gesetze-im-internet.de/urhg/__51a.html (Though this explicit carve-out is a recent development, though generally speaking parodies were allowed even under the previous version of the law.)

1 more reply

bayindirhOP1y ago

So you say that, every law is a suggestion depending who's being tried?

o11c1y ago

Er, what? I'm speaking directly from the law, 17 U.S.C. § 107. It's deliberately written in terms of "factors to consider", rather than absolutes.

> In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include:

> * the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

> * the nature of the copyrighted work;

> * the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

> * the effect of the use upon the potential market for or value of the copyrighted work.

xvector1y ago

You can absolutely monetize works altered under fair use.

bayindirhOP1y ago

Any examples sans current AI models? I have not seen any, or failed to find any, to precise.

xvector1y ago

Basically any YouTube video that shows another YouTube video, song, movie, etc. as part of something else (eg a voiceover.)

j / k navigate · click thread line to collapse

0 comments

FloorEgg1y ago

bayindirhOP1y ago

> I can read books and then earn money from applying what I learned in them.

How many books can you read, understand and memorize in T time, and how many books an AI can ingest in the T time?

If we're down to paraphrasing, watch this video [1], and think again.

Many models, given that you ask the correct questions, reproduce their training set with great accuracy, and this is only prevented with monkey patching, IIUC.

[1]: https://youtu.be/LrkAORPiaEA

FloorEgg1y ago

I agree it's a big mess, that was kind of my point.

o11c1y ago

"Making money" does not immediately invalidate fair use, but it does wave a big red flag in the courts' faces.

throwaway20371y ago

Another weird carve-out for copyright law in the US: parody. Honestly, I don't know if other jurisdictions allow parody in the same protected manner.

iggldiggl1y ago

> Another weird carve-out for copyright law in the US: parody. Honestly, I don't know if other jurisdictions allow parody in the same protected manner.

1 more reply

bayindirhOP1y ago

So you say that, every law is a suggestion depending who's being tried?

o11c1y ago

Er, what? I'm speaking directly from the law, 17 U.S.C. § 107. It's deliberately written in terms of "factors to consider", rather than absolutes.

> In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include:

> * the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

> * the nature of the copyrighted work;

> * the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

> * the effect of the use upon the potential market for or value of the copyrighted work.

xvector1y ago

You can absolutely monetize works altered under fair use.

bayindirhOP1y ago

Any examples sans current AI models? I have not seen any, or failed to find any, to precise.

xvector1y ago

Basically any YouTube video that shows another YouTube video, song, movie, etc. as part of something else (eg a voiceover.)

j / k navigate · click thread line to collapse