undefined | Better HN

0 pointsabustamam2mo ago0 comments

Fair use by most standards? Which standards are those? I don't think a standard about training an AI on billions of images exists.

0 comments

oreally2mo ago

By the same 'transformative' standards that allow satire, reaction and commentary videos to exist. And those take 100% from the source and add context, whereas good generated AI images that aren't wholesale copying take like less than 10% from the original source.

In addition, the idea that you need to pay rent on *your observation* of someone else's work is absurd. No one pays Newton's descendants for making lifts or hosting bungee jump sport activities.

maplethorpe2mo ago

> good generated AI images that aren't wholesale copying take like less than 10% from the original source.

So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?

> In addition, the idea that you need to pay rent on your observation of someone else's work is absurd.

I agree that's absurd. But training a model is no more "observing images" than an F1 car is "walking" down a race track. Just because a race car uses kinetic energy, gravity, and friction to propel itself, the same way a human does, doesn't mean it's doing the same thing as a human. That comparison you're making is the real absurdity.

oreally2mo ago

> So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?

The model works by training on what features humans can make sense out of the image they're presented with, if the image and the observations of the image's feature were clear/observable enough. Then the generation makes use of those observations. I'm just using 10% as an arbitrary number to describe proportions. If the generation were 100% of the observations from the same image, the model would be overfitting, and many would have deemed it to have produced a copy.

> Just because a race car uses kinetic energy, gravity, and friction to propel itself, the same way a human does, doesn't mean it's doing the same thing as a human.

WTF does this even mean? A race car uses concepts from Newton, just as how a human uses gravity to train it's muscles to move be it knowingly or unknowingly. But you don't see them (car makers/humans) paying rent to Newton after he discovered gravity. Come on!

tovej2mo ago

Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?

Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.

protocolture2mo ago

If I buy a book entitled "How to make a table" and then make a table, the author does not own the table I made.

If I buy a book and use it to prop up a table, the author likewise does not own the table, or any works I undertake on that table.

If I buy a book and rip out the pages to make a collage, the US is the only legal jurisdiction where I run even slight risk of civil penalties.

An LLM is downstream of a book. Using a book to make an LLM does not confer any rights or privilges towards the LLM on the original author, just as using a hammer or nails dont permit the hammer or nail manufacturers any royalties on what I make, even if I build a hammer making machine with them. Theres no right to the works of people who build on your work without reproducing your work, at least outside of strict copyleft.

Its like demanding a cut from people who learned how to use photoshop by watching your photoshop tutorial youtube videos.

This is why the most successful cases against LLMs have been on the "Did they purchase the book" side of the fence, and not on the "What did they do with it" outside of the one case, where the legal company tried to use the LLM to 1:1 reproduce the content they had a limited license to, but thats obviously a no go and they should have known better.

oreally2mo ago

These are my opinions ofc.

> Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?

If you meant it literally.. I'd think that such a version would be a sort of parody. It'd be up to lawyers doing their cross-examinations to prove the work was intended for such a purpose though..

> Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.

Probably a lawyer would answer this better than me, but the 'content' is the same and would violate copyright. There's also other factors, like if it was translated/distributed for free.

Besides that I regard that LLMs to hold mathematical observations in contrast to a translated work. So long as the user ensures the output isn't close to what's already available imo it fits the transformative criteria.

tovej2mo ago

You cannot claim that a formulaic thesaurusing of a text is parody, not unless the process is related to the message of the original text itself. Even then, that's a dubious claim. Especially if it was done automatically.

I can just as well say that a translated work contains "linguistic observations". In fact a translator has to do a lot of transformative work in order to translate a text.

An LLM just takes a set of texts, looks at n-gram distributions, and generates similar text. It is quite literally a fuzzy way of copying. There aren't any mathematical observations in the output. Any math (statistics) is done in the copying process.

2 more replies

protocolture2mo ago

Google scrapes the entire internet to generate a searchable index of the internet. But the resulting search engine is only infringing where it reproduces entire copies of scraped news articles and images. Both places where they have been put back in their place through legal means.

Like LLM's, it retains the produced index but not the original data.

The big concern is whether producing an LLM is competing with artists directly, but as artists dont make LLMs, this seems to be consistently ruled as non competing.

abustamamOP2mo ago

I don't quite follow. People don't go on Google and search for midieval history and pretend they wrote the Wikipedia article on it because they found it on Google.

People _do_ use LLMs to make art in someone else's style (knowingly or unknowingly) and claim it as their own creation.

Also, I wouldn't say the creators of LLMs are competing with artists. The users of LLMs are. Arists don't make LLMs, they make art, and people who use midjourney and such make art.

But I'd argue that creators of LLMs are still liable for the harm people cause using their tools. Perhaps not legally, but certainly ethically.

j / k navigate · click thread line to collapse

0 comments

oreally2mo ago

In addition, the idea that you need to pay rent on *your observation* of someone else's work is absurd. No one pays Newton's descendants for making lifts or hosting bungee jump sport activities.

maplethorpe2mo ago

> good generated AI images that aren't wholesale copying take like less than 10% from the original source.

So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?

> In addition, the idea that you need to pay rent on your observation of someone else's work is absurd.

oreally2mo ago

> So would the model work if it only trained on the top 10% of pixels in every image? Or do they in fact need the entire image before they begin processing it, and therefore use the entire image?

> Just because a race car uses kinetic energy, gravity, and friction to propel itself, the same way a human does, doesn't mean it's doing the same thing as a human.

tovej2mo ago

Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?

Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.

protocolture2mo ago

If I buy a book entitled "How to make a table" and then make a table, the author does not own the table I made.

If I buy a book and use it to prop up a table, the author likewise does not own the table, or any works I undertake on that table.

If I buy a book and rip out the pages to make a collage, the US is the only legal jurisdiction where I run even slight risk of civil penalties.

Its like demanding a cut from people who learned how to use photoshop by watching your photoshop tutorial youtube videos.

oreally2mo ago

These are my opinions ofc.

> Is it transformative if I take all the pages in Hanya Yanagiharas A Little Life and use a thesaurus to change every second word?

If you meant it literally.. I'd think that such a version would be a sort of parody. It'd be up to lawyers doing their cross-examinations to prove the work was intended for such a purpose though..

> Or a more realistic scenario: what if I translate it to Spanish without license from the author? That's not allowed, and yet I have "transformed" the work in the same way that an LLM does.

Probably a lawyer would answer this better than me, but the 'content' is the same and would violate copyright. There's also other factors, like if it was translated/distributed for free.

tovej2mo ago

I can just as well say that a translated work contains "linguistic observations". In fact a translator has to do a lot of transformative work in order to translate a text.

2 more replies

protocolture2mo ago

Like LLM's, it retains the produced index but not the original data.

The big concern is whether producing an LLM is competing with artists directly, but as artists dont make LLMs, this seems to be consistently ruled as non competing.

abustamamOP2mo ago

I don't quite follow. People don't go on Google and search for midieval history and pretend they wrote the Wikipedia article on it because they found it on Google.

People _do_ use LLMs to make art in someone else's style (knowingly or unknowingly) and claim it as their own creation.

Also, I wouldn't say the creators of LLMs are competing with artists. The users of LLMs are. Arists don't make LLMs, they make art, and people who use midjourney and such make art.

But I'd argue that creators of LLMs are still liable for the harm people cause using their tools. Perhaps not legally, but certainly ethically.

j / k navigate · click thread line to collapse