undefined | Better HN

0 pointssvaha17282y ago0 comments

https://www.washingtonpost.com/technology/interactive/2023/a...

Scribd has lots of pdfs of books that are copyrighted. The Washington Post article mentions there are several other places it downloaded and scraped pdfs of copyrighted textbooks, etc

0 comments

fasterik2y ago

That's interesting to know, but that doesn't by itself imply that it's illegal. For example, Google Books, which has massive amounts of scanned PDFs of copyrighted works, is considered fair use under US copyright law.

cyanydeez2y ago

There's no good faith world where OPENAI trained only on legally available works

The only valid arguments is whether their model or it's output is itself protected legally.

still_grokking2y ago

As long as you don't try to scrape all the book's content…

It's only fair use for search purposes.

fasterik2y ago

It's fair use if the work is "transformative". GPT-4 isn't publishing the content of the books, it's publishing a model derived from the entire corpus. I'm not a lawyer, but I think there's an argument that it is transformative.

1 more reply

j / k navigate · click thread line to collapse

0 pointssvaha17282y ago0 comments

https://www.washingtonpost.com/technology/interactive/2023/a...

Scribd has lots of pdfs of books that are copyrighted. The Washington Post article mentions there are several other places it downloaded and scraped pdfs of copyrighted textbooks, etc

0 comments

fasterik2y ago

cyanydeez2y ago

There's no good faith world where OPENAI trained only on legally available works

The only valid arguments is whether their model or it's output is itself protected legally.

still_grokking2y ago

As long as you don't try to scrape all the book's content…

It's only fair use for search purposes.

fasterik2y ago

1 more reply

j / k navigate · click thread line to collapse