When someone pirates a book, they're replacing the original without consent or remuneration to the copyright holders.
When you train an AI on the contents of a book, you're not replacing it. If someone is interested in the content, they still need to buy it. Using ChatGPT is not a substitute. If it is, they're gonna have to prove it in court, but I doubt they'll be able to.
Merely summarizing info and attributing it to the source is the basic element of learning, for both machines and human beings.
These suits are necessary becsuse it's not clear where the line is, and if ChatGPTs functions actually cross it.
What is clear is that OpenAI is doing its best to avoid infringing anyone's copyright even if it is trivial for them to do so. They have the training data so they can simply output it word for word bypass the LLM. They don't do that and further restrain their LLM from making too long recitations.
If you can trick / manipulate the LLM into giving you too much then I say that infringement is on you.
The ability to ask a commercial product is. In fact, feeding the book to that commercial product is already infringement.
ClosedAI is doing squat. The very least they could do is ask authors for permission, and of course if they really cared they would have LLM infer attribution and revenue share with the original creators.
The vast majority of publications (especially those of a explanatory nature) do not contribute original content/information. The exceptions are things like research articles/monographs, historical records, government reports. But copyright infringement doesn't apply here because these things weren't published with a profit motive but precisely to publicize the information as widely as possible. The only problem area I can think of involves books published by commercial publishers which promise 'exclusive peek' into the life of some famous person (think biographies of celebrities or books like Fire and Fury). In that kind of case there is indeed original content, and revealing it in detail will arguably mean less sales for the authors/publishers.
I disagree with this emphasis, given that rote, repetitive or technical material that is not original authorship is not in peril. Human authors who wrote original creative content, or wrote in a style that is personal and widely recognized, their rights to trade and commerce are in peril. That is much more important over the long term, and is not worth losing for convenient information mixers.
If someone makes a commercial activity of "answering any question about book contents at any time 24/7", hires tons of people to read those books and reply to billions of such questions daily thereby helping everyone not buy any books, is that robbing book authors?
Food for thought.
but let's be direct - are we talking about market share in the millions of views, where pirate copies are also available, or the sale of any books at all compared to a few hundred over a year. Quite the difference on a subsistence level of an individual author, no?
Curiously, when I ask GPT-4 about some well-known but under-copyright book, it says it can't answer because of the copyright. For well-known books out of copyright such as Alice in Wonderland, it can recite passages but tends to get lost and start reciting another section or book at some point. Would be real frustrating to use as a substitute.
Don't teachers do the same?
- Trained their minds on existing books
- Tutor the next generation of students
- Give classes on book contents
- Answer questions about those books
The book publishing industry didn't go out of business because there are teachers answering questions. To the contrary, it benefited book sales, because most people aren't good self-learners.
What's wrong with having a machine do the same?
> - Trained their minds on existing books
Training a human = enriching conscious human mind. "Training" AI = mechanically creating a derivative work (no conscious mind to enrich). Training a human is the same to "training" AI as killing a human to "killing" a Unix process, same word different things