I found the OpenAI bot scraping my blog recently. Assuming they used that data, when will they attribute me?
Given that Google successfully used a fair use defense in Authors Guild, Inc. v. Google, Inc., I think it's likely OpenAI and the others will also win in court.
I do think it's possible for specific uses of the output of LLMs to be copyright infringement. That's why it's interesting to see Microsoft to indemnify customers of their commercial products in the event a case is brought against the customer. This is smart on Microsoft's part; the risk probably isn't very high and by making it a non-issue for their customers, many more will feel comfortable using their LLM-based features and services.
Like githubs servers host AGPL code as data, without having to be open-source
The perceived problem there, is if their model generates an exact copy of some AGPL code, and you use it in your project unknowingly, and then you get can sued
Note: i'm declaring my comment license as https://creativecommons.org/licenses/by-sa/4.0/
So if you remix or transform my comment by responding it, please attribute to me your response.
I believe that this fact is and will be exploited to strip copyright and effectively transfer ownership using cleanroom/firewall techniques.
That's the key part. You haven't yet proved they have actually used your content for anything (other than, potentially, read the license to decide if they should include or discard from their training set).
But in practice we'll never know for sure if they are respecting the terms of licenses until 1) this is tested in court, or 2) there's some internal leak that points into either direction.
I would think OpenAI wants the thornier legal issues actually settled so that the whole ecosystem can grow within those terms & they can lobby for the legal changes they need/want?
Because people train on corpuses of data all the time, without a license or any attribution.
Every piece of text a writer reads is training that writer. Every image an artist sees helps to train that artist. Every sound a musician hears is training that musician.
That doesn't mean they can't exclude their works from training via a license going foreward. But that becomes an enforcement problem.
I also doubt "humans are just a larger Markov chain than the LLM and they're allowed to" will hold up in court.
I really hope “copyright can be used to prohibit reading and learning” does not hold up in court.
Copyright is, and should be, a protection from unauthorized reproduction. Extending it to protect the abstract ideas would be a disaster. And extending it to control stylistic learning would be even worse.
Cliff notes is not what lets you replicate the style of the author etc.
And yeah, you can use "it's just feeding it into a bunch of math" to justify nearly anything that involves software including good old piracy. What matters is what math is used for. (Spoiler: line up Microsoft's pockets at the expense of actual writers in this case.)
When someone pirates a book, they're replacing the original without consent or remuneration to the copyright holders.
When you train an AI on the contents of a book, you're not replacing it. If someone is interested in the content, they still need to buy it. Using ChatGPT is not a substitute. If it is, they're gonna have to prove it in court, but I doubt they'll be able to.
https://creativecommons.org/2023/03/23/the-complex-world-of-....
If you dissect the plaintiffs claim they are arbitrarily conflating training and regurgitating
Training is using for criticism and comparison purposes, hence fair use
And there is no lawsuit against what it regurgitates and the purpose of its output, whether someone asks it to give a list for comparison purposes, or specifically asks it for a story that has a plagiarized result
===== who is dan markunas
ChatGPT I'm sorry, but I don't have any information on a person named Dan Markunas in my database ....
who is janet saunders ChatGPT I'm sorry, but I don't have any specific information about a person named Janet Saunders in my database,
===========
As far as style goes, copyright doesn't protect that. Trademark MIGHT if your style is distinctive enough to be a trademark (and is used as such), but the "style" of a writer is largely about tempo and word choices, none of which are subject to copyright protections.