OpenAI pleads it can't make money with o using copyrighted material for free (opens in new tab)

(futurism.com)

18 pointsc5karl1y ago22 comments

22 comments

Regarding "it would be impossible to train today's leading AI models": OpenAI has a pattern of equating humanity's progress with their own progress in their corporate communication.

A similar instance that bugs me is on the documentation page for their GPTBot scraper (https://platform.openai.com/docs/gptbot) where they say "Allowing GPTBot to access your site can help AI models become more accurate". Strange wording, given that is specifically OpenAI's models you're allowing, not "AI models" in general.

The goal is both cases is to make you feel like you're standing in the way of progress by objecting.

scohesc1y ago

If the Internet Archive makes a whoopsie with loaning out books they didn't have the license or permission to and gets sued into the ground by publishers, then OpenAI shouldn't have been/be allowed to use and process copyrighted material either.

OpenAI is actively receiving money from funders and (potentially, maybe, eventually will) make money by using others' copyrighted content at a much larger potential than what the Internet Archive was doing.

OpenAI should not have permission to soullessly suck up copyrighted material and use it to make money.

On the other hand, other countries who don't place ethical/moral/fiscal priority on creating and protecting copyrighted works will eat the wests' lunch when it comes to AI as there's no limitation that's preventing them from consuming the content.

Not sure what the answer is - maybe copyright is an archaic idea/belief built and maintained by a once well-intended, now corrupted economic system that needs a bit of a shakeup anyways...

AnimalMuppet1y ago

So? Making money is not a legal right. Copyright is. If you can't make money without misappropriating copyrighted material, then you can't make money that way.

verdverm1y ago

It's a clickbait title, this is not what they are arguing

> "Because copyright today covers virtually every sort of human expression — including blog posts, photographs, forum posts, scraps of software code, and government documents — it would be impossible to train today's leading AI models without using copyrighted materials," the company wrote in the evidence filing. "Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today's citizens."

> OpenAI went on to insist in the document, submitted before the House of Lords' communications and digital committee, that it complies with copyright laws and that the company believes "legally copyright law does not forbid training."

techostritch1y ago

> it would be impossible to train today's leading AI models without using copyrighted materials,"

Why not just license them like everyone else?

> but would not provide AI systems that meet the needs of today’s citizens.

Needs is doing a lot of work here.

dyauspitr1y ago

Because they’re not reproducing it.

argimenes1y ago

"Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today's citizens."

They need a new market. This is precisely the kind of AI system I'd love to use.

AnimalMuppet1y ago

Yes and no.

They are arguing that the current copyright laws do not forbid training. And they are arguing that they need to train on copyrighted data in order to be able to make an effective tool (and make money).

That second part of the argument is there because, so far as I know, nobody has ruled (in any country) on the legality of using copyrighted material as training for LLMs that will then produce commercially-available output. So the first part is a claim, but it's not a ruled-upon claim. It's not a claim that OpenAI can count on a court agreeing with. So they add the second argument, which amounts to "please interpret copyright law that way, and if the courts don't, please change copyright law that way, or else we can't sell what we make (and therefore can't make any money)".

I take no position on the first claim. All I'm saying is that the appropriate response to the second claim is, "So what? The world doesn't owe you a living."

moritzwarhier1y ago

What exactly is misleading or "clickbait" in the title?

I know that copyright covers blog posts and generally every immaterial creation published by humans that is reproducible and above a fuzzily defined threshold of "original creativity".

The other day, I was downvoted here for criticizing the often-cited "freeware" claim put out by MS.

The argument was: copyright already covers all this, I must lack knowledge about copyright law.

Now, the argument seems to have shifted to: copyright law doesn't apply the way it used to?

verdverm1y ago

Copyright applies to the reproduction, not the consumption. We are free to read or otherwise ingest copyrighted material without legal concerns. We are free to learn from and create content based on those learnings.

Is there any precedence from banning the use of copyright material because someone (thing) might reproduce it later? Do the current copyright laws not already protect the authors and give them tools for takedowns and remuneration?

1 more reply

marcuskane21y ago

This is such an insane take.

At this point, I think as a society we need to just say copyright as a concept and law has completely failed and scrap the whole thing.

The 0.01% of powerful copyright cartel publishers get rich while harming 99.99% of people, because we've seen further erosion of fair use rights, absurdly lengthy expansions of copyright to prop up Disney's profits and expansive interpretation of how much control copyright olders have and zero punishment for abuse of DMCA and other things.

Students should be able to learn from books, music, film. So should AI training models.

If there is any ambiguity about this, we should immediately write laws making it clear that training and education of all forms is explicitly allowed under fair use. Ideally, we also send anyone trying to prevent this to the guillotines.

j0hnyl1y ago

I actually agree with you. I think what the LLM craze has show is that the copyright/IP laws need to adapt and not the other way around.

I think it should be legal to train a model on anything that is legal to scrape (which is almost everything).

Then, if someone uses a generative AI output that violates someones existing IP in an infringing way, go after the person that's trying to monetize that output, whether it's software, an image, or writing.

The thing is, if you limit what these things can be trained on, it creates a huge power imbalance. The wealthy and nation states are still going to scrape everything under the sun and train AIs with that data along with whatever else their surveillance has gathered. If businesses are neutered from being able to do the same, we all lose.

mpalmer1y ago

I have whiplash from your first and last sentences.

> Students should be able to learn from books, music, film. So should AI training models.

An AI model is a thing. It is owned and fully controlled by some agent. A student is a sentient, thinking being. Both can be trained, only one can be educated. Treating the two as comparable is misleading and in my view, wrong.

marcuskane21y ago

We're in strange new times, but the equivalence of human cognition and synthetic will likely become mainstream and mundane in the coming years.

Sci-fi has long had various "cyborg" type things as a plot element, but if you walk down the street in NYC today you'll pass thousands of people with pacemakers, artificial hips, insulin pumps, colostomy bags, and prosthetics. People who've had laser surgery on their eyes to see better or transplanted organs. Plus people's usage of smart watches that measure heart rate, steps, sleep quality or continuous blood glucose monitors.

We don't marvel at the cyborgs among us, we just accept it as modern medicine. Similarly, while we've gotten used to internet search and GPS turn-by-turn navigation. Gen Z and younger will probably just accept the integration of genAI into their everyday life as seamlessly and casually as we accepted our cyborgification.

You can say that an AI model can only "be trained, not educated" in the same way you can argue that a submarine doesn't swim. But does that really matter to any of the people using it?

1 more reply

olyjohn1y ago

Fine then, let's get rid of software copyrights too. We can copy the AI software, models, datasets all we want. They don't get copyright protection for their software while declaring that everybody else doesn't get copyright protection for their work.

literallycancer1y ago

Pointless distinction, you'll never see their code or weights if you just get a response from the API, so the license doesn't matter.

hulitu1y ago

> OpenAI pleads it can't make money with o using copyrighted material for free

Then it shouldn't. Bloody profitors.

mediumsmart1y ago

I don't know if the concept of an AI that I can ask things is feasible on cc-zero training but it would be nice.

j / k navigate · click thread line to collapse

22 comments

dougb51y ago

Regarding "it would be impossible to train today's leading AI models": OpenAI has a pattern of equating humanity's progress with their own progress in their corporate communication.

The goal is both cases is to make you feel like you're standing in the way of progress by objecting.

scohesc1y ago

OpenAI should not have permission to soullessly suck up copyrighted material and use it to make money.

Not sure what the answer is - maybe copyright is an archaic idea/belief built and maintained by a once well-intended, now corrupted economic system that needs a bit of a shakeup anyways...

AnimalMuppet1y ago

So? Making money is not a legal right. Copyright is. If you can't make money without misappropriating copyrighted material, then you can't make money that way.

verdverm1y ago

It's a clickbait title, this is not what they are arguing

techostritch1y ago

> it would be impossible to train today's leading AI models without using copyrighted materials,"

Why not just license them like everyone else?

> but would not provide AI systems that meet the needs of today’s citizens.

Needs is doing a lot of work here.

dyauspitr1y ago

Because they’re not reproducing it.

argimenes1y ago

They need a new market. This is precisely the kind of AI system I'd love to use.

AnimalMuppet1y ago

Yes and no.

I take no position on the first claim. All I'm saying is that the appropriate response to the second claim is, "So what? The world doesn't owe you a living."

moritzwarhier1y ago

What exactly is misleading or "clickbait" in the title?

I know that copyright covers blog posts and generally every immaterial creation published by humans that is reproducible and above a fuzzily defined threshold of "original creativity".

The other day, I was downvoted here for criticizing the often-cited "freeware" claim put out by MS.

The argument was: copyright already covers all this, I must lack knowledge about copyright law.

Now, the argument seems to have shifted to: copyright law doesn't apply the way it used to?

verdverm1y ago

1 more reply

marcuskane21y ago

This is such an insane take.

At this point, I think as a society we need to just say copyright as a concept and law has completely failed and scrap the whole thing.

Students should be able to learn from books, music, film. So should AI training models.

j0hnyl1y ago

I actually agree with you. I think what the LLM craze has show is that the copyright/IP laws need to adapt and not the other way around.

I think it should be legal to train a model on anything that is legal to scrape (which is almost everything).

mpalmer1y ago

I have whiplash from your first and last sentences.

> Students should be able to learn from books, music, film. So should AI training models.

marcuskane21y ago

We're in strange new times, but the equivalence of human cognition and synthetic will likely become mainstream and mundane in the coming years.

You can say that an AI model can only "be trained, not educated" in the same way you can argue that a submarine doesn't swim. But does that really matter to any of the people using it?

1 more reply

olyjohn1y ago

literallycancer1y ago

Pointless distinction, you'll never see their code or weights if you just get a response from the API, so the license doesn't matter.

hulitu1y ago

> OpenAI pleads it can't make money with o using copyrighted material for free

Then it shouldn't. Bloody profitors.

mediumsmart1y ago

I don't know if the concept of an AI that I can ask things is feasible on cc-zero training but it would be nice.

j / k navigate · click thread line to collapse