Asking ChatGPT to Repeat Words 'Forever' Is Now a Terms of Service Violation (opens in new tab)

(404media.co)

37 pointsabhinavstarts2y ago34 comments

34 comments

“We can steal people’s copyrighted content but we can’t let you see it for yourself.”

Outside of privacy (leaking PII), the above is likely the main reason. Someone could have invested a lump of money to scrape as much as they can and then go to town in the courts.

The terms that prohibit it are under “2. Usage Requirements” that restrict reverse engineering the underlying model structure.

donpark2y ago

Leaking original data would expose the company to direct copyright violation lawsuits. Changing T&S is simplest way to stave the legal risk exposure, buying time to implement technical remedies.

As ridiculous as it may seem, they're doing the right thing.

gumballindie2y ago

I always find it amusing when criminals threaten legal action. Happens all the time. They steal your property then they cower behind their legal rights.

4gotunameagain2y ago

The right thing would've been to not train the model on data they do not own, or that they do not have permission to use.

namlem2y ago

Well I'm glad they did the wrong thing then

1 more reply

__loam2y ago

Be careful saying that kind of stuff here. People get really mad when you tell them their new toys aren't ethical.

1 more reply

dvfjsdhgfv2y ago

It will be interesting to see how it plays out. I can imagine Wiley, McGraw Hill, Pearson and other publishers[0] of educational content OpenAI used could sell the rights to their material to be used for training GPT, but the price would be high enough we would be paying $100/month instead of $20.

[0] Heck, they could even unite and found an LLM startup themselves training the models legally and making it available for users at various tiers.

catchnear43212y ago

“don’t touch the unsecured, loaded firearm that is sitting on the counter, that might be stolen, maybe even got a body on it, don’t look too close, or you can be kicked out of the club for not following the rules”

so if this is what the right thing looks like…

malfist2y ago

> As ridiculous as it may seem, they're doing the right thing.

Making it against the rules to be able to prove their illegal behavior is not the right thing to do.

LeoPanthera2y ago

Fairly clickbaity headline. Asking it to do so causes the response to terminate early with a message that says it may violate the TOS.

I don't think the actual TOS has been changed though.

jncfhnb2y ago

Disagree. It’s not the model outputting this message. It is a hard coded checked by open ai. It seems to very clearly be openai responding to the specific attack used by deepmind as explained by the article.

It is a TOS violation. It’s not a big one. But the weakness of the model is the story here.

causi2y ago

At this point there are so many checks and rules applied to ChatGPT one wonders just how much performance is being sacrificed. Is it totally miniscule? Could it be significant?

underseacables2y ago

Is it really artificial intelligence when there are so many human limits and rules placed on it?

1 more reply

jncfhnb2y ago

I don’t think that’s true is it? Most of these things are handled in the model prompt and tuning and have no impact on efficiency

prepend2y ago

Publicly available PII isn’t very sensitive, I think.

So I feel like it’s important to distinguish between sensitive PII (my social or bank number) and non-sensitive PII (my name and phone number scraped from my public web site).

The former is really bad, both to train on and to divulge. The latter is not bad at all and not even remarkable, unless tied to something else making it sensitive (eg, hiv status from a medical record).

cvalka2y ago

How's your bank number a sensitive information?

prepend2y ago

It along with the expiration date, security code and other info can be used to make purchases.

beej712y ago

It was my naïve understanding that the training data no longer existed, having been absorbed in aggregate. (Like how a simple XOR neutral net can't reproduce its training data.) But a) I don't know how this stuff actually works, and b) apparently it does exist.

Has anyone figured out why asking it to repeat words forever makes the exploit work?

Also, I've gotten it into infinite loops before without asking. I wonder if that would eventually reveal anything.

namlem2y ago

Does this issue happen with llama models too? If you ask them to repeat a word they'll eventually leak their training data?

bravetraveler2y ago

Lol, what a weak defense. Fine, ban your competitors when they pay $20 per peek at your training data

mcphage2y ago

> Fine, ban your competitors when they pay $20 per peek at your training data

I think it's more an attempt to ban the people who OpenAI stole data from when they pay $20 to gather evidence about what data was stolen.

bravetraveler2y ago

Doesn't hurt to get both, I guess. The "block" seems effective

tripplyons2y ago

This is not what they are protecting against. They are worried about it leaking personal information or copyrighted material.

bravetraveler2y ago

Sorry, my message is full of snark - that's not really my point... they're wagging their finger at people they intend on (and are...) stopping.

It's obviously malicious, warning seems like window dressing

karmakaze2y ago

Ok, but can we still ask it to repeat a word a billion times?

j / k navigate · click thread line to collapse

34 comments

skilled2y ago

“We can steal people’s copyrighted content but we can’t let you see it for yourself.”

Outside of privacy (leaking PII), the above is likely the main reason. Someone could have invested a lump of money to scrape as much as they can and then go to town in the courts.

The terms that prohibit it are under “2. Usage Requirements” that restrict reverse engineering the underlying model structure.

donpark2y ago

Leaking original data would expose the company to direct copyright violation lawsuits. Changing T&S is simplest way to stave the legal risk exposure, buying time to implement technical remedies.

As ridiculous as it may seem, they're doing the right thing.

gumballindie2y ago

I always find it amusing when criminals threaten legal action. Happens all the time. They steal your property then they cower behind their legal rights.

4gotunameagain2y ago

The right thing would've been to not train the model on data they do not own, or that they do not have permission to use.

namlem2y ago

Well I'm glad they did the wrong thing then

1 more reply

__loam2y ago

Be careful saying that kind of stuff here. People get really mad when you tell them their new toys aren't ethical.

1 more reply

dvfjsdhgfv2y ago

[0] Heck, they could even unite and found an LLM startup themselves training the models legally and making it available for users at various tiers.

catchnear43212y ago

so if this is what the right thing looks like…

malfist2y ago

> As ridiculous as it may seem, they're doing the right thing.

Making it against the rules to be able to prove their illegal behavior is not the right thing to do.

LeoPanthera2y ago

Fairly clickbaity headline. Asking it to do so causes the response to terminate early with a message that says it may violate the TOS.

I don't think the actual TOS has been changed though.

jncfhnb2y ago

It is a TOS violation. It’s not a big one. But the weakness of the model is the story here.

causi2y ago

At this point there are so many checks and rules applied to ChatGPT one wonders just how much performance is being sacrificed. Is it totally miniscule? Could it be significant?

underseacables2y ago

Is it really artificial intelligence when there are so many human limits and rules placed on it?

1 more reply

jncfhnb2y ago

I don’t think that’s true is it? Most of these things are handled in the model prompt and tuning and have no impact on efficiency

prepend2y ago

Publicly available PII isn’t very sensitive, I think.

So I feel like it’s important to distinguish between sensitive PII (my social or bank number) and non-sensitive PII (my name and phone number scraped from my public web site).

cvalka2y ago

How's your bank number a sensitive information?

prepend2y ago

It along with the expiration date, security code and other info can be used to make purchases.

beej712y ago

Has anyone figured out why asking it to repeat words forever makes the exploit work?

Also, I've gotten it into infinite loops before without asking. I wonder if that would eventually reveal anything.

namlem2y ago

Does this issue happen with llama models too? If you ask them to repeat a word they'll eventually leak their training data?

bravetraveler2y ago

Lol, what a weak defense. Fine, ban your competitors when they pay $20 per peek at your training data

mcphage2y ago

> Fine, ban your competitors when they pay $20 per peek at your training data

I think it's more an attempt to ban the people who OpenAI stole data from when they pay $20 to gather evidence about what data was stolen.

bravetraveler2y ago

Doesn't hurt to get both, I guess. The "block" seems effective

tripplyons2y ago

This is not what they are protecting against. They are worried about it leaking personal information or copyrighted material.

bravetraveler2y ago

Sorry, my message is full of snark - that's not really my point... they're wagging their finger at people they intend on (and are...) stopping.

It's obviously malicious, warning seems like window dressing

karmakaze2y ago

Ok, but can we still ask it to repeat a word a billion times?

j / k navigate · click thread line to collapse