Alleged Distillation Attacks by DeepSeek, Moonshot AI, and MiniMax (opens in new tab)

(twitter.com)

36 pointsmike_kamau3mo ago16 comments

16 comments

exq3mo ago

So it's okay when big American corps raid the internet ignoring any terms of service or licenses they see in order to train models they rent back to us, but when a foreign entity trains off of Anthropic it's illegal?

riku_iki3mo ago

From the tweet, Anthropic's point is that distillation is Ok, unless new model has safeguards removed or used for military or surveillance purposes.

dmonitor3mo ago

The fact that they're calling it an "attack" implies otherwise.

I find the entire premise of this announcement absurd. Fraudulent accounts? They're just accounts. They paid for the access the same as any other. They're accessing Claude just like a human (or *claw) would.

There's no argument against their strategy that doesn't make them complete hypocrites in respect to how they got the model training data in the first place.

riku_iki3mo ago

> them complete hypocrites in respect to how they got the model training data in the first place.

sure, hypocrisies is part of rules for big games: politics and business.

> Fraudulent accounts? They're just accounts.

they tell the story in blog post, that they don't allow claude in China, but those labs use some proxy services to access claude and mix traffic with regular users to hids its activities

mongrelion3mo ago

I agree with you, especially with this:

They paid for the access the same as any other.

If anything, this makes them more legit than Anthropic because they are paying for the content, whereas Anthropic just stole *all* the data they got a hold of. So, in this case the Chinese AI labs stand on higher moral ground LOL.

_aavaa_3mo ago

I don’t think so. It reads much more like “distillation is okay when you do it to your own models.”

credit_guy3mo ago

I don't think this counts as distillation. Distillation is when you use a teacher model to train a student model, but crucially, you have access to the entire probability distribution of the generated tokens, not just to the tokens themselves. That probability distribution increases tremendously the strength of the signal, so the training converges much faster. Claude does not provide these probabilities. So, Claude was used for synthetic training data generation, but not really for distillation.

hooloovoo_zoo3mo ago

Sampling repeatedly gives them an estimate of the probability distribution in any case though.

hooloovoo_zoo3mo ago

That would be an interesting paper actually; what is the optimal sampling technique given you only have access to the token outputs. Surely someone has already done it.

m4rtink3mo ago

Oh no! They are stealing all the data we have stolen ourselves! This needs to be stopped and punished immediately!

ChrisArchitect3mo ago

Discussion on Source: https://www.anthropic.com/news/detecting-and-preventing-dist... (https://news.ycombinator.com/item?id=47126177)

https://news.ycombinator.com/item?id=47126614

veunes3mo ago

If just 16 million examples were enough to significantly boost model quality (as Anthropic claims), it turns out that data quality beats quantity

Instead of vacuuming petabytes of trash from Common Crawl, you can just take high-quality distillate from a SOTA model and get comparable results. Bad news for anyone betting solely on massive compute clusters and closed datasets

SilverElfin3mo ago

One difference between Anthropic and others is that Anthropic is crawling publicly visible information, and their argument is that this is fair use. Whereas these Chinese LLMs are circumventing an account creating process and terms of service to misuse non public information.

Lots of people think Anthropic training their own LLM is the same but it really isn’t.

saberience3mo ago

Pot, meet kettle!

I don’t think I’m the only one feeling some schadenfreude at this news. I suppose it’s ok when you’re a hot Silicon Valley scale-up to slurp up the rest of the worlds data for free and then hire hot shot lawyers to defend you against all the creatives you ripped off, but when it’s the “evil” Chinese doing the same to you it’s a dastardly “attack”?

m4rtink3mo ago

Yeah - not only have we seend some of the same large companies that have trampled regular people and made examples of them in name of defending copyright fully ignore it when it was time to feed their AI models.

And now the hypocrisy went full circle with complains of others not respecting their rights!

kingstnap3mo ago

Cry me a river, build a bridge, and get over it?

They publish weights and useful research for everyone to benefit.

I mean this is incredibly tone deaf for a company facing multiple lawsuits over where they got their training data from.

j / k navigate · click thread line to collapse

16 comments

exq3mo ago

riku_iki3mo ago

From the tweet, Anthropic's point is that distillation is Ok, unless new model has safeguards removed or used for military or surveillance purposes.

dmonitor3mo ago

The fact that they're calling it an "attack" implies otherwise.

There's no argument against their strategy that doesn't make them complete hypocrites in respect to how they got the model training data in the first place.

riku_iki3mo ago

> them complete hypocrites in respect to how they got the model training data in the first place.

sure, hypocrisies is part of rules for big games: politics and business.

> Fraudulent accounts? They're just accounts.

they tell the story in blog post, that they don't allow claude in China, but those labs use some proxy services to access claude and mix traffic with regular users to hids its activities

mongrelion3mo ago

I agree with you, especially with this:

They paid for the access the same as any other.

_aavaa_3mo ago

I don’t think so. It reads much more like “distillation is okay when you do it to your own models.”

credit_guy3mo ago

hooloovoo_zoo3mo ago

Sampling repeatedly gives them an estimate of the probability distribution in any case though.

hooloovoo_zoo3mo ago

That would be an interesting paper actually; what is the optimal sampling technique given you only have access to the token outputs. Surely someone has already done it.

m4rtink3mo ago

Oh no! They are stealing all the data we have stolen ourselves! This needs to be stopped and punished immediately!

ChrisArchitect3mo ago

Discussion on Source: https://www.anthropic.com/news/detecting-and-preventing-dist... (https://news.ycombinator.com/item?id=47126177)

https://news.ycombinator.com/item?id=47126614

veunes3mo ago

If just 16 million examples were enough to significantly boost model quality (as Anthropic claims), it turns out that data quality beats quantity

SilverElfin3mo ago

Lots of people think Anthropic training their own LLM is the same but it really isn’t.

saberience3mo ago

Pot, meet kettle!

m4rtink3mo ago

And now the hypocrisy went full circle with complains of others not respecting their rights!

kingstnap3mo ago

Cry me a river, build a bridge, and get over it?

They publish weights and useful research for everyone to benefit.

I mean this is incredibly tone deaf for a company facing multiple lawsuits over where they got their training data from.

j / k navigate · click thread line to collapse