undefined | Better HN

0 pointsmvkel1y ago0 comments

It's a real shame that we're still calling Llama "open source" when at best it's "open weights."

Not that anyone would go buy 100,000 H100s to train their own Llama, but words matter. Definitions matter.

0 comments

Source versus weights seems like a really pedantic distinction to make. As you say, the training code and training data would be worthless to anyone who doesn't have compute on the level that Meta does. Arguably, the weights are source code interpreted by an inference engine, and realistically it's the weights that someone is going to want to modify through fine-tuning, not the original training code and data.

The far more important distinction is "open" versus "not open", and I disagree that we should cede that distinction while trying to fight for "source". The Llama license is restrictive in a number of ways (it incorporates an entire acceptable use policy) that make it most definitely not "open" in the customary sense.

JamesBarney1y ago

https://llama.meta.com/llama3_1/use-policy/

The acceptable use policy is seems fine. Don't use it to break the law, solicit sex, kill people, or lie.

mvkelOP1y ago

This is like saying "You have the right to privacy. The police can tap your phone, but you have nothing to worry about as long as you're not breaking the law."

"we're open source, you can use it for anything you can imagine. But you can't use it for these specific things."

Then there's the added rub of the source not really being source code, but a CSV file.

That's fine. If you want to set that expectation, great! But don't call it open source.

lolinder1y ago

It's fine in that I'm happy to use it and don't think I'll be breaking the terms anytime soon. It's not fine in that one of the primary things that makes open source open is that an open source license doesn't restrict groups of people or whole fields from usage of the software. The policy has a number of such blanket bans on industries, which, while reasonable, make the license not truly open.

frabcus1y ago

Meta could change the license of future releases of Llama and kill your business built on it.

If the training data was openly available, even if you can't afford to res train a new version, a competitor like Amazon could do it for you

lolinder1y ago

> Meta could change the license of future releases of Llama and kill your business built on it.

If you built a business on Llama 3.1, you're not going to suddenly go down in flames because you can't upgrade to Llama 4.

Even saying you really needed to upgrade, Llama 4 would be a new model that you'd have to adapt your prompts for anyway, you can't just version bump and call it good. If you're going to update prompts anyway, at that point you can just switch to any other competitor model. Updating models isn't urgent, you have time to do it slowly and right.

> If the training data was openly available, even if you can't afford to res train a new version, a competitor like Amazon could do it for you

If Llama 4 changed the license then presumably you wouldn't have access to its training data even if you did have access to Llama 3.1's. So now you have access to Llama 3.1's training data... now what? You want to recreate the Llama 3.1 weights in response to the Llama 4 release?

mvkelOP1y ago

> training code and training data would be worthless to anyone who doesn't have compute on the level that Meta does

I don't fully agree.

Isn't that like saying *nix being open source is worthless unless you're planning to ship your own Linux distro?

Knowing how the sausage is made is important if you're an animal rights activist.

sidcool1y ago

Honest question. As far as LLMs are concerned, isn't open weights same as open source?

paulhilbert1y ago

No, I would argue that from the three main ingredients - training data, model source code and weights - weights are the furthest away from something akin to source code.

They're more like obfuscated binaries. When it comes to fine-tuning only however things shift a little bit, yes.

sidcool1y ago

I don't expect them to release the data used to train the models. But I agree that the code is an important ingredient of 'open'.

1 more reply

aloe_falsa1y ago

GPL defines the “source code” of a work as the preferred form of the work for making modifications to it. If Meta released a petabyte of raw training data, would that really be easier to extend and adapt (as opposed to fine-tuning the weights)?

blackeyeblitzar1y ago

No open weights are the output of a proprietary and secretive process of training. It’s like sharing a pre compiled application instead of what you need to reproduce the compiled application.

AI2’s OLMo is an example of what open source actually looks like for LLMs:

https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e73...

mesebrec1y ago

Open source requires, at the very least, that you can use it for any purpose. This is not the case with Llama.

The Llama license has a lot of restrictions, based on user base size, type of use, etc.

For example you're not allowed to use Llama to train or improve other models.

But it goes much further than that. The government of India can't use Llama because they're too large. Sex workers are not allowed to use Llama due to the acceptable use policy of the license. Then there is also the vague language probibiting discrimination, racism etc.. good luck getting something like that approved by your legal team.

j / k navigate · click thread line to collapse

0 comments

lolinder1y ago

JamesBarney1y ago

https://llama.meta.com/llama3_1/use-policy/

The acceptable use policy is seems fine. Don't use it to break the law, solicit sex, kill people, or lie.

mvkelOP1y ago

This is like saying "You have the right to privacy. The police can tap your phone, but you have nothing to worry about as long as you're not breaking the law."

"we're open source, you can use it for anything you can imagine. But you can't use it for these specific things."

Then there's the added rub of the source not really being source code, but a CSV file.

That's fine. If you want to set that expectation, great! But don't call it open source.

lolinder1y ago

frabcus1y ago

Meta could change the license of future releases of Llama and kill your business built on it.

If the training data was openly available, even if you can't afford to res train a new version, a competitor like Amazon could do it for you

lolinder1y ago

> Meta could change the license of future releases of Llama and kill your business built on it.

If you built a business on Llama 3.1, you're not going to suddenly go down in flames because you can't upgrade to Llama 4.

> If the training data was openly available, even if you can't afford to res train a new version, a competitor like Amazon could do it for you

mvkelOP1y ago

> training code and training data would be worthless to anyone who doesn't have compute on the level that Meta does

I don't fully agree.

Isn't that like saying *nix being open source is worthless unless you're planning to ship your own Linux distro?

Knowing how the sausage is made is important if you're an animal rights activist.

sidcool1y ago

Honest question. As far as LLMs are concerned, isn't open weights same as open source?

paulhilbert1y ago

No, I would argue that from the three main ingredients - training data, model source code and weights - weights are the furthest away from something akin to source code.

They're more like obfuscated binaries. When it comes to fine-tuning only however things shift a little bit, yes.

sidcool1y ago

I don't expect them to release the data used to train the models. But I agree that the code is an important ingredient of 'open'.

1 more reply

aloe_falsa1y ago

blackeyeblitzar1y ago

No open weights are the output of a proprietary and secretive process of training. It’s like sharing a pre compiled application instead of what you need to reproduce the compiled application.

AI2’s OLMo is an example of what open source actually looks like for LLMs:

https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e73...

mesebrec1y ago

Open source requires, at the very least, that you can use it for any purpose. This is not the case with Llama.

The Llama license has a lot of restrictions, based on user base size, type of use, etc.

For example you're not allowed to use Llama to train or improve other models.

j / k navigate · click thread line to collapse