Not that anyone would go buy 100,000 H100s to train their own Llama, but words matter. Definitions matter.
The far more important distinction is "open" versus "not open", and I disagree that we should cede that distinction while trying to fight for "source". The Llama license is restrictive in a number of ways (it incorporates an entire acceptable use policy) that make it most definitely not "open" in the customary sense.
The acceptable use policy is seems fine. Don't use it to break the law, solicit sex, kill people, or lie.
"we're open source, you can use it for anything you can imagine. But you can't use it for these specific things."
Then there's the added rub of the source not really being source code, but a CSV file.
That's fine. If you want to set that expectation, great! But don't call it open source.
If the training data was openly available, even if you can't afford to res train a new version, a competitor like Amazon could do it for you
If you built a business on Llama 3.1, you're not going to suddenly go down in flames because you can't upgrade to Llama 4.
Even saying you really needed to upgrade, Llama 4 would be a new model that you'd have to adapt your prompts for anyway, you can't just version bump and call it good. If you're going to update prompts anyway, at that point you can just switch to any other competitor model. Updating models isn't urgent, you have time to do it slowly and right.
> If the training data was openly available, even if you can't afford to res train a new version, a competitor like Amazon could do it for you
If Llama 4 changed the license then presumably you wouldn't have access to its training data even if you did have access to Llama 3.1's. So now you have access to Llama 3.1's training data... now what? You want to recreate the Llama 3.1 weights in response to the Llama 4 release?
I don't fully agree.
Isn't that like saying *nix being open source is worthless unless you're planning to ship your own Linux distro?
Knowing how the sausage is made is important if you're an animal rights activist.
They're more like obfuscated binaries. When it comes to fine-tuning only however things shift a little bit, yes.
AI2’s OLMo is an example of what open source actually looks like for LLMs:
https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e73...
The Llama license has a lot of restrictions, based on user base size, type of use, etc.
For example you're not allowed to use Llama to train or improve other models.
But it goes much further than that. The government of India can't use Llama because they're too large. Sex workers are not allowed to use Llama due to the acceptable use policy of the license. Then there is also the vague language probibiting discrimination, racism etc.. good luck getting something like that approved by your legal team.