The source of a language model is the text it was trained on. Llama models are not open source (contrary to their claims), they are open weight.
15T tokens, 45 terrabytes. Seems fairly open source to me.
Aside from licensing content, that content creators don’t like redistribution means a lawful model would probably only use Gutenberg’s collection and permissive code. Anything else, including Wikipedia, usually has licensing requirements they might violate.
Regardless, it fits the compute used and the claim that they trained from public web data, and was suspiciously published by HF staff shortly after L3 released. It's about as official as the Mistral 7B v0.2 base model. I.e. mostly, but not entirely, probably for some weird legal reasons.
Source is the input to some built artifact. It is the source of that artifact. As in: where the artifact comes from. Textual input is absolutely the source of the ML model. What you are using "source" as is analogous to the source of the compiler in traditional programming.
Asset is an artifact used as input, that is revered verbatim by the output. For example, a logo baked into an application to be rendered in the UI. The compilation of the program doesn't make a new logo, it just moves the asset into the built artifact.
Imagine if the source code was in a programming language of which the basic syntax and semantics were known to no one but the original developers.
Or more realistically, I think it’s a major problem if an open source project can only be built by an esoteric process that only the original developers have access to.