Wow, an actual open source language model (first of its kind [from a larger company] maybe even?), includes all you need to be able to recreate it from scratch. Thanks AMD!
Available under this funky GitHub organization it seems: https://github.com/AMD-AIG-AIMA/AMD-LLM
I find people get into silly arguments about the terminology because they’re focused on whether the “source” is “open” and not on what the “source” is actually the source of.
“Weights available” indicates even the weights aren’t “open” in the usual software meaning of the term, as they typically come with restrictive licenses (more restrictive than copyleft or attribution).
My limited understanding (finance side) is that most customers prefer to buy from (even open-source) companies, for several reasons:
1) They don't have time/desire to assemble hundreds of components, prefering drop-in solution
2) Our manufacturing facility has experience / protools, produces products within tighter tolerances
3) Many people understand their own manufacturing limitations and would prefer warranted solutions, without their understood dangers of DIY
I personally dropped out of US grad school because it was the antithesis of open-source licensing.
Disclosure: I am an AMD shareholder, excited about this recent announcement
Apple research has previously released another example of a model with open training code, data, and weights, but their model was sized for running inference workloads on mobile devices.
However, Apple has a mobile device line of business and AMD has an enterprise AI accelerator line of business, so they are both doing work relevant to their bottom line.
Maybe some other heavy hitter out there can explain what all this whatchamacallit newfangled synergy producing matrix algebra does after you have it running?
After you get it up and running you can just ask it what to do with it.
Something like this would help small teams build an initial POC and do some experimentation.
You have similar issues with robotics projects. It's very expensive for a hobbyist because of the hardware costs, but there's large number of small companies who benefit from open source tech to get their projects started.
I find it funny that the AI field has somehow normalised the goalpost moving from capabilities all the way to definitions about open source. And people seem really tribal about it...
There absolutely are open source LLMs already. Phi3.5 (MIT), various Mistral models (Apache2.0), various Qwen2 models (Apache2.0) and so on. LLamas are not open source, nor are Gemmas. But to say this is "an actual open source model" is weird nitpicking for the sake of nitpicking, IMO.
Requiring the methods and datasets that someone used to create some piece of IP is in no way a requirement for open sourcing said IP. It never has been!
Imagine this analogy:
A dev comes up with a way to generate source code that solves a real problem. This dev uses a secret seed, that only they know. The dev also uses thousands of hours of compute, and an algorithm that they created. At the end of the exercise they release the results on github, as follows:
- here is a project that takes in a piece of text in english, and translates it into french.
- the resulting source code is massive. 10 billions LOC. The lines of code are just if statements, all the way down, with some hardcoded integer values.
- source code licensed under Apache 2.0, written in let's say python.
- users can see the source code
- users can run the source code
- users can modify the source code and re-release the code
Now, would anyone pre LLMs say "this isn't true open source" because it's too complicated? Because no one can reasonably understand the source code? Because it uses hard coded int values? Because it's 10b LOC? Because the dev never shared how they got those values?
Of course not. The resulting code would have been open source because Apache 2.0 is open source.
It's the same with model weights. Just because they're not source code, and just because you don't know how they were created, it does not mean the weights are not open source.
You can see the weights. You can change the weights. You can re-distribute the weights. It's open source. The definition of something being open source does not cover you understanding why the weights are like they are. Nor do they require you having access to the methods of creating those weights. Or datasets. Or whatever the devs had for breakfast.
Whenever you use an LLM you "load" the weights, using (usually open source) code and you run inference with that code. The weights are not binary and the analogy to the binary form of distributing software is not valid, IMO.
That is why I used the analogy of a python code with ifs all the way, based on hardcoded values. That is what you are arguing is not open source. The weights are just "hardcoded values".
Open source never had the requirement of the author explaining what, why or how they got a hardcoded value in their shared code. Why it suddenly does for LLMs is what I find funny.
Yes. You can use a number of libraries to add, mix, merge, etc. layers [1]
> Not with weights. I need it to learn case histories to extend its "feature-set".
Again, yes. You can add attention heads, other features, heck you can even add classification if you want [2]. Because you are working with an open architecture! What you think of weights are not binary blobs. That is a common missconception.
[1] - https://github.com/arcee-ai/mergekit
[2] - https://github.com/center-for-humans-and-machines/transforme...
The problem is that Facebook and others are trying to move the goalpost, while others like me would like the goalpost to remain where it is, namely we call projects "Open source" when the required parts to build it on our own machines, is sufficiently accessible.
As I probably wouldn't be a developer in the first place if it wasn't for FOSS, and I spend literally all day long contributing to others FOSS projects and working on my own, it's kind of scary seeing these large companies trying to change what FOSS means.
I think you're forgetting about the intent and purpose of open source. The goal is that people can run software for whatever purpose they want, and they can modify it for whatever purpose. This is the intent behind the licenses we use when we "create FOSS".
This means, in practice, that the source code has to be accessible somehow, so the compiler I have on my computer, can build a similar binary to the one the project itself offers (if it does). The source code has to be accessible so I can build the project, but also modify it for myself.
Taking this idea that mostly only applied to software before (FOSS) but applying it to ML instead, it's clear to see what we need in order to 1) be able to use it as we want and 2) be able to modify it as we want.
> You can see the weights. You can change the weights. You can re-distribute the weights. It's open source.
Right. If I upload a binary to some website, you can see the binary, you can change the binary and you can re-distribute it. Would you say the binary is open source?
The weights are the binary in ML contexts. It's OK for projects to publish those weights, but it's not OK to suddenly change the definition and meaning of open source because companies want to look like they're doing FOSS, when in reality they're publishing binaries without any ways of building those binaries with your own changes.
Imagine if the Linux kernel was just a big binary blob. Yes, you can change it, re-distribute and what not, but only in a binary-blob shape. You'd be kind of out there if you insist on calling this binary-blob kernel FOSS. I'm sure you'd be able to convince some Facebook engineers about it, seems they're rolling with that idea already, but the rest of us who exist in the FOSS ecosystem? We'd still have the same goalpost in the exact same spot it's been for at least two decades I've been involved.
Great question. Is the assembly code in a git, with an open source license? Then yes! It's open source!
Think about it this way: just because someone wrote hello world in c and then a compiler translated that into assembly, doesn't invalidate the quality of that assembly code being open source! That's the point. Something is open source or not if the resulting stuff is published under an open source license. Can you see the assembly code? Can you change it? Can you re-publish it? If all of these are yes, then it's open source!
> Imagine if the Linux kernel ...
That is semantics. The linux kernel is published in c because it's easier for people to reason in that abstracted language, but it would not suddenly become "closed source" if it were written in asm, assuming it would still be published under an open source license.
In other words, you having access to the "dataset" would not make the weights any easier to work with. They would still be in a "blob" as you call it.
Can't believe it's the second time I end up with the very same argument about what open source is today on HN.
Similar to you publish the source for Oracle (the database), but nobody can build a binary from it because it needs magic compliers or test suites that aren't open source?
Heck when the browser was open-sourced, there was an explicit test where the source was given to some dude who didn't work for Netscape to verify that he could actually make a working binary. It's a scene in the movie "Code Rush".
It’s like if I said I open-sourced the Matrix trilogy and only gave you the DVD image and the source to the DVD decoder.
(Edit: Sorry, I replied to the wrong comment. I’m talking primarily about the typical sort of release we see, not this one which is a lot closer to actually open.)
I can simplify the task, can you convincingly explain how the same model can be produced from this dataset? We can start simple, how you can possibly get the same weights after the first single iteration? I.e. the same as original model got. Pay attention to randomness, data selection, initial model state.
Ok, if you can't do that. Can you explain in believable way how to prove that given model was trained on give dataset? I'm not asking you for actually doing all these things, that could be expensive, only to explain how it can be done.
Strict 'open source' includes not only open weights, open data. It also includes the word "reproducible". It's not "reproduced", only "reproducible". And even this is not the case here.
can we stick to years as a unit of measure and not spread Sam Altman's phrase :)
Twenty two thousand days
It's not a lot, it's all we got
Twenty two thousand days
- Sam Altman?
Anyone know the recommended cloud provider and equivalent rental price?
[1] https://www.wiredzone.com/shop/product/10025451-supermicro-g...
Actually, AMD has excellent reasons to make this kind of development and I hope they continue.
Does anyone know if the "several orders of magnitude speed improvement" is accurate? I'm doubtful.
Very interesting though! I'll be playing around with this on the weekend!
[1] https://www.reddit.com/r/LocalLLaMA/comments/17h4rqz/specula...
- 1.75x-2.80x on MI250
- 2.83x-2.98x on NPU
- 3.57x-3.88x on CPU
Note they were testing on AMD-Llama-135m-code as draft model for CodeLlama-7b, both of which do similarly badly on Humaneval Pass@1 (~30%), so it's likely if they were using a similarly trained 135m to SD for say, Qwen2.5-Coder (88.4% on HumanEval), the perf gains would probably be much worse.
For example, the C++ model is really good at writing both OpenGL+GLFW and Raylib.
https://machinelearning.apple.com/research/introducing-apple... (see Model Adaptation)
They've branded their specific architecture and integration, which allows me to easily refer to it as an example.
I understand that it's easy to be cynical about Apple's approach to product development, but it seems unwarranted in this case.
That's already very much a thing. Codestral, Phind, Starcoder etc.
Fine tuning models on whatever you want is quite accessible if you have a good dataset and a 100 bucks of budget
* https://github.com/amd/RyzenAI-SW - has a list of demos and how to use it directly (including apparently w/ PyTorch and LLMs)
* https://github.com/huggingface/optimum-amd - can use RyzenAI to use the NPU for HF transformers
There's now a Linux driver even https://github.com/amd/xdna-driver although it looks like a sufficiently PITA that I haven't even bothered to try it (my 7940HS only has like 10 TOPS anyway, so not much point even if it worked perfectly).
I thought PyTorch didn't work well with AMD architecture, and read of many people using JAX instead?