Open source AI is the path forward - https://news.ycombinator.com/item?id=41046773 - July 2024 (278 comments)
Quick comparison with GPT-4o:
+----------------+-------+-------+
| Metric | GPT-4o| Llama |
| | | 3.1 |
| | | 405B |
+----------------+-------+-------+
| MMLU | 88.7 | 88.6 |
| GPQA | 53.6 | 51.1 |
| MATH | 76.6 | 73.8 |
| HumanEval | 90.2 | 89.0 |
| MGSM | 90.5 | 91.6 |
+----------------+-------+-------+I'd think of the 405B model as the equivalent to a big rig tractor trailer. It's not for home use. But also check out the benchmark improvements for the 70B and 8B models.
I agree with you on open source in the original, home tinkerer sense.
Not in the slightest. They even have a table of cloud providers where you can host the 405B model and the associated cost to do so on their website: https://llama.meta.com/ (Scroll down)
"Open Source" doesn't mean "You can run this on consumer hardware". It just means that it's open source. They also released 8B and 70B models for people to use on consumer gear.
This time, I just copy pasted the raw metrics I found and asked an LLM to format it as an ASCII table.
GPT-4o 30.7
GPT-4 turbo (2024-04-09) 29.7
Llama 3.1 405B Instruct 29.5
Claude 3.5 Sonnet 27.9
Claude 3 Opus 27.3
Llama 3.1 70B Instruct 26.4
Gemini Pro 1.5 0514 22.3
Gemma 2 27B Instruct 21.2
Mistral Large 17.7
Gemma 2 9B Instruct 16.3
Qwen 2 Instruct 72B 15.6
Gemini 1.5 Flash 15.3
GPT-4o mini 14.3
Llama 3.1 8B Instruct 14.0
DeepSeek-V2 Chat 236B (0628) 13.4
Nemotron-4 340B 12.7
Mixtral-8x22B Instruct 12.2
Yi Large 12.1
Command R Plus 11.1
Mistral Small 9.3
Reka Core-20240501 9.1
GLM-4 9.0
Qwen 1.5 Chat 32B 8.7
Phi-3 Small 8k 8.4
DBRX 8.0
If you want to learn more, there is a writeup at https://wow.groq.com/now-available-on-groq-the-largest-and-m....
(disclaimer, I am a Groq employee)
Free trial gets you 50 messages, no credit card required - https://double.bot
(disclaimer, I am the co-founder)
I gave a seminar about the overall approach recently, abstract: https://shorturl.at/E7TcA, recording: https://shorturl.at/zBcoL.
This two-part AMA has a lot more detail if you're already familiar with what we do:
Statement from Mark: https://about.fb.com/news/2024/07/open-source-ai-is-the-path...
Where the right hardware is 10x4090s even at 4 bits quantization. I'm hoping we'll see these models get smaller, but the GPT-4-competitive one isn't really accessible for home use yet.
Still amazing that it's available at all, of course!
https://about.fb.com/news/2024/07/open-source-ai-is-the-path...
[1]: https://opensource.org/blog/metas-llama-2-license-is-not-ope...
As I have stated time and again, it is perfectly fine for them to slap on whatever license they see fit as it is their work. But it would be nice if they used appropriate terms so as not to disrupt the discourse further than they have already done. I have written several walls of text why I as a researcher find Facebook's behaviour problematic so I will fall back on an old link [2] this time rather than writing it all over again.
Is it? Has there been a ruling on the enforceability of the license they attach to their models yet? Just because you say what you release can only be used for certain things doesn't actually mean what you say means anything.
It's "a Google and Apple can't use this model in production" clause that frankly we can all be relatively okay with.
If you want a playground to test this model locally or want to quickly build some applications with it, you can try LLMStack (https://github.com/trypromptly/LLMStack). I wrote last week about how to configure and use Ollama with LLMStack at https://docs.trypromptly.com/guides/using-llama3-with-ollama.
Disclaimer: I'm the maintainer of LLMStack
In theory the benchmarks should be a pretty close proxy for quality, but that doesn't match my experience at all.
Examples: OpenAI's GPT 4o-mini is second only to 4o on LMSys Overall, but is 6.7 points behind 4o on MMLU. It's "punching above its weight" in real-world contexts. The Gemma series (9B and 27B) are similar, both beating the mean in terms of ELO per MMLU point. Microsoft's Phi series are all below the mean, meaning they have strong MMLU scores but aren't preferred in real-world contexts.
Llama 3 8B previously did substantially better than the mean on LMSys Overall, so hopefully Llama 3.1 8B will be even better! The 70B variant was interestingly right on the mean. Hopefully the 430B variant won't fall below!
Don't expect any meaningful score there before they wipe results.
For my use of the chat interface, I don't think lmsys is very useful. lmsys mainly evaluates relatively simple, low token count questions. Most (if not all) are single prompts, not conversations. The small models do well in this context. If that is what you are looking for, great. However, it does not test longer conversations with high token counts.
Just saying that all benchmarks, including lmsys, have issues and are focused on specific use cases.
Open source models are very exciting for self hosting, but the per-token hosted inference pricing hasn't been competitive with OpenAI and Anthropic, at least for a given tier of quality. (E.g.: Llama 3 70B costing between $1 and $10 per million tokens on various platforms, but Claude Sonnet 3.5 is $3 per million.)
[1]: https://github.com/meta-llama/llama-models/blob/main/models/...
[2]: https://github.com/meta-llama/llama-recipes/blob/main/recipe...
Have other major models explicitly communicated that they're trained on synthetic data?
Why do you think he is surprised? I think very few are surprised.
We had a brief, abnormal, and special moment in time after the crypto wars ended in the mid-2000s where software products were truly global, and the internet was more or less unregulated and completely open (at least in most of the world). Sadly it seems that this era has come to a close, and people have not yet updated their understanding of the world to account for that fact.
People are also not great at thinking through the second order effects of the policies they advocate for (e.g. the GDPR), and are often surprised by the results.
Other than that, and GDPR (which is generally now regarded as a good thing), I'm not sure what requirements you've got in mind.
The only solution is a worldwide government that can impose laws in all countries at once, but that's unlikely to happen any time soon.
A Gibsonesque global Turing Police is a sure sign of Dystopia.
Let's hope the next moustached guy that tries to do this ends up dying in a bunker just like the last one.
https://aider.chat/docs/leaderboards/
77.4% claude-3.5-sonnet
75.2% DeepSeek Coder V2 (whole)
72.9% gpt-4o
69.9% DeepSeek Chat V2 0628
68.4% claude-3-opus-20240229
67.7% gpt-4-0613
66.2% llama-3.1-405b-instruct (whole) Llama 3 Training System
19.2 exaFLOPS
_____
/ \ Cluster 1 Cluster 2
/ \ 9.6 exaFLOPS 9.6 exaFLOPS
/ \ _______ _______
/ ___ \ / \ / \
,----' / \`. `-' 24000 `--' 24000 `----.
( _/ __) GPUs GPUs )
`---'( / ) 400+ TFLOPS 400+ TFLOPS ,'
\ ( / per GPU per GPU ,'
\ \/ ,'
\ \ TOTAL SYSTEM ,'
\ \ 19,200,000 TFLOPS ,'
\ \ 19.2 exaFLOPS ,'
\___\ ,'
`----------------'405B is hopelessly out of reach for running in a homelab without spending thousands of dollars. For most people wanting to try out the 405B model, the best option is to rent compute from a datacenter. Looking forward to seeing what it can accomplish.
On a related note, for those interested in experimenting with large language models locally, I've been working on an app called Msty [1]. It allows you to run models like this with just one click and features a clean, functional interface. Just added support for both 8B and 70B. Still in development, but I'd appreciate any feedback.
[1]: https://msty.app
Can you add GCP Vertex AI API support? Then one key would enable Claude, Llama herd, Gemini, Gemma etc
Let us know if you have other needs!
edit: If the AI bubble pops we will be swimming in GPUs... but no new models.
Too bad, too, I don't think my PC will fit 20 4090s (480GiB).
Open Source AI Is the Path Forward
https://about.fb.com/news/2024/07/open-source-ai-is-the-path...
Seems like the biggest GPU node they have is the p5.48xlarge @ 640GB (8xH100s). Routing between multiple nodes would be too slow unless there's an InfiniBand fabric you can leverage. Interested to know if anyone else is exploring this.
For home users 7B models (which can fit on an 8GB GPU) and 13B models (which can fit on a 16GB GPU) are in far more demand. If you're a researcher, you want a 70B model to get the best performance, and so your benchmarks are comparable to everyone else.
The perplexity per parameter is higher and the delta grows as it scales.
Not per bit, but per parameter.
Why this is happening really needs more attention and more consideration for pretrained model development right now.
A sleeping giant of a difference in a space where even marginal gains make headlines.
And answer queries like:
Give all <myObject> which refer to <location> which refer to an Indo-European <language>.
https://github.com/meta-llama/llama-models/blob/main/models/...
Unless...
You have a couple hundred $k sitting around collecting dust... then all you need is a DGX or HGX level of vRAM, the power to run it, the power to keep it cool, and place for it to sit.
* You'll be running a Q5(ish) quantized model, not the full model
* You're OK with buying used hardware
* You have two separate 120v circuits available to plug it into (I assume you're in the US), or alternatively a single 240v dryer/oven/RV-style plug.
The build would look something like (approximate secondary market prices in parentheses):
* Asrock ROMED8-2T motherboard ($700)
* A used Epyc Rome CPU ($300-$1000 depending on how many cores you want)
* 256GB of DDR4, 8x 32GB modules ($550)
* nvme boot drive ($100)
* Ten RTX 3090 cards ($700 each, $7000 total)
* Two 1500 watt power supplies. One will power the mobo and four GPUs, and the other will power the remaining six GPUs ($500 total)
* An open frame case, the kind made for crypto miners ($100?)
* PCIe splitters, cables, screws, fans, other misc parts ($500)
Total is about $10k, give or take. You'll be limiting the GPUs (using `nvidia-smi` or similar) to run at 200-225W each, which drastically reduces their top-end power draw for a minimal drop in performance. Plug each power supply into a different AC circuit, or use a dual 120V adapter with a 240V outlet to effectively accomplish the same thing.
When actively running inference you'll likely be pulling ~2500-2800W from the wall, but at idle, the whole system should use about a tenth of that.
It will heat up the room it's in, especially if you use it frequently, but since it's in an open frame case there are lots of options for cooling.
I realize that this setup is still out of the reach of the "average Joe" but for a dedicated (high-end) hobbyist or someone who wants to build a business, this is a surprisingly reasonable cost.
Edit: the other cool thing is that if you use fast DDR4 and populate all 8 RAM slots as I recommend above, the memory bandwidth of this system is competitive with that of Apple silicon -- 204.8GB/sec, with DDR4-3200. Combined with a 32+ core Epyc, you could experiment with running many models completely on the CPU, though Lllama 405b will probably still be excruciatingly slow.
Would love to hear your feedback!
Meta's goal from the start was to target OpenAI and the other proprietary model players with a "scorched earth" approach by releasing powerful open models to disrupt the competitive landscape.
Meta can likely outspend any other AI lab on compute and talent:
- OpenAI makes an estimated revenue of $2B and is likely unprofitable. Meta generated a revenue of $134B and profits of $39B in 2023.
- Meta's compute resources likely outrank OpenAI by now.
- Open source likely attracts better talent and researchers.
- One possible outcome could be the acquisition of OpenAI by Microsoft to catch up with Meta.
The big winners of this: devs and AI product startups
I work at OpenAI and used to work at meta. Almost every person from meta that I know has asked me for a referral to OpenAI. I don’t know anyone who left OpenAI to go to meta.
There is no defensible moat unless a player truly develops some secret sauce on training. As of now seems that the most meaningful techniques are already widely known and understood.
The money will be made on compute and on applications of the base model (that are sufficiently novel/differentiated).
Investors will lose big on OpenAI and competitors (outside of greater fool approach)
This is why Altman has gone all out pushing for regulation and playing up safety concerns while simultaneously pushing out the people in his company that actually deeply worry about safety. Altman doesn't care about safety, he just wants governments to build him a moat that doesn't naturally exist.
https://github.com/meta-llama/llama-models/blob/main/models/...
Classic strategy.