Interesting to note - we have no idea how much R1 cost to train. To speculate - maybe DeepSeek’s release made an upcoming Llama release moot in comparison.
FP8 training and GRPO make sense to me, but that only gets you a 4x improvement total, right?
We don’t know how much of a speed improvement GRPO represents. They didn’t say how many GPU hours went into to RLing DeepSeek-r1 and we don’t have a o1 numbers to compare.
There’s definitely lots of misinformation spreading though. The $5.5m number refers to Deepseek-v3, not Deepseek-r1. I don't want to take away from HighFlyer's accomplishment, though. I think a lot of these innovations were forced to work around H800 networking limitations, and it's impressive what they've done.
According to this article they are rattled in some way...
Deepseek is open source and based on Meta's open source Llama models. So Meta can easily run Deepseek on their pipeline.
The revenue model for both Meta and Deepseek is to apply the model to their business, not just sell it as a chatbot or API. That's why they publish it, they benefit from the community improvements and ironing out bugs.
I'm imagining four rooms of candlelight and collective reading of publications. "War room" is executive-speak for "Important/Urgent Panic" or "rearranging deck chairs on the Titanic"
Four war rooms to read a document; so Meta
Spun differently - Meta just reacted to take advantage of a new opportunity in just a couple of weeks. Completely reshooting an entire years worth of work for dozens of engineers. That sounds… appropriate? For an announcement big enough to chop $600M off nvidia market cap.
Come to think of it, I wonder how much meta spends on AI power. 80% of that number could be a billion dollars.
Meta on the other hand makes money off whatsapp, facebook, instagram and threads. for meta an additional provider of "AI stuff" is not a threat to their source of revenue.
Meta can use cheap models to enhance core business.
"Gentlemen, you can't fight in the war room."