Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI (opens in new tab)

(fortune.com)

27 pointszathan1y ago19 comments

19 comments

I might be missing something, but DeepSeek’s recipe is right there in plain sight. Most of the cost efficiency of DeepSeek v3 seem to be attributable to MoE and FP8 training. DeepSeek R1s improvements are from GRPO-based RL.

Interesting to note - we have no idea how much R1 cost to train. To speculate - maybe DeepSeek’s release made an upcoming Llama release moot in comparison.

pptr1y ago

What is different about Deepseek's use of MoE vs all the other MoE models that makes training more efficient?

FP8 training and GRPO make sense to me, but that only gets you a 4x improvement total, right?

ahzhou1y ago

They slightly restructure their MoE [1], but I think the main difference is that other big models (e.g Llama 504B) are dense and have higher FLOP requirements. MoE should represent a ~5x improvement. FP8 should be about a ~2x improvement.

We don’t know how much of a speed improvement GRPO represents. They didn’t say how many GPU hours went into to RLing DeepSeek-r1 and we don’t have a o1 numbers to compare.

There’s definitely lots of misinformation spreading though. The $5.5m number refers to Deepseek-v3, not Deepseek-r1. I don't want to take away from HighFlyer's accomplishment, though. I think a lot of these innovations were forced to work around H800 networking limitations, and it's impressive what they've done.

[1] https://arxiv.org/abs/2401.06066

karmakaze1y ago

It's interesting that only having access to less powerful hardware motivated/necessitated more efficient training--like how tariffs can backfire if left in place too long.

marjann1y ago

What a time to be alive. Chinese companies were copying everything from the west, now it seems the opposite.

tetris111y ago

(annoying voice) "Hold on to your papers...!"

floydnoel1y ago

there's entire books on this phenomenon, it isn't at all unusual. happened to japan, korea, etc. moving up the value chain!

bamboozled1y ago

Can anyone explain why Meta's share price was untouched by the deep seek announcement ? They have spent billions on AI infra?

According to this article they are rattled in some way...

alecco1y ago

OpenAI and others are valued for expected future revenue of running the models. And they were also valued as having magic "secret sauce" in their closed source models. Investors are now pulling back from this kind of company.

Deepseek is open source and based on Meta's open source Llama models. So Meta can easily run Deepseek on their pipeline.

The revenue model for both Meta and Deepseek is to apply the model to their business, not just sell it as a chatbot or API. That's why they publish it, they benefit from the community improvements and ironing out bugs.

bravetraveler1y ago

My guess: they're somewhat uniquely positioned for the data. With 'the feeds' they're closer to a source/can withstand more. They plan to monetize another way

I'm imagining four rooms of candlelight and collective reading of publications. "War room" is executive-speak for "Important/Urgent Panic" or "rearranging deck chairs on the Titanic"

Four war rooms to read a document; so Meta

edmundsauto1y ago

This interpretation is heavily based on the journalists choice of words designed to create drama. If Meta can recreate this success in llama, they just cut their power bill by 80+%. That deserves jumping on something immediately and not waiting for next half’s planning cycle.

Spun differently - Meta just reacted to take advantage of a new opportunity in just a couple of weeks. Completely reshooting an entire years worth of work for dozens of engineers. That sounds… appropriate? For an announcement big enough to chop $600M off nvidia market cap.

Come to think of it, I wonder how much meta spends on AI power. 80% of that number could be a billion dollars.

Ekaros1y ago

They are still social-media company. And make most money from there. AI is like metaverse bets. And AI being cheaper to create might even be positive for them, if they can figure out a use case.

rchaud1y ago

They make all their money on ads in FB and IG. It's how their stock barely budged despite losing $30b on a VR ghost town.

YetAnotherNick1y ago

They are the users of AI, not sellers of AI. Better and cheaper AI would benefit them, no matter who trained it.

znpy1y ago

i think it's because openai makes a bunch of money off "AI stuff" by being regarded the best at this game... and guess what, there's a new player that makes "AI stuff" as good as them (or possibly better) and maybe even cheaper. this could be a threat to their source of revenue.

Meta on the other hand makes money off whatsapp, facebook, instagram and threads. for meta an additional provider of "AI stuff" is not a threat to their source of revenue.

maxglute1y ago

Expensive models are AI companies core business.

Meta can use cheap models to enhance core business.

OfCounsel1y ago

Meta has been aware of DeepSeek for a long time (as Zuckerberg mentioned the company by name in his podcast with Joe Rogan) and a “war room” is just a meeting room.

ryandrake1y ago

My experience is that a "War Room" is just a meeting room, but one where 1. engineers are rounded up to work in (because as we all know, developers type code faster when co-located in a single room under pressure), and 2. where panicked executives occasionally wander in to say things like "How are things going?" and "What's the current status?" and "Do you have an ETA for when we can stop panicking?"

hulitu1y ago

> Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI

"Gentlemen, you can't fight in the war room."

j / k navigate · click thread line to collapse

19 comments

ahzhou1y ago

Interesting to note - we have no idea how much R1 cost to train. To speculate - maybe DeepSeek’s release made an upcoming Llama release moot in comparison.

pptr1y ago

What is different about Deepseek's use of MoE vs all the other MoE models that makes training more efficient?

FP8 training and GRPO make sense to me, but that only gets you a 4x improvement total, right?

ahzhou1y ago

We don’t know how much of a speed improvement GRPO represents. They didn’t say how many GPU hours went into to RLing DeepSeek-r1 and we don’t have a o1 numbers to compare.

[1] https://arxiv.org/abs/2401.06066

karmakaze1y ago

It's interesting that only having access to less powerful hardware motivated/necessitated more efficient training--like how tariffs can backfire if left in place too long.

marjann1y ago

What a time to be alive. Chinese companies were copying everything from the west, now it seems the opposite.

tetris111y ago

(annoying voice) "Hold on to your papers...!"

floydnoel1y ago

there's entire books on this phenomenon, it isn't at all unusual. happened to japan, korea, etc. moving up the value chain!

bamboozled1y ago

Can anyone explain why Meta's share price was untouched by the deep seek announcement ? They have spent billions on AI infra?

According to this article they are rattled in some way...

alecco1y ago

Deepseek is open source and based on Meta's open source Llama models. So Meta can easily run Deepseek on their pipeline.

bravetraveler1y ago

My guess: they're somewhat uniquely positioned for the data. With 'the feeds' they're closer to a source/can withstand more. They plan to monetize another way

I'm imagining four rooms of candlelight and collective reading of publications. "War room" is executive-speak for "Important/Urgent Panic" or "rearranging deck chairs on the Titanic"

Four war rooms to read a document; so Meta

edmundsauto1y ago

Come to think of it, I wonder how much meta spends on AI power. 80% of that number could be a billion dollars.

Ekaros1y ago

They are still social-media company. And make most money from there. AI is like metaverse bets. And AI being cheaper to create might even be positive for them, if they can figure out a use case.

rchaud1y ago

They make all their money on ads in FB and IG. It's how their stock barely budged despite losing $30b on a VR ghost town.

YetAnotherNick1y ago

They are the users of AI, not sellers of AI. Better and cheaper AI would benefit them, no matter who trained it.

znpy1y ago

Meta on the other hand makes money off whatsapp, facebook, instagram and threads. for meta an additional provider of "AI stuff" is not a threat to their source of revenue.

maxglute1y ago

Expensive models are AI companies core business.

Meta can use cheap models to enhance core business.

OfCounsel1y ago

Meta has been aware of DeepSeek for a long time (as Zuckerberg mentioned the company by name in his podcast with Joe Rogan) and a “war room” is just a meeting room.

ryandrake1y ago

hulitu1y ago

> Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI

"Gentlemen, you can't fight in the war room."

j / k navigate · click thread line to collapse