undefined | Better HN

0 pointsfulladder1y ago0 comments

>To build DeepSeek probably requires at least a $1B+ budget.

Zero evidence that the above statement is true, and weak evidence (authors' claims) that it is false. Have you read their papers even?

https://arxiv.org/html/2412.19437v1#abstract https://arxiv.org/pdf/2501.12948

0 comments

sailingparrot1y ago

Parent is (I assume) talking about the entire budget to get to DeepSpeek V3, not the cost of the final training run.

This includes salary for ~130 ML people + rest of the staff, company is 2 years old. They have trained DeepSpeek V1, V2, R1, R1-Zero before finally training V3, as well as a bunch of other less known models.

The final run of V3 is ~6M$ (at least officially...[1]), but that does not factor the cost of all the other failed runs, ablations etc. that always happen when developing a new model.

You also can't get clusters of this size with a 3 weeks commitment just to do your training and then stop paying for it, there is always a multi-month (if not 1 year) commitment because of demand/supply. Or, if it's a private cluster they own it's already a $200M-300M+ investment just for the advertised 2000 GPUs for that run.

I don't know if it really is $1B, but it certainly isn't below $100M.

[1] I personally believe they used more GPUs than stated, but simply can't be forthcoming about this for obvious reason. I have of course not proof of that, my belief is just based on scaling laws we have seen so far + where the incentives are for stating the # of GPUs. But even if the 2k GPUs figure is accurate, it's still $100M+

tyre1y ago

H100s can cost about $30k. There was an interview with a CEO in the space speculating that they have about 50,000 H100s. That's $1.5bn. Presumably they got volume discounts, though given the export bans they might have had to pay a premium on that discount to buy them secondhand. If it were H800s, that would be ~half the price, which is still high hundreds of millions for the chips alone.

Is that true? No idea. But there isn't zero evidence.

j / k navigate · click thread line to collapse