undefined | Better HN

0 pointsgmerc1y ago0 comments

[flagged]

0 comments

I used to think the same way wrt Nvidia stock when it tanked - compute is clearly diminishing returns. Tech companies subsequently announced capex equal to or greater than expected in compute. I smacked myself on the forehead when I realized - I'd been think too much like an engineer. Tech CEOs badly want to believe they have an edge over every upstart from San Francisco to Shanghai. Unlimited spending on compute gives them that reassurance. In fact, the more threatened they feel, the more they spend to cling onto it.

Kids have security blankets. Tech CEOs have security compute clusters.

mrcwinn1y ago

This is the danger of being informed only by sensational headlines. Nvidia's stock has fully recovered and is again near an all-time high. You seem to be generalizing about "Tech CEOs" — but in this case, GPUs are the advantage. They are necessary to achieve the outcome, and yet they are severely supply constrained. It's smart to overpay now.

Apple did something similar with NAND storage for the iPad mini. They took a bet that could have been wrong. It was not wrong. Competitors had a hard time because of it.

gmercOP1y ago

Overpaying for using them is not smart. They depreciate fast under heavy load.

2 more replies

matthewdgreen1y ago

Everything is near an "all time high." Microstrategy stock is hovering near an all-time-high, and they're just a company that buys up Bitcoin and wastes some of it. Meme coins are floating up to all-time-highs. Stop using asset prices to justify anything people are doing, they're fully decoupled from anything happening below.

InkCanon1y ago

I don't think I was informed by sensational headlines. I was well into talking to people I knew about how DS's performance relative to compute was a game changer much before the stock crash.

It's not binary where you either have compute or not. You definitely do need GPUs, but there's already masses of compute, I believe it doubles every ten months or so just from Nvidia's chips. Many factors make it a very irrational decision

1) Companies were spending hundreds of billions collectively on AI capex. Meta alone was 75 billion projected this year. This is an extraordinary bet, given that the most revenue any AI company makes is a few billion by OpenAI.

2) When DS came out, it was a huge validation of the moatless idea. These SOTA companies have no moat, at best they are spending tens of billions to maintain a few months edge.

3) DS was also a huge validation of the compute saturation idea - that SOTA models were always massively efficient. At best it was traded for iteration speed.

4) Many other more technical arguments - Jevons paradox, data exhaustion (synthetic data can only be generated for a fixed set of things), apparent diminishing returns (performance relative to compute, the denominator has been exponential but the numerator logarithmic)

So on one hand you have these SOTA models which are becoming free. On the other hand you have this terrible business model. I strongly suspect that AI will go the way of Meta's Metaverse - a staggering cash burn with no realistic path to profitability.

It's one thing to invest in a new technology with tangible benefits to your product. It's another to spend vastly, vastly more into vague promises of AGI. To put it into perspective, Meta will spend on AI capex in a few months of 2025 as much as Apple spent on NAND in total. What advantage is there to be had with SOTA models? You do 20% better on some AIME/IQ/competitive coding benchmark, which still translates atrociously to real world issues.

But Nvidia will be very successful because these companies frankly have lost a lot of the plot and are FOMOing like mad. I still have memories of the 2013 AI gold rush where every tech company was grabbing anything with AI in them, which is how Google got DeepMind. They are being enormously rewarded by it by the stock market with Meta's price 6x since it's lows.

1 more reply

zhobbs1y ago

>For what? There is no ROI at that price point. There is no monetisation potential.

I think your whole argument is based on this being true, but you didn't give much argument about why there is no ROI. 400M USD isn't hard to generate...even a moderate ad engagement lift on X would generate ROI and that's just 1 customer.

Imagine going back in time and showing every VC how great the search business will be in 20-30 years. The only rational response would be to make giant bets on 20 different Googles...and I think that's what's happening. These all seem like rational investments to me.

makestuff1y ago

Ken Griffin had an interview where he said something along the lines of the technologies dot com bubble pretty much turned out to be what everyone thought they would become at the time. The issue was valuations grew way too fast and it took much longer than expected for the companies to build out their products.

I think a similar thing is playing out with AI. In 5-10 more years these LLMs will replace a google search today (and maybe be even better).

loandbehold1y ago

Everyone I know has already switched from Google to ChatGPT for most of their search queries.

gmercOP1y ago

That's a red herring because it ignores the part where they could have done the same things spending a tiny fraction of the money.

gordonhart1y ago

_Could_ they have done the same thing with a tiny fraction of the money? Grok 3 benchmarks are SOTA for both base model and reasoning. By definition, nobody has been able to do the same thing with any amount of money (discounting o3 which has been teased but is unreleased). That may change in the future! But as of now this is the case.

1 more reply

YetAnotherNick1y ago

Why don't you do it then? If you are talking about Deepseek "$5M", then you would be interested to know that they pay 7 digit salaries and reportedly have H100s worth $2B[1].

[1]: https://sherwood.news/tech/the-trillion-dollar-mystery-surro...

zhobbs1y ago

Just wonder if it matters? If Google spent 10x as much in the first 5 years of its life would it be a worse company now? Giant TAM, winner takes all (or most?), all that matters is winning.

loandbehold1y ago

People like Demis Hasabis and Derio Amodei say that R1 efficiency gains are exaggerated. $5M training cost seems to be fake as sources suggest they own more GPUs.

BluSyn1y ago

You seem to be assuming that the full cost of the cluster is recouped by Grok 3. The real value will be in grok 5, 6, etc…

xAI also announced a few days ago they are starting an internal video game studio. How long before AI companies take over Hollywood and Disney? The value available to be captured is massive.

The cluster they’ve built is impressive compared to the competition, and grok 3 barely scratches what it’s capable of.

Tycho1y ago

Yes. Why do get these replies on HN that seem to only consider the most shallow, surface details? It could well be that xAI wins the AI race by betting on hardware first and foremost - new ideas are quickly copied by everyone, but a compute edge is hard to match.

HarHarVeryFunny1y ago

The compute edge belongs to those like Google (TPU) and Amazon/Anthropic (Trainium) building their own accelerators and not paying NVIDIAs 1000% cost markups. Microsoft just announced experimenting with Cerebras wafer scale chips for LLM inference which are also a cost savings.

Microsoft is in process of building optical links between existing datacenters to create meta-clusters, and I'd expect that others like Amazon and Meta may be doing the same.

Of course for Musk this is an irrational ego-driven pursuit, so he can throw as much money at it as he has available, but trying to sell AI when you're paying 10x the competition for FLOPs seems problematic, even you you are capable of building a competitive product.

3 more replies

gmercOP1y ago

DeepSeek just showed the compute edge is not that hard to match. They could have chosen to keep the gains proprietary but probably made good money playing the market instead, quants as they are.

https://centreforaileadership.org/resources/deepseeks_narrat...

If you’re using your compute capacity at 1.25% efficiency, you are not going to win because your iteration time is just going to be too long to stay competitive.

3 more replies

acchow1y ago

> but a compute edge is hard to match.

xAI bought hardware off the open market. Their compute edge could dissappear in a month if Google or Amazon wanted to raise their compute by a whole xAI

1 more reply

kmac_1y ago

The point is that it is inefficient. Others achieved similar results much cheaper, meaning they can go much further. Compute is important, but model architecture and compute methods still outweigh it.

HarHarVeryFunny1y ago

How quickly will Grok 4/5/6 be released? Of course you can choose to keep running older GPUs for years, but if you want bleeding edge performance then you need to upgrade, so I'm not sure how many model generations the cost can really be spread over.

Also, what isn't clear is how RL-based reasoning model training compute requirements compares to earlier models. OpenAI have announced that GPT 4.5 will be their last non-reasoning model, so it seems we're definitely at a transition point now.

gmercOP1y ago

At current efficiency? Not nearly as fast as DeepSeek 4 ;)

gmercOP1y ago

None of which explains this massive waste of money for zero gain.

Larrikin1y ago

It's not going to be from this unless it's forced upon us by the federal government. All the other companies are ahead and aren't just going to stop.

doctorpangloss1y ago

> xAI also announced a few days ago they are starting an internal video game studio.

Ha ha. I'm sure their play to claim airdrop idle game will be groundbreaking.

nilkn1y ago

xAI is not trying to make an immediate profit -- ironically, just like DeepSeek. They will undoubtedly use more efficient training processes in future runs and they will scale that across their massive GPU cluster. Just because they didn't cancel the training of Grok 3 and start over absolutely does not mean they will not incorporate all the work from R1 and more in the next run.

What you're seeing right now is pure flex and a signal for the future and competition. A much maligned AI team that hasn't even been around for very long at all just matched or topped the competition without making use of the latest training techniques yet. The message this is intended to send is that xAI is a serious player in the space.

ctoth1y ago

> DeeoSeek trained r1 for 1.25% (5M) of that money (using the same spot price) on 2048 crippled export H800s and is maybe a month behind.

This is a great example of how a misleading narrative can take hold and dominate discussion even when it's fundamentally incorrect.

SemiAnalysis documents that DeepSeek has spent well over $500M on GPUs alone, with total infrastructure costs around $2.5B when including operating costs[0].

The more-interesting question is probably why do people keep repeating this? Why do they want it to be true so badly?

[0]: https://semianalysis.com/2025/01/31/deepseek-debates/#:~:tex...

tempusalaria1y ago

SemiAnalysis is wrong. They just made their numbers up (among many other things they have invented - they are not to be trusted). I have observed many errors of understanding, analysis and calculation in their writing.

Deep Seek R1 is literally an open weight model. It has <40bln active parameters. We know that for a fact. That size of model is definitely roughly optimally trained over the time period and server times claimed. In fact, the 70bln parameter Llama 3 model used almost exactly the same compute as the DeepSeek V3/R1 claims (which makes sense, as you would expect a bit less efficiency for the H800 and for the complex DeepSeek MoE architecture).

FergusArgyll1y ago

Active parameters is definitely the wrong metric to use for evaluating the cost to train a model

consumer4511y ago

> For what? There is no ROI at that price point. There is no monetization potential.

It appears that LLM chat interfaces will replace Google SERPs as the arbiters of truth. Getting people to use your LLM allows you to push your world view. Pushing his "unique" world view appears to be the most important thing to modern Musk.

In that light, paying 40B for Twitter, and billions for Grok training makes perfect sense.

screye1y ago

It's a race for AGI, a VC's wet dream.

The beauty of a failed investment is that it never goes below zero. So upside is the only thing they care about. Why invest in a near-zero chance for a random SAAS to take off, when you can invest in a near-zero chance of creating superhuman artificial life?

rtsil1y ago

> It's a race for AGI, a VC's wet dream.

Yes but why? This is what I really don't understand.

Say AGI is achieved within a reasonable timeframe. Odds are that no single company will achieve that, there will be no monopoly. If that's the case, where is the trillion dollars value for investors? From every claim we hear about it, AGI will lead to hundreds of millions of jobs disappearing (all white-collar jobs), and tens of millions of companies disappearing (all the companies that provide human-produced services). Who is going to buy your AGI-made products or services when nobody is paid anymore, when other companies, big and small, has ceased to exist? Sure, you can make extraordinary accomplishments and advance humanity far, far ahead, but who is going to pay for that? Even states won't be able to pay if their taxable population (individuals and corporations) disappear.

So where will the money come from? How does it work?

bparsons1y ago

Also, profitability won't materialize in an environment with so many competitors offering comparable products. Perfect competition destroys profit. The good becomes a commodity, and the price people will pay simply becomes the marginal cost of production (or in this case, less, while the dumb money is still chasing the hype).

gmercOP1y ago

Works well when you see the company stuffing dollar bills into their sports car to race on 1.25% fuel efficiency against a chinese family sedan with a hand tuned ICE.

mempko1y ago

A failed investment never goes below zero for the investor. For everyone else on the other hand...

gordonhart1y ago

As a consumer, I'm just happy that base models are improving again after a ~quarter or more of relative stagnation (last big base model drop was Sonnet v2 in October). Many use cases can't use o1, r1, or o3[-mini] due to the additional reasoning latency.

FergusArgyll1y ago

Yes, and the scaling laws survive! so, hopefully, more on the way...

> due to the additional reasoning latency.

They're also less creative for non-STEM topics

tinyhouse1y ago

DeepSeek wouldn't be able to train R1 without their ~600B parameters base model, so you should consider the cost of that model when you compare with Grok.

In any case, Elon won't win this race cause the best talent will not work for him. He used to have good reputation and a lot of money, which is a deadly combination. Now he only has the latter -- not enough when leading AI people can make 7 figures in other companies.

To be clear 1: I'm not saying that people who currently work on Grok are not great. It's not about hiring some great people. It's about competing in the long run - people with other options (e.g. offers from leading AI labs) are more likely to accept those offers than joining his research lab.

To be clear 2: I'm not talking about Elon's reputation due to his politics. I'm only talking about his reputation as an employer.

He has the vision and marketing skills but it's not going to be enough for leading the AI race.

andy12_1y ago

Actually, the 5 million figure is for the compute cost for the base 600B parameter model. Training R1 was just 8000 steps of reinforcement learning, so I expect that the vast, vast majority of the training cost is already included in the pretraining stage.

gmercOP1y ago

It’s not like Grok3 didn’t have precious work to build on either, but point taken.

I think the situations are a bit comparable given timelines however.

A perfect analogy for AI … your ability to replace talent with money. And if you don’t have the talent, it’s gonna cost you 100x more.

tastyfreeze1y ago

> your ability to replace talent with money

That sure seems to be the message given in Apple AI commercials. From those commercials the tag line for AI should be "enabling idiots everywhere".

submeta1y ago

> until Claude4 snuffs it out later this month

Any source? I’m a heavy user of Claude and pay for the Teams plan just for myself so I won’t get throttled. Love it. But I’ve been impressed with O1 Pro lately. That said, I don’t like paying both €166 for Claude Teams and €238 for OpenAI Pro. :)

dragonwriter1y ago

> This all by the man in charge of “government spending efficiency”.

Per court filings by the administration, Musk is not in charge of DOGE, nor does he have any role in DOGE, nor any decision-making function in government at all, he is a White House advisor unconnected to DOGE.

soheil1y ago

[flagged]

gmercOP1y ago

[flagged]

shytey1y ago

What is hilarious is your disdain for their achievements which occurred in less than two years. This is just the beginning.

jejeyyy771y ago

what makes you think there won't be an ROI?

gmercOP1y ago

I think at this point you're going to have to answer "what makes you think there will be any"

belter1y ago

> There is no monetisation potential.

DOGE uses only X links, and I am sure Grok will be the next gov contract. After all he has all the data on everybody down to your IRS tax returns.

nicce1y ago

How this is even legal. Don’t they have any sort of competitive tendering?

427728271y ago

We are long past the rule of law in the US. Whatever is left is residual, running on fumes. China-style corruption is here to stay.

1 more reply

Larrikin1y ago

Why do you think legality matters?

1 more reply

matwood1y ago

> How this is even legal.

You're talking about Musk and Trump. Legality doesn't even enter into the conversation.

j / k navigate · click thread line to collapse

0 comments

InkCanon1y ago

Kids have security blankets. Tech CEOs have security compute clusters.

mrcwinn1y ago

Apple did something similar with NAND storage for the iPad mini. They took a bet that could have been wrong. It was not wrong. Competitors had a hard time because of it.

gmercOP1y ago

Overpaying for using them is not smart. They depreciate fast under heavy load.

2 more replies

matthewdgreen1y ago

InkCanon1y ago

I don't think I was informed by sensational headlines. I was well into talking to people I knew about how DS's performance relative to compute was a game changer much before the stock crash.

2) When DS came out, it was a huge validation of the moatless idea. These SOTA companies have no moat, at best they are spending tens of billions to maintain a few months edge.

3) DS was also a huge validation of the compute saturation idea - that SOTA models were always massively efficient. At best it was traded for iteration speed.

1 more reply

zhobbs1y ago

>For what? There is no ROI at that price point. There is no monetisation potential.

makestuff1y ago

I think a similar thing is playing out with AI. In 5-10 more years these LLMs will replace a google search today (and maybe be even better).

loandbehold1y ago

Everyone I know has already switched from Google to ChatGPT for most of their search queries.

gmercOP1y ago

That's a red herring because it ignores the part where they could have done the same things spending a tiny fraction of the money.

gordonhart1y ago

1 more reply

YetAnotherNick1y ago

Why don't you do it then? If you are talking about Deepseek "$5M", then you would be interested to know that they pay 7 digit salaries and reportedly have H100s worth $2B[1].

[1]: https://sherwood.news/tech/the-trillion-dollar-mystery-surro...

zhobbs1y ago

Just wonder if it matters? If Google spent 10x as much in the first 5 years of its life would it be a worse company now? Giant TAM, winner takes all (or most?), all that matters is winning.

loandbehold1y ago

People like Demis Hasabis and Derio Amodei say that R1 efficiency gains are exaggerated. $5M training cost seems to be fake as sources suggest they own more GPUs.

BluSyn1y ago

You seem to be assuming that the full cost of the cluster is recouped by Grok 3. The real value will be in grok 5, 6, etc…

xAI also announced a few days ago they are starting an internal video game studio. How long before AI companies take over Hollywood and Disney? The value available to be captured is massive.

The cluster they’ve built is impressive compared to the competition, and grok 3 barely scratches what it’s capable of.

Tycho1y ago

HarHarVeryFunny1y ago

Microsoft is in process of building optical links between existing datacenters to create meta-clusters, and I'd expect that others like Amazon and Meta may be doing the same.

3 more replies

gmercOP1y ago

DeepSeek just showed the compute edge is not that hard to match. They could have chosen to keep the gains proprietary but probably made good money playing the market instead, quants as they are.

https://centreforaileadership.org/resources/deepseeks_narrat...

If you’re using your compute capacity at 1.25% efficiency, you are not going to win because your iteration time is just going to be too long to stay competitive.

3 more replies

acchow1y ago

> but a compute edge is hard to match.

xAI bought hardware off the open market. Their compute edge could dissappear in a month if Google or Amazon wanted to raise their compute by a whole xAI

1 more reply

kmac_1y ago

HarHarVeryFunny1y ago

gmercOP1y ago

At current efficiency? Not nearly as fast as DeepSeek 4 ;)

gmercOP1y ago

None of which explains this massive waste of money for zero gain.

Larrikin1y ago

It's not going to be from this unless it's forced upon us by the federal government. All the other companies are ahead and aren't just going to stop.

doctorpangloss1y ago

> xAI also announced a few days ago they are starting an internal video game studio.

Ha ha. I'm sure their play to claim airdrop idle game will be groundbreaking.

nilkn1y ago

ctoth1y ago

> DeeoSeek trained r1 for 1.25% (5M) of that money (using the same spot price) on 2048 crippled export H800s and is maybe a month behind.

This is a great example of how a misleading narrative can take hold and dominate discussion even when it's fundamentally incorrect.

SemiAnalysis documents that DeepSeek has spent well over $500M on GPUs alone, with total infrastructure costs around $2.5B when including operating costs[0].

The more-interesting question is probably why do people keep repeating this? Why do they want it to be true so badly?

[0]: https://semianalysis.com/2025/01/31/deepseek-debates/#:~:tex...

tempusalaria1y ago

FergusArgyll1y ago

Active parameters is definitely the wrong metric to use for evaluating the cost to train a model

consumer4511y ago

> For what? There is no ROI at that price point. There is no monetization potential.

In that light, paying 40B for Twitter, and billions for Grok training makes perfect sense.

screye1y ago

It's a race for AGI, a VC's wet dream.

rtsil1y ago

> It's a race for AGI, a VC's wet dream.

Yes but why? This is what I really don't understand.

So where will the money come from? How does it work?

bparsons1y ago

gmercOP1y ago

Works well when you see the company stuffing dollar bills into their sports car to race on 1.25% fuel efficiency against a chinese family sedan with a hand tuned ICE.

mempko1y ago

A failed investment never goes below zero for the investor. For everyone else on the other hand...

gordonhart1y ago

FergusArgyll1y ago

Yes, and the scaling laws survive! so, hopefully, more on the way...

> due to the additional reasoning latency.

They're also less creative for non-STEM topics

tinyhouse1y ago

DeepSeek wouldn't be able to train R1 without their ~600B parameters base model, so you should consider the cost of that model when you compare with Grok.

To be clear 2: I'm not talking about Elon's reputation due to his politics. I'm only talking about his reputation as an employer.

He has the vision and marketing skills but it's not going to be enough for leading the AI race.

andy12_1y ago

gmercOP1y ago

It’s not like Grok3 didn’t have precious work to build on either, but point taken.

I think the situations are a bit comparable given timelines however.

A perfect analogy for AI … your ability to replace talent with money. And if you don’t have the talent, it’s gonna cost you 100x more.

tastyfreeze1y ago

> your ability to replace talent with money

That sure seems to be the message given in Apple AI commercials. From those commercials the tag line for AI should be "enabling idiots everywhere".

submeta1y ago

> until Claude4 snuffs it out later this month

dragonwriter1y ago

> This all by the man in charge of “government spending efficiency”.

soheil1y ago

[flagged]

gmercOP1y ago

[flagged]

shytey1y ago

What is hilarious is your disdain for their achievements which occurred in less than two years. This is just the beginning.

jejeyyy771y ago

what makes you think there won't be an ROI?

gmercOP1y ago

I think at this point you're going to have to answer "what makes you think there will be any"

belter1y ago

> There is no monetisation potential.

DOGE uses only X links, and I am sure Grok will be the next gov contract. After all he has all the data on everybody down to your IRS tax returns.

nicce1y ago

How this is even legal. Don’t they have any sort of competitive tendering?

427728271y ago

We are long past the rule of law in the US. Whatever is left is residual, running on fumes. China-style corruption is here to stay.

1 more reply

Larrikin1y ago

Why do you think legality matters?

1 more reply

matwood1y ago

> How this is even legal.

You're talking about Musk and Trump. Legality doesn't even enter into the conversation.

j / k navigate · click thread line to collapse