AMD Instinct MI325X in Q4 2024, 288GB of HBM3E (opens in new tab)

(ir.amd.com)

82 pointsasparagui1y ago50 comments

50 comments

The claim that the next generation would be 35x faster, felt like an "Osborne moment" to me, but if demand is robust enough...

Netcob1y ago

In AI, that doesn't sound too surprising to me right now.

I just experiment with some local LLMs, but the differences are pretty huge:

Llama 3 8B, Raspberry Pi 5: 2-3 Tokens/second (but it works!)

Llama 3 8B, RTX 4080: ~60 Tokens/second

Llama 3 8B, groq.com LPU, ~1300 Tokens/second

Llama 3 70B, AMD 7800X3D: 1-2 Tokens/second

Llama 3 70B, groq.com LPU, ~330 Tokens/second

There seem to be huge gaps between CPU, GPU and specialized inference ASICs. I'm guessing that right now there aren't many genius-level architecture breakthroughs, and that it's more about how much memory and silicon real estate you're willing to dedicate to AI inference.

SushiHippie1y ago

What quantization levels did you use?

I think groq doesn't use quantization, so the gap between your hardware and groq would be even further apart.

2 more replies

zozbot2341y ago

> Llama 3 70B, AMD 7800X3D: 1-2 Tokens/second

How much RAM is required for this result? It's quite impressive that it even works as well as it does.

1 more reply

wmf1y ago

Nvidia is doing the same thing. They announced B100 before H200 shipped and a few hours ago they started talking about R100 before B100 shipped.

ipsum21y ago

(Re: Osborne effect) It's going to be released in 2 years. Rarely can businesses wait that long, they're going to be ordering the MI300 now.

karma_pharmer1y ago

Or they're trying to distract attention from the fact that they've already sold out 100% of the fab capacity available to produce these chips for the next two years.

So really, they lose nothing. They've already booked sales of everything there is to sell. So might as well now turn attention to those who might be customers two years from now, and make them feel like the wait will be worth it.

latchkey1y ago

[deleted] See below, I did not understand the Osborne effect comment.

1 more reply

Havoc1y ago

> 35x increase in AI inference performance compared to AMD Instinct MI300 Series

Even for marketing claims that’s pretty wild.

Still lots of trajectory left in just scale up plan it seems

layoric1y ago

I think there is a close limit considering most of these gains are coming from the reduced memory bandwidth consumption that comes with the smaller data types. This would line up with Nvidia’s crazy graph from yesterday where data types were specified.

How much lower can these go though? 2bit? 1.58bit? 1bit? It seems that these massive gains have a very hard stop to gains that AMD and Nvidia will use to raise their stock price before it all comes to a sudden end.

jauntywundrkind1y ago

Such a weird & cruel modernity, where these releases are purely in the abstract. No, you still won't be able to buy a MI300X in Q4 2024. The enhanced edition will absolutely not be available.

(I miss the old PC era where the world at large was benefiting in tandem from new things happening (or falling behind from not adapting)).

latchkey1y ago

The focus is on hyperscalers and cloud service providers now. Even Groq stopped selling to retail.

karma_pharmer1y ago

Welcome (back) to the age of the mainframe.

Except we call it "cloudframe" now.

mpreda1y ago

I think that's where short-sighted financial gain leads AMD to. Where's the money? -- datacenter. So let's focus the good stuff on datacenter exclusivelly. What about "the rest" (gamers, hobbist, students)? There's no money there, let's give theme crap RDNA that we make sure can't be used for any real work; just pretent we're catering for their needs.

I think their "consumer GPU" did so bad recently that AMD could just as well, you know, simply liquidate the "consumer GPU" division and stop pretending.

I'm in the "consumer GPU" market myself; what AMD GPU do I buy today? -- Radeon Pro VII, launched in 2020 and the best AMD consumer GPU I can find today.

It's such a divide. I could optimize my software for such powerful GPUs as the Mi300 line.. but why do that, given that probably I won't even see one such GPU in my lifetime.

atq21191y ago

The RX 7900s are pretty good. You get 24GB of RAM in a consumer GPU. If you're interested in GenAI that's a good offering for your "gamers, students, hobbyists" category.

And they announced a workstation version with 48GB: https://www.phoronix.com/news/AMD-Radeon-PRO-W7900-Dual-Slot

1 more reply

jacobgorm1y ago

I’ve filed a detailed AMD Windows driver crash bug months ago that is getting totally ignored because all their devs have been moved to working on AI.

re-thc1y ago

> (I miss the old PC era where the world at large was benefiting in tandem from new things happening (or falling behind from not adapting))

Paper launches aren't anything new. It's always been a thing especially in hardware.

wmf1y ago

What is kind of new is that hyperscalers sometimes get chips 1-2 years before the rest. If the new chip is 3x-4x faster that looks pretty unfair.

1 more reply

forrestthewoods1y ago

> No, you still won't be able to buy a MI300X in Q4 2024

Why not? Because they’re sold out to hyperscalers?

latchkey1y ago

They are not sold out. It is just a lot more work to support retail on a novel new product, so they are focused on hyperscalers and CSP's. Don't forget that high end GPUs are US export controlled as well. They are considered weapons by the government. [See 88 Fed. Reg. 73458 (Oct. 25, 2023) and the Export administration Regulations (EAR)].

almostgotcaught1y ago

> No, you still won't be able to buy a MI300X in Q4 2024.

they're 15k - who exactly is disappointed they won't be able to buy one?

krasin1y ago

People on reddit.com/r/localllama build even more expensive rigs sometimes. Everyone wants to run llama3 70B and eventually the 405B version.

latchkey1y ago

As someone who's buying them, that is not the price. Nor can you just buy one at a time. They are on a OAM/UBB, which is 8.

ein0p1y ago

I’d buy a few at that price. But I can’t.

ai_what1y ago

I mean it's the price of 2 * Apple Mac Pro's. It isn't exactly unheard of.

Just a few weeks ago I spoke to someone who shelled out $10k for running LLM's locally. I've seen more expensive builds as well.

bayindirh1y ago

> who exactly is disappointed they won't be able to buy one?

HPC centers and research clusters.

nabla91y ago

AMD comparison:

  8x AMD MI300X (192GB, 750W) GPU
  8x H100 (80GB, 700W) GPU

What would be the result against

  8x H100 NVL (188GB, <800W) GPU

?

DrNosferatu1y ago

Is the software stack working (for practical use)?

AMD still has to prove themselves in this.

j / k navigate · click thread line to collapse

50 comments

shrubble1y ago

The claim that the next generation would be 35x faster, felt like an "Osborne moment" to me, but if demand is robust enough...

Netcob1y ago

In AI, that doesn't sound too surprising to me right now.

I just experiment with some local LLMs, but the differences are pretty huge:

Llama 3 8B, Raspberry Pi 5: 2-3 Tokens/second (but it works!)

Llama 3 8B, RTX 4080: ~60 Tokens/second

Llama 3 8B, groq.com LPU, ~1300 Tokens/second

Llama 3 70B, AMD 7800X3D: 1-2 Tokens/second

Llama 3 70B, groq.com LPU, ~330 Tokens/second

SushiHippie1y ago

What quantization levels did you use?

I think groq doesn't use quantization, so the gap between your hardware and groq would be even further apart.

2 more replies

zozbot2341y ago

> Llama 3 70B, AMD 7800X3D: 1-2 Tokens/second

How much RAM is required for this result? It's quite impressive that it even works as well as it does.

1 more reply

wmf1y ago

Nvidia is doing the same thing. They announced B100 before H200 shipped and a few hours ago they started talking about R100 before B100 shipped.

ipsum21y ago

(Re: Osborne effect) It's going to be released in 2 years. Rarely can businesses wait that long, they're going to be ordering the MI300 now.

karma_pharmer1y ago

Or they're trying to distract attention from the fact that they've already sold out 100% of the fab capacity available to produce these chips for the next two years.

latchkey1y ago

[deleted] See below, I did not understand the Osborne effect comment.

1 more reply

Havoc1y ago

> 35x increase in AI inference performance compared to AMD Instinct MI300 Series

Even for marketing claims that’s pretty wild.

Still lots of trajectory left in just scale up plan it seems

layoric1y ago

jauntywundrkind1y ago

Such a weird & cruel modernity, where these releases are purely in the abstract. No, you still won't be able to buy a MI300X in Q4 2024. The enhanced edition will absolutely not be available.

(I miss the old PC era where the world at large was benefiting in tandem from new things happening (or falling behind from not adapting)).

latchkey1y ago

The focus is on hyperscalers and cloud service providers now. Even Groq stopped selling to retail.

karma_pharmer1y ago

Welcome (back) to the age of the mainframe.

Except we call it "cloudframe" now.

mpreda1y ago

I think their "consumer GPU" did so bad recently that AMD could just as well, you know, simply liquidate the "consumer GPU" division and stop pretending.

I'm in the "consumer GPU" market myself; what AMD GPU do I buy today? -- Radeon Pro VII, launched in 2020 and the best AMD consumer GPU I can find today.

It's such a divide. I could optimize my software for such powerful GPUs as the Mi300 line.. but why do that, given that probably I won't even see one such GPU in my lifetime.

atq21191y ago

The RX 7900s are pretty good. You get 24GB of RAM in a consumer GPU. If you're interested in GenAI that's a good offering for your "gamers, students, hobbyists" category.

And they announced a workstation version with 48GB: https://www.phoronix.com/news/AMD-Radeon-PRO-W7900-Dual-Slot

1 more reply

jacobgorm1y ago

I’ve filed a detailed AMD Windows driver crash bug months ago that is getting totally ignored because all their devs have been moved to working on AI.

re-thc1y ago

> (I miss the old PC era where the world at large was benefiting in tandem from new things happening (or falling behind from not adapting))

Paper launches aren't anything new. It's always been a thing especially in hardware.

wmf1y ago

What is kind of new is that hyperscalers sometimes get chips 1-2 years before the rest. If the new chip is 3x-4x faster that looks pretty unfair.

1 more reply

forrestthewoods1y ago

> No, you still won't be able to buy a MI300X in Q4 2024

Why not? Because they’re sold out to hyperscalers?

latchkey1y ago

almostgotcaught1y ago

> No, you still won't be able to buy a MI300X in Q4 2024.

they're 15k - who exactly is disappointed they won't be able to buy one?

krasin1y ago

People on reddit.com/r/localllama build even more expensive rigs sometimes. Everyone wants to run llama3 70B and eventually the 405B version.

latchkey1y ago

As someone who's buying them, that is not the price. Nor can you just buy one at a time. They are on a OAM/UBB, which is 8.

ein0p1y ago

I’d buy a few at that price. But I can’t.

ai_what1y ago

I mean it's the price of 2 * Apple Mac Pro's. It isn't exactly unheard of.

Just a few weeks ago I spoke to someone who shelled out $10k for running LLM's locally. I've seen more expensive builds as well.

bayindirh1y ago

> who exactly is disappointed they won't be able to buy one?

HPC centers and research clusters.

nabla91y ago

AMD comparison:

  8x AMD MI300X (192GB, 750W) GPU
  8x H100 (80GB, 700W) GPU

What would be the result against

  8x H100 NVL (188GB, <800W) GPU

?

DrNosferatu1y ago

Is the software stack working (for practical use)?

AMD still has to prove themselves in this.

j / k navigate · click thread line to collapse