undefined | Better HN

0 pointsmenaerus1y ago0 comments

> 8 channels of DDR5 4800 will still get you something like 300 GB per second bandwidth.

That's the theory. In practice, Sapphire Rapids needs 24-28 cores to hit the 200 GB/s mark and it doesn't go much further than that. Intel CPU design generally has a hard time saturating the memory bandwidth so it remains to be seen if they managed to fix this but I wouldn't hold my breath. 200 GB/s is not much. My dual-socket Skylake system hits ~140 GB/s and it's quite slow for larger LLMs.

> Why does it have to be a dual CPU design?

Because memory bandwidth is one of the most important limiting (compute) factors for larger models inference. With dual-socket design you're essentially doubling the available bandwidth.

> And the original message you were responding to was using a CPU with AMX and mixing it with a GPU like Nvidia 4900/5900.

Dual-socket CPU that costs $10k on a server that costs probably couple of factors more. Now you claimed that it doesn't have to be that expensive but I beg to differ - you still need $20k-$30k of worth equipment to run it. That's a lot and not quite "cost effective".

0 comments

phonon1y ago

The proof of the pudding is in the eating. Read the link above. It's one or two mid range[1] Sapphire Rapids CPUs and a 4090. Dual CPU is faster (partially because 32->64 cores, not just bandwidth) but also hit data locality issues, limiting the increase to about 30%.

(Dual Socket Skylake? Do you mean Cascade Lake?)

If you price it out, it's basically the most cost effective set-up with reasonable speed for large (more than 300 GB) models. Dual socket basically doubles the motherboard[2] and CPU cost, so maybe another $3k-$6k for a 30% uplift.

[1] https://www.intel.com/content/www/us/en/products/sku/231733/... $3,157

[2] https://www.serversupply.com/MOTHERBOARD/SYSTEM%20BOARD/LGA-... $1,800

menaerusOP1y ago

Yes, dual socket Skylake. What's strange about that?

Please price it out for us because I still don't see what's cost effective in a system that costs well over $10k and runs at 8 tok/s vs the dual zen4 system for $6k running at the same tok/s.

phonon1y ago

Sorry. Didn't realize you meant Skylake-SP.

I am not sure what your point is? There are some nice dual socket Epyc examples floating around as well, that claim 6-8 tokens/s. (I think some of those are actually distilled versions with very small context sizes...I don't see any as thoroughly documented/benchmarked as the above). This is a dual socket Sapphire Rapids example with similar sized CPUs and a consumer graphics card that gives about 16 tokens/second. Sapphire Rapids CPU and MB are a bit more expensive, and a 4090 was $1500 until recently. So for a few thousand more you can double the speed. Also the prompt processing speed is waaaaay faster as well. (Something like 10x faster than the Epyc versions.)

In any case, these are all vastly cheaper approaches than trying to get enough H100s to fit the full R1 model in VRAM! A single H100 80 GB is more than $20k, and you would need many of them + server just to run R1.

1 more reply

j / k navigate · click thread line to collapse

0 pointsmenaerus1y ago0 comments

> 8 channels of DDR5 4800 will still get you something like 300 GB per second bandwidth.

> Why does it have to be a dual CPU design?

Because memory bandwidth is one of the most important limiting (compute) factors for larger models inference. With dual-socket design you're essentially doubling the available bandwidth.

> And the original message you were responding to was using a CPU with AMX and mixing it with a GPU like Nvidia 4900/5900.

0 comments

phonon1y ago

(Dual Socket Skylake? Do you mean Cascade Lake?)

[1] https://www.intel.com/content/www/us/en/products/sku/231733/... $3,157

[2] https://www.serversupply.com/MOTHERBOARD/SYSTEM%20BOARD/LGA-... $1,800

menaerusOP1y ago

Yes, dual socket Skylake. What's strange about that?

Please price it out for us because I still don't see what's cost effective in a system that costs well over $10k and runs at 8 tok/s vs the dual zen4 system for $6k running at the same tok/s.

phonon1y ago

Sorry. Didn't realize you meant Skylake-SP.

1 more reply

j / k navigate · click thread line to collapse