undefined | Better HN

0 pointswkat42427mo ago0 comments

What hardware do you have? 50tk/s is really impressive for cpu.

0 comments

2xEPYC Genoa w/768GB of DDR5-4800 and an A5000 24GB card. I built it in January 2024 for about $6k and have thoroughly enjoyed running every new model as it gets released. Some of the best money I’ve ever spent.

testaburger7mo ago

Which specific model epcys? And if it's not too much to ask which motherboard and power supply? I'm really interested in building something similar

smartbit7mo ago

Looking at https://news.ycombinator.com/submitted?id=DrPhish it's probably this machine https://rentry.co/miqumaxx

  * Gigabyte MZ73-LM1 with two AMD EPYC GENOA 9334 QS 64c/128t
  * 24 sticks of M321R4GA3BB6-CQK 32GB DDR5-4800 RDIMM PC5-38400R
  * 24GB A5000

Note that the RAM price almost doubled since Jan 2024

fouc7mo ago

I've seen some mentions of pure-cpu setups being successful for large models using old epyc/xeon workstations off ebay with 40+ cpus. Interesting approach!

wkat4242OP7mo ago

Wow nice!! That's a really good deal for that much hardware.

How many tokens/s do you get for DeepSeek-R1?

DrPhish7mo ago

Thanks, it was a bit of a gamble at the time (lots of dodgy ebay parts), but it paid off.

R1 starts at about 10t/s on an empty context but quickly falls off. I'd say the majority of my tokens are generating around 6t/s.

Some of the other big MoE models can be quite a bit faster.

I'm mostly using QwenCoder 480b at Q8 these days for 9t/s average. I've found I get better real-world results out of it than K2, R1 or GLM4.5.

ekianjo7mo ago

thats a r/localllama user right there

SirMaster7mo ago

I'm getting 20 tokens/sec on the 120B model with a 5060Ti 16GB and a regular desktop Ryzen 7800x3d with 64GB of DDR5-6000.

wkat4242OP7mo ago

Wow that's not bad. It's strange, for me it is much much slower on a Radeon Pro VII (also 16GB, with a memory bandwidth of 1TB/s!) and a Ryzen 5 5600 with also 64GB. It's basically unworkably slow. Also, I only get 100% CPU when I check ollama ps, the GPU is not being used at all :( It's also counterproductive because the model is just too large for 64GB.

I wonder what makes it work so well on yours! My CPU isn't much slower and my GPU probably faster.

magicalhippo7mo ago

AMD basically decided they wanted to focus on HPC and data center customers rather than consumers, and so GPGPU driver support for consumer cards has been non-existing or terrible[1].

[1]: https://github.com/ROCm/ROCm/discussions/3893

1 more reply

j / k navigate · click thread line to collapse

0 comments

DrPhish7mo ago

testaburger7mo ago

Which specific model epcys? And if it's not too much to ask which motherboard and power supply? I'm really interested in building something similar

smartbit7mo ago

Looking at https://news.ycombinator.com/submitted?id=DrPhish it's probably this machine https://rentry.co/miqumaxx

  * Gigabyte MZ73-LM1 with two AMD EPYC GENOA 9334 QS 64c/128t
  * 24 sticks of M321R4GA3BB6-CQK 32GB DDR5-4800 RDIMM PC5-38400R
  * 24GB A5000

Note that the RAM price almost doubled since Jan 2024

fouc7mo ago

I've seen some mentions of pure-cpu setups being successful for large models using old epyc/xeon workstations off ebay with 40+ cpus. Interesting approach!

wkat4242OP7mo ago

Wow nice!! That's a really good deal for that much hardware.

How many tokens/s do you get for DeepSeek-R1?

DrPhish7mo ago

Thanks, it was a bit of a gamble at the time (lots of dodgy ebay parts), but it paid off.

R1 starts at about 10t/s on an empty context but quickly falls off. I'd say the majority of my tokens are generating around 6t/s.

Some of the other big MoE models can be quite a bit faster.

I'm mostly using QwenCoder 480b at Q8 these days for 9t/s average. I've found I get better real-world results out of it than K2, R1 or GLM4.5.

ekianjo7mo ago

thats a r/localllama user right there

SirMaster7mo ago

I'm getting 20 tokens/sec on the 120B model with a 5060Ti 16GB and a regular desktop Ryzen 7800x3d with 64GB of DDR5-6000.

wkat4242OP7mo ago

I wonder what makes it work so well on yours! My CPU isn't much slower and my GPU probably faster.

magicalhippo7mo ago

AMD basically decided they wanted to focus on HPC and data center customers rather than consumers, and so GPGPU driver support for consumer cards has been non-existing or terrible[1].

[1]: https://github.com/ROCm/ROCm/discussions/3893

1 more reply

j / k navigate · click thread line to collapse