* Gigabyte MZ73-LM1 with two AMD EPYC GENOA 9334 QS 64c/128t
* 24 sticks of M321R4GA3BB6-CQK 32GB DDR5-4800 RDIMM PC5-38400R
* 24GB A5000
Note that the RAM price almost doubled since Jan 2024How many tokens/s do you get for DeepSeek-R1?
R1 starts at about 10t/s on an empty context but quickly falls off. I'd say the majority of my tokens are generating around 6t/s.
Some of the other big MoE models can be quite a bit faster.
I'm mostly using QwenCoder 480b at Q8 these days for 9t/s average. I've found I get better real-world results out of it than K2, R1 or GLM4.5.
I wonder what makes it work so well on yours! My CPU isn't much slower and my GPU probably faster.