undefined | Better HN

0 pointshedgehog2y ago0 comments

I'm curious how the newer consumer Ryzens might fare. With LPDDR5X they have >100 GB/s memory bandwidth and the GPUs have been improved quite a bit (16 TFLOPS FP16 nominal in the 780M). There are likely all kinds of software problems but setting that aside the perf/$ and perf/watt might be decent.

0 comments

cjbprime2y ago

Consumer Ryzens only have two-channel memory controllers. Two dual-rank (double sided) DIMMs per channel, which you would need to use to get enough RAM for LLMs, drops the memory bandwidth dramatically -- almost all the way back down to DDR4 speeds.

timschmidt2y ago

Yup. Strix Halo will change this, with a 256bit memory bus (4 channel) which CPU and GPU have access to. However it is only likely to be available in laptop designs and probably with soldered-down RAM to reduce timing and board space issues. So it won't be easy to get enough memory for large LLMs with either. But it should be faster than previous models for LLM work.

hedgehogOP2y ago

For consumer Ryzen to pencil out it would require a cluster of APU-equipped machines with the model striped across them. Given say 16GB of model per machine and 60GBps actual memory bandwidth @ $500 it's favorable vs A100s if the software is workable (which my guess is it's not today due to AMD's spotty support). This is for inference, training probably would be too slow due to interconnect overhead.

j / k navigate · click thread line to collapse

0 comments

cjbprime2y ago

timschmidt2y ago

hedgehogOP2y ago

j / k navigate · click thread line to collapse