undefined | Better HN

0 pointsdnautics2mo ago0 comments

I don't think this is correct. For inference, the bottleneck is memory bandwidth, so if you can hook up an FPGA with better memory, it has an outside shot at beating GPUs, at least in the short term.

I mean, I have worked with FPGAs that outperform H200s in Llama3-class models a while and a half ago.

0 comments

fooblaster2mo ago

Show me a single FPGA that can outperform a B200 at matrix multiplication (or even come close) at any usable precision.

B200 can do 10 peta ops at fp8, theoretically.

I do agree memory bandwidth is also a problem for most FPGA setups, but xilinx ships HBM with some skus and they are not competitive at inference as far as I know.

checker6592mo ago

Said GPUs spend half the time just waiting for memory.

fooblaster2mo ago

Yep, but they are still 50x faster than any fpga.

1 more reply

j / k navigate · click thread line to collapse

0 pointsdnautics2mo ago0 comments

I don't think this is correct. For inference, the bottleneck is memory bandwidth, so if you can hook up an FPGA with better memory, it has an outside shot at beating GPUs, at least in the short term.

I mean, I have worked with FPGAs that outperform H200s in Llama3-class models a while and a half ago.

0 comments

fooblaster2mo ago

Show me a single FPGA that can outperform a B200 at matrix multiplication (or even come close) at any usable precision.

B200 can do 10 peta ops at fp8, theoretically.

I do agree memory bandwidth is also a problem for most FPGA setups, but xilinx ships HBM with some skus and they are not competitive at inference as far as I know.

checker6592mo ago

Said GPUs spend half the time just waiting for memory.

fooblaster2mo ago

Yep, but they are still 50x faster than any fpga.

1 more reply

j / k navigate · click thread line to collapse