undefined | Better HN

0 pointsKeplerBoy3mo ago0 comments

Yes, if you're doing what everyone else is doing you can just use tensor cores and libraries which optimize for that.

Contrarily if you're doing something that doesn't map that well to tensor cores you have a problem: every generation a larger portion of the die is devoted to low/mixed precision mma operations. Maybe FGPAs can find a niche that is underserved by current GPUs, but I doubt it. Writing a cuda/hip/kokkos kernel is just soo much cheaper and accessible than vhdl it's not even funny.

AMD needs to invest in that: Let me write a small FPGA kernel in line in a python script, compile it instantly and let me pipe numpy arrays into that (similar to cupy rawkernels). If that workflow works and let's me iterate fast, I could be convinced to get deeper into it.

0 comments

imtringued3mo ago

The primary niche of FPGAs is low latency, determinism and low power consumption. Basically what if you needed an MCU, or many MCUs but the ones in the market don't have enough processing power?

The Versal AI Edge line is very power efficient compared to trying to achieve the same number of FLOPs using a Ryzen based CPU.

j / k navigate · click thread line to collapse

0 pointsKeplerBoy3mo ago0 comments

Yes, if you're doing what everyone else is doing you can just use tensor cores and libraries which optimize for that.

0 comments

imtringued3mo ago

The primary niche of FPGAs is low latency, determinism and low power consumption. Basically what if you needed an MCU, or many MCUs but the ones in the market don't have enough processing power?

The Versal AI Edge line is very power efficient compared to trying to achieve the same number of FLOPs using a Ryzen based CPU.

j / k navigate · click thread line to collapse