undefined | Better HN

0 pointsomneity2y ago0 comments

I believe the fp64 limitation came from the laptop-grade GPU I had rather than inherent to AMD or ROCm.

The API level I could target was at least two or three versions behind the latest they have to offer.

0 comments

Might very well be true. I don't blame anyone for not diving deeper into figuring out why this stuff doesn't work.

But this is one of the great strengths of CUDA: I can develop a kernel on my workstation, my boss can demo it on his laptop and we can deploy it on Jetsons or the multi-gpu cluster with minimal changes and i can be sure that everything runs everywhere.

brutus12132y ago

There is indeed something excellent about CUDA from a user perspective that is hard to beat. I do high-level DNN and it is not clear to me what it is or why that is. Anytime I have worked on optimizing to mobile hardware (not Jetson, but actual phones or accelerators), it is just a world of hurt and incompatibilities. This notion that operators or subgraphs can be accelerated by lower level closed blobs .. I wonder if that is part of the issue. But then why doesn't OpenCL not just work? I thought it gave a CUDA kernel like general purpose abstraction.

I just don't understand the details enough to understand why things are problematic without CUDA :(

iopq2y ago

Sorry, still trying to install some dependencies for DNN and CUDA, not sure why it says my Clang version is too new (!)

j / k navigate · click thread line to collapse