undefined | Better HN

0 pointskkielhofner2y ago0 comments

Genuine questions. What are your use cases? What do you do? How much experience?

My personal experience shows CUDA to in fact be a very deep moat. In ~12 years CUDA and ~6 ROCm (since Vega) I’ve never met a professional who says otherwise, including those at top500.org AMD sites.

From what I’ve seen online this take really seems to come from some kind of Linux desktop Nvidia grudge/bad experience or just good ‘ol gaming/desktop team red vs green vs blue nonsense.

Many things can be said about Nvidia and all kinds of things can be debated but suggesting that Nvidia has > 90% market share simply and solely because people drink Nvidia kool-aid is a wild take.

0 comments

froonly2y ago

I have 40+ yrs of HPC/AI apps/performance engineering experience & I was one of the 1st people to port LAPACK and a number of other numerical libs to CUDA. Moreover, many of those major DoE + AI sites are my customers.

You should not confuse AMD's general & long-standing indifference/incompetence wrt SW with the actual difficulty of providing a portable SW path for acceleration. As Woody Allen once said: "90% of success is showing up"

But what happened in AI, when, in a very short period of time, almost everyone moved away from writing their directly in CUDA, to writing them in frameworks like Tensorflow & PyTorch is all the evidence anyone need to show just how unsound that SW obstacle is.

kkielhofnerOP2y ago

I'm working on a project ATM at one of the DoE sites you're likely referring to... Maybe we'll bump into each other!

Ah yes, pytorch:

1) Check issues, PRs, etc on torch Github. Considering market share ROCm has a multiple of the number of open and closed issues. There is still much work to be done for things as basic as overall stability.

2) torch is the bare minimum. Consider flash attention. On CUDA just runs of course with sliding window attention, ALiBi, and PagedAttention. ROCm fork? Nope. Then check out the xFormers situation on ROCm. Prepare to spend your time messing around with ROCm, spelunking GH issues/PRs/blogs, etc and going one by one through frameworks and libraries instead of `pip install` and actually doing your work.

3) Repeat for hundreds of libraries, frameworks, etc depending on your specific use case(s).

Then, once you have a model and need to serve it up for inference so your users can actually make use of it and you can get paid? With CUDA you can choose between torchserve, HF TEI/TGI, Nvidia Triton Inference Server, vLLM, and a number of others. vLLM has what I would call (at best) "early" support that requires patches to ROCm, isn't feature complete, and regularly has commits to fix yet another show-stopping bug/crash/performance regression/whatever.

Torch support is a good start but it's just that - a start.

teleforce2y ago

I almost spew my coffee when reading your grand parent comments.

One of the first teams that ported LAPACK to CUDA or CULA are apparently being paid handsomely by Nividia [1],[2].

Interestingly, DCompute is a little known effort to support compute on CUDA and OpenCL in D language, and it was done by a part-time undergrad student [3].

I strongly believe we need a very capable language to make advancement much easier in HPC/AI/etc, and D language fit the bill very much and then some. Heck it even beat other BLAS libraries that other so called data languages namely Matlab and Julia still heavily depended on for their performances to this very day. It does it in style back in 2016 more than seven years ago [4]. The DCompute implementation by the part-timer in 2017 actually depended on this native D implementation of these linear algebra routines in Mir [5].

[1] CULA: hybrid GPU accelerated linear algebra routines:

https://www.spiedigitallibrary.org/conference-proceedings-of...

[2] CUDA Spotlight: John Humphrey:

https://www.nvidia.com/content/cuda/spotlights/john-humphrey...

[3] DCompute: GPGPU with Native D for OpenCL and CUDA:

https://dlang.org/blog/2017/07/17/dcompute-gpgpu-with-native...

[4] Numeric age for D: Mir GLAS is faster than OpenBLAS and Eigen:

http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...

[5] DCompute: Native execution of D on GPUs and other Accelerators:

https://github.com/libmir/dcompute

froonly2y ago

I got paid to do the LAPACK port, back in the mid 2000s, for a federal contractor working on satellite imaging type apps. I was still a good coder, back then... Took me about a month, as I recall. Maybe 6 weeks.

But I'm one of those old-school HPC guys who believes that libraries are mostly irrelevant, and absolutely no substitute for compilers and targeted code generation.

Julia is cool, btw. It could very well end up supplanting Fortran, once they fix the poor performance code generation issues.

1 more reply

j / k navigate · click thread line to collapse

0 comments

froonly2y ago

kkielhofnerOP2y ago

I'm working on a project ATM at one of the DoE sites you're likely referring to... Maybe we'll bump into each other!

Ah yes, pytorch:

3) Repeat for hundreds of libraries, frameworks, etc depending on your specific use case(s).

Torch support is a good start but it's just that - a start.

teleforce2y ago

I almost spew my coffee when reading your grand parent comments.

One of the first teams that ported LAPACK to CUDA or CULA are apparently being paid handsomely by Nividia [1],[2].

Interestingly, DCompute is a little known effort to support compute on CUDA and OpenCL in D language, and it was done by a part-time undergrad student [3].

[1] CULA: hybrid GPU accelerated linear algebra routines:

https://www.spiedigitallibrary.org/conference-proceedings-of...

[2] CUDA Spotlight: John Humphrey:

https://www.nvidia.com/content/cuda/spotlights/john-humphrey...

[3] DCompute: GPGPU with Native D for OpenCL and CUDA:

https://dlang.org/blog/2017/07/17/dcompute-gpgpu-with-native...

[4] Numeric age for D: Mir GLAS is faster than OpenBLAS and Eigen:

http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...

[5] DCompute: Native execution of D on GPUs and other Accelerators:

https://github.com/libmir/dcompute

froonly2y ago

But I'm one of those old-school HPC guys who believes that libraries are mostly irrelevant, and absolutely no substitute for compilers and targeted code generation.

Julia is cool, btw. It could very well end up supplanting Fortran, once they fix the poor performance code generation issues.

1 more reply

j / k navigate · click thread line to collapse