AMD may get across the CUDA moat (opens in new tab)

(hpcwire.com)

551 pointsdanzheng2y ago302 comments

302 comments

I was able to use ROCm recently with Pytorch and after pulling some hair it worked quite well. The Radeon GPU I had on hand was a bit old and underpowered (RDNA2) and it only supported matmul on fp64, but for the job I needed done I saw a 200x increase in it/s over CPU despite the need to cast everywhere, and that made me super happy.

Best of all is that I simply set the device to `torch.device('cuda')` rather than openCL, which does wonders for compatibility and to keep code simple.

Protip: Use the official ROCM Pytorch base docker image [0]. The AMD setup is so finicky and dependent on specific versions of sdk/drivers/libraries and it will be much harder to make work if you try to install them separately.

[0]: https://rocm.docs.amd.com/en/latest/how_to/pytorch_install/p...

mikepurvis2y ago

Sigh. It's great that these container images exist to give people an easy on-ramp, but they definitely don't work for every use case (especially once you're in embedded where space matters and you might not be online to pull multi-gb updates from some registry).

So it's important that vendors don't feel let off the hook to provide sane packaging just because there's an option to use a kitchen-sink container image they rebuild every day from source.

xahrepap2y ago

I know it's still different than what you're looking for, so you probably already know this, but many projects like this have the Dockerfile on github which shows exactly how they set up the image. For example:

https://github.com/RadeonOpenCompute/ROCm-docker/blob/master...

They also have some for Fedora. Looks like for this you need to install their repo:

    curl -sL https://repo.radeon.com/rocm/rocm.gpg.key | apt-key add - \
    && printf "deb [arch=amd64] https://repo.radeon.com/rocm/apt/$ROCM_VERSION/ jammy main" | tee /etc/apt/sources.list.d/rocm.list \
    && printf "deb [arch=amd64] https://repo.radeon.com/amdgpu/$AMDGPU_VERSION/ubuntu jammy main" | tee /etc/apt/sources.list.d/amdgpu.list \

then install Python, a couple other dependencies (build-essential, etc) and then the package in question: rocm-dev

So they are doing the packaging. There might even be documentation elsewhere for that type of setup.

mikepurvis2y ago

Oh yeah, I mean... having the source for the container build is kind of table stakes at this point. No one would accept a 10gb mystery meat blob as the basis of their production system. It's bad enough that we still accept binary-only drivers and proprietary libraries like TensorRT.

I think my issue is more just with the mindset that it's okay to have one narrow slice of supported versions of everything that are "known to work together" and those are what's in the container and anything outside of those and you're immediately pooched.

This is not hypothetical btw, I've run into real problems around it with libraries like gproto, where tensorflow's bazel build pulls in an exact version that's different from the default one in nixpkgs, and now you get symbol conflicts when something tries to link to the tensorflow c++ API while linking to another component already using the default gproto. I know these problems are solveable with symbol visibility control and whatever, but that stuff is far from universal and hard to get right, especially if the person setting up the build rules for the library doesn't themselves use it in that type of heterogeneous environment (like, everyone at Google just links the same global proto version from the monorepo so it doesn't matter).

5 more replies

fwsgonzo2y ago

I feel the same way, especially about build systems. OpenSSL and v8 are among a large list of things that have horrid build systems. Only way to build them sanely is to use some randos CMake fork, then it Just Works. Literally a two-liner in your build system to add them to your project with a sane CMake script.

mikepurvis2y ago

I was part of a Nix migration over the past two years, and literally one of the first things we checked is that there was already a community-maintained tensorflow+gpu package in nixpkgs because without that the whole thing would have been a complete non-starter, and we sure as heck didn't have the resources or know-how to figure it out for ourselves as a small DevOps team just trying to do basic packaging.

amelius2y ago

> So it's important that vendors don't feel let off the hook to provide sane packaging just because there's an option to use a kitchen-sink container image they rebuild every day.

Sadly if e.g. 95% of their users can use the container, then it could make economical sense to do it that way.

mathisfun1232y ago

> especially once you're in embedded

is this a real problem? exactly which embedded platform has a device that ROCm supports?

mikepurvis2y ago

Robotic perception is the one relevant to me. You want to do object recognition on an industrial x86 or Jetson-type machine, without having to use Ubuntu or whatever the one "blessed" underlay system is (either natively or implicitly because you pulled a container based on it).

1 more reply

ngcc_hk2y ago

Better to come if the tide shift so we can have compatible layer. The key is the tide. Obviously would n try to sue … it would be a sign that finally we have real competition. Gar is where innovation do.

X86 cannot do 64 bit let us do this and that so the market can use only our cpu. Repeat with me x86-64 is impossible.

Not sure Apple is in this otherwise the real great competition come.

wyldfire2y ago

> Best of all is that I simply set the device to `torch.device('cuda')` rather than openCL, which does wonders for compatibility

Man oh man where did we go wrong that cuda is the more compatible option over OpenCL?

KeplerBoy2y ago

It must be a misnomer on PyTorch's side. Clearly it's neither CUDA nor OpenCL.

AMD should just get it's shit together. This is ridiculous. Not the name, but the fact that you can only do FP64 on a GPU. Everybody is moving to FP16 and AMD is stuck on doubles?

omneity2y ago

I believe the fp64 limitation came from the laptop-grade GPU I had rather than inherent to AMD or ROCm.

The API level I could target was at least two or three versions behind the latest they have to offer.

1 more reply

JonChesterfield2y ago

FP64 is what HPC is built on. F32 works on the cards too (same rate or faster). I don't know the status of F16 or F8.

Some architectures provide fast F16->F32 and F32->F16 conversion instructions so you can DIY the memory bandwidth saving - that always seemed reasonable to me, but I don't know if the AMD hardware people are/will go down that path.

2 more replies

NavinF2y ago

This has always been the case. OpenCL is a shit show

RockRobotRock2y ago

Have you gotten it to work with Whisper by any chance?

kkielhofner2y ago

Whisper is actually a great example of why Nvidia has such a stronghold on ML/AI and why it’s so difficult to compete.

There’s getting something to “work”, which is often enough of a challenge with ROCm. Then there’s getting it to work well (next challenge).

Then there’s getting it to work as well as Nvidia/CUDA.

With Whisper, as one example, you should be running it with ctranslate2[0]. Of all the platforms on their supported list you won’t find ROCm.

When you really start to look around you’ll find that ROCm is (at best) still very much in the “get it to work (sometimes)” stage. In most cases it’s still a long way away from getting it to work well, and even further away from making it actually competitive with Nvidia for serious use cases and applications.

People get excited about the progress ROCm has made getting basic things to work with PyTorch and this is good - progress is progress. But saving 20% on the hardware when the equivalent Nvidia product is often somewhere between 5-10x as performant (at a fraction of the development time) because of vastly superior software support you realize pretty quickly Nvidia is actually a bargain compared to AMD.

I’m desperately rooting for Nvidia to have some actual competition but after six years of ROCm and my own repeated failed attempts to have it make any sense overall I’m only more and more skeptical that real competition in the space will come from AMD.

[0] - https://github.com/OpenNMT/CTranslate2

errnoh2y ago

While I agree that it's much more effort to get things working on AMD cards than it is with Nvidia, I was a bit surprised to see this comment mention Whisper being an example of "5-10x as performant".

https://www.tomshardware.com/news/whisper-audio-transcriptio... is a good example of Nvidia having no excuses being double the price when it comes to Whisper inference, with 7900XTX being directly comparable with 4080, albeit with higher power draw. To be fair it's not using ROCm but Direct3D 11, but for performance/price arguments sake that detail is not relevant.

EDIT: Also using CTranslate2 as an example is not great as it's actually a good showcase why ROCm is so far behind CUDA: It's all about adapting the tech and getting the popular libraries to support it. Things usually get implemented in CUDA first and then would need additional effort to add ROCm support that projects with low amount of (possibly hobbyist) maintainers might not have available. There's even an issue in CTranslate2 where they clearly state no-one is working to get ROCm supported in the library. ( https://github.com/OpenNMT/CTranslate2/issues/1072#issuecomm... )

1 more reply

pedrovhb2y ago

I've had luck with an RX5700XT and whisper.cpp built with clblast. Works like a charm, not entirely a scarring experience getting it to work (easier than most other stuff which was surprising to me).

One arcane detail is that whereas for PyTorch I have to set the env var HSA_OVERRIDE_GFX_VERSION to 10.3.0, getting it to run with whisper.cpp and llama.cpp requires setting it to 10.1.0. Good luck and may it cost you less hair than it did me.

incognition2y ago

Fp64??

latchkey2y ago

https://en.wikipedia.org/wiki/Double-precision_floating-poin...

NVIDIA fp32 (H100) has 2x more TFLOPS than AMD's fp32 (MI250) and AI doesn't need fp64 precision.

incognition2y ago

Lol it was meant as I wouldn't be caught dead using fp64

fransje262y ago

Hardware limitation.

javchz2y ago

CUDA is the only reason I have an Nvidia card, but if more projects start migrating to a more agnostic environment, I'll be really grateful.

Running Nvidia in Linux isn't as much fun. Fedora and Debian can be incredibly reliable systems, but when you add an Nvidia card, I feel like I am back in Windows Vista with kernel crashes from time to time.

distract89012y ago

My Arch system would occasionally boot to a black screen. When this happened, no amount of tinkering could get it back. I had to reinstall the whole OS.

Turns out it was a conflict between nvidia drivers and my (10 year old) Intel integrated GPU. But once I switched to an AMD card, everything works flawlessly.

Ubuntu based systems barely worked at all. Incredibly unstable and would occasionally corrupt the output and barf colors and fragments of the desktop all over my screens.

AMD on arch has been an absolute delight. It just. Works. It's more stable than nvidia on windows.

For a lot of reasons-- but mainly Linux drivers-- I've totally sworn off nvidia cards. AMD just works better for me.

aftbit2y ago

As a counter-argument, I ran Arch Linux + nvidia GPUs + Intel CPUs between 2012 and 2020, and still run Arch + nvidia (now with AMD CPU) to this day. I won't say it has been bug free at all, but it generally works pretty well. If you find a problem in Arch that you cannot fix without reinstalling, you do not sufficiently understand the problem or Arch itself. "Installing" Arch is refreshingly manual and "simple" compared to the magic that is other Linux distros or the closed source OSes.

iopq2y ago

I tried using an Nvidia card with OBS to record my screen and it kind of freezes in Wine. I switched from x11 to Wayland and now Wine shows horizontal lines (!) and performs like crap.

Even my 4GB RX 570 from years ago gives a better experience doing this. You just install OBS from flathub, Wayland works, everything works without any setup or tinkering. You click record and you can record your gameplay footage.

1 more reply

distract89012y ago

I'm sure that I could have fixed it, but I gave up after spending multiple evenings on it. Have you ever spent hours debugging a system exclusively in text mode? It isn't fun. Reinstalling the OS takes less than 30 minutes. It's a clear choice for me

1 more reply

wildzzz2y ago

I ran a laptop with the swappable dedicated Nvidia and integrated Intel GPU for a decade with no issues. Used to use something called Bumblebee to swap between them depending on workload, actually worked surprisingly well given the circumstances. Eventually I just dropped back to integrated only when I stopped doing anything intensive with the machine.

MegaDeKay2y ago

I run Arch as well and AMD is only "good". I would have a problem every now and then where my RX560 would lose its mind coming out of sleep and I'd have to reboot.

But the other problem that really bugs me is the "AMD reset bug" that you trip over with most AMD GPUs. This is when you pass through a second GPU through to another OS running under KVM, and is what lets you run Linux and (say) Windows simultaneously with full GPU hardware acceleration on the guest. The reset bug means the GPU will hang upon shutdown of the guest and only a reboot will let you recover the card. This is a silicon level bug that has existed for many years across many generations of cards and AMD can't be arsed to fix it. Projects like "vendor-reset" help for some cards, but gnif2 has basically given up (he mentioned he even personally raised the issue with Lisa Su). Even AMDs latest cards like the 7800 XT are affected. NVidia works flawlessly here.

__rito__2y ago

I have used Pop OS and Ubuntu with NVIDIA card, and honestly, I never faced any serious problem.

After every kernel upgrade, I just have to reinstall the nvidia drivers and the cuda toolkit.

Everything works as before after I do that. I don't face any problems at all.

hskalin2y ago

I'm not sure what card you have but I've never really had any major problems running Nvidia + Intel integrated graphics on Arch, Ubuntu etc.

nextaccountic2y ago

> CUDA is the only reason I have an Nvidia card, but if more projects start migrating to a more agnostic environment, I'll be really grateful.

What AMD really needs is to have 100% feature parity with CUDA without changing a single line of code. Maybe for this to happen it needs to add hardware features or something (I see people saying that CUDA as an API is very tailored to the capabilities of nvidia GPUs), I don't know.

If AMD relies on people changing their code to make it portable, it already lost.

JonChesterfield2y ago

The idea was supposed to be people convert cuda to hip, which is a pretty similar language, either by hand or by running a tool called 'hipify' that comes with rocm. You can then compile that unmodified for amdgpu or for nvptx.

I think where that idea goes wrong is in order to compile it unmodified for nvptx, you need to use a toolchain which knows hip and nvptx, which the cuda toolchain does not. Clang can mostly compile cuda successfully but it's far less polished than the cuda toolchain. ROCm probably has the nvptx backend disabled, and even if it's built in, best case it'll work as well as clang upstream does.

What I'm told does work is keeping all the source as cuda and using hipify as part of a build process when using amdgpu - something like `cat foo.cu | hipify | clang -x hip -` - though I can't personally vouch for that working.

The original idea was people would write in opencl instead of cuda but that really didn't work out.

pjmlp2y ago

Both ideas are already lost before starting, Hip isn't polyglot as CUDA, and OpenCL is mostly stuck in C.

mrweasel2y ago

> I see people saying that CUDA as an API is very tailored to the capabilities of nvidia GPUs

I'm wondering how true that is, because that could give NVidia issues in the future if they need to redesign their GPU should they hit some limit with the current designs. Dependence on certain instruction makes sense, but there's not technical preventing AMD from implementing those instructions, only legal mumbo jumbo.

javchz2y ago

I think that could work too. I wonder if they could do a translation layer, something like Apple with the M1 chips that translates JIT x86 to ARM.

JonChesterfield2y ago

That's a fun idea. Qemu parses a binary into something very like a compiler IR, optimises it a bit, then writes it out as a binary for the same or another target in JIT like fashion. So that sort of thing can be built. Apple's rosetta is functionally similar, I expect it does the same sort of thing under the hood. Valgrind is another from the same architecture.

It would be a painful reverse engineering process - the cuda file format is sort of like elf, but with undocumented bonus constraints, and you'd have to reverse the instruction encoding to get sass, which isn't documented, or try to take it directly to ptx which is somewhat documented, and then convert that onward.

It would be far more difficult than compiling cuda source directly. I'm not sure anyone would pay for a cuda->amdgpu conversion tool, and it's hard to imagine AMD making one as part of ROCm.

mschuetz2y ago

Not just feature parity, but proper UX. Things need to just work, without spending hours or days to make them work.

weebull2y ago

Blame Nvidia. They are the ones the got the industry hooked on a proprietry API.

1 more reply

PH95VuimJjqBqy2y ago

I see these complains from time to time and I never understand them.

I've literally been running nvidia on linux since the TNT2 days and have _never_ had this sort of issue. That's across many drivers and many cards over the many many years.

LtWorf2y ago

I've had kernel panics that disappeared when I started using the on board intel graphics instead of the nvidia.

Your statement makes no sense. It's like a smoker claiming that since he didn't die of lung cancer, smoke is 100% safe.

kkielhofner2y ago

Describing kernel panics and general nightmare scenarios as the general course with Nvidia doesn’t make sense either.

Nvidia has 80% market share of the discrete GPU desktop market and at least 90% market share of cloud/datacenter.

Nvidia GPUs are used almost exclusively for every cloud powered AI service and to train virtually every ML model in existence. Almost always on Linux.

Do you really think any of this would be possible if what you are describing was anything approaching the typical experience starting at the /driver/ level?

Nvidia would have never achieved their market dominance nor held on to it this long if the issues you’ve experienced impacted anything approaching a statistically significant number of users or applications.

Nvidia gets a lot of hate on HN and elsewhere (much of it fair) but I will never understand the people who claim it doesn’t work and get the job done (often very well).

2 more replies

jjoonathan2y ago

Same but linux experience is a steep and bumpy function of hardware.

My guess: something like laptop GPU switching failed badly in the nvidia binary, earning it a reputation.

HideousKojima2y ago

That was my experience, Nvidia Optimus (which is what allows dynamic switching between the integrated and dedicated GPU in laptops) was completely broken (as in a black screen, not just crashes or other issues) for several years, and Nvidia didn't care to do anything about it.

2 more replies

temp08262y ago

I understand it, but I also haven't had any trouble since I figured out the right procedure for me on fedora (which probably took some time, but it's been so long that I can't remember). Whenever I read people having issues it sounds like they are using a package installed via dnf for the driver/etc. I've always had issues with dkms and the like and just install the latest .run from nvidia's website whenever I have a kernel update (I made a one-line script to call it with the silent option and flags for signing for secure boot so I don't really think about it). No issues in a very long time even with the whackiness of prime/optimus offloading on my old laptop.

PH95VuimJjqBqy2y ago

actually, it's a good point because that's how I always install nvidia drivers as well. Never from the local package manager.

bootsmann2y ago

So you don‘t recommend going the rpm-fusion route?

ant6n2y ago

Well tnt2 should be pretty well supported by now ;-)

PH95VuimJjqBqy2y ago

lmao, touche :)

einpoklum2y ago

I have been NVIDIA cards for compute capabilities only, both personally and at work, for nearly a decade. I've had dozens and dozens of different issues involving the hardware, the drivers, integration with the rest of the OS, version compatibilities, ensuring my desktop environment doesn't try to use the NVIDIA cards, etc. etc.

Having said that - I (or rarely, other people) have almost always managed to work out those issues and get my systems to work. Not in all cases though.

kombine2y ago

I use a rolling distro (OpenSUSE Tumbleweed) and have had zero issues with my NVIDIA card despite it pulling the kernel and driver updates as they get released. The driver repo is maintained by NVIDIA itself, which is amazing.

filterfiber2y ago

Do you use wayland, multiple monitors, and/or play games or is it just for ML/AI?

smoldesu2y ago

I do all of those things with my 3070 and it works just fine. Most of them will depend on your DE's Wayland implementation.

I'm not here to desparage anyone experiencing issues, but my experience on the NixOS rolling-release channel has also been pretty boring. There was a time when my old 1050 Ti struggled, but the modern upstream drivers feel just as smooth as my Intel system does.

chaostheory2y ago

Yeah with my CUDA setup, it feels like I just ducktaped my deployment. I am very hesitant to make changes and it’s not easy to replicate

gymbeaux2y ago

I often have issues booting to the installer or first boot after install with an NVidia GPU.

Pop_OS, Fedora and OpenSUSE work out of the box. Those are all Wayland I believe. Debian/Ubuntu distros are a bad time. I think they’re still X11. It’s ironic because X11 is supposed to be the more stable window manager.

Flameancer2y ago

I think they moved to Wayland on 23.04 or 23.10. I just recently installed both to try and get a 7800xt working with PyTorch and the default was Wayland.

anthk2y ago

X11 is not a window manager.

gymbeaux2y ago

Xorg

1 more reply

smoldesu2y ago

Those problems might just be GNOME-related at this point. I've been daily-driving two different Nvidia cards for ~3 years now (1050 Ti then 3070 Ti) and Wayland has felt pretty stable for the past 12 months. The worst problem I had experienced in that time was Electron and Java apps drawing incorrectly in xWayland, but both of those are fixed upstream.

I'm definitely not against better hardware support for AI, but I think your problems are more GNOME's fault than Nvidia's. KDE's Wayland session is almost flawless on Nvidia nowadays.

arsome2y ago

If GNOME can tank the kernel, it ain't GNOME's fault.

kombine2y ago

I really hope that with KDE 6 I can finally switch to Wayland!

Zardoz842y ago

I'm using KDE on Debian 12 with AMD GPU with Wayland, and works. it keeps being a bit annoying compared with X11 with a few programs (Eclipse, Dbeaver... I need to launch both with flags to not use Wayland backend). But even I can play AAA games without problems

orangetuba2y ago

Nvidia on Linux is more like running Windows 95 from the gulag, and you're covered in ticks. I absolutely detest Nvidia because of the Linux hell they've created.

codemk82y ago

From my recent experience, ROCm and hipcc allow you to port cuda programs fairly easily to AMD arch. The compiling is so much faster than nvcc too.

wubrr2y ago

Yeah, nvidia linux support is meh, but still much better than amd.

phkahler2y ago

>> Yeah, nvidia linux support is meh, but still much better than amd.

Can not confirm. I used nvidia for years when it was the only option. Then used the nouveau driver on a well supported card because it worked well and eliminated hassle. Now I'm on AMD APU and it just works out of the box. YMMV of course. We do get reports of issues with AMD on specific driver versions, but I can't reproduce.

Zambyte2y ago

Is it better than AMD? I have had literally no graphics issues on my 6650 XT with swaywm using the built in kernel drivers.

aseipp2y ago

This week I upgraded my kernel on a 2017 workstation to 6.5.5 and when I rebooted and looked at 'dmesg' there were no less than 7 kernel faults with stack traces in my 'dmesg' from amdgpu. Just from booting up. This is a no-graphical-desktop system using a Radeon Pro W5500, which is 3.5 years old (I just had the card and needed something to plug in for it to POST.)

I have come to accept that graphics card drivers and hardware stability ultimately comes down to whether or not ghosts have decided to haunt you.

christkv2y ago

I think the problems are pro drivers and the issues with ROCm being buggy not the open source graphics drivers.

HansHamster2y ago

Guess I'm also doing something wrong. Never had any serious issues with either Nvidia or AMD on Linux (and only a few annoyances on RNDA2 shortly after release)...

treprinum2y ago

I never had an issue with nVidia drivers on Linux in the past 5 years, but recently bought a laptop with a 4090 and AMD CPU. Now I get random freezes, often right after I login into Cinnamon but can't really tell if it's the nVidia driver for 4090, AMDGPU driver for integrated RDNA, kernel 6.2 or Cinnamon issue. The laptop just hangs and stops responding to keyboard so I can't login to console and dmesg it.

1 more reply

bryanlarsen2y ago

Not my experience. The open source AMD drivers are much more pleasant to deal with than the closed source Nvidia ones.

silisili2y ago

In the closed source days of fglrx or whatever it's called I'd agree. Since they went open source, hard disagree. AMD graphics work in Linux about as well as Intel always has.

acomjean2y ago

As someone who was tasked with trying to get nvidia working on Ubuntu, it’s a pretty terrible experience.

I have a nvidia laptop with popos. That works well.

IronWolve2y ago

Yup, thank the hobbyists. Pytorch is allowing other hardware. Stable diffusion working on m chips, intel arc, and Amd.

Now what I'd like to see is real benchmarks for compute power. Might even get a few startups to compete in this new area.

mandevil2y ago

It isn't the hobbyists who are making sure that PyTorch and other frameworks runs well on these chips, but teams of engineers who work for NVIDIA, AMD, Intel, etc. who are doing this as their primary assigned jobs, in exchange for money from their employer, who are paying those salaries because they want to sell chips into the enormous demand for running PyTorch faster.

Hobbyist and open-source are definitely not synonyms.

janalsncm2y ago

Special mention to Facebook and Google AI research teams that maintain PyTorch and Tensorflow respectively. And also to ptrblck on the PyTorch forums [1] who has the answer to basically every question it seems. He alone is probably responsible for hundreds of millions of dollars of productivity gain.

[1] https://discuss.pytorch.org/u/ptrblck/summary

Eisenstein2y ago

People don't usually get employed to make things with no demand, and people who work for companies with a budget line don't really care how much the nVidia tax is. You can thank hobbyists for creating a lot of demand for compatability with other cards.

kiratp2y ago

There are so many billions of dollar being spent on this hardware that everyone other than Nvidia is doing everything they can to make competition happen.

Eg: https://www.intel.com/content/www/us/en/developer/videos/opt...

https://www.intel.com/content/www/us/en/developer/tools/onea...

https://developer.apple.com/metal/tensorflow-plugin/

Large scale opensource is, outside of a few exceptions, built by engineers paid to build it.

johngossman2y ago

I can only point you to cloud financial results and the huge cost of the AI race. Note also the story recently about OpenAI looking at building their own chips. Companies absolutely care immensely about the cost of GPUs. It's billions of dollars.

roenxi2y ago

There is huge demand for AMD cards that can efficiently multiply matrices together. The issue is that while there are currently isolated cases where people can make them do that, it doesn't seem to be possible at the scale that it needs to happen at.

AMD are being dragged along by the market. Willingly, they aren't fighting it, but their focus has been on other areas.

3 more replies

mattnewton2y ago

Re: startups, Geohotz raised a few million for this already. https://tinygrad.org/

IntelMiner2y ago

Didn't he do what he always does. Rake in a ton of money, fart around and then cash out exclaiming it's everyone else's fault?

The way he stole Fail0verflow's work with the PS3 security leak after failing to find a hypervisor exploit for months absolutely soured any respect I had for him at the time

throwitawayfam2y ago

Yep, did exactly that. IMO he threw a fit, even though AMD was working with him squashing bugs. https://github.com/RadeonOpenCompute/ROCm/issues/2198#issuec...

4 more replies

ShamelessC2y ago

> The way he stole Fail0verflow's work with the PS3 security leak after failing to find a hypervisor exploit for months absolutely soured any respect I had for him at the time

That sounds interesting. I tried googling about it but can't really find much other than that failoverflow found a key and didn't release it, and then geohot released his own subsequently. I'd love to hear more about how directly he "stole" the work from the Fail0verflow team.

edit: Reading some sibling comments here, it seems you are either mistaken and/or were exaggerating your claim about the "theft" here. As far as I can tell, he simply took their findings and made his own version of an exploit that they had detailed publicly. That may be in poor taste in this particular community but it's certainly not theft. I do agree that his behavior there was lacking in decency, but not to the degree implied here where I was thinking he _literally_ stole their exploit by hacking them, or something similar to that.

1 more reply

kinematikk2y ago

Do you have a source on the stealing part? A quick Google search didn't result in anything

1 more reply

georgehotz2y ago

Are you interested in being factually correct, or are you interested in hating? If it's the former, I think you should do some research. If it's the latter, :salute:

I had GPT-4 do some research for you, hopefully you will incorporate it in future comments you make about me. https://chat.openai.com/share/d0fa24e9-3ed7-4b17-8497-24bfdd...

adastra222y ago

Wow, TIL

nomel2y ago

Obligatory Lex Fridman podcast, where he discusses it: https://youtu.be/dNrTrx42DGQ?t=2408

jauntywundrkind2y ago

Pytorch is just using Google's OpenXLA now, & OpenXLA is the actual cross platform thing, no? I'm not very well versed in this area, so pardon if mistaken. https://pytorch.org/blog/pytorch-2.0-xla-path-forward/

mathisfun1232y ago

> Pytorch is just using Google's OpenXLA now

this is so far from accurate it should be considered libelous; from the link

> PyTorch/XLA is set to migrate to the open source OpenXLA

so PyTorch on the XLA backend is set to migrate to use OpenXLA instead of XLA. but basically everyone moved from XLA to OpenXLA because there is no more OSS XLA. so that's it. in general, PyTorch has several backends, including plenty of homegrown CUDA and CPU kernels. in fact the majority of your PyTorch code runs through PyTorch's own kernels.

fotcorn2y ago

You can use OpenXLA, but it's not the default. The main use-case for OpenXLA is running PyTorch on Google TPUs. OpenXLA also supports GPUs, but I am not sure how many people use that. Afaik JAX uses OpenXLA as backend to run on GPUs.

If you use model.compile() in PyTorch, you use TorchInductor and OpenAIs Triton by default.

jauntywundrkind2y ago

Thank you for saying something useful here. I was vaguely under the impression that pytorch 2.0 had fully flipped to defaulting to openxla. That seems to not be the case.

Good to hear more than a cheap snub. OpenAI Triton as the reason other GPUs work is a real non-shit answer, it seems. And interesting to hear JAX too. Thank you for being robustly useful & informative.

voz_2y ago

Wrong.

withwarmup2y ago

CUDA is the result of years of NVIDIA supporting the ecosystem, some people likes to complain because they bought hardware that was cheaper but can't use it for what they want to use it, when you buy NVIDIA, you aren't buying only the hardware, but the insane amount of work they have put into the ecosystem, the same goes for Intel, mkl and scikit-learn intelex aren't free to develop.

AMD has the hardware but the support for HPC is non-existent outside of the joke that is bliss and AOCL.

I really wish for more competitors to enter the market in HPC, but AMD has a shitload of work to do.

arcanus2y ago

> AMD has the hardware but the support for HPC is non-existent outside of the joke that is bliss and AOCL.

You are probably two years behind the state of the art. The world's largest supercomputer, OLCF's Frontier, runs AMD CPUs and GPUs. It's emphatically using ROCm, not just BLIS and AOCL. See for example: https://docs.olcf.ornl.gov/systems/frontier_user_guide.html

That's hardly non-existent support for HPC.

65a2y ago

Agreed...the main gap is support on consumer and workstation cards, which is where nVidia made headway, but that is starting erode super recently. ROCm works pretty well for me, I have had a lot more problems with specific packagers than the ROCm layer.

aiunboxed2y ago

Exactly, with NVIDIAs core focus on AI way before it was cool has lead to them being in this advantageous position. For AMD just being a price friendly competitor to Intel and Nvidia was the motto.

runiq2y ago

Yeah, that's a pretty shortsighted take of things. Do you really believe that Nvidia hasn't taken steps do make sure their moat is as wide as possible?

Blammar2y ago

The thing about owning the CUDA spec is that Nvidia can add new features quickly without having to argue with other hardware vendors. I find that a positive thing overall.

Also, I choose to pay the ~$120 Windows tax once (per box), everything works very well, and I don't have the driver issues that some fraction of other users seem to have with Linux and Nvidia cards. Seems like a good use of my time.

anon2912y ago

Literally never had an issue with Nvidia and Linux in decades. Despite this, my windows installs have all sorts of issues.. as always

pama2y ago

There is only limited empirical evidence of AMD closing the gap that NVidia has created in the science or ML software. Even when considering pytorch only, the engineering effort to maintain specialized ROCm along with CUDA solutions is not trivial (think flashattention, or any customization that optimizes your own model). If your GPUs only need a simple ML workflow all times for a few years nonstop, maybe there exist corner cases where the finances make sense. It is hard for AMD now to close the gap across the scientific/industrial software base of CUDA. NVidia feels like a software company for the hardware they produce; luckily they make the money from hardware thus cannot lock the software libraries.

(Edited “no” to limited empirical evidence after a fellow user mentioned El Capitan.)

fotcorn2y ago

ROCm has HIP (1) which is a compatibility layer to run CUDA code on AMD GPUs. In theory, you only have to adjust #includes, and everything should just work, but as usual, reality is different.

Newer backends for AI frameworks like OpenXLA and OpenAI Triton directly generate GPU native code using MLIR and LLVM, they do not use CUDA apart from some glue code to actually load the code onto the GPU and get the data there. Both already support ROCm, but from what I've read the support is not as mature yet compared to NVIDIA.

1: https://github.com/ROCm-Developer-Tools/HIP

Certhas2y ago

The fact that El Capitan is AMD says that at least for Science/HPC there definitely is evidence of a closing gap.

pama2y ago

Thanks. You are actually right that this new supercomputer might move the needle once it is in production mode. I will wait and see how it goes.

falconroar2y ago

I don't understand why developers of PyTorch and similar don't use OpenCL. Open standard, runs everywhere, similar performance - what's the downside??

pama2y ago

I don’t know for sure why the early pytorch team picked it, but my guess is due to simplicity and performance. NVidia optimizes CUDA better that OpenCL and provides tons of useful performance tuning tools. It is hard to match the CUDA performance with OpenCL even on the same NVidia GPU hardware, and making performant code compatible across different GPU with OpenCL is also hard. I know examples of scientific codes that became simpler and faster (on nvidia hardware) by going from openCL to CUDA but haven’t yet heard of examples the other way around.

Roark662y ago

I think the article claiming "PyTorch has dropped the drawbridge on the CUDA moat" is way over optimistic. Jest pytorch is widely used by researchers and by users to quickly iterate various over various ways to use the models, but when it comes to inference there are huge gains to be had by going a different route. Llama.cpp has showed 10x speedups on my hardware for example (32gb of gpu ram + 32gb of cpu ram)for models like falcon-40b-instruct, for much smaller models on the cpu (under 10b) I saw up to 3x speedup just by switching to onnc and openvino.

Apple has showed us in practice the benefits of CPU/GPU memory sharing, will AMD be able to follow in their footsteps? The article claims AMD has a design with up to 192gb of shared ram. Apple is already shipping a design with the same amount of RAM(if you can afford it). I wish them-and) success, but I believe they need to aim higher than just matching apple in some unspecified future.

bigcat123456782y ago

Cuda is the foundation

NVIDIA moat is the years of work built by oss community, big corporations, research insistute

They spend all time building for cuda, a lot of implicit designs are derived from cuda's characteristic

That will be the main challenge

mikepurvis2y ago

It depends on the domain. Increasingly people's interfaces to this stuff are the higher level libraries like tensorflow, pytorch, numpy/cupy, and to a lesser degree accelerated processing libraries such as opencv, PCL, suitesparse, ceres-solver, and friends.

If you can add hardware support to a major library and improve on the packaging and deployment front while also undercutting on price, that's the moat gone overnight. CUDA itself only matters in terms of lock-in if you're calling CUDA's own functions.

bigcat123456782y ago

what I meant is that all these stuff have 15 years of implicit accumulation of knowledge and tips and even hacks builtin in the software

No matter what you depends on, you'll have a slew of larger or minor obstacles or annoyance

That collectively is the most itself

As you said, already it's clear that replacing cuda itself is not that daunting

pixelesque2y ago

Does AMD have a solution to forward device combatibility (like PTX for NVidia)?

Last time I looked into ROCm (two years ago?), you seemed to have to compile stuff explicitly for the architecture you were using, so if a new card came out, you couldn't use it without a recompile.

mnau2y ago

Not natively, but AdaptiveCpp (previously hiSycl, then OpenSycl) has a single source single compiler pass, where they basically store LLVM IR as an intermediate representation.

https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/...

Performance penalty was within ew precents, at least according to the paper (figure 9 and 10) https://cdrdv2-public.intel.com/786536/Heidelberg_IWOCL__SYC...

einpoklum2y ago

I don't know what they do with ROCm, but with OpenCL, the answer is: Certainly. It's called SPIR:

https://www.khronos.org/spir/

nabla92y ago

> Crossing the CUDA moat for AMD GPUs may be as easy as using PyTorch.

Nvidia has spent huge amount of work to make code run smoothly and fast. AMD has to work hard to catch up. ROCm code is slower , has more bugs, don't have enough features and they have compatibility issues between cards.

latchkey2y ago

Lisa has said that they are committed to improving ROCm, especially for AI workloads. Recent releases (5.6/5.7) prove that.

einpoklum2y ago

> Nvidia has spent huge amount of work to make code run smoothly and fast.

Well, let's say "smoother" rather than "smoothly".

> ROCm code is slower

On physically-comparable hardware? Possible, but that's not an easy claim to make, certainly not as expansively as you have. References?

> has more bugs

Possible, but - NVIDIA keeps their bug database secret. I'm guessing you're concluding this from anecdotal experience? That's fair enough, but then - say so.

> ROCm ... don't have enough features and

Likely. while AMD has both spent less in that department (and had less to spend I guess); plus, and no less importantly - it tried to go along with the OpenCL initiative, as specified by the Khronos consortium, while NVIDIA has sort of "betrayed" the initiative by investing in it's vendor-locked, incompatible ecosystem and letting their OpenCL support decay in some respects.

> they have compatibility issues between cards.

such as?

kkielhofner2y ago

I wouldn’t say ROCm code is “slower”, per se, but in practice that’s how it presents. References:

https://github.com/InternLM/lmdeploy

https://github.com/vllm-project/vllm

https://github.com/OpenNMT/CTranslate2

You know what’s missing from all of these and many more like them? Support for ROCm. This is all before you get to the really wildly performant stuff like Triton Inference Server, FasterTransformer, TensorRT-LLM, etc.

ROCm is at the “get it to work stage” (see top comment, blog posts everywhere celebrating minor successes, etc). CUDA is at the “wring every last penny of performance out of this thing” stage.

In terms of hardware support, I think that one is obvious. The U in CUDA originally stood for unified. Look at the list of chips supported by Nvidia drivers and CUDA releases. Literally anything from at least the past 10 years that has Nvidia printed on the box will just run CUDA code.

One of my projects specifically targets Pascal up - when I thought even Pascal was a stretch. Cue my surprise when I got a report of someone casually firing it up on Maxwell when I was pretty certain there was no way it could work.

A Maxwell laptop chip. It also runs just as well on an H100.

THAT is hardware support.

RcouF1uZ4gsC2y ago

I am not so sure.

Everyone knows that CUDA is a core competency of Nvidia and they have stuck to it for years and years refining it, fixing bugs, and making the experience smoother on Nvidia hardware.

On the other hand, AMD has not had the same level of commitment. They used to sing the praises of OpenCL. And then there is ROCm. Tomorrow, it might be something else.

Thus, Nvidia CUDA will get a lot more attention and tuning from even the portability layers because they know that their investment in it will reap dividends even years from now, whereas their investment in AMD might be obsolete in a few years.

In addition, even if there is theoretical support, getting specific driver support and working around driver bugs is likely to be more of a pain with AMD.

AnthonyMouse2y ago

This is what people complain about, but at the same time there aren't enough cards, so the people with AMD cards want to use them. So they fix the bugs, or report them to AMD so they can fix them, and it gets better. Then more people use them and submit patches and bug reporters, and it gets better.

At some point the old complaints are no longer valid.

hot_gril2y ago

People complain about Nvidia being anticompetitive with CUDA, but I don't really see it. They saw a gap in the standards for on-GPU compute and put tons of effort into a proprietary alternative. They tied CUDA to their own hardware, which sorta makes technical sense given the optimizations involved, but it's their choice anyway. They still support the open standards, but many prefer CUDA and will pay the Nvidia premium for it because it's actually nicer. They also don't have CPU marketshare to tie things to.

Good for them. We can hope the open side catches up either by improving their standards, or adding more layers like this article describes.

zirgs2y ago

CUDA was released in 2007 and the development of it started even earlier - possibly even in the 90s. Back then nobody else cared about GPU compute. OpenCL came out 2 years after that.

killerstorm2y ago

Not true. People got interested in general-purpose GPU compute (GPGPU) in early 2000s when video cards with programmable shaders became available. https://en.wikipedia.org/wiki/General-purpose_computing_on_g...

People made a programming language & a compiler/runtime for GPGPU in 2004: https://en.wikipedia.org/wiki/BrookGPU

hot_gril2y ago

Everything has old beginnings that the specialists will remember, but GPU compute really reached mass popularity and became a large selling point for Nvidia in the 2010s.

binarymax2y ago

And the question for most that remains once AMD catches up: will the duopoly result in lower prices to a reasonable level for hobbyists or bootstrapped startups, or will AMD just gouge like NVidia?

quitit2y ago

I think in this case the changes needed to make AMD useful will open the market to other players as well (e.g. Intel).

PyTorch is already walking down this path and while CUDA-based performance is significantly better, that is changing and of course an area of continued focus.

It's not that people don't like Nvidia, rather it's just that there is a lot of hardware out there that can technically perform competitively, but the work needs to be done to bring it into the circle.

binarymax2y ago

Last I checked I saw the H100 was about two gens more advanced for certain components (tensor cores, bfloats, cache, mem bandwidth) - but my research may have been wrong as admittedly I'm not as familiar with AMDs offerings for GPU.

FuriouslyAdrift2y ago

They are not behind... https://www.tomshardware.com/news/amd-expands-mi300-with-gpu...

You can also actually buy them as opposed to the nVidia offerings which you are going to have to fight for.

klysm2y ago

A simplistic economic take would suggest that the competition would result in lower prices, but given two players in the market who knows.

binarymax2y ago

My intuition is along the lines that if AMD had a competing product earlier, then it would have kept prices down. But since Nvidia has shown what the market will pay, AMD won't be able to resist overcharging. It will probably come down a little, but nowhere near to the point of affordability.

I sure hope I'm wrong.

tyre2y ago

AMD might have to charge less to break into customers that are already bought into Nvidia. There has to be a discount to cover the switching costs + still provide savings (or access).

1 more reply

sumtechguy2y ago

It is oligopoly pricing.

https://www.investopedia.com/terms/o/oligopoly.asp

With that few competitors pricing would not change much.

AnthonyMouse2y ago

That's mostly when there isn't a lot of price elasticity of demand. If you're Comcast and Verizon, each customer wants one internet connection and you're not going to change the size of the market much by offering better prices.

If you're AMD and NVIDIA and lowering the price would double the number of customers, you might very well want to do that, unless you're supply constrained -- which has been the issue because they're both bidding against everyone else for limited fab capacity. But that should be temporary.

This is also a market with a network effect. If all your GPUs are $1000 and nobody can afford them then nobody is going to write code for them, and then who wants them? So the winning strategy is actually to make sure that there are kind of okay GPUs available for less than $300 and make sure lots of people have them, then sell very expensive ones that use the same architecture but are faster.

That has been the traditional model, but the lack of production capacity meant that they've only been making the overpriced ones recently. Which isn't actually in their interests once the supply of fab capacity loosens up.

2 more replies

ad404b8a372f2b92y ago

Prices seemed to have lowered when AMD came out with CPUs competitive with Intel's.

tibbydudeza2y ago

Price difference between 13900K and AMD Ryzen 9 7950x is not big - the latest 7950X3D is about on par with the higher clocked 13900KS as well.

1 more reply

hackerlight2y ago

Looking at the CPU market, competition did lead to lower prices: AMD are the best, but their CPUs are very reasonably priced because Intel is close behind.

In the gaming market for GPUs, Nvidia has no competition except in some niche areas. Overall, their lead in upscaling software is too commanding so they can price how they want. Customers are paying 15-20% premiums for the same raw hardware performance, all to access Nvidia's DLSS, because there's no good competition.

adamsvystun2y ago

This is not a binary question. Two players, while not ideal, are better then just one.

evanjrowley2y ago

AMD prices will go up because of the newfound ability to gouge for AI/ML/GPGPU workloads. Nvidia's will likely go down, but I don't expect it will be by much. The market demand is high, so the equilibrium price will also be high. Supply isn't at pandemic / crypto-rush lows, but the supply of cards useful for CUDA/ROCm still is.

johngossman2y ago

When AMD caught up to Intel in CPUs, prices went down (at least compared to when Intel had a complete monopoly). The same was true when AMD gaming cards were more competitive. Chip manufacturers have shown themselves willing to both raise prices when they can and lower them when they must.

rafaelmn2y ago

If the margins and demand is there Intel will eventually show up

wmf2y ago

Intel already showed up three or four times but their software is as bad as AMD's used to be.

ilc2y ago

Thankfully, software can be fixed over time as AMD has shown. Lack of another competitor can't be fixed as easily.

Havoc2y ago

Is either in doubt?

rafaelmn2y ago

Wouldn't be surprised if a bunch of investment is hype bubble and demand correction forces price correction. Maybe not immediately but at Intel's pace - they managed to miss out on mining bubble, wouldn't be surprised for them to release in a correction.

rdsubhas2y ago

Demand will push AMD prices up by couple hundred bucks and Nvidia cards down by couple hundred bucks. A hobbyist customer will be neither better or worse.

stjohnswarts2y ago

In general I think it will lower prices, certainly not as much as if there were 4+ on the market where it's hard to anticipate your rivals. a 2 body system is pretty straight forward, 3 body can be stable for a while with some restrictions, a 4 body problem is really damn hard...

wil4212y ago

Why would their investors allow anything else? I’m sure they see it as a huge loss like intel and mobile.

superkuh2y ago

>There is also a version of PyTorch that uses AMD ROCm, an open-source software stack for AMD GPU programming. Crossing the CUDA moat for AMD GPUs may be as easy as using PyTorch.

Unfortunately since the AMD firmware doesn't reliably do what it's supposed to those ROCm calls often don't either. That's if your AMD card is even still supported by ROCm: the AMD RX 580 I bought in 2021 (the great GPU shortage) had it's ROCm support dropped in 2022 (4 years support total).

The only reliable interface in my experience has been via opencl.

htrp2y ago

has opencl actually improved enough to be competitive?

orangepurple2y ago

I thought ONNX is supposed to be the ultimate common denominator for machine learning model cross platform compatibility

1 more reply

zucker422y ago

Do you mean OpenCL using Rusticl or something else? And what DL framework, if any?

superkuh2y ago

I should clarify that I mean for human person uses. Not commercial or institutional. But, clBLAST via llama.cpp for LLM currently. Or far in the past just pure opencl for things with AMD cards.

65a2y ago

ROCm works fine on my 2016 Vega Frontier edition, for what it's worth.

the__alchemist2y ago

When coding using Vulkan, for graphics or compute (The latter is the relevant one here), you need to have CPU code (Written in C++, Rust etc), then serialize it as bytes, then have shaders which run on the graphics card. This 3-step process creates friction, much in the same way as backend/serialization/frontend does in web dev. Duplication of work, type checking not going across the bridge, the shader language being limited etc.

My understanding is CUDA's main strength is avoiding this. Do you agree? Is that why it's such a big deal? Ie, why this article was written, since you could always do compute shaders on AMD etc using Vulkan.

mark_l_watson2y ago

NVidia hardware/CUDA stack is great, but I also love to see competition from AMD, George Hotz’s Tiny Corp, etc.

Off topic, but I am also looking with great interest at Apple Silicon SOCs with large internal RAM. The internal bandwidth also keeps getting better which is important for running trained LLMs.

Back on topic: I don’t own any current Intel computers but using Colab and services like Lambda Labs GPU VPSs is simple and flexible. A few people here mentioned if AMD can’t handle 100% of their workload they will stick with Intel and NVidia - understandable position, but there are workarounds.

physicsguy2y ago

Don’t agree at all. PyTorch is one library - yes, it’s important that it supports AMD GPUs but it’s not enough.

The ROCm libraries just aren’t good enough currently. The documentation is poor. AMD need to heavily invest in their software ecosystem around it, because library authors need decent support to adopt it. If you need to be a Facebook sized organisation to write an AMD and CUDA compatible library then the barrier to entry is too high.

weebull2y ago

Disagree that the Rocm libraries are poor. Their integration with everything else is poor because everything else is so highly Nvidia centric, and AMD can't just write to the same API because it's copyright Nvidia (see Oracle's Java case).

The adoption of CUDA has been such a coop for Nvidia, it's going to take some time to dismantle it.

physicsguy2y ago

I don’t use high level frameworks like PyTorch because my work is in computational physics so I do actually use the lower level libraries. The documentation doesn’t even come close although it has got better. But they’re just not at feature parity, and that’s not on anyone but AMD currently. They need to invest more in the core libraries.

Just look at cuFFT vs rocFFT for e.g… they aren’t even close to being at feature parity - things like multi GPU is totally missing and callbacks are still “experimental”. These are pretty basic features - bear in mind that when people ported from CPU codes CUDA had to support these because they existed in FFTW (transforms over multiple CPUs rather than GPUs though via MPI).

alecco2y ago

Regurgitated months-old content. blogspam

ris2y ago

I don't understand the author's argument (if there is one) - pytorch has existed for ages. AMD's Instinct MI* range has existed for years now. If these are the key ingredients why has it not already happened?

fluxem2y ago

I call it the 90% problem. If AMD works for 90% of my projects, I would still buy NVIDIA, which works for 100%, even though I’m paying a premium

hot_gril2y ago

I'm lazy, so it's 99% for me. I don't even mess with AMD CPUs; I know they're not exactly the same instruction set as Intel, and more importantly they work with a different (and less mainstream) set of mobos, so I don't want em. If AMD manages to pull more customers their way, that's great, it just means lower Intel premium for me.

bornfreddy2y ago

That's an interesting take. AMD mobos are no "less mainstream" than Intel ones are... When you choose a CPU you are also choosing a compatible mobo chipset. The companies that make motherboards are mostly the same, so there should be no big difference between those.

Also, while the CPU instruction sets are not exactly equal, the same is true for Intel processors of different generations too. And it doesn't matter one bit... Unless there is a bug in CPU you will never notice the difference, because it is taken care of at the compiler / kernel level.

Intel does have some advantages (and disadvantages too) over AMD, just not those.

anon2912y ago

I have no idea what you're talking about. Amd and Intel match on the isa in any case you'd see typically. Moreover, Intel is currently using AMDs instruction set. X86_64 was designed my amd and used to be called AMD64

hot_gril2y ago

It's never this simple. Their SIMD extensions differ for one, or at least did in the past.

1 more reply

Flameancer2y ago

What mainstream board company is intel only? Maybe a decade ago on AM3(+) but on AM5/AM5 I haven’t seen a main board partner not offer the same board SKU that works with Intel and AMD.

65a2y ago

As an owner of some Sapphire Rapids parts, let me just direct you to: https://edc.intel.com/content/www/us/en/design/products-and-...

hot_gril2y ago

To see errata tracked by Intel is a good sign.

hot_gril2y ago

Forgot to also mention iGPU and other on-chip accelerators being different and Intel usually having the edge there.

nologic012y ago

If the AI hype persists the CUDA moat will be less relevant in ~2 yrs.

Historically HPC was simply not sufficiently interesting (in commercial sense) for people to throw serious resources in the direction of making it a mass market capability.

NVIDIA first capitalized on the niche crypto industry (which faded) and was then well positioned to jump into the AI hype. The question is how much of the hype will become real business.

The critical factor for the post-CUDA world is not any circumstantial moat but who will be making money servicing stable, long term computing needs. I.e., who will be buying this hardware not with speculative hot money but with cashflow from clients that regularly use and pay for a HPC-type application.

These actors will be the long term buyers of commercially relevant HPC and they will have quite a bit of influence on this market.

ddtaylor2y ago

It's worth noting that AMD also has a ROCm port of Tensorflow.

ginko2y ago

When I try to install rocm-ml-sdk on Arch linux it'll tell me the total installed size would be about 18GB.

What can possibly explain this much bloat for what should essentially be a library on top of a graphics driver as well as some tools (compiler, profiler etc.)? A couple hundred MB I could understand if they come with graphical apps and demos, but not this..

tomsmeding2y ago

A regular TensorFlow installation, just the Python library, is an 184 MB wheel that unpacks to about 1.2 GB of stuff. I have no clue what mess goes in there, but it's a lot.

Still, if you're right that this package seems to take 18 GB disk size, something weird is going on.

slavik812y ago

There's a lot of kernels that are specialized for particular sets of input parameters and tuned for improved performance on specific hardware, which makes the libraries a couple hundred megabytes per architecture. The ROCm libraries are huge because they are fat binaries containing native machine code for ~13 different GPU architectures.

Flameancer2y ago

He’s not wrong. I did a new arch install to try and get a 7800XT working with ROCm and PyTorch and was concussed on how I ran out of space but saw that ROCm was 18GB.

sharonzhou2y ago

ROCm is great. We were able to get run and finetune LLMs on AMD Instincts with parity to NVIDIA A100s - and built an SDK that’s as easy to use as HuggingFace or easier (Lamini). Or at the very least, our designer is able to finetune/train the latest LLMs on them like Llama 2 - 70B and Mistral 7B with ease. The ROCm library isn’t as easy to use as CUDA because as another poster said, the ecosystem was built around CUDA. For example, it’s even called “.cuda()” in PyTorch to put a model on a GPU, when in reality you’d use it for an AMD GPU too.

atemerev2y ago

Nope. PyTorch is not enough, you have to do come C++ occasionally (as the code there can be optimized radically, as we see in llama.cpp and the like). ROCm is unusable compared to CUDA (4x more code for the same problem).

I don't understand why everyone neglects good, usable and performant lower-level APIs. ROCm is fast, low-level, but much much harder to use than CUDA, and the market seems to agree.

voz_2y ago

The amount of random wrong stuff about pytorch in this thread is pretty funny.

whywhywhywhy2y ago

Anyone who has to work in this ecosystem surely thinks this is a naive take

freedomben2y ago

For someone who doesn't work in this ecosystem, can you elaborate? What's the real situation currently?

whywhywhywhy2y ago

Nvidia CUDA was first to market, easier to work with that OpenCL which was the only competition for the first decade then abandoned. Because of this then all the people serious about this are using Nvidia hardware therefore all the code is written for Nvidia hardware.

Only way I could see AMD making inroads if they were willing to provide power of the level Nvidia puts in a data center at consumer prices and relaxed licensing to justify retooling the entire ML chain to work on a different architecture.

Geohot has documented his troubles trying to go all in on AMD and he's back on Nvidia now I believe.

benreesman2y ago

I know a lot of people don’t like George, I dislike plenty of people who are doing the right thing thing (including by some measures sama and siebel while they were pushing YC forward).

But not admitting the tinygrad project is the best Rebel Alliance on this is just a matter of letting vibe overcome results.

frnkng2y ago

As a former ETH miner I learned the hard way that saving a few bucks on hardware may not be worth operational issues.

I had a miner running with Nividia cards and a miner running with AMD cards. One of them had massive maintenance demand and the other did not. I will not state which brand was better imho.

Currently I estimate that running miners and running gpu servers has similar operational requirements and finally at scale similar financial considerations.

So, whatever is cheapest to operate in terms of time expenditure, hw cost, energy use,… will be used the most.

P.s.: I ran the mining operation not to earn money but mainly out of curiosity. And it was a small scale business powered by a pv system and a attached heat pump.

latchkey2y ago

I ran 150,000+ AMD cards for mining ETH. Once I fully automated all the vbios installs and individual card tuning, it ran beautifully. Took a lot of work to get there though!

Fact is that every single GPU chip is a snowflake. No two operate the same.

rottencupcakes2y ago

Have you ever written about this enterprise? This sounds super unique and I would be very interested in hearing about how it was run and how it turned out.

latchkey2y ago

It was unique, not many people on the planet, that I know of, who've run as many GPUs as I have. Especially not working for a giant company with large teams of people. For the tech team, it was just me and one other guy. Everything had to be automated because there was no way we could survive otherwise.

I've put a bunch of comments here on HN about the stuff I can talk about.

It no longer exists after PoS.

1 more reply

pjmlp2y ago

Unless they get their act together regarding CUDA polyglot tooling, I seriously doubt it.

ElectronBadger2y ago

On my PC workstation (Debian Testing) I have absolutely no problems running NVIDIA PNY Quadro P2200, which I'm going to upgrade with PNY Quadro RTX 4000 soon. I'd love to make a switch for AMD Radeon, but the very short (and shrinking) list of ROCm supported cards makes this move highly improbable for the not-so-nearest future.

upbeat_general2y ago

This article doesn’t address the real challenge [in my mind].

Framework support is one thing, but what about the million standalone CUDA kernels that have been written, especially common in research. Nobody wants to spend time re-writing/porting those, especially when they probably don’t understand the low-level details in the first place.

Not to mention, what is the plan for comprehensive framework support? I’ve experienced the pain of porting models to different hardware architectures where various ops are unsupported. Is it realistic to get full coverage of e.g., PyTorch?

bdowling2y ago

Someone could reimplement CUDA for AMD hardware. That would be legal because copying APIs for compatibility purposes is not copyright infringement. (See Google LLC v. Oracle America, Inc., 593 U.S. ___ (2021)).

AMD is unlikely to do this, however, because it would commodify their own products under their competitor’s API.

A third party could do it though. It may make sense as an open source project.

blueboo2y ago

Research kernels mostly turn to ash upon publication anyway. The wheel turns and the next post-doc gives ROCm a try and we move on

hankman862y ago

I suspect that AMD will use their improved compatibility with the leading ML stack for data center deals. Presumably by offering steep discounts over NVIDIA’s GPUs. This might help them to break into the market.

Individual ML practitioners will probably not be tempted to switch to AMD cards anytime soon. Whatever the price difference is: it will hardly offset the time that is subsequently sunk into working around remaining issues resulting from a non-CUDA (and less mature) stack underneath PyTorch.

falconroar2y ago

Is there any reason OpenCL is not the standard in implementations like PyTorch? Similar performance, open standard, runs everywhere - what's the downside?

LoganDark2y ago

IIRC, ease of implementation (for the GPU kernels), and cross-compatibility (the same bytecode can be loaded by multiple models of GPU).

ealloc2y ago

How is CUDA-C that much easier than OpenCL? Having ported back and forth myself, the base C-like languages are virtually identical. Just sub "__syncthreads();" for "barrier(CL_MEM_FENCE)" and so on. To me the main problem is that Nvidia hobbles OpenCL on their GPUs by not updating their CL compiler to OpenCL 2.0, so some special features are missing, such as many atomics.

LoganDark2y ago

Never used it myself, these are just the main reasons I've heard from friends.

jacobgorm2y ago

The ease of implementation using CUDA means that your code because effed for life, because it is no longer valid C/C++, unless you totally litter it with #ifdefs to special case for CUDA. In my own proprietary AI inference pipeline I've ended up code-generating to a bunch of different backends (OpenCL SpirV, Metal, CUDA, HLSL, CPU w. OpenMP), giving no special treatment to CUDA, and the resulting code is much cleaner and builds with standard open source toolchains.

LoganDark2y ago

> The ease of implementation using CUDA means that your code because effed for life

yes, yes it absolutely does. establishing market dominance as everyone wants to use CUDA but almost nobody wants to write their kernel twice.

JonChesterfield2y ago

Downsides are it can't express a bunch of stuff cuda or openmp can plus the nvidia opencl implementation is worse than their cuda one. So opencl is great if you want a lower performance way of writing a subset of the programs you want to write.

tails4e2y ago

AMD playing catch up is a good thing, their SW solution is intended to run on any HW, and with hip being basically line for line compatible with cuda it makes porting very easy. They did it with FSR,and they are doing it with rocm. Hopefully it takes off as it's a more open ecosystem for the industry. Necessity is the mother of invention and all that.

tormeh2y ago

For LLM inference, a shoutout to MLC LLM, which runs LLM models on basically any API that's widely available: https://github.com/mlc-ai/mlc-llm

einpoklum2y ago

TL;DR:

1. Since PyTorch has grown very popular, and there's an AMD backend for that, one can switch GPU vendors when doing Generative AI work.

2. Like NVIDIA's Grace+Hopper CPU-GPU combo, AMD is/will be offering "Instinct MI300A", which improves performance over having the GPU across a PCIe bus from a regular CPU.

ur-whale2y ago

> AMD May Get Across the CUDA Moat

I really wish they would, and properly, as in: fully open solution to match CUDA.

CUDA is a cancer on the industry.

mschuetz2y ago

What's wrong with CUDA? I avoided it for years because it's proprietory but about one year ago I started using it because all the alternatives (OpenGL/Vulkan compute, OpenCL, WebGPU, ...) couldn't quite do what I wanted, and it turned out to be a game changer. Nothing comes close to it. Now I'm hooked because there simply isn't an alternative that's as easy to use, yet powerfull and fast.

I wish there was an open alternative, but NVIDIA did several things right that others, especially Khronos, do not: The UX is top-notch. It makes the common cases easy yet still fast, and from there you can optimize to your hearts content. Khronos, however, usually completely over-engineers things and makes the common case hard and cumbersome with massive entry barriers.

ur-whale2y ago

> What's wrong with CUDA?

Read on

> it's proprietory

Yes indeed, proprietary

> Now I'm hooked

There you go.

> I wish there was an open alternative

So does the rest of the industry.

Specifically, it forces you to run your stuff on NVidia hardware and gives you exactly zero guarantee of future support.

Good luck trying to reproduce whatever research you are currently conducting in 10 years time.

Vendor lock-in + no forward compatibility guarantee = surefire recipe for getting milked to the bone by NVidia.

mschuetz2y ago

Those are some poor arguments, imho, because there literally is no other option than CUDA. The alternatives are so bad, it's far better to be vendor-locked and being able to get stuff done, than not being able to get stuff done at all.

As I said, I avoided it for years because of the reasons you mentioned. Turns out I could not avoid it any longer because it's the only (meaningful) option that could do what I needed, has serious support, and great UX. And NVIDIA is hardly to blame because they simply made sure to build a good product. It can't stop AMD, Intel or Khronos from creating a competitive alternative, but so far they haven't.

And regarding support, so far NVIDIA has shown excellent continuous support for CUDA, whereas OpenCL and OpenGL are the ones that went down. And I've chosen CUDA over rocm precisely due to support reasons, because AMD has always treated it as some kind of side gig with uncertain future.

raggi2y ago

Can we just get wgsl compute good enough and over the line instead, and do away with these moats?

mschuetz2y ago

Not happening. WGSL wants to support the lowest common denominator, so it'll always mainly be a 5-year old mobile-phone API. Also if you want to beat CUDA, you'll need some functionality that's completely missing in compute shaders, especially WGSL. Like pointers and pointer casting (and that glsl buffer reference extension is the worst emulation of that feature I've every seen).

raggi2y ago

The language extensions feature is designed to provide these kinds of facilities is it not?

jeffreygoesto2y ago

I am hoping for SYCL and SPIR-V to gain traction...

jiggawatts2y ago

Can I buy an MI300 or even rent one in a cloud?

arcanus2y ago

Soon. The card is coming in Q4. The early shipments are likely all going to LLNL's El Capitan Exascale computer: https://www.tomshardware.com/news/amds-instinct-mi300-moves-...

spandextwins2y ago

That’s like saying Ford is gonna catch Tesla.

cantaloupe2y ago

Do you see that as an inevitability or an impossibility?

tpmx2y ago

No, not really. They have similar enough silicon, they "just" need some software to make it work.

Zetobal2y ago

They are just too late even if they catch up. Until they make a leap like they did with ryzen nothing will happen.

Havoc2y ago

>They are just too late even if they catch up.

Late certainly, too late I don't think so.

If you can field a competitively priced consumer card that can run llama fast then you're already halfway there because then the ecosystem takes off. Especially since nvidia is being really stingy with their vram amounts.

H100 & datacenter is a separate battle certainly, but on mindshare I think some deft moves from AMD will get them there quite fast once they pull their finger out their A and actually try sorting out the driver stack.

dylan6042y ago

>If you can field a competitively priced consumer card

if this unicorn were to show up, what's to say that all the non-consumers won't just scarf up these equally performant yet lower priced cards causing the supply-demand situation we're in now? the only difference would be a sudden supply of the expensive Nvidia cards that nobody wants because of their price.

AnthonyMouse2y ago

The thing that causes it to be competitively priced is having enough production capacity to prevent that from happening.

One way to do that may be to produce a card on an older process node (or the existing one when a new one comes out) that has a lot of VRAM. There is less demand for the older node so they can produce more of them and thereby sell them for a lower price without running out.

Havoc2y ago

>if this unicorn were to show up

A unicorn like that showed up a couple hours ago. Someone posted a guide for getting llama to run on a 7900xtx

https://old.reddit.com/r/LocalLLaMA/comments/170tghx/guide_i...

It's still slow and janky but this really isn't that far away.

I don't buy that AMD can't make this happen if they actually tried.

Go on fiverr, get them to compile a list of top 100 people in the DIY LLM space, send them all free 7900XTXs. Doesn't matter if half of it is wrong, just send it. Next take 1.2m USD, post a dozen 100k bounties against llama.cpp that are AMD specific - support & optimise the gear. Rinse and repeat with every other hobbyist LLM/stable diffusion project. A lot of these are zero profit open source / passion / hobby projects. If 6 figure bounties show up it'll absolute raise pulses. Next do all the big youtubers in the space - carefully on that one so that it doesn't come across as an attempted pay-off...but you want them to know that you want this space to grow and are willing to put your money where your mouth is.

That'll cost AMD what 2m 3m? To move the needle on a multi billion market? That's the cheapest marketing you've ever seen.

As I said the datacenter & enterprise market is another beast entirely full of moats and strategy, but I don't see why a suitably motivated senior AMD exec can't tackle the enthusiast market single handedly with a couple of emails, a cheque book and a tshirt that has the nike slogan on it.

>what's to say that all the non-consumers won't just scarf up these equally performant yet lower priced cards

It doesn't matter. They're in the business of selling cards. To consumers, to datacenters, to your grandmother. From a profit driven capitalist company the details don't matter as long as there is traction & volume. The above - opening up even the possibility of a new market - is gold in that perspective. And from a consumer perspective anything that breaks the nvidia cuda monopoly is a win.

2 more replies

j / k navigate · click thread line to collapse

302 comments

omneity2y ago

Best of all is that I simply set the device to `torch.device('cuda')` rather than openCL, which does wonders for compatibility and to keep code simple.

[0]: https://rocm.docs.amd.com/en/latest/how_to/pytorch_install/p...

mikepurvis2y ago

So it's important that vendors don't feel let off the hook to provide sane packaging just because there's an option to use a kitchen-sink container image they rebuild every day from source.

xahrepap2y ago

https://github.com/RadeonOpenCompute/ROCm-docker/blob/master...

They also have some for Fedora. Looks like for this you need to install their repo:

    curl -sL https://repo.radeon.com/rocm/rocm.gpg.key | apt-key add - \
    && printf "deb [arch=amd64] https://repo.radeon.com/rocm/apt/$ROCM_VERSION/ jammy main" | tee /etc/apt/sources.list.d/rocm.list \
    && printf "deb [arch=amd64] https://repo.radeon.com/amdgpu/$AMDGPU_VERSION/ubuntu jammy main" | tee /etc/apt/sources.list.d/amdgpu.list \

then install Python, a couple other dependencies (build-essential, etc) and then the package in question: rocm-dev

So they are doing the packaging. There might even be documentation elsewhere for that type of setup.

mikepurvis2y ago

5 more replies

fwsgonzo2y ago

mikepurvis2y ago

amelius2y ago

> So it's important that vendors don't feel let off the hook to provide sane packaging just because there's an option to use a kitchen-sink container image they rebuild every day.

Sadly if e.g. 95% of their users can use the container, then it could make economical sense to do it that way.

mathisfun1232y ago

> especially once you're in embedded

is this a real problem? exactly which embedded platform has a device that ROCm supports?

mikepurvis2y ago

1 more reply

ngcc_hk2y ago

X86 cannot do 64 bit let us do this and that so the market can use only our cpu. Repeat with me x86-64 is impossible.

Not sure Apple is in this otherwise the real great competition come.

wyldfire2y ago

> Best of all is that I simply set the device to `torch.device('cuda')` rather than openCL, which does wonders for compatibility

Man oh man where did we go wrong that cuda is the more compatible option over OpenCL?

KeplerBoy2y ago

It must be a misnomer on PyTorch's side. Clearly it's neither CUDA nor OpenCL.

AMD should just get it's shit together. This is ridiculous. Not the name, but the fact that you can only do FP64 on a GPU. Everybody is moving to FP16 and AMD is stuck on doubles?

omneity2y ago

I believe the fp64 limitation came from the laptop-grade GPU I had rather than inherent to AMD or ROCm.

The API level I could target was at least two or three versions behind the latest they have to offer.

1 more reply

JonChesterfield2y ago

FP64 is what HPC is built on. F32 works on the cards too (same rate or faster). I don't know the status of F16 or F8.

2 more replies

NavinF2y ago

This has always been the case. OpenCL is a shit show

RockRobotRock2y ago

Have you gotten it to work with Whisper by any chance?

kkielhofner2y ago

Whisper is actually a great example of why Nvidia has such a stronghold on ML/AI and why it’s so difficult to compete.

There’s getting something to “work”, which is often enough of a challenge with ROCm. Then there’s getting it to work well (next challenge).

Then there’s getting it to work as well as Nvidia/CUDA.

With Whisper, as one example, you should be running it with ctranslate2[0]. Of all the platforms on their supported list you won’t find ROCm.

[0] - https://github.com/OpenNMT/CTranslate2

errnoh2y ago

1 more reply

pedrovhb2y ago

I've had luck with an RX5700XT and whisper.cpp built with clblast. Works like a charm, not entirely a scarring experience getting it to work (easier than most other stuff which was surprising to me).

incognition2y ago

Fp64??

latchkey2y ago

https://en.wikipedia.org/wiki/Double-precision_floating-poin...

NVIDIA fp32 (H100) has 2x more TFLOPS than AMD's fp32 (MI250) and AI doesn't need fp64 precision.

incognition2y ago

Lol it was meant as I wouldn't be caught dead using fp64

fransje262y ago

Hardware limitation.

javchz2y ago

CUDA is the only reason I have an Nvidia card, but if more projects start migrating to a more agnostic environment, I'll be really grateful.

distract89012y ago

My Arch system would occasionally boot to a black screen. When this happened, no amount of tinkering could get it back. I had to reinstall the whole OS.

Turns out it was a conflict between nvidia drivers and my (10 year old) Intel integrated GPU. But once I switched to an AMD card, everything works flawlessly.

Ubuntu based systems barely worked at all. Incredibly unstable and would occasionally corrupt the output and barf colors and fragments of the desktop all over my screens.

AMD on arch has been an absolute delight. It just. Works. It's more stable than nvidia on windows.

For a lot of reasons-- but mainly Linux drivers-- I've totally sworn off nvidia cards. AMD just works better for me.

aftbit2y ago

iopq2y ago

I tried using an Nvidia card with OBS to record my screen and it kind of freezes in Wine. I switched from x11 to Wayland and now Wine shows horizontal lines (!) and performs like crap.

1 more reply

distract89012y ago

1 more reply

wildzzz2y ago

MegaDeKay2y ago

I run Arch as well and AMD is only "good". I would have a problem every now and then where my RX560 would lose its mind coming out of sleep and I'd have to reboot.

__rito__2y ago

I have used Pop OS and Ubuntu with NVIDIA card, and honestly, I never faced any serious problem.

After every kernel upgrade, I just have to reinstall the nvidia drivers and the cuda toolkit.

Everything works as before after I do that. I don't face any problems at all.

hskalin2y ago

I'm not sure what card you have but I've never really had any major problems running Nvidia + Intel integrated graphics on Arch, Ubuntu etc.

nextaccountic2y ago

> CUDA is the only reason I have an Nvidia card, but if more projects start migrating to a more agnostic environment, I'll be really grateful.

If AMD relies on people changing their code to make it portable, it already lost.

JonChesterfield2y ago

The original idea was people would write in opencl instead of cuda but that really didn't work out.

pjmlp2y ago

Both ideas are already lost before starting, Hip isn't polyglot as CUDA, and OpenCL is mostly stuck in C.

mrweasel2y ago

> I see people saying that CUDA as an API is very tailored to the capabilities of nvidia GPUs

javchz2y ago

I think that could work too. I wonder if they could do a translation layer, something like Apple with the M1 chips that translates JIT x86 to ARM.

JonChesterfield2y ago

It would be far more difficult than compiling cuda source directly. I'm not sure anyone would pay for a cuda->amdgpu conversion tool, and it's hard to imagine AMD making one as part of ROCm.

mschuetz2y ago

Not just feature parity, but proper UX. Things need to just work, without spending hours or days to make them work.

weebull2y ago

Blame Nvidia. They are the ones the got the industry hooked on a proprietry API.

1 more reply

PH95VuimJjqBqy2y ago

I see these complains from time to time and I never understand them.

I've literally been running nvidia on linux since the TNT2 days and have _never_ had this sort of issue. That's across many drivers and many cards over the many many years.

LtWorf2y ago

I've had kernel panics that disappeared when I started using the on board intel graphics instead of the nvidia.

Your statement makes no sense. It's like a smoker claiming that since he didn't die of lung cancer, smoke is 100% safe.

kkielhofner2y ago

Describing kernel panics and general nightmare scenarios as the general course with Nvidia doesn’t make sense either.

Nvidia has 80% market share of the discrete GPU desktop market and at least 90% market share of cloud/datacenter.

Nvidia GPUs are used almost exclusively for every cloud powered AI service and to train virtually every ML model in existence. Almost always on Linux.

Do you really think any of this would be possible if what you are describing was anything approaching the typical experience starting at the /driver/ level?

Nvidia gets a lot of hate on HN and elsewhere (much of it fair) but I will never understand the people who claim it doesn’t work and get the job done (often very well).

2 more replies

jjoonathan2y ago

Same but linux experience is a steep and bumpy function of hardware.

My guess: something like laptop GPU switching failed badly in the nvidia binary, earning it a reputation.

HideousKojima2y ago

2 more replies

temp08262y ago

PH95VuimJjqBqy2y ago

actually, it's a good point because that's how I always install nvidia drivers as well. Never from the local package manager.

bootsmann2y ago

So you don‘t recommend going the rpm-fusion route?

ant6n2y ago

Well tnt2 should be pretty well supported by now ;-)

PH95VuimJjqBqy2y ago

lmao, touche :)

einpoklum2y ago

Having said that - I (or rarely, other people) have almost always managed to work out those issues and get my systems to work. Not in all cases though.

kombine2y ago

filterfiber2y ago

Do you use wayland, multiple monitors, and/or play games or is it just for ML/AI?

smoldesu2y ago

I do all of those things with my 3070 and it works just fine. Most of them will depend on your DE's Wayland implementation.

chaostheory2y ago

Yeah with my CUDA setup, it feels like I just ducktaped my deployment. I am very hesitant to make changes and it’s not easy to replicate

gymbeaux2y ago

I often have issues booting to the installer or first boot after install with an NVidia GPU.

Flameancer2y ago

I think they moved to Wayland on 23.04 or 23.10. I just recently installed both to try and get a 7800xt working with PyTorch and the default was Wayland.

anthk2y ago

X11 is not a window manager.

gymbeaux2y ago

Xorg

1 more reply

smoldesu2y ago

I'm definitely not against better hardware support for AI, but I think your problems are more GNOME's fault than Nvidia's. KDE's Wayland session is almost flawless on Nvidia nowadays.

arsome2y ago

If GNOME can tank the kernel, it ain't GNOME's fault.

kombine2y ago

I really hope that with KDE 6 I can finally switch to Wayland!

Zardoz842y ago

orangetuba2y ago

Nvidia on Linux is more like running Windows 95 from the gulag, and you're covered in ticks. I absolutely detest Nvidia because of the Linux hell they've created.

codemk82y ago

From my recent experience, ROCm and hipcc allow you to port cuda programs fairly easily to AMD arch. The compiling is so much faster than nvcc too.

wubrr2y ago

Yeah, nvidia linux support is meh, but still much better than amd.

phkahler2y ago

>> Yeah, nvidia linux support is meh, but still much better than amd.

Zambyte2y ago

Is it better than AMD? I have had literally no graphics issues on my 6650 XT with swaywm using the built in kernel drivers.

aseipp2y ago

I have come to accept that graphics card drivers and hardware stability ultimately comes down to whether or not ghosts have decided to haunt you.

christkv2y ago

I think the problems are pro drivers and the issues with ROCm being buggy not the open source graphics drivers.

HansHamster2y ago

Guess I'm also doing something wrong. Never had any serious issues with either Nvidia or AMD on Linux (and only a few annoyances on RNDA2 shortly after release)...

treprinum2y ago

1 more reply

bryanlarsen2y ago

Not my experience. The open source AMD drivers are much more pleasant to deal with than the closed source Nvidia ones.

silisili2y ago

In the closed source days of fglrx or whatever it's called I'd agree. Since they went open source, hard disagree. AMD graphics work in Linux about as well as Intel always has.

acomjean2y ago

As someone who was tasked with trying to get nvidia working on Ubuntu, it’s a pretty terrible experience.

I have a nvidia laptop with popos. That works well.

IronWolve2y ago

Yup, thank the hobbyists. Pytorch is allowing other hardware. Stable diffusion working on m chips, intel arc, and Amd.

Now what I'd like to see is real benchmarks for compute power. Might even get a few startups to compete in this new area.

mandevil2y ago

Hobbyist and open-source are definitely not synonyms.

janalsncm2y ago

[1] https://discuss.pytorch.org/u/ptrblck/summary

Eisenstein2y ago

kiratp2y ago

There are so many billions of dollar being spent on this hardware that everyone other than Nvidia is doing everything they can to make competition happen.

Eg: https://www.intel.com/content/www/us/en/developer/videos/opt...

https://www.intel.com/content/www/us/en/developer/tools/onea...

https://developer.apple.com/metal/tensorflow-plugin/

Large scale opensource is, outside of a few exceptions, built by engineers paid to build it.

johngossman2y ago

roenxi2y ago

AMD are being dragged along by the market. Willingly, they aren't fighting it, but their focus has been on other areas.

3 more replies

mattnewton2y ago

Re: startups, Geohotz raised a few million for this already. https://tinygrad.org/

IntelMiner2y ago

Didn't he do what he always does. Rake in a ton of money, fart around and then cash out exclaiming it's everyone else's fault?

The way he stole Fail0verflow's work with the PS3 security leak after failing to find a hypervisor exploit for months absolutely soured any respect I had for him at the time

throwitawayfam2y ago

Yep, did exactly that. IMO he threw a fit, even though AMD was working with him squashing bugs. https://github.com/RadeonOpenCompute/ROCm/issues/2198#issuec...

4 more replies

ShamelessC2y ago

> The way he stole Fail0verflow's work with the PS3 security leak after failing to find a hypervisor exploit for months absolutely soured any respect I had for him at the time

1 more reply

kinematikk2y ago

Do you have a source on the stealing part? A quick Google search didn't result in anything

1 more reply

georgehotz2y ago

Are you interested in being factually correct, or are you interested in hating? If it's the former, I think you should do some research. If it's the latter, :salute:

I had GPT-4 do some research for you, hopefully you will incorporate it in future comments you make about me. https://chat.openai.com/share/d0fa24e9-3ed7-4b17-8497-24bfdd...

adastra222y ago

Wow, TIL

nomel2y ago

Obligatory Lex Fridman podcast, where he discusses it: https://youtu.be/dNrTrx42DGQ?t=2408

jauntywundrkind2y ago

mathisfun1232y ago

> Pytorch is just using Google's OpenXLA now

this is so far from accurate it should be considered libelous; from the link

> PyTorch/XLA is set to migrate to the open source OpenXLA

fotcorn2y ago

If you use model.compile() in PyTorch, you use TorchInductor and OpenAIs Triton by default.

jauntywundrkind2y ago

Thank you for saying something useful here. I was vaguely under the impression that pytorch 2.0 had fully flipped to defaulting to openxla. That seems to not be the case.

voz_2y ago

Wrong.

withwarmup2y ago

AMD has the hardware but the support for HPC is non-existent outside of the joke that is bliss and AOCL.

I really wish for more competitors to enter the market in HPC, but AMD has a shitload of work to do.

arcanus2y ago

> AMD has the hardware but the support for HPC is non-existent outside of the joke that is bliss and AOCL.

That's hardly non-existent support for HPC.

65a2y ago

aiunboxed2y ago

Exactly, with NVIDIAs core focus on AI way before it was cool has lead to them being in this advantageous position. For AMD just being a price friendly competitor to Intel and Nvidia was the motto.

runiq2y ago

Yeah, that's a pretty shortsighted take of things. Do you really believe that Nvidia hasn't taken steps do make sure their moat is as wide as possible?

Blammar2y ago

The thing about owning the CUDA spec is that Nvidia can add new features quickly without having to argue with other hardware vendors. I find that a positive thing overall.

anon2912y ago

Literally never had an issue with Nvidia and Linux in decades. Despite this, my windows installs have all sorts of issues.. as always

pama2y ago

(Edited “no” to limited empirical evidence after a fellow user mentioned El Capitan.)

fotcorn2y ago

ROCm has HIP (1) which is a compatibility layer to run CUDA code on AMD GPUs. In theory, you only have to adjust #includes, and everything should just work, but as usual, reality is different.

1: https://github.com/ROCm-Developer-Tools/HIP

Certhas2y ago

The fact that El Capitan is AMD says that at least for Science/HPC there definitely is evidence of a closing gap.

pama2y ago

Thanks. You are actually right that this new supercomputer might move the needle once it is in production mode. I will wait and see how it goes.

falconroar2y ago

I don't understand why developers of PyTorch and similar don't use OpenCL. Open standard, runs everywhere, similar performance - what's the downside??

pama2y ago

Roark662y ago

bigcat123456782y ago

Cuda is the foundation

NVIDIA moat is the years of work built by oss community, big corporations, research insistute

They spend all time building for cuda, a lot of implicit designs are derived from cuda's characteristic

That will be the main challenge

mikepurvis2y ago

bigcat123456782y ago

what I meant is that all these stuff have 15 years of implicit accumulation of knowledge and tips and even hacks builtin in the software

No matter what you depends on, you'll have a slew of larger or minor obstacles or annoyance

That collectively is the most itself

As you said, already it's clear that replacing cuda itself is not that daunting

pixelesque2y ago

Does AMD have a solution to forward device combatibility (like PTX for NVidia)?

Last time I looked into ROCm (two years ago?), you seemed to have to compile stuff explicitly for the architecture you were using, so if a new card came out, you couldn't use it without a recompile.

mnau2y ago

Not natively, but AdaptiveCpp (previously hiSycl, then OpenSycl) has a single source single compiler pass, where they basically store LLVM IR as an intermediate representation.

https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/...

Performance penalty was within ew precents, at least according to the paper (figure 9 and 10) https://cdrdv2-public.intel.com/786536/Heidelberg_IWOCL__SYC...

einpoklum2y ago

I don't know what they do with ROCm, but with OpenCL, the answer is: Certainly. It's called SPIR:

https://www.khronos.org/spir/

nabla92y ago

> Crossing the CUDA moat for AMD GPUs may be as easy as using PyTorch.

latchkey2y ago

Lisa has said that they are committed to improving ROCm, especially for AI workloads. Recent releases (5.6/5.7) prove that.

einpoklum2y ago

> Nvidia has spent huge amount of work to make code run smoothly and fast.

Well, let's say "smoother" rather than "smoothly".

> ROCm code is slower

On physically-comparable hardware? Possible, but that's not an easy claim to make, certainly not as expansively as you have. References?

> has more bugs

Possible, but - NVIDIA keeps their bug database secret. I'm guessing you're concluding this from anecdotal experience? That's fair enough, but then - say so.

> ROCm ... don't have enough features and

> they have compatibility issues between cards.

such as?

kkielhofner2y ago

I wouldn’t say ROCm code is “slower”, per se, but in practice that’s how it presents. References:

https://github.com/InternLM/lmdeploy

https://github.com/vllm-project/vllm

https://github.com/OpenNMT/CTranslate2

A Maxwell laptop chip. It also runs just as well on an H100.

THAT is hardware support.

RcouF1uZ4gsC2y ago

I am not so sure.

Everyone knows that CUDA is a core competency of Nvidia and they have stuck to it for years and years refining it, fixing bugs, and making the experience smoother on Nvidia hardware.

On the other hand, AMD has not had the same level of commitment. They used to sing the praises of OpenCL. And then there is ROCm. Tomorrow, it might be something else.

In addition, even if there is theoretical support, getting specific driver support and working around driver bugs is likely to be more of a pain with AMD.

AnthonyMouse2y ago

At some point the old complaints are no longer valid.

hot_gril2y ago

Good for them. We can hope the open side catches up either by improving their standards, or adding more layers like this article describes.

zirgs2y ago

CUDA was released in 2007 and the development of it started even earlier - possibly even in the 90s. Back then nobody else cared about GPU compute. OpenCL came out 2 years after that.

killerstorm2y ago

People made a programming language & a compiler/runtime for GPGPU in 2004: https://en.wikipedia.org/wiki/BrookGPU

hot_gril2y ago

Everything has old beginnings that the specialists will remember, but GPU compute really reached mass popularity and became a large selling point for Nvidia in the 2010s.

binarymax2y ago

And the question for most that remains once AMD catches up: will the duopoly result in lower prices to a reasonable level for hobbyists or bootstrapped startups, or will AMD just gouge like NVidia?

quitit2y ago

I think in this case the changes needed to make AMD useful will open the market to other players as well (e.g. Intel).

PyTorch is already walking down this path and while CUDA-based performance is significantly better, that is changing and of course an area of continued focus.

binarymax2y ago

FuriouslyAdrift2y ago

They are not behind... https://www.tomshardware.com/news/amd-expands-mi300-with-gpu...

You can also actually buy them as opposed to the nVidia offerings which you are going to have to fight for.

klysm2y ago

A simplistic economic take would suggest that the competition would result in lower prices, but given two players in the market who knows.

binarymax2y ago

I sure hope I'm wrong.

tyre2y ago

AMD might have to charge less to break into customers that are already bought into Nvidia. There has to be a discount to cover the switching costs + still provide savings (or access).

1 more reply

sumtechguy2y ago

It is oligopoly pricing.

https://www.investopedia.com/terms/o/oligopoly.asp

With that few competitors pricing would not change much.

AnthonyMouse2y ago

2 more replies

ad404b8a372f2b92y ago

Prices seemed to have lowered when AMD came out with CPUs competitive with Intel's.

tibbydudeza2y ago

Price difference between 13900K and AMD Ryzen 9 7950x is not big - the latest 7950X3D is about on par with the higher clocked 13900KS as well.

1 more reply

hackerlight2y ago

Looking at the CPU market, competition did lead to lower prices: AMD are the best, but their CPUs are very reasonably priced because Intel is close behind.

adamsvystun2y ago

This is not a binary question. Two players, while not ideal, are better then just one.

evanjrowley2y ago

johngossman2y ago

rafaelmn2y ago

If the margins and demand is there Intel will eventually show up

wmf2y ago

Intel already showed up three or four times but their software is as bad as AMD's used to be.

ilc2y ago

Thankfully, software can be fixed over time as AMD has shown. Lack of another competitor can't be fixed as easily.

Havoc2y ago

Is either in doubt?

rafaelmn2y ago

rdsubhas2y ago

Demand will push AMD prices up by couple hundred bucks and Nvidia cards down by couple hundred bucks. A hobbyist customer will be neither better or worse.

stjohnswarts2y ago

wil4212y ago

Why would their investors allow anything else? I’m sure they see it as a huge loss like intel and mobile.

superkuh2y ago

>There is also a version of PyTorch that uses AMD ROCm, an open-source software stack for AMD GPU programming. Crossing the CUDA moat for AMD GPUs may be as easy as using PyTorch.

The only reliable interface in my experience has been via opencl.

htrp2y ago

has opencl actually improved enough to be competitive?

orangepurple2y ago

I thought ONNX is supposed to be the ultimate common denominator for machine learning model cross platform compatibility

1 more reply

zucker422y ago

Do you mean OpenCL using Rusticl or something else? And what DL framework, if any?

superkuh2y ago

I should clarify that I mean for human person uses. Not commercial or institutional. But, clBLAST via llama.cpp for LLM currently. Or far in the past just pure opencl for things with AMD cards.

65a2y ago

ROCm works fine on my 2016 Vega Frontier edition, for what it's worth.

the__alchemist2y ago

mark_l_watson2y ago

NVidia hardware/CUDA stack is great, but I also love to see competition from AMD, George Hotz’s Tiny Corp, etc.

Off topic, but I am also looking with great interest at Apple Silicon SOCs with large internal RAM. The internal bandwidth also keeps getting better which is important for running trained LLMs.

physicsguy2y ago

Don’t agree at all. PyTorch is one library - yes, it’s important that it supports AMD GPUs but it’s not enough.

weebull2y ago

The adoption of CUDA has been such a coop for Nvidia, it's going to take some time to dismantle it.

physicsguy2y ago

alecco2y ago

Regurgitated months-old content. blogspam

ris2y ago

fluxem2y ago

I call it the 90% problem. If AMD works for 90% of my projects, I would still buy NVIDIA, which works for 100%, even though I’m paying a premium

hot_gril2y ago

bornfreddy2y ago

Intel does have some advantages (and disadvantages too) over AMD, just not those.

anon2912y ago

hot_gril2y ago

It's never this simple. Their SIMD extensions differ for one, or at least did in the past.

1 more reply

Flameancer2y ago

What mainstream board company is intel only? Maybe a decade ago on AM3(+) but on AM5/AM5 I haven’t seen a main board partner not offer the same board SKU that works with Intel and AMD.

65a2y ago

As an owner of some Sapphire Rapids parts, let me just direct you to: https://edc.intel.com/content/www/us/en/design/products-and-...

hot_gril2y ago

To see errata tracked by Intel is a good sign.

hot_gril2y ago

Forgot to also mention iGPU and other on-chip accelerators being different and Intel usually having the edge there.

nologic012y ago

If the AI hype persists the CUDA moat will be less relevant in ~2 yrs.

Historically HPC was simply not sufficiently interesting (in commercial sense) for people to throw serious resources in the direction of making it a mass market capability.

NVIDIA first capitalized on the niche crypto industry (which faded) and was then well positioned to jump into the AI hype. The question is how much of the hype will become real business.

These actors will be the long term buyers of commercially relevant HPC and they will have quite a bit of influence on this market.

ddtaylor2y ago

It's worth noting that AMD also has a ROCm port of Tensorflow.

ginko2y ago

When I try to install rocm-ml-sdk on Arch linux it'll tell me the total installed size would be about 18GB.

tomsmeding2y ago

A regular TensorFlow installation, just the Python library, is an 184 MB wheel that unpacks to about 1.2 GB of stuff. I have no clue what mess goes in there, but it's a lot.

Still, if you're right that this package seems to take 18 GB disk size, something weird is going on.

slavik812y ago

Flameancer2y ago

He’s not wrong. I did a new arch install to try and get a 7800XT working with ROCm and PyTorch and was concussed on how I ran out of space but saw that ROCm was 18GB.

sharonzhou2y ago

atemerev2y ago

I don't understand why everyone neglects good, usable and performant lower-level APIs. ROCm is fast, low-level, but much much harder to use than CUDA, and the market seems to agree.

voz_2y ago

The amount of random wrong stuff about pytorch in this thread is pretty funny.

whywhywhywhy2y ago

Anyone who has to work in this ecosystem surely thinks this is a naive take

freedomben2y ago

For someone who doesn't work in this ecosystem, can you elaborate? What's the real situation currently?

whywhywhywhy2y ago

Geohot has documented his troubles trying to go all in on AMD and he's back on Nvidia now I believe.

benreesman2y ago

I know a lot of people don’t like George, I dislike plenty of people who are doing the right thing thing (including by some measures sama and siebel while they were pushing YC forward).

But not admitting the tinygrad project is the best Rebel Alliance on this is just a matter of letting vibe overcome results.

frnkng2y ago

As a former ETH miner I learned the hard way that saving a few bucks on hardware may not be worth operational issues.

I had a miner running with Nividia cards and a miner running with AMD cards. One of them had massive maintenance demand and the other did not. I will not state which brand was better imho.

Currently I estimate that running miners and running gpu servers has similar operational requirements and finally at scale similar financial considerations.

So, whatever is cheapest to operate in terms of time expenditure, hw cost, energy use,… will be used the most.

P.s.: I ran the mining operation not to earn money but mainly out of curiosity. And it was a small scale business powered by a pv system and a attached heat pump.

latchkey2y ago

I ran 150,000+ AMD cards for mining ETH. Once I fully automated all the vbios installs and individual card tuning, it ran beautifully. Took a lot of work to get there though!

Fact is that every single GPU chip is a snowflake. No two operate the same.

rottencupcakes2y ago

Have you ever written about this enterprise? This sounds super unique and I would be very interested in hearing about how it was run and how it turned out.

latchkey2y ago

I've put a bunch of comments here on HN about the stuff I can talk about.

It no longer exists after PoS.

1 more reply

pjmlp2y ago

Unless they get their act together regarding CUDA polyglot tooling, I seriously doubt it.

ElectronBadger2y ago

upbeat_general2y ago

This article doesn’t address the real challenge [in my mind].

bdowling2y ago

AMD is unlikely to do this, however, because it would commodify their own products under their competitor’s API.

A third party could do it though. It may make sense as an open source project.

blueboo2y ago

Research kernels mostly turn to ash upon publication anyway. The wheel turns and the next post-doc gives ROCm a try and we move on

hankman862y ago

falconroar2y ago

Is there any reason OpenCL is not the standard in implementations like PyTorch? Similar performance, open standard, runs everywhere - what's the downside?

LoganDark2y ago

IIRC, ease of implementation (for the GPU kernels), and cross-compatibility (the same bytecode can be loaded by multiple models of GPU).

ealloc2y ago

LoganDark2y ago

Never used it myself, these are just the main reasons I've heard from friends.

jacobgorm2y ago

LoganDark2y ago

> The ease of implementation using CUDA means that your code because effed for life

yes, yes it absolutely does. establishing market dominance as everyone wants to use CUDA but almost nobody wants to write their kernel twice.

JonChesterfield2y ago

tails4e2y ago

tormeh2y ago

For LLM inference, a shoutout to MLC LLM, which runs LLM models on basically any API that's widely available: https://github.com/mlc-ai/mlc-llm

einpoklum2y ago

TL;DR:

1. Since PyTorch has grown very popular, and there's an AMD backend for that, one can switch GPU vendors when doing Generative AI work.

2. Like NVIDIA's Grace+Hopper CPU-GPU combo, AMD is/will be offering "Instinct MI300A", which improves performance over having the GPU across a PCIe bus from a regular CPU.

ur-whale2y ago

> AMD May Get Across the CUDA Moat

I really wish they would, and properly, as in: fully open solution to match CUDA.

CUDA is a cancer on the industry.

mschuetz2y ago

ur-whale2y ago

> What's wrong with CUDA?

Read on

> it's proprietory

Yes indeed, proprietary

> Now I'm hooked

There you go.

> I wish there was an open alternative

So does the rest of the industry.

Specifically, it forces you to run your stuff on NVidia hardware and gives you exactly zero guarantee of future support.

Good luck trying to reproduce whatever research you are currently conducting in 10 years time.

Vendor lock-in + no forward compatibility guarantee = surefire recipe for getting milked to the bone by NVidia.

mschuetz2y ago

raggi2y ago

Can we just get wgsl compute good enough and over the line instead, and do away with these moats?

mschuetz2y ago

raggi2y ago

The language extensions feature is designed to provide these kinds of facilities is it not?

jeffreygoesto2y ago

I am hoping for SYCL and SPIR-V to gain traction...

jiggawatts2y ago

Can I buy an MI300 or even rent one in a cloud?

arcanus2y ago

Soon. The card is coming in Q4. The early shipments are likely all going to LLNL's El Capitan Exascale computer: https://www.tomshardware.com/news/amds-instinct-mi300-moves-...

spandextwins2y ago

That’s like saying Ford is gonna catch Tesla.

cantaloupe2y ago

Do you see that as an inevitability or an impossibility?

tpmx2y ago

No, not really. They have similar enough silicon, they "just" need some software to make it work.

Zetobal2y ago

They are just too late even if they catch up. Until they make a leap like they did with ryzen nothing will happen.

Havoc2y ago

>They are just too late even if they catch up.

Late certainly, too late I don't think so.

dylan6042y ago

>If you can field a competitively priced consumer card

AnthonyMouse2y ago

The thing that causes it to be competitively priced is having enough production capacity to prevent that from happening.

Havoc2y ago

>if this unicorn were to show up

A unicorn like that showed up a couple hours ago. Someone posted a guide for getting llama to run on a 7900xtx

https://old.reddit.com/r/LocalLLaMA/comments/170tghx/guide_i...

It's still slow and janky but this really isn't that far away.

I don't buy that AMD can't make this happen if they actually tried.

That'll cost AMD what 2m 3m? To move the needle on a multi billion market? That's the cheapest marketing you've ever seen.

>what's to say that all the non-consumers won't just scarf up these equally performant yet lower priced cards

2 more replies

j / k navigate · click thread line to collapse