I'm working on a project ATM at one of the DoE sites you're likely referring to... Maybe we'll bump into each other!
Ah yes, pytorch:
1) Check issues, PRs, etc on torch Github. Considering market share ROCm has a multiple of the number of open and closed issues. There is still much work to be done for things as basic as overall stability.
2) torch is the bare minimum. Consider flash attention. On CUDA just runs of course with sliding window attention, ALiBi, and PagedAttention. ROCm fork? Nope. Then check out the xFormers situation on ROCm. Prepare to spend your time messing around with ROCm, spelunking GH issues/PRs/blogs, etc and going one by one through frameworks and libraries instead of `pip install` and actually doing your work.
3) Repeat for hundreds of libraries, frameworks, etc depending on your specific use case(s).
Then, once you have a model and need to serve it up for inference so your users can actually make use of it and you can get paid? With CUDA you can choose between torchserve, HF TEI/TGI, Nvidia Triton Inference Server, vLLM, and a number of others. vLLM has what I would call (at best) "early" support that requires patches to ROCm, isn't feature complete, and regularly has commits to fix yet another show-stopping bug/crash/performance regression/whatever.
Torch support is a good start but it's just that - a start.