The point I was glibly trying to get across was that even a small effort on the part of AMD to treat the SW side as seriously as NVidia does would have yielded great benefits, and not have left them so far behind.
Also, there is a lot of work going on in the gcc & llvm toolchain to not only use OpenMP to target accelerators in computationally intensive loops but, in the case of llvm, to also target tensor instructions for more efficient code generation (https://lists.llvm.org/pipermail/llvm-dev/2021-November/1537...).
It took the AI folk less than 18 months to almost completely move away from CUDA to Tensorflow and then PyTorch... LLVM, imho, is going to do the same for Sci/Eng and general code bases in the next 2 years.