> One of the really nice things about Julia for GPU programming is that you can write your own kernels. CUDA.jl isn't just C kernels.
Do y'all like understand that this isn't special to Julia and it isn't special to any language and it's 100% due to the fact that nvcc has an open-source counterpart in LLVM?
Like literally there are now dozens of languages that be used to author CUDA kernels because they all just target nvcc's flavor of ll. Eg you can do it in Python like 13 different ways - numba, taichi, etc all do it. You can even use the MLIR python bindings to directly emit nvgpu dialect and then lower to target specific llvmir.