[1]: https://jobs.amd.com/job/Calgary-GPU-Libraries-Software-Deve... [2]: https://github.com/ROCmSoftwarePlatform/rocFFT
Fluid flow, heat transfer, and other such physical phenomena that you might want to simulate.
Phase correlation in image processing is another example. (https://en.wikipedia.org/wiki/Phase_correlation)
MD simulations rely on FFT but I'm not sure how much is typically (or can be) done on the GPU. For example, NAMD employs cuFFT on the GPU in some cases. (https://aip.scitation.org/doi/10.1063/5.0014475)
In general, Vulkan is a thing which commands the GPU, but is not opinionated on what the language used to represent the kernel is as long as it compiles to SPIR-V. SPIR-V in itself is like parallel LLVM IR. If you look into the project source, the shaders are in GLSL which have been pre-compiled using a cross-compiler into SPIR-V. The C file you find on the project root constitutes as the loader program for the SPIR-V files.
Futhark project did some initial benchmarks on translating OpenCL to Vulkan. The results were mainly slowdowns. You can read about it in here: https://futhark-lang.org/student-projects/steffen-msc-projec...
https://home.otoy.com/octane2020-rndr-released/
"OTOY | GTC 2020: Real-Time Raytracing, Holographic Displays, Light Field Media and RNDR Network"
There are no error bars on the graphs, so it's very hard to judge if the minor differences are significant. I work in research, so probably I'm peculiar about this point, but: I'd expect better from anyone who's taken basic statistics. But from a quick look, it seems like the performance is pretty much just "on par".
It would also be nice to know how performance is on other hardware. I'm assuming it's tuned to nvidida GPUs (or maybe even the specific GPU mentioned). But how does this perform on Intel or AMD hardware? How does it compare to `rocFFT` or Intel's own implementation?
I have tested VkFFT on Intel UHD620 GPU and the performance scaled on the same rate as most benchmarks do. There are a couple of parameters that can be modified for different GPUs (like the amount of memory coalesced, which is 32bits on Nvidia GPUs after Pascal and is 64bits for Intel). I have no access to an AMD machine, otherwise I would have refined the lauch configuration parameters for it too. I have not tested other libraries than cuFFT yet.
Also: I should've said this in my first post already, which in hindsight might sound too negative: I think this is a cool project and you did a great job! I just thought this might improve the presentation of your results a bit.
Personally, I would have a hard time hiring anyone without a Github account and less so working in a place where nobody has one.
Seems a bit more feature complete than my take on the problem: https://github.com/Lichtso/VulkanFFT
Still, to beat CUDA with Vulkan a lot is still missing: Scan, Reduce, Sort, Aggregate, Partition, Select, Binning, etc.
It is a library.
What about bigger than big? > 2^29 or so ? Are these sizes for double precision ?
A Free GPUFFT implementation will certainly help! Great work.
If your entire stack lived in the GPU, and you're just reading out the result, this is trivial.
If you're constantly copying buffers back and forth because some effects are implemented in the CPU and some in the GPU, not so much!
It's probably the case that a full stack GPU implementation would blow what we have out of the water, but you'd lose your entire ecosystem in the process, so it's probably never going to happen.
But even if that is not the case, machine learning is making its way into music production tools more and more. No doubt a beefy GPU will be useful to a lot of music production professionals in the future at least, as the tools they are using begin to leverage ML more and more.
The time budget to refresh a video frame is 8ms on 120HZ if everything else came free. In practice closer to <4ms. So even looking at the close to worst conditions, that's about the delay of the sound traveling a meter - should be fine for a lot of real life applications.