If possible the GPU, but not all GPUs have either a library or enough documentation to write one. I’ve seen complaints about this issue on mobile GPUs for years, no idea how widespread it is now.
BTW, this is just one example algorithm that I picked because it does (on the cpu) what the person I replied to said was rare.