The same algorithm can put on GPU, it works by approximating non-linear activation of a layer by storing which neuron get activated for a certain activation and then only calculate those neurons for similar input. This costs a lot of memory.
This involves traversing a hash table but that is possible efficiently on GPU.
This entire paper is entire publication is a red herring for the reason you gave.
If you (or anybody else) happens to be a graphics programmer with experience of implementing the bounding-box tree traversal common in CUDA implementations of ray tracing please get in touch with me as there as a good chance an rusty old 1080 can defeat most expensive xeon on the market since it has more memory bandwidth than Intels part. If you ever wanted to piss on Intels leg with a multiple week-end project please get in touch. email in profile. There is a slim chance that this will actually help deep learning.