I’m not familiar with that paper but it would probably be best to compare speeds with an unoptimized transformer decoder. The Vaswani paper came out 8 years ago so implementations will be pretty highly optimized at this point.
On the other hand if there was a theoretical reason why text diffusion models could never be faster than autoregressive transformers it would be notable.