There are format preserving encryption algorithms which act as a pseudo random permutation of integers with a variable upper bound. These slower than Fisher-Yates for in memory data, but for on disk data the random access overhead should exceed their cost. The advantage of this approach is minimal memory use and no need to modify the data.
The downside is that it still needs one random read access per element, so cache friendly hierarchical algorithms, like the one described in the post, are probably still faster for on disk data.