My first thought would be to split the list of numbers into a prefix and a suffix part and building two tries connected at the leaves[1][2], replacing the trie used in the article. Then we sort both tries using the Kirkpatrick-Reisch method (but in reverse order for the suffix trie so that the final result is sorted correctly), and finally we would have to reconnect the two while walking the tries.
[0] https://journals.plos.org/plosone/article?id=10.1371/journal...
[1] more or less, the MT in the linked paper works a bit differently but also has a different use-case in mind.
[2] Also I have no idea if it make sense to have two depth 2 tries, or if there is another algorithm out there with two depth 1 tries that _kind_ of looks like this algorithm
- split the numbers into a top and bottom half (from now on: prefix and suffix) (linear time)
- make an unordered suffix trie (linear time). First level has suffixes as, second level has prefixes
- make a (recursively sorted) ordered prefix set, and a (recursively sorted) ordered suffix set
- initiate an ordered prefix trie, but only the first level for now - that is, don't insert suffixes yet (linear time over the ordered prefix set)
- in order of the ordered suffix set, walk over the suffix trie and for each prefix leaf insert the parent suffix into the appropriate prefix bucket in the prefix trie (linear time)
- now we can walk the prefix trie in order and combine prefix and suffix again (like in the article)
This feels like it should have comparable computational complexity - as far as I can see the only real difference is that it recursively sorts twice as often (once for the prefix set and once for the suffix set). Either way it still seems to have horrible memory overhead, requiring a trie for each level of recursion and all that.
Then I realized that if we are at the base case where prefix/suffix can be sorted with a counting sort, then the above can actually be simplified to LSB radix sort where we sort the suffixes into a temporary secondary array, and the prefixes from the secondary array into the original array (I think we can safely say that using a plain array of n elements has both lower memory overhead and better computational performance than a trie with n leaves). But... couldn't I then optimize the entire recursion into an LSB radix sort? Which would imply it must have... worse time complexity than Kirkpatrick-Reisch sorting? Wait what? Where did I go wrong then?
My inclination is that this would be slower than "standard" high perf radix sorting, but I'm not sure if the high level overview of this algorithm represents an equivalent level of implementation.
Wouldn't this decrease again for large enough n, and even go negative after n=2^(w * 2)?
O(n+max(0,log(w/log n)))Benchmarks?
That said, word size w is, in almost all integer dieting problems, bounded by 128 (by 64 or even 32 with high probability) which makes it acceptable to regard as “constant” in which case both sorts are essentially O(n) and it all depends on specific implementations (with radix sort likely significantly faster in practice)
1. It scans in linear order, so if you tune your radix size to L1/L2 cache it will happily beat other "faster" algorithms thanks to the prefetcher.
2. If preserves ordering for keys with the same value.
#2 makes is a really good depth-sorting algorithm for alpha rendering, and #1 just makes it darn fast. There's a nice floating point implementation out there for it as well.
When it applies, there are essentially no faster algorithms - it’s O(n) if the word size is constant (it often is), which cannot be beat asymptotically. kr sort is only asymptotically better if word size is considered variable.
It’s irrelevant if you have no radix to sort on - comparison sort is provably at least O(n log n) which is slower.
That's called a stable sort, and it's a standard property present in many sorting algorithms.