Faster than radix sort: Kirkpatrick-Reisch sorting (opens in new tab)

(sortingsearching.com)

193 pointsmilo_im5y ago45 comments

45 comments

So since this sorting algorithm involves a trie, would there another optimization possibility by using a data structure inspired by the MergedTrie?

My first thought would be to split the list of numbers into a prefix and a suffix part and building two tries connected at the leaves[1][2], replacing the trie used in the article. Then we sort both tries using the Kirkpatrick-Reisch method (but in reverse order for the suffix trie so that the final result is sorted correctly), and finally we would have to reconnect the two while walking the tries.

[0] https://journals.plos.org/plosone/article?id=10.1371/journal...

[1] more or less, the MT in the linked paper works a bit differently but also has a different use-case in mind.

[2] Also I have no idea if it make sense to have two depth 2 tries, or if there is another algorithm out there with two depth 1 tries that _kind_ of looks like this algorithm

vanderZwan5y ago

So I tried working this out on paper. The simplest variation I could think of:

- split the numbers into a top and bottom half (from now on: prefix and suffix) (linear time)

- make an unordered suffix trie (linear time). First level has suffixes as, second level has prefixes

- make a (recursively sorted) ordered prefix set, and a (recursively sorted) ordered suffix set

- initiate an ordered prefix trie, but only the first level for now - that is, don't insert suffixes yet (linear time over the ordered prefix set)

- in order of the ordered suffix set, walk over the suffix trie and for each prefix leaf insert the parent suffix into the appropriate prefix bucket in the prefix trie (linear time)

- now we can walk the prefix trie in order and combine prefix and suffix again (like in the article)

This feels like it should have comparable computational complexity - as far as I can see the only real difference is that it recursively sorts twice as often (once for the prefix set and once for the suffix set). Either way it still seems to have horrible memory overhead, requiring a trie for each level of recursion and all that.

Then I realized that if we are at the base case where prefix/suffix can be sorted with a counting sort, then the above can actually be simplified to LSB radix sort where we sort the suffixes into a temporary secondary array, and the prefixes from the secondary array into the original array (I think we can safely say that using a plain array of n elements has both lower memory overhead and better computational performance than a trie with n leaves). But... couldn't I then optimize the entire recursion into an LSB radix sort? Which would imply it must have... worse time complexity than Kirkpatrick-Reisch sorting? Wait what? Where did I go wrong then?

olliej5y ago

I suspect memory indirection would clobber the theoretical perf, but I'd be happy to be proved wrong.

My inclination is that this would be slower than "standard" high perf radix sorting, but I'm not sure if the high level overview of this algorithm represents an equivalent level of implementation.

oxxoxoxooo5y ago

If you are into integer sorting, this might be of interest as well:

https://yourbasic.org/algorithms/fastest-sorting-algorithm/

https://sorting.cr.yp.to/

nathell5y ago

Written by Tomek Czajka, a 3x TopCoder winner and algorithmic mastermind. Worth following!

mirekrusin5y ago

Remember him at high school programming olympiads, top place year after year (also on math olympiads and likely other competitions I'd have to recall), everybody admired him.

nXqd5y ago

And he's a spaceX engineer

1wd5y ago

O(n+n * log(w/log(n)) )

Wouldn't this decrease again for large enough n, and even go negative after n=2^(w * 2)?

dan-robertson5y ago

The algorithm is to switch to a counting sort when w <= log n, ie n >= 2^w, so more properly the complexity is written:

  O(n+max(0,log(w/log n)))

karpierz5y ago

The recursion assumes that log(n) > w; if log(n) <= w, then you're in the base case and it's O(n).

1 more reply

rocqua5y ago

Not sure whether it applies here, but _if_ n is the number of unique values, you are limited here by the fact that there are only 2^w unique integers. Hence n < 2^w

ben-schaaf5y ago

Not sure what's going on here, but that does indeed seem to be the case: https://www.wolframalpha.com/input/?i=x%2Bx+*+log%282%2Flog%...

cwzwarich5y ago

> Faster

Benchmarks?

beagle35y ago

In the big-Oh algorithmic complexity sense; in a loose sense, for any pair of implementations (radix sort, kr sort) there exists a word size w and a list size n such that If either w or n increases, the time for radix sort would increase more quickly than for Kr sort - and this, eventually kr would be faster and keep getting faster. (Assuming that the hash can indeed yield average O(1) access, which is probabilistically but not deterministically true)

That said, word size w is, in almost all integer dieting problems, bounded by 128 (by 64 or even 32 with high probability) which makes it acceptable to regard as “constant” in which case both sorts are essentially O(n) and it all depends on specific implementations (with radix sort likely significantly faster in practice)

adrianN5y ago

Big-O as commonly used in the CS literature sometimes doesn't translate to Big-O on actual computers. For example virtual memory translation can add a log term where you wouldn't expect it: https://pdfs.semanticscholar.org/1e90/c55362cf7793dc0b2521f6...

4 more replies

vvanders5y ago

Yeah, would be curious as well. There's two really awesome things about radix sort:

1. It scans in linear order, so if you tune your radix size to L1/L2 cache it will happily beat other "faster" algorithms thanks to the prefetcher.

2. If preserves ordering for keys with the same value.

#2 makes is a really good depth-sorting algorithm for alpha rendering, and #1 just makes it darn fast. There's a nice floating point implementation out there for it as well.

corysama5y ago

vvanders, I believe you’ve worked in games so you might already know about how the PlayStation 1 kindof had radix sort baked into the hardware. The hardware had no Z buffer, so all polygons had to be ordered back-to-front using the Painter’s Algorithm for visibility. The hardware understood a linked list of polygons; as odd as that sounds. And, the standard practice presented by the API was to have a pre-allocated linear array of NOP list nodes forming a radix as the starting point for inserting sorted polys.

beagle35y ago

> it will happily beat other "faster" algorithms

When it applies, there are essentially no faster algorithms - it’s O(n) if the word size is constant (it often is), which cannot be beat asymptotically. kr sort is only asymptotically better if word size is considered variable.

It’s irrelevant if you have no radix to sort on - comparison sort is provably at least O(n log n) which is slower.

moonchild5y ago

> It preserves ordering for keys with the same value

That's called a stable sort, and it's a standard property present in many sorting algorithms.

zem5y ago

faster in the algorithmic rather than the performance sense

rumanator5y ago

That's not what "faster" means. Computational complexity means expected asymptotic behaviour followig certain assumptions. More often than not don't happen in the real world, or don't take in consideration real-world properties such as tiered cache and the impact of cache misses.

3 more replies

xiaodai5y ago

I might be missing something but radix sort I can sort a 64 bit vector 11 bits at a time.

j / k navigate · click thread line to collapse

45 comments

vanderZwan5y ago

So since this sorting algorithm involves a trie, would there another optimization possibility by using a data structure inspired by the MergedTrie?

[0] https://journals.plos.org/plosone/article?id=10.1371/journal...

[1] more or less, the MT in the linked paper works a bit differently but also has a different use-case in mind.

[2] Also I have no idea if it make sense to have two depth 2 tries, or if there is another algorithm out there with two depth 1 tries that _kind_ of looks like this algorithm

vanderZwan5y ago

So I tried working this out on paper. The simplest variation I could think of:

- split the numbers into a top and bottom half (from now on: prefix and suffix) (linear time)

- make an unordered suffix trie (linear time). First level has suffixes as, second level has prefixes

- make a (recursively sorted) ordered prefix set, and a (recursively sorted) ordered suffix set

- initiate an ordered prefix trie, but only the first level for now - that is, don't insert suffixes yet (linear time over the ordered prefix set)

- in order of the ordered suffix set, walk over the suffix trie and for each prefix leaf insert the parent suffix into the appropriate prefix bucket in the prefix trie (linear time)

- now we can walk the prefix trie in order and combine prefix and suffix again (like in the article)

olliej5y ago

I suspect memory indirection would clobber the theoretical perf, but I'd be happy to be proved wrong.

My inclination is that this would be slower than "standard" high perf radix sorting, but I'm not sure if the high level overview of this algorithm represents an equivalent level of implementation.

oxxoxoxooo5y ago

If you are into integer sorting, this might be of interest as well:

https://yourbasic.org/algorithms/fastest-sorting-algorithm/

https://sorting.cr.yp.to/

nathell5y ago

Written by Tomek Czajka, a 3x TopCoder winner and algorithmic mastermind. Worth following!

mirekrusin5y ago

Remember him at high school programming olympiads, top place year after year (also on math olympiads and likely other competitions I'd have to recall), everybody admired him.

nXqd5y ago

And he's a spaceX engineer

1wd5y ago

O(n+n * log(w/log(n)) )

Wouldn't this decrease again for large enough n, and even go negative after n=2^(w * 2)?

dan-robertson5y ago

The algorithm is to switch to a counting sort when w <= log n, ie n >= 2^w, so more properly the complexity is written:

  O(n+max(0,log(w/log n)))

karpierz5y ago

The recursion assumes that log(n) > w; if log(n) <= w, then you're in the base case and it's O(n).

1 more reply

rocqua5y ago

Not sure whether it applies here, but _if_ n is the number of unique values, you are limited here by the fact that there are only 2^w unique integers. Hence n < 2^w

ben-schaaf5y ago

Not sure what's going on here, but that does indeed seem to be the case: https://www.wolframalpha.com/input/?i=x%2Bx+*+log%282%2Flog%...

cwzwarich5y ago

> Faster

Benchmarks?

beagle35y ago

adrianN5y ago

4 more replies

vvanders5y ago

Yeah, would be curious as well. There's two really awesome things about radix sort:

1. It scans in linear order, so if you tune your radix size to L1/L2 cache it will happily beat other "faster" algorithms thanks to the prefetcher.

2. If preserves ordering for keys with the same value.

#2 makes is a really good depth-sorting algorithm for alpha rendering, and #1 just makes it darn fast. There's a nice floating point implementation out there for it as well.

corysama5y ago

beagle35y ago

> it will happily beat other "faster" algorithms

It’s irrelevant if you have no radix to sort on - comparison sort is provably at least O(n log n) which is slower.

moonchild5y ago

> It preserves ordering for keys with the same value

That's called a stable sort, and it's a standard property present in many sorting algorithms.

zem5y ago

faster in the algorithmic rather than the performance sense

rumanator5y ago

3 more replies

xiaodai5y ago

I might be missing something but radix sort I can sort a 64 bit vector 11 bits at a time.

j / k navigate · click thread line to collapse