Long ago, I wrote adversaries that attempt to force _any_ algorithm that is extracting a partial or total order from data to approach their worst case.
These adversaries, like McIlroy's adversary, sits in the compare() function, but it asks itself "what answer can I give, consistent with the data so far, to force the algorithm to ask me the largest number of subsequent questions?"
For non-introspective quicksorts this will be O(n^2), but it should also bring out the worst constant factors in O(n\logn) worst case time algorithms.
I did some experiments: http://nicknash.me/2012/07/31/adversaries/
A proper reference is the Kahn's "Sorting and Entropy", IIRC.
I think it's a good rule of thumb to say "if you need to sort large quantities of untrusted data you should probably use mergesort".
Glibc is the only libc I'm aware of that implements mergesort. It still falls back to quicksort for large inputs though.
Also, nice little write-up, thanks. I also dig the extracted sort implementations you dug-out: https://github.com/matslina/qsort
Application level DDoS's leave the kernel/network/sockets layers exposed for exploitation. Since their tasks will take priority over user's applications.
So what's the benefit of that, just that you can maximize the chaos you cause by attacking in two different ways?
They all use introsort -- basically use quicksort until you have done some number of partitions, then switch to heapsorting the cells of the partition. This ensures fast quick-sort performance while guaranteeing O(n log n) worst case.
Now this is very interesting, but could also be prevented with a random number generator (which influences the selection of the pivot element). If this would have been exploited, people would have looked into mitigating this.
Edit: You can detect recursion-tree-degeneration (just check the depth) and react to it (pivot selection)
Additionally, quicksort is just as easy to implement in parallel as mergesort and runs about as fast since it can be performed in-place.
That reminds me :)
$ expr \( -2147483648 \) \* -1
-2147483648
$ expr \( -2147483648 \) / -1
Floating point exception: 8 (core dumped)
$ uname -srm
FreeBSD 8.4-RELEASE-p9 i386 $ expr \( -2147483648 \) \* -1
2147483648
$ expr \( -2147483648 \) / -1
2147483648
$ uname -srm
FreeBSD 10.0-RELEASE amd64
Or maybe it doesn't work on i386? $ echo | awk '{ print 2 ** 31 }'
2147483648
Make sure you are using /bin/expr and maybe give -9223372036854775808 a whirl too.edit: I'm an idiot, totally glossed-over the switch to heap sort logic.
You can see the fallback to heapsort here: http://referencesource.microsoft.com/#mscorlib/system/collec...
In other words, they use the worst-case n log(n) heapsort with an optimization to use the much faster quicksort for non-pathological inputs, which is almost always.
This is the main motivation behind SipHash, a new hash function that is designed to be cryptographically collision-resistant but fast enough to use in hash tables: https://131002.net/siphash/
Though you and the article both are perpetuating the lie that hash table operations are O(1). What magical hash function distributes n inputs into O(n) buckets in O(1) time? (Hint: distributing into O(n) buckets requires looking at O(log n) bits.)
Hash table operations are O(1) on average, assuming a uniform hash function, and assuming that the number of buckets is kept greater than the number of elements. More info here: http://en.wikipedia.org/wiki/Hash_table#Performance_analysis
Computing the hash function itself is not usually factored into this analysis, probably because it's an orthogonal concern (ie. hash functions and hash table implementations can be mixed and matched), and because computing the hash function for every operation is not strictly required (the hash of the key can be cached).
But even if you are factoring in computation of the hash function, this is a strange argument:
> Hint: distributing into O(n) buckets requires looking at O(log n) bits.
Computers can look at O(log n) bits in a single operation for any "n" that can reasonably be held in memory.
For example, if your hash table has less than 4B entries, "log n" is less than 32. Any 32 bit computer can "look at" 32 bits in a single operation. So I can't see why you think that "log n" needs to be a factor in the complexity analysis.
I suspect fluctuations in latency would make this exceedingly difficult, but I am constantly amazed by the statistical attacks people pull off.
This sort of thing might even work where the target comes to you for data (such as a search engine crawling you).
https://www.usenix.org/legacy/publications/library/proceedin...
[1] http://events.ccc.de/congress/2011/Fahrplan/attachments/2007...
http://code.google.com/p/awib/ "Awib is a brainfuck compiler written in brainfuck. "