I've had real world applications where the full program's performance increased by a factor of 10 when switching to khash over std::unordered_map. I've seen papers which have them performing rather comparably, but whenever I've compared the two, khash has always soared over std::unordered_map. I use std::unordered_map for typical, average use cases (where speed doesn't matter as much and I'm feeling lazy), but khash for inner loops/core functionality.
I wonder if part of the reason is that, by nature of the standard, std::unordered_map has to resolve its collisions by linear chaining, while khash is able to use quadratic probing.