On the contrary, the idea that performance of microkernels is "horrible" is the current received wisdom, believed by the majority of programmers without critical examination, and based on very performance-poor early microkernel designs like Mach.
The truth is that modern microkernel designs like L4 can perform IPC over 20 times faster than Mach. Another important advance for microkernels are tagged TLBs which can alleviate the TLB misses that are usually incurred on every context switch.
Someday, hopefully not too far in the future, someone is going to write a modern, practical, and free microkernel-based OS that implements all of POSIX with performance the rivals Linux, but that offers a level of isolation and modularity that Linux could never offer. When that happens, a lot of people are going to scratch their heads and wonder why they believed the people who told them that microkernels are "fundamentally" slow.
I had hoped that HelenOS would become this (http://www.helenos.org/), but its lowest-layer ipc abstraction is async-based, which I think is a mistake. One of L4's innovations was to use sync IPC, which offers the highest possible performance and gets the kernel out of the business of queueing. You can always build async ipc on top of sync without loss of performance; this puts the queues in user-space which is where they belong (they're easier to account for this way).
I asked the HelenOS people why they decided to go this way, and this is their response (one of their points was "the L4's focus on performance is not that important these days", which I disagree with). http://www.mail-archive.com/helenos-devel@lists.modry.cz/msg...