In terms of out-optimizing a modern compiler, it really depends on what you're optimizing for. For performance, a modern compiler is going to destroy you 99% of the time -- I've been writing x86 assembly for many years and in the vast majority of cases, I can't write faster code than MSVC. However, if code
size is what's important, it's easy to beat any compiler. You can often get your code down an order of magnitudes, sometimes even two. Sizecoding has become my passion in recent years, and it's amazing how much you can do that the compiler can't if you really care; for instance, this is a bootsector I wrote that reads a kernel from NTFS:
http://demoseen.com/ColorblindBoot.s.txt As far as I know, this is the only NTFS reader that actually fits into the boot sector (510 bytes, of which it uses every one), which is kinda neat.