1. Cache performance and locality may differ considerably for large applications. For example, a microbenchmark may perform well because it can monopolize the L1 cache in a way that is not possible for larger applications.
2. Aggressive inlining, a common factor in microbenchmark performance, may not scale well for large code bases. The tradeoffs here can be rather complex: https://webkit.org/blog/2826/unusual-speed-boost-size-matter...
3. Microbenchmarks often avoid abstraction in order to achieve high performance. This can create unacceptable software engineering costs.