The codes I worked on were complex graph analysis, spatiotemporal behavioral analysis, a bit of geospatial environmental modeling, and in prehistoric times thermal and mass transport modeling. These codes (pretty much anything involving reality) are intrinsically tightly coupled across compute nodes. Low-latency interconnects eventually gave way to latency-hiding software architectures but at no point did we use map-reduce as that would have been insanely inefficient given the sparsity and unpredictability of the interactions between nodes.
These were the prototype software architectures for later high-performance databases. Every core is handling thousands or millions of independent shards of the larger computational model, which makes latency-hiding particularly efficient.