The differences are very known, it's just still an open problem how to address running code for a stronger memory model on systems with a weaker one at as high as possible performance performance without explicit hardware support (like Apple's choice of a TSO config bit). Your compiled binary for TSO has erased any LoadLoad, LoadStore, and StoreStore barriers, and the emulator has to divine them. The heuristics there are still fraught with peril.
The JVM absolutely did some great work walking this path, both in defining a memory model in the first place, and supporting that model on weak and strong hardware memory models, but the JMM was specifically designed to be able to run cleanly on WMO platforms to begin with (early Sparc), so they don't face a lot of the same problems discussed here.