edit: AFAIK, seq_cst ordering (as opposed to acq_rel) is only relevant when you have more than two threads and you care about things like IRIW. In this case acquires and releases are not enough to capture the full set of constraints, although at the hardware level it is still everything local.
edit2: I guess the missing bit is that beyond the hardware fences you have the hardware cache coherency protocol that makes sure that a total order of operations always exist once load and stores reach the coherence fabric.