In addition to energy, power use is another reason; parking a core will allow it to cool down thermally, so that when it is put back in use (milli)seconds later, it can run at a higher clock speed for longer.
That's not extremely low power compared to real low power states. The main advantage of PAUSE is the scheduling of the other hyperthread (if it exists) and maybe not generating a gratuitous L1 / MESI workload at a crazy rate (well if programmed correctly that should be quite cheap in lots of cases, but still...). To my knowledge this does not cut any clock, so the power economy is going to be minimal.
IIRC the mov imm, %ecx; rep nop sequences are somewhat special cased by modern architectures (and this fact is the only reason why you even would want to execute such code). On the other hand the energy savings are mostly negligible and it is simply an SMT-level equivalent of sched_yield()
Actually I heard that the last few generations of intel (from skylake) enter power state mode more aggressively with pause and the latency of getting out of a pause went up from tens of cycles to hundreds. No first hand testing though.