I'd be interested to know what you are thinking.
The primary exotic thing I can imagine is an architecture lacking the ability to do atomic operations. But even in that case, C11 has atomic operations [1] built in. So worst case, the C library for the target architecture would likely boil down to mutex operations.
Yes. Also, almost every platform I know that supports multi threading and atomics doesn’t support atomics between /all/ possible masters. Consider a microcontroller with, say, two Arm cores (multithreaded, atomic-supporting) and a DMA engine.
You can't create userspace locks which is a bummer, but the OS has the capability of enforcing locks. That's basically how early locking worked.
The main thing needed to make a correct lock is interrupt protection. Something every OS has.
To go fast, you need atomic operations. It especially becomes important if you are dealing with multiple cores. However, for a single core system atomics aren't needed for the OS to create locks.
// Not real MIPS, just what I've gleaned from a brief look at some docs
LOAD addr, register
ADD 1, register
STORE register, addr
The LOAD and STORE are atomic, but the `ADD` happens out of band.That's a problem if any sort of interrupt happens (if you are multi-threading then a possibility). If it happens at the load, then a separate thread can update "addr" which mean the later STORE will stomp on what's there.
x86 and ARM can do
ADD 1, addr
as well as other instructions like "compare and swap" LOAD addr, register
MOV register, register2
ADD 1, register2
COMPARE_AND_SWAP addr, register, register2
if (cas_failed) { try again } loop: LL r2, (r1)
ADD r3, r2, 1
SC r3, (r1)
BEQ r3, 0, loop
NOP