Use a transaction manager (HTM or STM). The STM is going to boil down to lockfree synchronization, logging, and retry. Transactions can fail and retry.
But ultimately, all inter-thread communication boils down to programs (or libraries) using barriers for acquire/release semantics and/or using compare/swap and atomic read-modify-write.
> at any point of time there can be either one writer or multiple readers
Time is a slippery notion in concurrency. Most language-level memory models (e.g. Java) or even hardware-level concurrency models focus on happens-before relations, which are induced by executing certain operations. At the hardware level, you can basically think that a CPU receives asynchronous updates about cache lines from other processors (potentially out of order). While technically the cache coherence protocol pushes cache lines into processors, you can't ever guarantee that one "writer" can "push" updates into all other CPUs. Instead, you have to rely on those other CPUs executing either fences or being forced to execute fences through global OS means, such as IPIs. Those other CPUs executing fences (barriers) or other atomic operations induce happens-before relations.