Programming Language Memory Models (opens in new tab)

(research.swtch.com)

180 pointsthinxer4y ago97 comments

97 comments

A GPU followup to this article.

While on CPU sequentially consistent semantics are efficient to implement, that seems to be much less true on GPU. Thus, Vulkan completely eliminates sequential consistency and provides only acquire/release semantics[1].

It is extremely difficult to reason about programs using these advanced memory semantics. For example, there is a discussion about whether a spinlock implemented in terms of acquire and release can be reordered in a way to introduce deadlock (see reddit discussion linked from [2]). I was curious enough about this I tried to model it in CDSChecker, but did not get definitive results (the deadlock checker in that tool is enabled for mutexes provided by API, but not for mutexes built out of primitives). I'll also note that using AcqRel semantics is not provided by the Rust version of compare_exchange_weak (perhaps a nit on TFA's assertion that Rust adopts the C++ memory model wholesale), so if acquire to lock the spinlock is not adequate, it's likely it would need to go to SeqCst.

Thus, I find myself quite unsure whether this kind of spinlock would work on Vulkan or would be prone to deadlock. It's also possible it could be fixed by putting a release barrier before the lock loop.

We have some serious experts on HN, so hopefully someone who knows the answer can enlighten us - mixed in of course with all the confidently wrong assertions that inevitably pop up in discussions about memory model semantics.

[1]: https://www.khronos.org/blog/comparing-the-vulkan-spir-v-mem...

[2]: https://rigtorp.se/spinlock/

raphlinus4y ago

Also: it remains difficult to fully nail down the semantics of sequential consistency as well, especially when it's mixed with other memory semantics. Very likely next time Russ updates his article he should add a reference to Repairing Sequential Consistency in C/C++11[1].

[1]: https://plv.mpi-sws.org/scfix/full.pdf

rsc4y ago

Thanks for the GPU insights and links (and the paper link below)!

I based my claim about Rust from https://doc.rust-lang.org/nomicon/atomics.html. ("Rust pretty blatantly just inherits the memory model for atomics from C++20.") Perhaps that is out of date?

spinlocker4y ago

I believe your claim is correct: https://news.ycombinator.com/item?id=27758461.

rigtorp4y ago

There's even more discussion on the lock memory ordering on Stackoverflow: https://stackoverflow.com/questions/61299704/how-c-standard-...

Taking a lock only needs to be an acquire operation and a compiler barrier for other lock operations. Using seq_cst or acq_rel semantics is stronger than needed. From my reading and discussions with people from WG21 the current argument for why taking a lock only requires acq semantics is that a compiler optimization that transforms a non-deadlocking program into a potentially deadlocking program is not allowed. There's an interesting twitter thread where we discuss this I can't find anymore :(.

rsc4y ago

That is an amazing thread. The fact that C++ apparently allows optimizing

    #include <stdio.h>
    
    int stop = 1;
    
    void maybeStop() {
        if(stop)
            for(;;);
    }
    
    int main() {
        printf("hello, ");
        maybeStop();
        printf("world\n");
    }

into

    int main() {
        printf("hello, world\n");
    }

(as Clang does today) does not inspire confidence about disallowing moving the loop in the other example. If the compiler is allowed to assume that this loop terminates, why not the lock loop?

Maybe there is a reason, but none of this inspires confidence.

2 more replies

spinlocker4y ago

> I'll also note that using AcqRel semantics is not provided by the Rust version of compare_exchange_weak (perhaps a nit on TFA's assertion that Rust adopts the C++ memory model wholesale), so if acquire to lock the spinlock is not adequate, it's likely it would need to go to SeqCst.

Is this true? AcqRel seems to be accepted by the compiler for the success ordering of compare_exchange_weak.

raphlinus4y ago

https://doc.rust-lang.org/std/sync/atomic/struct.AtomicU32.h...

It's accepted by the compiler, but if provided, it compiles to a panic.