But the cost is not the syscalls . . . it's all those threads getting stopped, swapping memory in and out of the cache, etc. The syscall cost is minor by comparison.
At any rate, you've definitely overstated the case against mutexes in this post. At most, the advice should have been "avoid designs that will suffer from a lot of contention" like locking with lots of shared mutable state. It really has nothing to do with threads-vs-processes as you've made it out to do.