The book is clearly very unfinished still, but what exists is very good. Got my mind spinning on what a good statically typed language running on a VM could look like.
I have no experience with OTP but have read some books and did toy projects.
They do not. Supervisor trees are a way to manage failures at the level of one node.
> How would I tell the cluster that there should be N processes running, each on a different node?
It's not something that is builtin in OTP. There are libraries that solve it, like libcluster.
> And is there anything in OTP that would help me elect a leader or do I still have to implement that myself?
Not directly in OTP but there are libraries, for example for raft.
Ok so why is Erlang/Elixir/OTP good then? Well first it makes a single running application more robust to failures thanks to its supervision trees but it also allow to build distributed applications more easily. GenServers allow to build robust services very easily with common patterns. Local calls or remote calls to GenServers are the same, allowing to scale services. Message passing and pattern matching is part of the core of the language (no need for protobuf for example). Observability and introspection is excellent when a problem arise (inspecting processes, their memory, their message queues, the schedulers etc). Immutable datastructures and processes that do not share memory also make it easier to scale horizontally, at a cluster level. And probably lot of other good things I forgot :-).
What you say makes sense. I can see the benefit in message passing as a first class citizen so it allows extraction of some processes to a different node. But you still have to manage the process placements.
Erlang does have some tools that make it legitimately easier to write cluster-aware software, such as the message-passing operator not caring about what node the target is on, and transparently handling all network communication from serializing the Erlang term on one side and deserializing it on the other. Erlang's terms, the basic data types, are designed for exactly this use case, and as such a first-class concern, they are good at that particular use case.
However it's still easy to accidentally write some code that will only run on a single node and you still need to be aware of what you're doing to be sure you don't accidentally wire in a hard-coded dependency on being in a single OS process. But when you contrast that to the default state of most other programming languages, which is that they have essentially no concept of cluster communication at all, you can still see how this is an advantage over those other languages.
Erlang/Elixir is amazing because it brings so much good things and they are not that difficult to learn with some experience with functional programming.
There is https://github.com/rabbitmq/ra which is a Raft implementation in Erlang that is Jepsen-tested. You could use it to build "etcd in Erlang", or https://github.com/rabbitmq/khepri which is built on top of Ra.
global:register/3 may be helpful. I haven't used it, so no direct experience. I think you would need to provide the resolution function for when a cluster merges and the name is registered on both partitions, and the logic to register a potential leader if there is none.
From experience with other parts of global, you'll want to be careful and test what happens on your system if a thousand nodes across several locations all try to join/register at once. Especially if one or several of those nodes are running really slow because of hardware issues.
I think some of this might be covered in distributed OTP applications with takeover[1], but where I worked with Erlang, we certainly weren't applying OTP applications as the OTP team intended, I think as a result of most of the team, including all of early server engineers learning Erlang on the job.
[1] https://learnyousomeerlang.com/distributed-otp-applications
and an attempt to correct it by Hans Svensson: https://erlang.org/workshop/2005/NewLeaderElection.pdf
This project attempts to modernize it: https://github.com/lehoff/gen_leader
But from what I can tell, theres no standardized solution. There are quite a few libraries I can see out there, however.