I would push back on the "shared state with locks vs isolated state with message passing" framing. Both approaches model concurrency as execution that needs coordination. Switching from locks to mailboxes changes the syntax of failure, not the structure. A mailbox is still a shared mutable queue between sender and receiver, and actors still deadlock through circular messages.
I've rarely seen naked sends/receives in Erlang, you mostly go through OTP behaviors. And if you happen to use them and get stuck (without "after" clause), the fact you can just attach a console to a running system and inspect processes' states makes it much easier to debug.
In practice you'll likely push stuff through Oban, Phoenix PubSub or some other convenience library that gives you some opinions and best practices. It really lowers the bar for building concurrent systems.
That process has as local state the database connection and receive messages that are translated to SQL queries, here 2 scenarios are possible:
1) The query is invalid (you are trying to inert a row with a missing foreign key, or wrong data type). In that case, you send the error back to the caller.
2) There is a network problem between your application and the database (might be temporary).
You just let the process crash (local state is lost), the supervisor restarts it, the restarted process tries to connect back to the database (new local state). If it still fails it will crash again and the supervisor might decide to notify other parts of the application of the problem. If the network issue was temporary, the restart succeeds.
Before crashing, you notified the caller that there was a problem and he should retry.
Now, for the caller. You could start a transient process in a dynamic supervisor for every query. That would handle the retry mechanism. The "querier process" would quit only on success and send the result back as a message. When receiving an error, it would crash and then be restarted by the supervisor for the retry.
There are plenty of other solutions, and in Elixir you have "ecto" that handles all of this for you. "ecto" is not an ORM, but rather a data-mapper: https://github.com/elixir-ecto/ecto
OTP includes mnesia, which is a distributed, optionally transactional database (for mostly key-values); it's not the easiest thing to use, but it's there. You can also connect out to an external database, there's no requirement to stay within BEAM.
If you want database changes to be persisted to disk, you have to persist them. If you want to wait to show success until the changes have persisted, you have to wait. I don't see how the runtime you use changes that, so I'm not really sure I understand your question? You don't generally persist the process mailboxes; if a process or node crashes, its mailbox is lost.
In a distributed system you rapidly run into two generals questions, which are always challenging to address. If I send you a message, and I receive a reply, I know you received it. But if I send you a message and don't receive a reply, I don't know what happened; maybe you never got it, maybe you received it and crashed, maybe you replied but I never got it, maybe you replied but I crashed or timed out and moved on. Again, that's the case regardless of runtime. It's hard to find systems with 100% uptime on all individual parts, so you have to set a reasonable timeout on communication, and you have to deal with picking up the pieces when that happens.
> I assume these problems are solved, but the article doesn't demonstrate the solutions.
There isn't really a general solution to the systems are hard problem. You have to pick what's appropriate for your system, and many systems will need different solutions for different parts. As an example from my time at WhatsApp: the table indicating which process held the tcp chat connection for a user was never persisted to disk; otoh (towards the end of my time) text messages would not be acknowledged to the client until they were either acknowledged by the destination client or in memory or on disk on multiple servers; the receiving client was responsible to deduplicate messages in cases where the sender did not receive an ack and resent or when one of the redundant servers was offline when the message was delivered and it delivered it again later. Many things less critical than messages were acknowledged when accepted, without waiting for confirmed persistance. Many user actions would not be automatically retried on a timeout or other failure --- letting the user decide what to do.
I guess maybe the question is why use BEAM if it also doesn't solve the general systems are problem? IMHO, the reason to use BEAM is because it helps you structure your system around easy to reason about parts. You've got to do some work to get messages into the right mailboxes, but the process working on a mailbox usually reads a message, does the work for the message, sends a reply and then gets to the next message in its mailbox. Each individual process can be simple and self-contained. Explicit locking can (hopefully) be avoided by ensuring only a single process is responsible for some piece of state, and that accessing that state is done by sending the responsible process a message. BEAM takes care of locking around the mailbox, but you don't need to worry about it.
Take this example from the article:
def handle_call({:process, order}, _from, state) do
customer = Customers.fetch!(order.customer_id)
charge = PaymentGateway.charge!(customer, order.total)
Notifications.send_confirmation!(customer, charge)
{:reply, :ok, state}
end
I'd assume we want PaymentGateway to commit to a DB. But there's no transactionality with notifications, hence notifications can be lost if the entire runtime goes down. For an article trying to "sell" BEAM to me, I just don't see the value.> I guess maybe the question is why use BEAM if it also doesn't solve the general systems are problem?
I interpreted the tone of the article to mean it does solve all these problems. Resulting in my general confusion as to the actual advantages. I think this whole actor business somewhat reminds me of the Smalltalk people saying it's all about message passing, but I just don't understand what's the difference between passing a message to and object, and doing obj.function(message). At least for BEAM the whole supervisor tree seems neat, but other than that, it sounds like go routines with channels, or just a queue in python.
Sorry but this is wrong. This is no kind of backpressure as any experienced erlang developer will tell you: properly doing backpressure is a massive pain in erlang. By default your system is almost guaranteed to break in random places under pressure that you are surprised by.
You're right in correcting the article, but I'd like to add that for probably around a decade, Erlang had 'sender punishment', which is what 'IsTom' who replied to you is probably talking about.
Ulf Wiger referred to sender_punishment as "a form of backpressure" (Erlang-questions mailing list, January 2011). 'sender punishment' was removed around 2018, in ad72a944c/OTP14667. I haven't read the whole discussion carefully, but it seems to be roughly "it wasn't clear that sender punishment solved more problems than it caused, and now that most machines are multi-core, that balance is tipped even more in favour of not having 'sender punishment'".
Is that sufficient and/or desirable backpressure, and does it provide everything your app needs? Maybe close enough for some applications?
You can also do some brute force backpressure stuff now; you can set a max heap size of a process and if it uses an on-heap message queue, it should be killed if the queue gets too large. Not very graceful, but create some back pressure.
I'm a fan of letting back pressure accrue by having clients timeout, and having servers drop requests that arrive too late to be serviced within the timeout, but you've got to couple that with effective monitoring and operations. Sometimes you do have to switch to a quick response to tell the client to try again later or other approaches.
So many threads I wanna jump in to, interesting discussions.
what would it look like if you didn't need concurrency at all - would simply having a step by step process enough e.g using DAGs
what would it look like if by not letting it crash - you can simply redo the process like a Traditional RDBMS does i.e ACID
they're domains where OTP / BEAM are useful - but for majority of business cases NO
"letting it crash" in BEAM terms often means "simply redo the process". The difference is you end up defining your "transaction" (to borrow database terminology) by concurrency lines. What makes it so pleasant in practice is that you take a bunch of potential failure modes and lump them into a single, unified "this task cannot be completed" failure mode, which includes ~impossible to anticipate failure states, and then only have to expressly deal with the failure modes that do have meaningful resolutions within a task.
With that understanding in mind, I'd argue that nearly all business cases benefit from the BEAM. It's mostly one-off scripts and throwaway tools that don't.
What business systems don't use concurrency in some form? I can only think of the simplest data processing tasks written for batch processing. But even every embedded system I've ever developed or worked on used concurrency. Though for older systems this was often hand rolled, and as error prone as you might expect. For newer systems (developed this century), it was often done using a task system baked into the embedded RTOS.
This fear of better languages being some massive hurdle is either unfounded, or the big tech companies paying top dollar for talent aren't getting their money's worth.
Occam added types to channels and distinguished sends/receives, which is the design also inherited by Go.
In principle you can emulate a mailbox/message queue in CSP by a sequence of processes, one per queue slot, but accounting for BEAM's weak-ish ordering guarantees might be complicated (I suppose you should allow queue slots to swap messages under specific conditions).
In other words, it is exactly a database, albeit an in-memory one.
In practice? Urgh.
The live is all so cerebral and theoretical and I'm certain the right people know how to implement it for the right tasks in the right way and it screams along.
But as yet no one has been able to give me an incling of how it would work well for me.
I read learn you some Erlang for great good quite a while back and loved the idea. But it just never comes together for me in practice. Perhaps I'm simply in the wrong domain for it.
What I really needed was a mentor and existing project to contribute to at work. But it's impossible to get hold of either in the areas I'm in.
Erlang is weird, it helps if you have some Lisp and Prolog background, but for a while it might get in the way of learning how OTP works.
You're not going to be able to add it.
I don't find that to be true of many other ecosystems.
We could and do have a few Rust tools and webapps.
There is a few older Python/Flask internal applications.
If I went to an org with established tools from the ecosystem then that is not a problem!
People tried to introduce threads to Node.js but there was push-back for the very reasons mentioned in this article and so we never got threads.
The JavaScript languages communities watch, nod, and go back to work.
Work on the BEAM started in the 1990s, over ten years before the first release of Node in 2009.