Haskell's Missing Concurrency Basics (2016) (opens in new tab)

(snoyman.com)

114 pointsDanielRibeiro8y ago26 comments

26 comments

There's an open position at Facebook to work on GHC. If you're into Haskell and want to make it better, here's your opportunity: https://www.facebook.com/careers/jobs/a0I1H00000MoVjBUAV/

dnautics8y ago

How does erlang/elixir do it? I've never really had any problems.

SlySherZ8y ago

Elixir newbie here. If I remember correctly IO is implemented as a process, which means that different write requests are processed sequentially in the order they arrive to the process. The following link has more information: http://erlang.org/doc/apps/stdlib/io_protocol.html

phoe-krk8y ago

Correct. Erlang (and therefore Elixir) stdio is implemented by means of sending messages to a process that does low-level writing to a console. When a single message is served, the other are queued up.

Therefore, when two processes (let's name them A and B) want to output something at the same time, the result is going to be AAAABBBB or BBBBAAAA (depending on the order in which the messages arrive in the mailbox), but never BBAABBAA or anything similar.

gmfawcett8y ago

...where "AAAA" and "BBBB" are two distinct messages, and not two sets of four messages each ("A", "A", "A", "A").

When I first read your example, it sounded like you were saying that the IO process would exhaust all messages from one source process before processing any messages from the other, no matter what order they arrived in.

1 more reply

amelius8y ago

It's very customary to let logging and such be done by a separate thread; this has also the benefit that the original (sending) thread can continue without waiting for the device.

leshow8y ago

They have a single process which manages IO, you communicate w/ it by passing messages. There is never any contention because it's never shared. Of course, there are trade-offs involved with this decision, but it's a really nice architecture.

chriswarbo8y ago

I had some sympathy for this situation, until I saw that the concurrency was being specified via a function called `mapConcurrently`.

IMHO this is perfectly acceptable behaviour for a `map` function, since that name has gained the connotation that its purpose is to transform one 'collection' (Functor; whatever) into another, by pointwise, independent applications of the given function. Providing a function/action which breaks this independence (by writing to the same handle) breaks this implicit meaning. Heck, I'd consider it a code smell to combine interfering actions like this using a non-concurrent `map` function; I would prefer to define a separate function to make this distinction explicit, e.g.

    -- Like 'map', but function invocations may interfere with each other (you've been warned!)
    runAtOnce = map

When using `map` functions (which is a lot!) I subconsciously treat it as if it will be executed concurrently, in parallel, in any order. Consider that even an imperative languages like Javascript provide a separate `forEach` function, to prevent "abuses" of `map`. Even Emacs Lisp, not the most highly regarded language, provides separate `mapcar` and `mapc` functions for this reason.

With that said, I recognise that there's a problem here; but the problem seems to be 'mapping a self-interfering function'. If we try to make it non-interfering, we see that it's due to the use of a shared global value (`stdout`); another code smell! Whilst stdout is append-only, it's still mutable, so I'd try to remove this shared mutable state. Message passing is one alternative, where we can have each call/action explicitly take in the handle, then pass it along (either directly, or via some sort of "trampoline", like an MVar). This way we get the "concurrent from the outside, single-threaded on the inside" behaviour of actor systems like Erlang. In particular, it's easy to make sure the handle only get passed along when we're 'finished' with it (i.e. we've written a complete "block" of output).

divs12108y ago

Thread-unsafe `println` is one of Clojure's quirks too!

masklinn8y ago

Interesting, I think it's thread-safe in Rust because one of the common performance improvements for console applications with lots of output is to acquire the relevant stream's lock (and perform all writes against a never-released guard) otherwise it's going to be acquired and dropped on every write: https://doc.rust-lang.org/src/std/io/stdio.rs.html#448-461

heavenlyhash8y ago

I'm kind of surprised to hear that writing to stdout is a source of concurrency problems in a language that's considered to be functional.

Surely if you can pass your IO handles to all functions that need them, you can decide on a mutexing/buffering strategy at the top of your program, wrap the standard IO interface with a delegate that does so, and pass it on. Then, for all libraries called thereafter to use it consistently isn't just a no-brainer, it's an outright given, isn't it? There's no global (impure, non-functional) handle to stdout, is there?

foldr8y ago

Haskell's being functional is pretty much irrelevant here. The process has one stdout. If functions that write to file handles don't acquire a lock, then the output of different threads will get mixed up.

chriswarbo8y ago

It's easy to split hairs about what "being functional" means, but the way that Haskell implements IO is certainly a byproduct of this. In particular, my preferred mental model of Haskell doesn't include "functions that write to file handles"; but rather, these would be pure functions which return IO "actions".

This distinction is often inconsequential, but this case seems to rely on how we combine those "actions" together (which, due to laziness, may happen far away from where/when the functions are called; see 'lazy IO'). For sequential IO we can combine things with Applicative and Monad, which gives us a definite order, but using these in a concurrent setting would cause too much synchronisation and determinism to be useful. I've not done enough concurrent Haskell to know how the various alternatives stack up; although I did play with Arrow many years ago, before it fell out of favour (seemingly for Profunctors?).

gmfawcett8y ago

Monadic I/O (as I'm sure you know) just means that every I/O effect takes a world-state as an implicit parameter ("the world just before this action"), and returns a world-state ("the world just after this action") to thread into the next effect. In a concurrent program, I/O effects from multiple threads (and the outside world) may be interleaved or executed concurrently. Crudely, the world you changed a moment ago isn't necessarily the world you're about to change again. :)

Monadic I/O on its own doesn't make any guarantees, or impose any requirements, about locking external resources. If stdout is locked (from within the monad, or not), then that's the state the world arrives in for your effect. If not, then it's not. They are orthogonal concerns.

1 more reply

foldr8y ago

I just don't see how monadic IO is relevant to this issue. putStrLn either acquires a lock or it doesn't. As it happens, it doesn't. It just as easily could (without any change in its type). The same issue arises in any language that supports threads.

1 more reply

nomel8y ago

Maybe I'm confused, but isn't this expected, desired, behavior?

mitchty8y ago

Being functional doesn't mean interaction with the outside world is going to lose its difficulty. Stdout being buffered basically means you have to sort out how to get consistent output just like in most other languages.

Even if you do as you say, you can still bypass it by writing to stdout directly outside of your stable buffering mechanism. But at that point the language isn't to be blamed here.

clord8y ago

Use an STM channel or some other lock and put your messages for the shared resource (the terminal ui) through that channel. There’s no way to automatically figure out what granularity the programmer expects from the output so make them specify. Haskell makes specifying that staggeringly easy compared to other languages.

bitL8y ago

How can you get a performant language if your I/O granularity is 1 character? :-O

EDIT: this is an honest question, I was shocked to read what was in the article.

chriswarbo8y ago

Strings in Haskell are one of the language's sore points. It's something that's mostly a non-issue for those using Haskell day to day, but may be surprising to newcomers.

Haskell's built-in string type is a list of characters. This is mostly for historical reasons, but it's also handy in education (installing extra packages is a barrier for learners; list processing is common in introductory courses, but lists are polymorphic/generic in their element type; lists of characters are a nice concrete type, which follows on easily from "hello world"); also there are arguments about the theoretical elegance of linked lists, KISS for the builtins, whether there's concensus on what the best alternative is, etc.

Anyone who cares about Haskell performance will have hit this early on, and be using a different string implementation, as mentioned in the article. In particular there's ByteString for C-like arrays of bytes, and there's Text which is just a ByteString with extra metadata like character encoding. In fact, ByteString doesn't have to be a single contiguous array: it can be a list of "chunks", where each chunk contains a pointer to an array, an offset and a length; this speeds up many operations, e.g. we can append ByteStrings by adding chunks to the list (pointing to existing arrays), we can take substrings by manipulating the offsets and lengths, etc. This is all perfectly safe and predictable since the data is immutable, where other languages which allow mutation might prefer to make copies of the data to reduce aliasing.

The other aspect is the buffering mode of the handle, which is discussed a little in the article and its comments (e.g. line-based buffering, etc.).

bru8y ago

Title is missing a (2016).

j / k navigate · click thread line to collapse

26 comments

fiorix8y ago

There's an open position at Facebook to work on GHC. If you're into Haskell and want to make it better, here's your opportunity: https://www.facebook.com/careers/jobs/a0I1H00000MoVjBUAV/

dnautics8y ago

How does erlang/elixir do it? I've never really had any problems.

SlySherZ8y ago

phoe-krk8y ago

gmfawcett8y ago

...where "AAAA" and "BBBB" are two distinct messages, and not two sets of four messages each ("A", "A", "A", "A").

1 more reply

amelius8y ago

It's very customary to let logging and such be done by a separate thread; this has also the benefit that the original (sending) thread can continue without waiting for the device.

leshow8y ago

chriswarbo8y ago

I had some sympathy for this situation, until I saw that the concurrency was being specified via a function called `mapConcurrently`.

    -- Like 'map', but function invocations may interfere with each other (you've been warned!)
    runAtOnce = map

divs12108y ago

Thread-unsafe `println` is one of Clojure's quirks too!

masklinn8y ago

heavenlyhash8y ago

I'm kind of surprised to hear that writing to stdout is a source of concurrency problems in a language that's considered to be functional.

foldr8y ago

chriswarbo8y ago

gmfawcett8y ago

1 more reply

foldr8y ago

1 more reply

nomel8y ago

Maybe I'm confused, but isn't this expected, desired, behavior?

mitchty8y ago

Even if you do as you say, you can still bypass it by writing to stdout directly outside of your stable buffering mechanism. But at that point the language isn't to be blamed here.

clord8y ago

bitL8y ago

How can you get a performant language if your I/O granularity is 1 character? :-O

EDIT: this is an honest question, I was shocked to read what was in the article.

chriswarbo8y ago

Strings in Haskell are one of the language's sore points. It's something that's mostly a non-issue for those using Haskell day to day, but may be surprising to newcomers.

The other aspect is the buffering mode of the handle, which is discussed a little in the article and its comments (e.g. line-based buffering, etc.).

bru8y ago

Title is missing a (2016).

j / k navigate · click thread line to collapse