Teaching Concurrency (2009) [pdf] (opens in new tab)

(research.microsoft.com)

151 pointsluisfoliv9y ago44 comments

44 comments

fizixer9y ago

Highly recommended: Allen Downey The Little Book of Semaphores (free pdf here: http://greenteapress.com/wp/semaphores/)

TickleSteve9y ago

I think the first thing people should be taught about concurrency... is when not to use it.

Concurrency can result in increased maintenance costs and complexity.

Concurrency is also not more efficient on a single core.

Concurrency can help with latency and response time.

In embedded systems in particular, there is an over-use of concurrency which often results in bloated, complex code.

Const-me9y ago

> Concurrency is also not more efficient on a single core.

Concurrency can be more efficient even on a single core. When blocking synchronous I/O is involved, concurrency may help saturate the bandwidth with multiple in-flight requests.

TickleSteve9y ago

if you are in an I/O bound situation, then yes, concurrency will allow you to exploit some parallelism.

Otherwise, no.

(and in general, people just throw concurrency at the problem instead of analysing whether they are in fact I/O bound or CPU bound).

(Downvoted why??)

1 more reply

pron9y ago

> I think the first thing people should be taught about concurrency... is when not to use it.

Concurrency is usually not a feature of the algorithm but a feature of the problem domain. If you have many requests coming in at the same time, all competing for a limited amount of computational resources -- you have concurrency.

How you handle it is a different matter. You can decide to handle the requests one by one, but then, by Little's law, your throughput would be very low, and your server will crash if the rate of requests is over some small limit (which depends on the time it takes to service each request).

arielb19y ago

No. If the problem is computational resources, performance can't be increased by mere concurrency (while even completely deterministic STM-style parallelism would help).

Concurrency improves performance when a process accesses both the same computational resource and some other high-latency sharable resource.

1 more reply

monix9y ago

I think you're confusing concurrency with parallelism. Indeed, when these two are combined in the same process with multiple threads sharing memory or other resources, you're effectively juggling with knives.

But in fact concurrency is inevitable in absence of OS threads that can be blocked (another potential clusterfuck) or of some form of continuations support, because it is a direct consequence of asynchrony.

And asynchrony isn't avoidable, all you can do is to find abstractions that make it more deterministic.

sidlls9y ago

The next thing they should be taught is that even if an existing serial implementation can be made more efficient using concurrency, that doesn't mean it should be. That should be followed quickly by teaching that concurrency should be implemented over the smallest possible surface of the code.

TickleSteve9y ago

absolutely!

VinzO9y ago

Do you have any suggestions of ressources to learn more on that specific topic. I am interested on ressources to learn more about when to use tasks or when it is better not to. Especially for embedded systems. I find now people use tasks for anything without having a real idea of the costs. I am also interested on when it is needed to use an embedded OS or when you could better do without one. Any kind of ressources, book, website etc is welcomed.

TickleSteve9y ago

Unfortunately, this is the type of real-world problem that isn't well taught (or even documented).

We are all taught from uni about how concurrency is implemented and most applications use concurrency as a design-tool to decompose a problem into its functions.

Unfortunately, this tends to produce a sub-optimal result as the inefficiencies become visible on small embedded/real-time systems.

Its difficult to give any general advice, but have a look at real-time analysis to get an idea of the real issues.... and don't blindly throw tasks at a problem when a simple superloop is more efficient/simple/maintainable.

nkozyra9y ago

> Concurrency is also not more efficient on a single core.

This isn't a hard and fast rule. There is overhead to parallelism.

tbrownaw9y ago

And if you step out all the parallelism from your concurrency runtime, you get rid of a big chunk of the overhead. But even with purely cooperative single-threaded multitasking, there's still some overhead vs "ordinary" strictly serial code.

TickleSteve9y ago

For CPU bound code, concurrency is pure overhead.

For I/O bound code, concurrency can give you some benefit.

...but in general, people do not analyse this before throwing tasks at a problem.

1 more reply

Const-me9y ago

When I was a kid, I loved playing transport tycoon deluxe videogame.

When I grew up to be a programmer, never had much problems with concurrent stuff.

IMO designing concurrent programs is conceptually similar to building complex high-throughput low latency railway networks in the game.

rdmsr9y ago

An open source version (remake?) of the game exists[1].

[1] https://www.openttd.org/en/

InclinedPlane9y ago

hcarvalhoalves9y ago

Same idea: https://www.factorio.com/

argv_empty9y ago

The only actionable advice in here for someone tasked with developing a curriculum for teaching concurrency is to make sure the prerequisite courses instill the idea of computation as a sequence of state transitions.

Const-me9y ago

“Sequence of state transitions” implies you can order them by time. Generally, various state transitions happen in parallel, their relative order is undefined even on an SMP system.

You can serialize the transitions if you want, but that usually costs performance. This is especially true for distributed parallel computing, where the state is also distributed.

argv_empty9y ago

It sounds like you're modeling the state space wrong. In a concurrent system, the state is not just one thread's (or one processor's, or one machine's) internal data (including program counter). That would be like saying a (single-threaded) program's state is the contents of the registers and topmost stack frame. The state in a concurrent system is the combination of every thread's internal state. And what actually happens is a sequence of such arrangements. One arrangement is not realized without replacing the previous one.

1 more reply

pron9y ago

The state transitions can be partially ordered, but the way this works in TLA is that every computation is a set of legal behaviors (sequences of states).

jasondebo9y ago

So, what is the inductive invariant in that simple algorithm?

pron9y ago

The inductive invariant is the correctness property itself (when all processes are done then at least one y is 1) in conjunction with "for all processes, when the process is not on line 1, then its x value is 1". You can probably replace the second conjunct with "for all processes, if the x value is 0, then the process is not done", but the proof gets a bit harder.

pron9y ago

The proof then follows like so, by induction on the states of the program:

This is our partial correctness property

    PartialCorrectness ≜ AllDone ⇒ ∃ p ∈ ProcSet : y[p] = 1

And this is the inductive invariant:

    Inv ≜ PartialCorrectness
            ∧ ∀ p ∈ ProcSet : pc[p] ≠ "Line1" ⇒ x[p] = 1

We need to show that Inv implies PartialCorrectness (trivial), that Inv holds in the initial state, and that if it holds in any state s, then it holds in any possible consecutive step s'. It's easy to see that it holds in the initial state. Now, let's assume it holds in s, and prove for s'. To make this transition, some process p either executes line 1 or executes line 2. If it executes 1, then PartialCorrectness doesn't change because no new process is done. The second conjunct holds because we've just left line 1 and x has been assigned (by the definition of line 1). If we are currently in line 2, the second conjunct of the invariant doesn't change. By the definition of this action, we'll be done. Here we have two cases, either we set y to 1, or we set y to zero. If we set y to 1, we're done and PartialCorrectness holds. If we set y to 0 then by the assumption of the invariant, the process we depend on must not be done, hence AllDone is false, and PartialCorrectness holds. QED.

j / k navigate · click thread line to collapse

44 comments

fizixer9y ago

Highly recommended: Allen Downey The Little Book of Semaphores (free pdf here: http://greenteapress.com/wp/semaphores/)

TickleSteve9y ago

I think the first thing people should be taught about concurrency... is when not to use it.

Concurrency can result in increased maintenance costs and complexity.

Concurrency is also not more efficient on a single core.

Concurrency can help with latency and response time.

In embedded systems in particular, there is an over-use of concurrency which often results in bloated, complex code.

Const-me9y ago

> Concurrency is also not more efficient on a single core.

Concurrency can be more efficient even on a single core. When blocking synchronous I/O is involved, concurrency may help saturate the bandwidth with multiple in-flight requests.

TickleSteve9y ago

if you are in an I/O bound situation, then yes, concurrency will allow you to exploit some parallelism.

Otherwise, no.

(and in general, people just throw concurrency at the problem instead of analysing whether they are in fact I/O bound or CPU bound).

(Downvoted why??)

1 more reply

pron9y ago

> I think the first thing people should be taught about concurrency... is when not to use it.

arielb19y ago

No. If the problem is computational resources, performance can't be increased by mere concurrency (while even completely deterministic STM-style parallelism would help).

Concurrency improves performance when a process accesses both the same computational resource and some other high-latency sharable resource.

1 more reply

monix9y ago

And asynchrony isn't avoidable, all you can do is to find abstractions that make it more deterministic.

sidlls9y ago

TickleSteve9y ago

absolutely!

VinzO9y ago

TickleSteve9y ago

Unfortunately, this is the type of real-world problem that isn't well taught (or even documented).

We are all taught from uni about how concurrency is implemented and most applications use concurrency as a design-tool to decompose a problem into its functions.

Unfortunately, this tends to produce a sub-optimal result as the inefficiencies become visible on small embedded/real-time systems.

nkozyra9y ago

> Concurrency is also not more efficient on a single core.

This isn't a hard and fast rule. There is overhead to parallelism.

tbrownaw9y ago

TickleSteve9y ago

For CPU bound code, concurrency is pure overhead.

For I/O bound code, concurrency can give you some benefit.

...but in general, people do not analyse this before throwing tasks at a problem.

1 more reply

Const-me9y ago

When I was a kid, I loved playing transport tycoon deluxe videogame.

When I grew up to be a programmer, never had much problems with concurrent stuff.

IMO designing concurrent programs is conceptually similar to building complex high-throughput low latency railway networks in the game.

rdmsr9y ago

An open source version (remake?) of the game exists[1].

[1] https://www.openttd.org/en/

InclinedPlane9y ago

hcarvalhoalves9y ago

Same idea: https://www.factorio.com/

argv_empty9y ago

Const-me9y ago

“Sequence of state transitions” implies you can order them by time. Generally, various state transitions happen in parallel, their relative order is undefined even on an SMP system.

You can serialize the transitions if you want, but that usually costs performance. This is especially true for distributed parallel computing, where the state is also distributed.

argv_empty9y ago

1 more reply

pron9y ago

The state transitions can be partially ordered, but the way this works in TLA is that every computation is a set of legal behaviors (sequences of states).

jasondebo9y ago

So, what is the inductive invariant in that simple algorithm?

pron9y ago

The proof then follows like so, by induction on the states of the program:

This is our partial correctness property

    PartialCorrectness ≜ AllDone ⇒ ∃ p ∈ ProcSet : y[p] = 1

And this is the inductive invariant:

    Inv ≜ PartialCorrectness
            ∧ ∀ p ∈ ProcSet : pc[p] ≠ "Line1" ⇒ x[p] = 1

j / k navigate · click thread line to collapse