undefined | Better HN

0 pointskenjackson15y ago0 comments

So here's a typical problem that I'd have in an HPC application. I'd have some space, represented by some 3D structure (maybe an array, or even an object-based particle system). I need to do some computation over this space -- often using some type of stencil, so in order to compute the value at <x,y,z> I need values of coordinates some distance from x,y,z.

The part that often ends up being tricky is the fact that I need to send data from processor A to processor B. And I want to send as little data as possible. So one of the first sources of bugs is that when I do my gather-scatter I make a mistake mapping a value to a coordinate. In shared-memory you never have to do this mapping back and forth, so its not an issue.

Next issue is related to the fact that I don't want to ever block waiting for data. There are a variety of models for handling this. I can do a non-blocking receive, and do some work waiting for the data to arrive. This is often another source bugs as people will often do work that depends on the new data, but they chug along without it. Add the new data when they get it, and alas their computation is already hosed.

And the last common error in this case is handing the data off to the wrong object (or processor) or being confused as to which data you're receiving at any given point in time.

Now all of these can be handled by simply being careful, and using some good programming practices. But they are just simple, if not grossly naive, examples of issues you have with traditional message passing that don't exist in shared memory.

0 comments

gruseom15y ago

Interesting; thanks.

What you're describing sounds to me like the complexity of ferrying data around and scheduling computations is being offloaded to the app programmer. Presumably the intent behind things like OpenMP on clusters is to take care of all that behind the scenes and let the user pretend that it's all shared and program accordingly. Is that correct? If so, how far would you say such distributed infrastructure has gotten to date? Is it usable for real work, or do people end up having to learn so many limitations and workarounds that they're no better off than programming against the lower-level model in the first place?

Another question: even when there is shared memory you still have to coordinate the various processes that are operating concurrently on it so they don't clobber each other, and that, as everyone knows, is complicated too. So there is a tradeoff here. It sounds like your point is that given a choice, the HPC community would rather program against shared memory using traditional concurrency mechanisms (threads, locks, etc.) than deal with the complexities of the alternatives. Am I reading you correctly? If so, that's a pretty major point which suggests that the general-purpose programming community may be gearing up for a wild goose chase.

j / k navigate · click thread line to collapse

0 comments

gruseom15y ago

Interesting; thanks.

j / k navigate · click thread line to collapse