Compiling Array Languages for SIMD [pdf] (opens in new tab)

(vmchale.com)

52 pointsvmchale1y ago13 comments

13 comments

I had originally planned something like this for my PhD thesis, but found out that I'm in way over my head. So I scaled down my ambitions a little.

Array languages and SIMD are a match made in heaven. This should be the paradigm of choice for high-performance programming, but it's unfortunately pretty obscure.

toasterlovin1y ago

> Array languages and SIMD are a match made in heaven. This should be the paradigm of choice for high-performance programming, but it's unfortunately pretty obscure.

Huh. I kinda figured the whole point of array programming languages was that the compiler doesn't have to guess which parts of the code are inherently parallel.

jacoblambda1y ago

So as someone who is by no means an expert you are half right. The compiler doesn't have to guess what parts are parallel and it's very clear which ops are parallelisable but how you parallelise them is the name of the game.

So for example if you do a pattern of "do a small op to each part of a large block of data and then do another small op to each part of that block of data, etc" then at least in CPU SIMD (ex AVX) you end up memory bottlenecked.

However if you can do a bunch of ops on the same small blocks of data before moving on to the next blocks of data in your overall large block of data then said small blocks can fit inside the L1 cache (or in the registers directly) and that can run the CPU to it's absolute limit.

Hence it becomes a game of scheduling. You already know what you need to optimise but actually doing so gets really hard really fast. Albeit things like MLIR (which are still very new) are making this easy to approach.

1 more reply

vmchaleOP1y ago

>it's unfortunately pretty obscure

NumPy is partly inspired by APL and descendants. One of the few places that programmers commonly get performance afforded by hardware!

fulafel1y ago

Indeed. It's sad how GPU programming is mostly stuck in the dark ages of C/C++ (well, worse than C++, with proprietary mutually incompatible variants, and buggy sw stacks). We have Futhark at least...

vmchaleOP1y ago

Aaron Hsu has an APL compiler targeting GPU that gets tantalizing performance in machine learning: https://dl.acm.org/doi/10.1145/3589246.3595371

fuhsnn1y ago

It's about the author's language named Apple[1], took a few seconds since Apple Array System unfortunately sounds like some MacOS framework.

For people prefer C-like syntax, there is ispc[2], which supports x86 AVX and ARM Neon programming via LLVM.

[1] https://github.com/vmchale/apple

[2] https://github.com/ispc/ispc

ldbeth1y ago

It does use arm64 and neon instructions, and talks a bit about macOS specific Accelarate.framework functions. not totally unrelated.

vmchaleOP1y ago

One of my blog posts was posted and got some comments. This is a more refined take in the vein of "C is Not Suited to SIMD"

ldbeth1y ago

Probably a more proper title is "type directed optimization for SIMD", I do not see it particularly useful for array languages at large (as many of them, such as APL and J are untyped by intentional choice).

j / k navigate · click thread line to collapse

13 comments

clausecker1y ago

I had originally planned something like this for my PhD thesis, but found out that I'm in way over my head. So I scaled down my ambitions a little.

Array languages and SIMD are a match made in heaven. This should be the paradigm of choice for high-performance programming, but it's unfortunately pretty obscure.

toasterlovin1y ago

> Array languages and SIMD are a match made in heaven. This should be the paradigm of choice for high-performance programming, but it's unfortunately pretty obscure.

Huh. I kinda figured the whole point of array programming languages was that the compiler doesn't have to guess which parts of the code are inherently parallel.

jacoblambda1y ago

1 more reply

vmchaleOP1y ago

>it's unfortunately pretty obscure

NumPy is partly inspired by APL and descendants. One of the few places that programmers commonly get performance afforded by hardware!

fulafel1y ago

vmchaleOP1y ago

Aaron Hsu has an APL compiler targeting GPU that gets tantalizing performance in machine learning: https://dl.acm.org/doi/10.1145/3589246.3595371

fuhsnn1y ago

It's about the author's language named Apple[1], took a few seconds since Apple Array System unfortunately sounds like some MacOS framework.

For people prefer C-like syntax, there is ispc[2], which supports x86 AVX and ARM Neon programming via LLVM.

[1] https://github.com/vmchale/apple

[2] https://github.com/ispc/ispc

ldbeth1y ago

It does use arm64 and neon instructions, and talks a bit about macOS specific Accelarate.framework functions. not totally unrelated.

vmchaleOP1y ago

One of my blog posts was posted and got some comments. This is a more refined take in the vein of "C is Not Suited to SIMD"

ldbeth1y ago

j / k navigate · click thread line to collapse