Intel proposes XeGPU dialect for LLVM MLIR (opens in new tab)

(discourse.llvm.org)

90 pointsartagnon2y ago14 comments

14 comments

Inference is going to be interesting in 2025.

By that time we will have a good number of MI300 hosts. AMD Strix Halo (and the Intel equivalent?) will be out for high memory jobs locally. Intel Falcon Shores and who knows will finally be coming out, and from the looks of it the software ecosystem will be at least a little more hardware agnostic.

cyanydeez2y ago

https://www.amd.com/en/products/apu/amd-ryzen-7-7840hs this has existed since 2023 with "Neural Processing Unit".

Seems like if you want to catch the wave, it's really already here. Not sure what this thing can do, and hope to find out next year, but local AI is going to be a killer app.

brucethemoose22y ago

The NPU is reportedly slower than the IGP (just like in Apple devices) and restricted in what it can do. Both it and the GPU are bottlenecked by the 128 bit memory bus anyway. The 7840HS is just not fast in generative AI, even in the best case scenario.

Halo Strix's memory bus will be twice as wide, higher speed, and the GPU will be much bigger. It will be closer to a small GPU with a huge VRAM pool, rather than a dreadfully slow IGP.

CalChris2y ago

“XeGPU dialect provides an abstraction that closely models Xe instructions.”

How is that an abstraction? It sounds more like a representation.

viksit2y ago

could someone eli5 about what this means for engineers working on systems from an app perspective / higher level perspective?

(have worked extensively with tf / pytorch)

zer0zzz2y ago

When the core compiler infrastructure generalizes more of the GPU support bits, it means you can pay less for hardware because the software stack is less coupled with one vendor’s hardware.

gavinray2y ago

Absolutely nothing, this is a "sausage being made" sort of thing.

JonChesterfield2y ago

Weird when there's no codegen for it in llvm. I guess the idea is to use MLIR with a toolchain built from intel's GitHub.

westurner2y ago

https://discourse.llvm.org/t/rfc-add-xegpu-dialect-for-intel... :

> XeGPU dialect models a subset of Xe GPU’s unique features focusing on GEMM performance. The operations include 2d load, dpas, atomic, scattered load, 1d load, named barrier, mfence, and compile-hint. These operations provide a minimum set to support high-performance MLIR GEMM implementation for a wide range of GEMM shapes. XeGPU dialect complements Arith, Math, Vector, and Memref dialects. This allows XeGPU based MLIR GEMM implementation fused with other operations lowered through existing MLIR dialects.

1 more reply

KingLancelot2y ago

Not the way to do this.

Accelerators already have a common middle layer.

https://discourse.llvm.org/t/rfc-introducing-llvm-project-of...

superb_dev2y ago

From the thread it looks like AMD and Nvidia have something similar, maybe Intel is wants feature parity

ColonelPhantom2y ago

It looks like offload is mostly about generic abstractions for accelerator devices. XeGPU on the other hand seems to be a way of encoding Xe-specific hardware features/instructions, so pretty different? I don't see how these are anything but orthogonal.

gardenfelder2y ago

Direct link: https://hai.stanford.edu/news/how-well-do-large-language-mod...

mplewis9z2y ago

You replied to the wrong topic.

j / k navigate · click thread line to collapse

14 comments

brucethemoose22y ago

Inference is going to be interesting in 2025.

cyanydeez2y ago

https://www.amd.com/en/products/apu/amd-ryzen-7-7840hs this has existed since 2023 with "Neural Processing Unit".

Seems like if you want to catch the wave, it's really already here. Not sure what this thing can do, and hope to find out next year, but local AI is going to be a killer app.

brucethemoose22y ago

Halo Strix's memory bus will be twice as wide, higher speed, and the GPU will be much bigger. It will be closer to a small GPU with a huge VRAM pool, rather than a dreadfully slow IGP.

CalChris2y ago

“XeGPU dialect provides an abstraction that closely models Xe instructions.”

How is that an abstraction? It sounds more like a representation.

viksit2y ago

could someone eli5 about what this means for engineers working on systems from an app perspective / higher level perspective?

(have worked extensively with tf / pytorch)

zer0zzz2y ago

When the core compiler infrastructure generalizes more of the GPU support bits, it means you can pay less for hardware because the software stack is less coupled with one vendor’s hardware.

gavinray2y ago

Absolutely nothing, this is a "sausage being made" sort of thing.

JonChesterfield2y ago

Weird when there's no codegen for it in llvm. I guess the idea is to use MLIR with a toolchain built from intel's GitHub.

westurner2y ago

https://discourse.llvm.org/t/rfc-add-xegpu-dialect-for-intel... :

1 more reply

KingLancelot2y ago

Not the way to do this.

Accelerators already have a common middle layer.

https://discourse.llvm.org/t/rfc-introducing-llvm-project-of...

superb_dev2y ago