CXL 3.1 was the first spec where they added any way to have a host CPU also be able to share memory (host to host), itself be part of RDMA. It seems like it's not exactly going to look like any other CXL memory device, so it'll take some effort to make other hosts or even the local host be able to take advantage of CXL. https://www.servethehome.com/cxl-3-1-specification-aims-for-...
Now work on the bandwidth.
A single HBM3 module has the bandwidth of half-a-dozen data center grade PCIe 5.0 x16 NVME drives.
A single DDR5 DIMM has the bandwidth of a pair of PCIe 5.0 x4 NVME drives.
If you really wanted very low latency you needed Optane DIMMs. And that was problematic because typically you wanted motherboards loaded with ram. And it made it complex to figure out how to use those DIMMs, those parts of memory that would be slower but persistent. Using the DIMMs was hard.
But Optane existed as a damned fine NVMe product too! Latency wasn't as good because it was PCIe, was the main downside. CXL could remove this penalty, make it look more like ram that is a NUMA hop away, potentially, which would be grand. This ain't really required to use Optane well, one can still get epic iops at incredibly consistently low latency & proposer, but if you do have a latency sensitive demand it certainly can help!
Poor Optane. I have a hard time understanding how something of such excellent value floundered so. In truth there's not that many people who need many drive-writes-per-day but even if you didn't, the promise was this drive should last you a very very long time because it had such endurance. That long term sustainability seemed like an incredible value we simply failed to recognize & tap.
CXL changes the game due to its cache coherency protocol. You don't have to care, precisely because the system transparently deals with this directly in the hardware. It is just one giant address space. You don't need slow OS level page faults or to update the page table every time something is loaded or unloaded from memory.
The biggest problem with persistent memory is building an application with transactional semantics. All the hardware and software transactional memory is built around concurrency and not persistence. When you think about it, that is kind of backwards. Persistent memory has very loose performance requirements since I/O is assumed to be slow. Meanwhile parallelism and concurrency are about increasing performance and therefore it defeats the point if it ends up slower than without.
Its may be enough that just another company would have managed to create/license something like Optane and both companies would have stuck with a large over capacity for a long time.
Combined with the fact that Intel created both CXL and Optane, it stands to reason that the plan was to combine them eventually. Unfortunately, that was never came to pass :(
On the consumer side, you're right, using the System ram is probably a better approach as most consumer motherboards would have the NVMe storage routing up to the CPU Interconnect then back "Down" to the GPU (or worse through the "southbridge" chipset(s) like on X570) so you take that hit anyway.
However if you have a PCIe switch on board that allows data to flow direct from storage to GPU without a round trip across the CPU, then NVMe/CXL/SCM modules would theoretically be better than system RAM. Depends on the switch, retimers, muxing, topology etc.
Regardless of what you're using for direct storage and how ideal your topology is, the MTps and GBps over PCIe is significantly slower than onboard VRAM (be it GDDR or especially HBM) and bandwidth limited to boot. Doesn't mean it's useless by any means, but important to point out that this doesn't turn a 20GB VRAM card into a 2.02TB VRAM card just because you DirectStorage'd a 2TB Drive to it, no matter how ideal the setup is. However, as PCIe increases in bandwidth and Storage-Class-Memory type devices and just storage tech in general continues to improve, it's rapidly becoming more viable. On PCIe gen 3, you're probably shooting yourself in the foot. on PCIe Gen 6, you can realistically see a very real benefit. But again, there's a lot of "depends" here and for now, you're probably better off buying a bigger or multiple GPUs if you're not on the cutting edge with the corporate credit line.