It is plausible that with another layer of in-package cache they could eliminate DRAM altogether, replacing it with ultrafast NVM. Imagine the resume/suspend speed and power savings of a machine whose state is always stored in NVM.
I'm very interested in this. Could you point out which technologies that are near ready for commercialization?
My understanding is that the current cost is orders of magnitude higher per unit of storage for these new technologies compared to NAND flash or even DDR3 RAM. But of course, a dedicated fab could change that very quickly.
The issue is the cache: the data is not non-volatile until it has been written back to DRAM. Even then, you need some advanced warning of a power outage for it all to work.
Unibus (bus for PDP-11 core memory systems) had an early warning signal, to give the memory controller a chance to write back the previous (destructive) read.
Memory is becoming the new disk. This could have major security implications, as memory contents are unencrypted in general.
Fortunately, Intel CPUs will have hardware support to encrypt SGX enclaves. Perhaps that support can be used for general memory access as well.
Encrypting disks or network is no problem today, but we'll need architectural changes to support full memory encryption without a performance hit.
There are alternatives. Non-volatile memory could be treated as a key/value store, or a tree, with a storage controller between the CPU and the memory device. With appropriate protection hardware, this could be accessed from user space through special instructions. That's what I though this article indicated. But no. This is just better cache management for the OS.
It's called "single-level store" in System 36 and descendants. File access in Multics was all memory mapped.
There's nothing inherently rotating-disky about current filesystem APIs from the user point of view, a they just provide a database interface which has a certain type of namespace system for access. The block level part is largely invisible to the FS users (modulo leaky abstractions).
It wasn't super-easy to figure out who in the grand ecosystem view of things is going to have to care about these instructions, but I guess database and OS folks.
Also, if the author reads this, the first block quote with instruction descriptions has an editing fail, it repeats the same paragraph three times (text begins "CLWB instruction is ordered only by store-fencing operations").
The non-volatile memory needs a layer of RAID-like volume management. For example, when you transfer the memory from one system to another, there should be a way to determine that the memory is inserted in the correct slots (remember there is RAID like interleaving/striping across memory modules).
Now I can do almost all of my I/o, timer and inter-process synchronization without ever entering a kernel or swapping out thread context. I've been waiting for this chip since the Z80.
I dropped this feature when I switched to a single-task stack-based machine (inspired by my adventures with GraFORTH - thank you, Paul Lutus). This ended up being my graduation project.