It also reminds me of attempts to define BUSE[0][1][2], which would have been a block device equivalent of FUSE. IIRC attempts to get BUSE into the Linux kernel have been blocked for performance reasons -- the FUSE protocol isn't well designed and is only barely acceptable for VFS.
If io_uring (+ careful use of zero-copy) has fixed the performance issues with userspace block devices, maybe it would be applicable to FUSE (or FUSE-v2)? I've tried using io_uring with the current FUSE protocol to reduce syscall overhead and it kinda works, but a protocol designed to operate in that mode from the beginning would be even better.
[0] https://github.com/acozzette/BUSE
[1] https://dspace.cuni.cz/bitstream/handle/20.500.11956/148791/...
Block devices operate on blocks of data identified by offset. Hard disks, CD-ROM drives, USB sticks, basically anything where it'd make sense to say "read (or write) these 1024 bytes at offset 0x10000".
You can in principle implement a block device-ish API in FUSE by disabling open/close and requiring all reads/writes to be at given offsets -- IIRC this is how the "fuseblk" mode added for ntfs-3g works -- but the protocol is too chatty to be fast enough for things people want block devices for.
I've also heard the kernel's block layer error handling doesn't interact well with the FUSE protocol, but I don't know the details too well on that.
- <odd>.x.x: Linus went crazy, broke absolutely _everything_, and rewrote the kernel to be a microkernel using a special message-passing version of Visual Basic. (timeframe: "we expect that he will be released from the mental institution in a decade or two").
[*] https://lkml.org/lkml/2005/3/2/247
It's really interesting to see Linux getting more and more micro-kernel like features throughout the years.
Basically, you can implement a virtual SAN for containers efficiently with this.
There's a reason emulators design their virtual devices to resemble real hardware (PCI, SCSI, USB) -- there's already going to be a bunch of code in the hypervisor to create fake hardware. It's also more practical to piggy-back on PCI (etc) when the spec needs to be implemented by competing vendors, since there's no kernel and no OS idioms involved. Not to mention various pre-kernel code such as EFI and bootloaders.
Conversely, userspace developers really do not want to be coding up a fake PCI device with registers and interrupts and so on just to get some bytes into the kernel. They want to invoke system calls (ioctl, mmap, io_uring) and let the OS handle the details.
I understand the frustration of having the network driver crash but could it not be run in a way that it doesn't bring down the OS?
It seems to me Java would have a no-brainer advantage of a user-space networking option since you're already in a VM!?
When I saturate my HTTP server the kernel takes 30% of the CPU just copying data for no good reason?!
What is the whole idea, though? Serving things to kernel from userland is decades old and commonly used with both NFS and iSCSI. The fact that this particular implementation uses io_uring instead of something non-proprietary like RDMA, is just an implementation detail.