The kind of model I prefer is something based on atomicity. Most applications can get by with file-level atomicity--make whole file read/writes atomic with a copy-on-write model, and you can eliminate whole classes of filesystem bugs pretty quickly. (Note that something like writeFileAtomic is already a common primitive in many high-level filesystem APIs, and it's something that's already easily buildable with regular POSIX APIs). For cases like logging, you can extend the model slightly with atomic appends, where the only kind of write allowed is to atomically append a chunk of data to the file (so readers can only possibly either see no new data or the entire chunk of data at once).
I'm less knowledgeable about the way DBs interact with the filesystem, but there the solution is probably ditching the concept of the file stream entirely and just treating files as a sparse map of offsets to blocks, which can be atomically updated. (My understanding is that DBs basically do this already, except that "atomically updated" is difficult with the current APIs).
int fd = open(".config", O_RDWR | O_CREAT | O_SYNC_ON_CLOSE, 0o666);
// effects of calls to write(2)/etc. are invisible through any other file description
// until the close(2) is called on all descriptors to this file description.
close(fd);
So now you can watch for e.g. either IN_MODIFY or IN_CLOSE_WRITE (and you don't need to balance it with IN_OPEN), it doesn't matter, you'll never see partial updates... would be nice!What happens when a lot of data is written and exceeds the dirty threshold?
Database developers don’t want the complexity or poor performance of posix. It’s wild to me that we still don’t have any alternative to fsync in Linux that can act as a barrier without also flushing caches at the same time.
https://github.com/openzfs/zfs/blob/34205715e1544d343f9a6414...
Writes on ZFS cease to be atomic around approximately 32MB in size if I read the code correctly.
I have many files that are several GB. Are you sure this is a good idea? What if my application only requires best effort?
> eliminate whole classes of filesystem bugs pretty quickly.
Block level deduplication is notoriously difficult.
> where the only kind of write allowed is to atomically append a chunk of data to the file
Which sounds good until you think about the complications involved in block oriented storage medium. You're stuck with RMW whether you think you're strictly appending or not.
But even then, doing atomic writes of multi gigabyte files doesn’t sound that hard to implement efficiently. Just write to disk first and update the metadata atomically at the end. Or whenever you choose to as a programmer.
The downside is that, when overwriting, you’ll need enough free space to store both the old and new versions of your data. But I think that’s usually a good trade off.
It would allow all sorts of useful programs to be written easily - like an atomic mode for apt, where packages either get installed or not installed. But they can’t be half installed.