undefined | Better HN

0 points__loam1y ago0 comments

POSIX is also so old and essential that it's hard to imagine an alternative.

0 comments

Not really, there's been lots of APIs that have improved on the POSIX model.

The kind of model I prefer is something based on atomicity. Most applications can get by with file-level atomicity--make whole file read/writes atomic with a copy-on-write model, and you can eliminate whole classes of filesystem bugs pretty quickly. (Note that something like writeFileAtomic is already a common primitive in many high-level filesystem APIs, and it's something that's already easily buildable with regular POSIX APIs). For cases like logging, you can extend the model slightly with atomic appends, where the only kind of write allowed is to atomically append a chunk of data to the file (so readers can only possibly either see no new data or the entire chunk of data at once).

I'm less knowledgeable about the way DBs interact with the filesystem, but there the solution is probably ditching the concept of the file stream entirely and just treating files as a sparse map of offsets to blocks, which can be atomically updated. (My understanding is that DBs basically do this already, except that "atomically updated" is difficult with the current APIs).

Joker_vD1y ago

> Most applications can get by with file-level atomicity--make whole file read/writes atomic with a copy-on-write model, and you can eliminate whole classes of filesystem bugs pretty quickly.

    int fd = open(".config", O_RDWR | O_CREAT | O_SYNC_ON_CLOSE, 0o666);

    // effects of calls to write(2)/etc. are invisible through any other file description
    // until the close(2) is called on all descriptors to this file description.

    close(fd);

So now you can watch for e.g. either IN_MODIFY or IN_CLOSE_WRITE (and you don't need to balance it with IN_OPEN), it doesn't matter, you'll never see partial updates... would be nice!

BobbyTables21y ago

Surely this can’t always be true?

What happens when a lot of data is written and exceeds the dirty threshold?

1 more reply

kragen1y ago

It's not hard to design a less bug-prone API that would enable you to do everything the POSIX file API permits and admits equally-high-performance implementations. But making that new API a replacement for the POSIX API would require rewriting essentially all of the software that somebody cares about to use your new, better API instead of the POSIX API. This is probably only feasible in practice for small embedded systems with a fairly small universe of software.

josephg1y ago

You could do a phased transition, where both the legacy posix api and the new api are available. This has already happened with a lot of the old C standard library. Old, unsafe functions like strcpy were gradually replaced by safer alternatives like strncpy.

Database developers don’t want the complexity or poor performance of posix. It’s wild to me that we still don’t have any alternative to fsync in Linux that can act as a barrier without also flushing caches at the same time.

2 more replies

ryao1y ago

Writes in the POSIX API can be atomic depending on the underlying filesystem. For example, small writes on ZFS through the POSIX API are atomic since they either happen in their entirety or they do not (during power failure), although if the writes are big enough (spanning many records), they are split into separate transactions and partial writes are then possible:

https://github.com/openzfs/zfs/blob/34205715e1544d343f9a6414...

Writes on ZFS cease to be atomic around approximately 32MB in size if I read the code correctly.

timewizard1y ago

> make whole file read/writes atomic with a copy-on-write model,

I have many files that are several GB. Are you sure this is a good idea? What if my application only requires best effort?

> eliminate whole classes of filesystem bugs pretty quickly.

Block level deduplication is notoriously difficult.

> where the only kind of write allowed is to atomically append a chunk of data to the file

Which sounds good until you think about the complications involved in block oriented storage medium. You're stuck with RMW whether you think you're strictly appending or not.

josephg1y ago

It doesn’t have to be one or the other. Developers could decide by passing flags to open.

But even then, doing atomic writes of multi gigabyte files doesn’t sound that hard to implement efficiently. Just write to disk first and update the metadata atomically at the end. Or whenever you choose to as a programmer.

The downside is that, when overwriting, you’ll need enough free space to store both the old and new versions of your data. But I think that’s usually a good trade off.

It would allow all sorts of useful programs to be written easily - like an atomic mode for apt, where packages either get installed or not installed. But they can’t be half installed.

2 more replies

emmelaich1y ago

Some of the problems transcend POSIX. Someone I know maintains a non-relational db on IBM mainframes. When diving into a data issue, he was gob-smacked to find out that sync'd writes did not necessarily make it to the disk. They were cached in the drive memory and (I think) the disk controller memory. If all failed, data was lost.

mangamadaiyan1y ago

This is precisely why well-designed enterprise-grade storage systems disable the drive cache and rely upon some variant of striping to achieve good I/O performance.

hackit21y ago

Just wait till he has to deal with raid controllers.

MisterTea1y ago

I use Plan 9 regularly and while its Unix heritage is there, it most certainly isn't Unix and completely does away with POSIX.

j / k navigate · click thread line to collapse

0 comments

jcranmer1y ago

Not really, there's been lots of APIs that have improved on the POSIX model.

Joker_vD1y ago

> Most applications can get by with file-level atomicity--make whole file read/writes atomic with a copy-on-write model, and you can eliminate whole classes of filesystem bugs pretty quickly.

    int fd = open(".config", O_RDWR | O_CREAT | O_SYNC_ON_CLOSE, 0o666);

    // effects of calls to write(2)/etc. are invisible through any other file description
    // until the close(2) is called on all descriptors to this file description.

    close(fd);

So now you can watch for e.g. either IN_MODIFY or IN_CLOSE_WRITE (and you don't need to balance it with IN_OPEN), it doesn't matter, you'll never see partial updates... would be nice!

BobbyTables21y ago

Surely this can’t always be true?

What happens when a lot of data is written and exceeds the dirty threshold?

1 more reply

kragen1y ago

josephg1y ago

2 more replies

ryao1y ago

https://github.com/openzfs/zfs/blob/34205715e1544d343f9a6414...

Writes on ZFS cease to be atomic around approximately 32MB in size if I read the code correctly.

timewizard1y ago

> make whole file read/writes atomic with a copy-on-write model,

I have many files that are several GB. Are you sure this is a good idea? What if my application only requires best effort?

> eliminate whole classes of filesystem bugs pretty quickly.

Block level deduplication is notoriously difficult.

> where the only kind of write allowed is to atomically append a chunk of data to the file

Which sounds good until you think about the complications involved in block oriented storage medium. You're stuck with RMW whether you think you're strictly appending or not.

josephg1y ago

It doesn’t have to be one or the other. Developers could decide by passing flags to open.

The downside is that, when overwriting, you’ll need enough free space to store both the old and new versions of your data. But I think that’s usually a good trade off.

It would allow all sorts of useful programs to be written easily - like an atomic mode for apt, where packages either get installed or not installed. But they can’t be half installed.

2 more replies

emmelaich1y ago

mangamadaiyan1y ago

This is precisely why well-designed enterprise-grade storage systems disable the drive cache and rely upon some variant of striping to achieve good I/O performance.

hackit21y ago

Just wait till he has to deal with raid controllers.

MisterTea1y ago

I use Plan 9 regularly and while its Unix heritage is there, it most certainly isn't Unix and completely does away with POSIX.

j / k navigate · click thread line to collapse