undefined | Better HN

0 pointsnaikrovek4y ago0 comments

at that point just use a RAM disk and periodically write that data to physical disk or SSD. no extreme tradeoff required, because RAM disks are WAY faster than SSDs.

manhandling /dev/nvme0 seems equally likely to corrupt data in the event of a power failure.

0 comments

wtallis4y ago

> manhandling /dev/nvme0 seems equally likely to corrupt data in the event of a power failure.

If we make the reasonable assumption that this subthread is discussing a server use case, then we can assume that the SSD is tolerant of power failures and has the capacitors necessary to finish any cached writes it has reported as complete. Thus, having fewer layers between the hardware and the application means there are fewer opportunities for some layer to lie to those above it about whether the data has made it to persistent storage.

Whether or not you're bypassing large parts of the operating system's IO stack, the application needs to have a clear idea of what data needs to be flushed to persistent storage at what times in order to properly survive unexpected power loss without unnecessary data loss or corruption.

10000truths4y ago

> at that point just use a RAM disk and periodically write that data to physical disk or SSD. no extreme tradeoff required, because RAM disks are WAY faster than SSDs.

A storage application that need to bypass the filesystem will already be implementing its own caching system anyways. The idea is to persist the data to maintain durability without sacrificing latency.

> manhandling /dev/nvme0 seems equally likely to corrupt data in the event of a power failure.

That is what O_SYNC flag is for.

natmaka4y ago

Given enough RAM on a Linux machine one may use tmpfs, which maintains a RAM disk and at any moment only uses the amount of RAM needed, with a pre-defined limit.

On PostgreSQL create an adequately-caped tmpfs, create a TABLESPACE on it, then store temporary tables into this TABLESPACE. No SSD (I have access to) beats this. Hint: before shutting PG down you may DROP this TABLESPACE.

It also is useful for a blockchain, amazingly fast (and a relief for HDDs), in most cases alleviating the need for a SSD. Place the blockchain file(s) on the tmpfs mount. Before machine shutdown stop any blockchain-using software, then store a compressed copy of the blockchain file(s) on permanent storage (I use "zstd -T0 --fast"...), and upon reboot restore it on the tmpfs mount. If anything fails the blockchain-writing software will re-download any missing block.

adrian_b4y ago

While tmpfs can be very useful even as it is, users must beware that copying a file from another Linux file system to tmpfs can lose a part of the file metadata, without giving any warnings or errors.

The main problem is that copying a file to tmpfs will drop extended attributes. Old versions of tmpfs dropped all extended attributes, modern versions of tmpfs keep some security-related extended attributes, but they still drop any user-defined extended attributes.

Old versions of tmpfs truncated some high-resolution timestamps, e.g. those coming from xfs, but I do not know if this still happens on modern versions of tmpfs.

Before learning these facts, I could not understand while some file copies lost parts of their metadata, after being copied via /tmp between 2 different users, on a multi-user computer where /tmp was mounted on tmpfs.

Now that I know, when I have to copy a file via tmpfs, I have to make a pax archive, which preserves file metadata. Older tar archive formats may have the same problems like tmpfs.

raffraffraff4y ago

Isn't this extremely dangerous? Disk write caches aren't used most of the time, except on battery backed HBAs. And databases are typically configured to use O_DIRECT for a reason: COMMITs are supposed to be durable. We had this fight at a previous company when an engineer based database server hardware recommendation on a dangerously misconfigured database server, and did not consider the effect of caches. As soon as a safe configuration was used in production, performance dropped off a cliff, particularly on random IO. So the question we had to ask was: do you want to trade durability for performance? Or do you now have to carve up your databases into shards that fit the IO performance characteristics of the badly chosen servers you purchased, and waste rack space and CPU power?

barrkel4y ago

Parent is talking about temporary tables. Those are normally only live for the duration of a transaction (well, session, but in practice if you're using temporary tables across multiple transactions you have a logical application-level transaction which needs to be able to handle failure part-way through). After your transaction the writes to non-temporary tables should be persistent.

Postgres temp tables on ramdisk are a problem for a different reason, the WAL, as pointed to by a sibling comment.

2 more replies

polskibus4y ago

Could you relate your day experience to 2ndquandrant's (contradictory?) advice?

https://www.2ndquadrant.com/en/blog/postgresql-no-tablespace...

natmaka4y ago

TEMPORARY tables are UNLOGGED, and therefore they aren't WALed

See https://www.postgresql.org/message-id/CAB7nPqTkZvESuZ3qcN_Tj...

j / k navigate · click thread line to collapse

0 comments

wtallis4y ago

> manhandling /dev/nvme0 seems equally likely to corrupt data in the event of a power failure.

10000truths4y ago

> at that point just use a RAM disk and periodically write that data to physical disk or SSD. no extreme tradeoff required, because RAM disks are WAY faster than SSDs.

> manhandling /dev/nvme0 seems equally likely to corrupt data in the event of a power failure.

That is what O_SYNC flag is for.

natmaka4y ago

Given enough RAM on a Linux machine one may use tmpfs, which maintains a RAM disk and at any moment only uses the amount of RAM needed, with a pre-defined limit.

adrian_b4y ago

Old versions of tmpfs truncated some high-resolution timestamps, e.g. those coming from xfs, but I do not know if this still happens on modern versions of tmpfs.

Now that I know, when I have to copy a file via tmpfs, I have to make a pax archive, which preserves file metadata. Older tar archive formats may have the same problems like tmpfs.

raffraffraff4y ago

barrkel4y ago

Postgres temp tables on ramdisk are a problem for a different reason, the WAL, as pointed to by a sibling comment.

2 more replies

polskibus4y ago

Could you relate your day experience to 2ndquandrant's (contradictory?) advice?

https://www.2ndquadrant.com/en/blog/postgresql-no-tablespace...

natmaka4y ago

TEMPORARY tables are UNLOGGED, and therefore they aren't WALed

See https://www.postgresql.org/message-id/CAB7nPqTkZvESuZ3qcN_Tj...

j / k navigate · click thread line to collapse