ZFS for Dummies (opens in new tab)

(ikrima.dev)

414 pointsgiis2y ago164 comments

164 comments

I'm getting started with ZFS just now. The learning curve is steeper than I expected. I would love to have a dumbed down wrapper that made the common case dead-simple. For example:

- Use sane defaults for pool creation. ashift=12, lz4 compression, xattr=sa, acltype=posixacl, and atime=off. Don't even ask me.

- Make encryption just on or off instead of offering five or six options

- Generate the encryption key for me, set up the systemd service to decrypt the pool at start up, and prompt me to back up the key somewhere

- `zfs list` should show if a dataset is mounted or not, if it is encrypted or not, and if the encryption key is loaded or not

- No recursive datasets and use {pool}:{dataset} instead of {pool}/{dataset} to maintain a clear distinction between pools and datasets.

- Don't make me name pools or snapshots. Assign pools the name {hostname}-[A-Z]. Name snapshots {pool name}_{datetime created} and give them numerical shortcuts so I never have to type that all out

- Don't make me type disk IDs when creating pools. Store metadata on the disk so ZFS doesn't get confused if I set up a pool with `/dev/sda` and `/dev/sdb` references and then shuffle around the drives

- Always use `pv` to show progress

- Automatically set up weekly scrubs

- Automatically set up hourly/daily/weekly/monthly snapshots and snapshot pruning

- If I send to a disk without a pool, ask for confirmation and then create a new single disk pool for me with the same settings as on the sending pool

- collapse `zpool` and `zfs` into a single command

- Automatically use `--raw` when sending encrypted datasets, default to `--replicate` when sending, and use `-I` whenever possible when sending

- Provide an obvious way to mount and navigate a snapshot dataset instead of hiding the snapshot filesystem in a hidden directory

gruturo2y ago

Uh you are mixing sensible suggestions (ashift=12 as a default, {pool}:{dataset} syntax, though it will be hard to change so late) with very, very opinionated ones, which break use cases you may not be aware of.

Naming pools after hostnames: I have pools on a SAN which can be imported by more than one host.

Weekly scrubs, periodic snapshots, periodic pruning: This is really the job of the OS' scheduler (an equally opinionated view, I admit)

collapsing zpool and zfs commands - sure but why? so you can have zfs -pool XXXX and zfs -volume XXXX?

No recursive datasets? I have use cases where it's very useful.

`zfs list` should show if a dataset is mounted or not, if it is encrypted or not, and if the encryption key is loaded or not: Fully agree!

Don't make me type disk IDs when creating pools: You can address them in 3-4 different ways (by id, by WWN, by label, by sdX etc), and you have to specify in _some_ way which disks you want to go there, so not sure what's the point here.

Store metadata on the disk so ZFS doesn't get confused if I set up a pool with `/dev/sda` and `/dev/sdb` references and then shuffle around the drives: Already happening. Swap a few drives around and import the pool, it will find them.

Some of your suggestions are genuinely OK, at least as defaults, but some indicate you aren't really considering much outside your own usage pattern and needs. ZFS caters to a lot more people than you.

istjohn2y ago

I'm not suggesting zfs itself change. I'd like a porcelain for people with very simple needs where zfs is almost overkill, for people who just want better data integrity and quicker backups on their main machine, for example.

I think zpool is unnecessary as an additional command. For example, `zfs scrub`, `zfs destroy [pool | dataset]`, `zfs add`, `zfs remove` would all have clear meanings. There may be a couple commands that would need explicit disambiguation with a flag like `zfs create`.

1 more reply

simondotau2y ago

> ZFS caters to a lot more people than you.

And under the OP's proposal, those people would continue to ZFS entirely unaffected. The OP wasn't proposing changing the behaviour of ZFS, but rather "wrapping" this defined set into a well-defined recipe which could be used by people who aren't so opinionated.

This "dumbed down wrapper" wouldn't even need to be called ZFS, to avoid confusion. Personally I'd like to propose the name ZzzFS: which is ZFS made so simple you can do it in your sleep...

2 more replies

traceroute662y ago

> so ZFS doesn't get confused if I set up a pool with `/dev/sda` and `/dev/sdb`

To be fair, that's not ZFS's problem, that is your problem for not keeping up with the times. PBCAK.

For quite some time now, Linux has had fully-qualified references, e.g. : `/dev/disk/by-id/ata-$manufactuer-$serial-$whatever`

That is what you should be using when building your pools.

istjohn2y ago

My problem is that that's a pain to type out. I've read that it's necessary (and other's have said it's not), but it'd be more convenient to just do `mirror /dev/sda /dev/sdb` than `mirror /dev/disk/by-id/ata-WDC_WDBNCE5000PNC_21365M802768 /dev/disk/by-id/ata-Samsung_SSD_850_EVO_500GB_S2RANX0HA52854X`

7 more replies

tuetuopay2y ago

OTOH you have BTRFS where you can use whatever you want, and just find all disks using the filesystem ID to join the array. Works like a charm and you never have to think about it.

2 more replies

rollcat2y ago

A lot of these suggestions are heavily opinionated. Which is not necessarily bad, but they seem to mess with existing conventions just for the sake of it (why {pool}:{dataset}?).

> Don't make me name [...] snapshots.

You might like this little tool I wrote: https://github.com/rollcat/zfs-autosnap

You put "zfs-autosnap snap" in cron hourly (or however often you want a snapshot), and "zfs-autosnap gc" in cron daily, and it takes care of maintaining a rolling history of snapshots, per the retention policy.

It's not hard writing simple ZFS command wrappers, feel free to take my code and make your own tools.

istjohn2y ago

Fair point on the {pool}:{dataset} thing. I just don't like that the same {pool} name refers to both a pool of vdevs and the top-level dataset on that pool. It makes it that little bit harder to grok the distinction. Perhaps there's a better way to emphasize that difference.

1 more reply

Freaky2y ago

Nice, you reminded me of my own incomplete Rust rewrite of the Ruby ZFS snapshot script I wrote about a decade ago, and this bit of yak shaving that ended up derailing me: https://github.com/Freaky/command-limits

I ended up finishing neither, and should pick them back up!

(I snapshot in big chunks with xargs to try to minimise temporal smear - snapshots created in the same `zfs snapshot` command are atomic)

cduzz2y ago

I've read that one of the last task of a blacksmith apprentice is to make all the tools a workaday blacksmith would need to use to blacksmith. IE your last lesson is to make your own anvil, your own hammers, tongs, etc.

At $DAYJOB I wrote a bunch of scripts to mechanize building ZFS arrays for whatever expected deployment I'd imagined on that day. Among the tasks was to make luks encrypted volumes on which to put the zvols, standardize the naming schemes, sane defaults like ashift=12, lz4 compression, etc. (this was well before encryption was part of ZFS; I haven't updated the scripts since to support encryption in zfs since it's not really been a problem this way)

I don't remember many of these flags now, but have a script as reference for documentation, and others on the team don't need to know much about ZFS besides run make-zfs-big-mirror or make-big-zfs-undundant-raid0 and magic happens.

Eventually maybe even that stuff will be automated away by our provisioning, if we ever are in a position to provision systems more than 20 times per year.

godelski2y ago

Honestly this is one of the reasons I love ansible. I create scripts while doing it the first time. Then the rest is hitting go and forgetting. The scripts are the documentation, like you said. The hell if I'm remembering all those magic incantations. You only ever will remember what you frequently use, the rest is for off-brain storage.

feitingen2y ago

Last time I tested performance (~2 years ago), zfs on luks performed better than zfs encrypted datasets, on sequential reads, almost twice as good. This was on particularly slow hard drives.

Not sure why, and I should probably make the test reproducable.

mustache_kimono2y ago

As others have noted, these are really opinionated suggestions. And while it's perfectly fine to have an opinion, many of these vary between "this isn't the way I'm used to Linux doing it" to the actually objectionable.

The ones I find most personally objectionable:

> - Don't make me name pools or snapshots. Assign pools the name {hostname}-[A-Z]. Name snapshots {pool name}_{datetime created} and give them numerical shortcuts so I never have to type that all out

Not naming pools is just bonkers. You don't create pools often enough to not simply name them.

Re: not naming snapshots, you could use `httm` and `zfs allow` for that[0]:

    $ httm -S .
    httm took a snapshot named: rpool/ROOT/ubuntu_tiebek@snap_2022-12-14-12:31:41_httmSnapFileMount

> - collapse `zpool` and `zfs` into a single command

`zfs` and `zpool` are just immaculate Unix commands, each of which has half a dozen sub commands. One the smartest decisions the ZFS designers made was not giving you a more complicated single administrative command.

> - Provide an obvious way to mount and navigate a snapshot dataset instead of hiding the snapshot filesystem in a hidden directory

Again -- you can do this very easily via `zfs mount`, but you'll have to trust me that a stable virtual interface also makes it very easy to search for all file versions, something which is much more difficult to achieve with btrfs, et. al. See again `httm` [1].

[0]: https://kimono-koans.github.io/opinionated-guide/#dynamic-sn... [1]: https://github.com/kimono-koans/httm

formerly_proven2y ago

> I would love to have a dumbed down wrapper that made the common case dead-simple.

TrueNAS

highpost2y ago

This. ZFS for Dummies == TrueNAS.

barrkel2y ago

It's interesting; I'm the kind of person who feels uncomfortable using something without understanding the shape of the stack underneath it. A magic black box is anathema; I want to know how to use the thing with mechanical sympathy, aligning my work with the grain of the implementation details. I want to understand error messages when they happen. I want to know how to diagnose something when it breaks, especially when it's something as important as my data.

I like how ZFS is put together. I've been running it for about 13 years. I started with Nexenta, a Solaris fork with Debian userland. I've ported my pool twice, had a bunch of HDD failures, and haven't lost a single byte.

I agree with you on most of the encryption stuff. That is very recent and not fully integrated and the user experience isn't fully baked. I don't agree on unifying zpool and zfs; for a good long time, I served zvols from my zpool, and dividing up storage management and its redundancy configuration from file system management makes sense to me. Similarly, recursive datasets make sense; you want inheritance or something very like it when managing more than a handful of filesystems. I don't agree on pool names (why anyone would want ordinal pool naming and just replicate the problem you just stated re sda, sdb etc. is a bit mysterious), and I don't agree on snapshots (to me this is like preferring commit IDs in git to branch and tag names - manually created snapshots outside periodic pruning should be named).

ZoL on Ubuntu does periodic scrubs by default now. Sometimes I have to stop them because they noticeably impact I/O too much. Periodic snapshots is one of the first cronjobs I created on Nexenta, and while there's plenty of tooling, it also needs configuration - if you are not aware of it, it's an easy way to retain references to huge volumes of data, depending on use case. Not all of my ZFS filesystems are periodically snapshotted the same way.

istjohn2y ago

I see the utility in recursive datasets and I wouldn't want them to go away, but if I were creating a zfs-for-dummies I wouldn't include the functionality. You'd have to drop down to the raw zpool/zfs commands to get that.

Likewise, I appreciate being able to name snapshots, but it's annoying to have to manually name the snapshot I create in order to zfs send. The solution there is probably to not make me take a manual snapshot in the first place. `zfs send` should automatically make the snapshot for me. But in general, I don't see why zfs can't default to a generic name and let me override it with a `--name` flag.

Giving it more thought, I think I would keep pool naming. What I don't like is the possibility of having pool name collisions which isn't something you have to think about with, say, ext4 filesystems. But the upshot, as you point out, is with zfs you aren't stuck using sda, sdb, etc.

1 more reply

gigatexal2y ago

I’m on the other end of the spectrum. I like knowing the flags and settings I use to create the pools.

For snapshots and replication take a look at sanoid (https://github.com/jimsalterjrs/sanoid).

aborsy2y ago

— when destroying a dataset, please ask for confirmation before permanently deleting it

— please provide support for multiple key slots as in LUKS

— please build in the functionality of sanoid and syncoid, so that snapshots and replication don’t need a third party tool

— please build a usable deduplication, so that we don’t have to use external tools such as Restic or Borg

Modified30192y ago

I’m not sure what restic or borg are or do for you, but there is an upcoming feature called “block cloning” that is a sort of on-demand version deduplication that offers new advantages for certain setups.

https://www.bsdcan.org/events/bsdcan_2023/sessions/session/1...

1 more reply

istjohn2y ago

Do you use Restic or Borg with ZFS? How do you have that set up? Do you use them in lieu of zfs send/recv?

2 more replies

tomxor2y ago

> - Make encryption just on or off instead of offering five or six options

You can set "encryption=on", and it will select the default strongest option, currently AES-256-GCM

> - Generate the encryption key for me, set up the systemd service to decrypt the pool at start up, and prompt me to back up the key somewhere

Technically it does generate encryption keys internally, which is why the ones you provide can be rotated out. If you use a keyfile then automount with key load is easy (zfs mount -al), there is already an auto mount systemd service created automatically for Debian, however they did not add the -l flag for auto loading keys because they got stuck in a debate about supporting passphrase prompts at boot. For now you can simply edit it to add the -l flag and it work fine for datasets with keyfiles.

> - Don't make me type disk IDs when creating pools. Store metadata on the disk so ZFS doesn't get confused if I set up a pool with `/dev/sda` and `/dev/sdb` references and then shuffle around the drives

This is no longer the case for ZoL. I know because there is an issue with Linode for storage device identifier assignments where they consistently get jumbled up with Debian on every boot. ZFS finds the devices all the same, even if it has a different identifier every boot. I believe this is because it stores it's own UUID info on the devices. So you can create pools by referring to devices however you like, because they are only a temporary reference, i.e use /dev/sda etc (and I have, and it's fine). I think there is a lot of outdated advice about this floating around still.

> - Automatically set up weekly scrubs

This might be ZoL specific, but, the Debian package does exactly this, sets up systemd weekly scrub.

> - No recursive datasets

Why? this is too useful... inherited encryption roots, datasets with different properties so you can have databases and other filesystems all under the same root dataset, which can then be recursive replicated in one command. If you have no need for recursive datasets just don't use them, but they have many valid purposes.

paulmd2y ago

this is honestly hard because many of the decisions that matter are not things you type into zfs at all (except incidentally).

how many disks per vdev? how much memory? etc

a lot of the things you've outlined are not universal at all, just situational

istjohn2y ago

Yes, there is a lot of essential complexity that is unavoidable, but there are a lot of people like me who just want a better desktop file system, and we don't need to know about SLOGs and L2ARCs and the half dozen compression algorithms, etc. It's situational, but it's a common enough situation that a targeted solution would be valuable.

3 more replies

realusername2y ago

I'm also firmly on the "convention over configuration" mindset, sensible defaults should be available and anybody else can still tinker if needed. I also admit that setting up zfs wasn't easy for me (and I'm sure I haven't done it 100% properly)

LanternLight832y ago

I get that it's meant but be wrapper for common use cases, but I use ZFS on root and had to craft a custom initfs. The other nice-ities sound great (some BTRFS inspiration?), but some of these are at distro-devel level.

1 more reply

znpy2y ago

> The learning curve is steeper than I expected

In fairness, learning about zfs is like learning about mdadm, lvm and a filesystem all at once… so it’s kinda justifiable in my opinion

pmarreck2y ago

I'd write a wrapper to do much of that to automate whatever my particular use-case is.

One pattern I've found useful when writing wrapper shell scripts: Output the actual command(s) that actually get run, to stderr in yellow, before running them. This also serves as a sanity check.

cryptonector2y ago

> - Generate the encryption key for me, set up the systemd service to decrypt the pool at start up, and prompt me to back up the key somewhere

This should have an option to integrate use of a TPM for super-encrypting the ZFS encryption key(s).

cryptonector2y ago

> - collapse `zpool` and `zfs` into a single command

Nooo, that should not be done. They are very different tools, for very different things.

CalChris2y ago

I expect that that’s what Apple would have done if they had adopted ZFS, sensible defaults for the common user.

zb109482y ago

This is a very funny list.

"Sane defaults", are you going to be arbiter of sanity? With the #3 recommendation, "set up systemd for me", I'd rather not have you at that position.

Most of the bullets you wrote induce a "...why?" thought in somebody that has ZFS experience. Why would you unify zpool and zfs? Why would you want automatic weekly scrubs on as default? Do you realize what ZFS scrubbing is and when is the time to perform it?

I'm a bit agitated by your writing I must confess. You want ZFS to exactly reflect your basic use case so you don't have to move your little finger (automatic naming, automagical configuration). It's not meant to be a hands-off filesystem, you are expected to understand encryption in ZFS in order to use it.

But the most annoying thing is that you did see a steeper learning curve, and want to avoid it. Why don't you write your own ZFS provisioning tool? Why are you still using /dev/sda and not disk-by-UUID or something more 2023? Etc.

istjohn2y ago

Like I said, I just want a wrapper to make this stuff easier for me. It's just like not everyone wants all that comes with running Arch. Some people like having Ubuntu handle the nitty gritty of a Linux OS. What's wrong with that?

tripleo12y ago

> It's not meant to be a hands-off filesystem,

vermaden2y ago

Other useful things about ZFS:

- get to know the difference between zpool-attach(8) and zpool-replace(8).

- this one will tell you where your space is used:

    # zfs list -t all -o space
    NAME                      AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
    (...)

- ZFS Boot Environments is the best feature to protect your OS before major changes/upgrades

--- this may be useful for a start: https://is.gd/BECTL

- this command will tell you all history about ZFS pool config and its changes:

    # zpool history poolname
    History for 'poolname':
    2023-06-20.14:03:08 zpool create poolname ada0p1
    2023-06-20.14:03:08 zpool set autotrim=on poolname
    2023-06-20.14:03:08 zfs set atime=off poolname
    2023-06-20.14:03:08 zfs set compression=zstd poolname
    2023-06-20.14:03:08 zfs set recordsize=1m poolname
    (...)

- the guide misses one important info:

  --- you can create 3-way mirror - requires 3 disks and 2 may fail - still no data lost

  --- you can create 4-way mirror - requires 4 disks and 3 may fail - still no data lost

  --- you can create N-way mirror - requires N disks and N-1 may fail - still no data lost

  (useful when data is most important and you do not have that many slots/disks)

65a2y ago

N-way mirrors also have the property that ZFS can shard reads across them, which mattered a lot on spinning rust, since iops can be limited.

customizable2y ago

We have been running a large multi-TB PostgreSQL database on ZFS for years now. ZFS makes it super easy to do backups, create test environments from past snapshots, and saves a lot of disk space thanks to built-in compression. In case anyone is interested, you can read our experience at https://lackofimagination.org/2022/04/our-experience-with-po...

photon_lines2y ago

Nice - thanks for the info! I had no idea about the Toy Story 2 fiasco as well so this was a great read :)

customizable2y ago

Thanks, glad you liked it.

qwertox2y ago

FreeBSD's Handbook on ZFS [0] and Aaron Toponce's articles [1] were what helped me the most when getting started with ZFS

[0] https://docs.freebsd.org/en/books/handbook/zfs/

[1] https://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux...

CTDOCodebases2y ago

I love FreeBSD's docs.

I had an old HP Microserver with 1GB of ECC RAM lying around so I installed FreeBSD on it. I had 5 old 500GB hard drives lying around too so I set them up in a 5x mirror with help from the FreeBSD Handbook. First time using FreeBSD and it was a breeze.

philsnow2y ago

One of the diagrams under the bit about snapshotting has a typo reading "snapthot" and I immediately thought it was talking about instagram.

(I realize now after writing it that maybe snapchat should have occurred to me first, but I have never used it)

tomxor2y ago

I recently rebuilt a load of infrastructure (mainly LAMP servers) and decided to back them all with ZFS on Linux for the benefit of efficient backup replication and encryption.

I've been using ZFS in combination with rsync for backups for a long time, so I was fairly comfortable with it... and it all worked out, but it was a way bigger time sink than I expected - because I wanted to do it right - and there is a lot of misleading advice on the web, particularly when it comes to running databases and replication.

For databases (you really should at minimum do basic tuning like block size alignment), by far the best resource I found for mariadb/innoDB is from the lets encrypt people [0]. They give reasons for everything and cite multiple sources, which is gold. If you search around the web elsewhere you will find endless contradicting advice, anecdotes and myths that are accompanied with incomplete and baseless theories. Ultimately you should also test this stuff and understand everything you tune (it's ok to decide to not tune something).

For replication, I can only recommend the man pages... yeah, really! ZFS gives you solid replication tools, but they are too agnostic, they are like git pluming, they don't assume you're going to be doing it over SSH (even though that's almost always how it's being used)... so you have to plug it together yourself, and this feels scary at first, especially because you probably want it to be automated, which means considering edge cases... which is why everyone runs to something like syncoid, but there's something horrible I discovered with replication scripts like syncoid, which is that they don't use ZFS's send --replication mode! They try to reimplement it in perl, for "greater flexibility", but incompletely. This is maddening when you are trying to test this stuff for the first time and find that all of the encryption roots break when you do a fresh restore, and not all dataset properties are automatically synced. ZFS takes care of all of this if you simply use the build in recursive "replicate" option. It's not that hard to script manually once you commit to it, just keep it simple, don't add a bunch of unnecessary crap into the pipeline like syncoid does, (they actually slow it down if you test), just use pv to monitor progress and it will fly.

I might publish my replication scripts at some point because I feel like there are no good functional reference scripts for this stuff that deal with the basics without going nuts and reinventing replication badly like so many others.

[0] https://github.com/letsencrypt/openzfs-nvme-databases

sgarland2y ago

They mention tuning io_capacity and io_capacity_max, which unfortunately the MySQL docs indicate is useful until you click through to see what the parameters do [0]. They control background IO actions like change buffer merges, and in fact will take IO from the main process that needs it for work.

IME with a decently busy (120K QPS) MySQL DB is that you do not need to touch either of these. If you think you do, monitor the time to fill the redo log, and the dirty page percent in the buffer pool. There are probably other parameters you should tune instead.

[0] https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.ht...

tomxor2y ago

To be fair they aren't going nuts with them, I've seen worse examples. But I agree with you in principle, it's not necessary, and potentially harmful to overall performance. It also doesn't really belong in a ZFS tuning guide.

yjftsjthsd-h2y ago

> For databases (you really should at minimum do basic tuning like block size alignment),

One unexpected thing to check (and do check, because your mileage will vary) - the suggestion is usually to align record sizes, which in practice tends to mean reducing the record size on the ZFS filesystem holding the data. I don't doubt that this is at some level more efficient, but I can empirically tell you that it kills compression ratios. Now the funny knock-on effect is that it can - and again, I say can because it will vary by your workload - but it can actually result in worse throughput if you're bottlenecked on disk bandwidth, because compression lets you read/write data faster than the disk is physically capable of, so killing that compression can do bad things to your read/write bandwidth.

tomxor2y ago

I know what you're getting at, I wondered the same thing, but my results were the opposite of what I expected for compression.

I enabled lz4 compression and set recordsize for database datasets to 16k to match innoDB... turns out even at 16k my databases are extremely compressible 3-4x AFAIR (I didn't write the DB schema for the really big DBs, they are not great, and I suspect that there is a lot of redundant data even within 16k of contiguous data)... maybe I could get even more throughput with larger record sizes, but seems unlikely.

As you say, mileage will vary, it's subjective, but then I wasn't using compression before ZFS, so I don't have a comparison. I have only done basic performance testing, overall it's an improvement over ext4, but I've not been trying to fine tune it, I'm just happy to not have made it worse so far while gaining ZFS.

1 more reply

guerby2y ago

I started to use ZFS (on Linux) a few years ago and it went smoothly.

My only surprise was volblocksize default which is pretty bad for most RAIDZ configuration: you need to increase it to avoid loosing 50% of raw disk space...

Articles touching this topic :

https://jro.io/nas/#overhead

https://openzfs.github.io/openzfs-docs/Basic%20Concepts/RAID...

https://www.delphix.com/blog/zfs-raidz-stripe-width-or-how-i...

And you end up on one of the ZFS "spreadsheet" out there:

ZFS overhead calc.xlsx https://docs.google.com/spreadsheets/d/1tf4qx1aMJp8Lo_R6gpT6...

RAID-Z parity cost https://docs.google.com/spreadsheets/d/1pdu_X2tR4ztF6_HLtJ-D...

rhinoceraptor2y ago

In my opinion, the 50% efficiency of mirror vdevs is a fair price to pay for the simplicity and greatly improved performance. You can grow RAIDZ pools now, but it's still a lot more complicated and doesn't perform as well.

tweetle_beetle2y ago

Might not remember the details correctly but when I was younger and stupider I read a lot about how great one of the open source NAS OSs (FreeNAS?) and ZFS were from fervent fans. I bought a very low spec second hand HP micro server on eBay and jumped straight in without really knowing what I was doing. I asked a few questions on the community forum but the vast majority of answers were "Have you read the documentation?!" "Do you have enough RAM?!".

The documentation in question was a PowerPoint presentation with difficult to read styling, somewhat evangelical language, lots of assumptions about knowledge and it was not regularly updated. It was vague on how much RAM was required, mainly just focused on having as much as possible. Needless to say I ignored all the red flags about the technology, the hype and my own knowledge and lost a load of data. Lots of lessons learnt.

andruby2y ago

Can you roughly remember how long ago that was? ZFS has been around since the earlier 2000's, with FreeNAS starting in 2005 iirc.

The filesystem has gotten a lot more stable, and imo the documentation clearer.

That said, it's "more powerful and more advanced" than traditional journaling filesystems like ext3, and thus comes with more ways to shoot yourself in the foot.

unethical_ban2y ago

Some additional points for posterity, in case it isn't driven home here:

- All redundancy in ZFS is built in the vdev layer. Zpools are created with one or more vdevs, and no matter what, if you lose any single vdev in a zpool, the zpool is permanently destroyed.

- Historically RAIDZs (parity RAIDs) cannot be expanded by adding disks. The only way to grow a RAIDZ is to replace each disk in the array one at a time with a larger disk (and hope no disks fail during the rebuild). So in my very amateur opinion, I would only consider doing a RAIDZ if it is something like a RAIDZ2 or 3 with a large number of disks. For n<=6 and if the budget can stand it, I would do several mirrored vdevs. (Again as an amateur I am less familiar with RW performance metrics of various RAIDs so do more research for prod).

sgarland2y ago

Pool of mirrors is usually the safer way, yes.

If and only if you a. Have full, on-site backups b. Are fairly sure of your abilities and monitoring then I can suggest RAIDZ1. I have a pool of 3x3 drives, which ships its snapshots a few U down in my rack to the backup target that wakes up daily, and has a pool of 3x4 drives, also in RAIDZ1.

In the event that I suffer a drive failure in my NAS, my plan of action would be to immediately start up the backup, ingest snapshots, and then replace the drive. That should minimize the chance of a 2nd drive failure during resilvering destroying my data.

Truly important data, of course, has off-site as well.

mastax2y ago

I've run into a ZFS problem I don't understand. I have a zpool where zpool status prints out a list of detected errors, never in files or `<metadata>` but in snapshots (and hex numbers that I assume are deleted snapshots). If I delete the listed errored snapshots and run zpool scrub twice the errors disappear and the scrub finds no errors. Zpool status never listed any errors for any of the devices.

So there aren't any errors in files. There aren't any errors in devices. There aren't any errors detected in scrub(?). And yet at runtime I get a dozen new "errors" showing up in zpool status per day. How?

Modified30192y ago

Damn good question. I don’t have time to search for duplicates myself right now, but you can look through/ask the mailing list: https://zfsonlinux.topicbox.com/groups/zfs-discuss (looks weird, but this is a legit web front end for the mailing list) and the github issues: https://github.com/openzfs/zfs/issues

unethical_ban2y ago

I've been running into the same issue, where occasionally files seem to get corrupted on the snapshot but also in the live version of the file. I cannot move it or modify it. I can only delete it. There's no indication as to why these files are getting corrupted. Thankfully there they are all large Linux ISOs, so it hasn't been critical to my life.

totetsu2y ago

Nice. My gotchas form using zfs on my personal laptop with Ubuntu.

- if you want to copy files for example and connect your drive to another system and mount your zpool there, it sets some pool membership value on the file system and when you put it back in your system it won’t boot unless you set it back. Which involved chroot

- the default settings I had made snapshot every time I apt installed something, because that snap shot included my home drive when I deleted big files thereafter I didn’t get any free space back until i figued out what was going on and arbitrarily deleted some old snapshots

- you can’t just make a swap file and use it,

Helmut100012y ago

> - if you want to copy files for example and connect your drive to another system and mount your zpool there, it sets some pool membership value on the file system and when you put it back in your system it won’t boot unless you set it back. Which involved chroot

Isn't this what `zpool export` is for?

dizhn2y ago

Opensuse Tumbleweed comes with snapper which works with btrfs in a similar fashion and /home is not included in the snapshot by default. For your use case too you should exclude /home from your apt triggered snapshots and set a separate one for it. I had scheduled snapshots for my /home at one point but since it's a very actively used directory (downloading isos, games then deleting them) I had similar problems to yours. I guess we could both also have a dedicated separate directory for those short lived huge files, which don't need a snapshot anyway.

totetsu2y ago

I found

  cat /etc/apt/apt.conf.d/90_zsys_system_autosnapshot
  // Takes a snapshot of the system before package changes.
  DPkg::Pre-Invoke {"[ -x /usr/libexec/zsys-system-autosnapshot ] && /usr/libexec/zsys-system-autosnapshot snapshot || true";};

  // Update our bootloader to list the new snapshot after the update is done to not block the critical path
  DPkg::Post-Invoke {"[ -x /usr/libexec/zsys-system-autosnapshot ] && /usr/libexec/zsys-system-autosnapshot update-menu || true";};

but how would I get this to not snapshot , say /home/Downloads .. make that its own zpool?

1 more reply

Dylan168072y ago

> I had scheduled snapshots for my /home at one point but since it's a very actively used directory (downloading isos, games then deleting them) I had similar problems to yours.

What kind of schedule was it? I feel like the low-impact alternative to no snapshots at all is daily snapshots for half a week to a week, and maybe some n-hourly snapshots that last a day or two. Which I would not expect to use up very much space.

1 more reply

idatum2y ago

I need 3 stores to feel I'm keeping safe years of digital family photos. 1) I have a live (local) FreeBSD ZFS server running for backups and snapshots; 2 pairs of mirrored physical drives 2) I have a USB device that takes 2 mirrored drives to recv ZFS snapshots from #1; I store that vdev backup in a safe place 3) I backup entire datasets to cloud storage from off-prem using rclone.

It's #3 where I need to do some more research/work. I need to spend some time sending snapshots/diffs to cloud blob storage and make sure I can restore. Yes, I know there is rsync.net.

Any experiences to share?

zyberzero2y ago

I bought a cheap HP Microserver with four 4 TB spinning disks that I placed at a relatives house ~1000 km from where I live. I do nightly replication to the off-site location, with an account on the receiving end that only has enough permissions to create snapshots and receive data, so even if that ssh key somehow got out in the wild things could not be deleted from the remote store. I hope :)

Clarification: Remote end also uses ZFS, so I can use cheap replication with encryption

havnagiggle2y ago

My setup is similar. +1 that Restic is great. My cloud backup was sending blobs to Google workspace, but they have clamped on storage. I will be replacing that with another box at my parents that will be tucked out of the way. I will just have that wireguard tunnel to my home network and send snapshots to it. At some point I'll turn down the workspace solution and probably also unsubscribe.

oniony2y ago

I'm using Borg to back up to rysnc.net.

Borg spilts your files up into chunks, encrypts them and dedupes them client-side and then syncs them with the server. Because of the deduping, versioning is cheap and you can configure how many daily, weekly, monthly, &c. copies to keep. For example you could keep 7 day's worth of copies, 6 monthly copies and 10 yearly copies.

Rysnc.net have special pricing for customers using Borg/Restic:

https://www.rsync.net/products/borg.html

https://www.rsync.net/products/restic.html

CTDOCodebases2y ago

I use Restic and backup to rsync.net for remote backups. Works great.

I'm not working with much data though so even if I wanted to I couldn't get a ZFS send/receive account with rsync.net. I like the way rsync.net give you separate credentials for managing the snapshots. This way even if my NAS gets compromised i will still have all the periodic snapshots.

For me privacy is my main concern and Restic's security model is good for me. The backup testing features are good too and rsync.net doesn't charge for traffic so these two work good together. I don't use the snapshots though because rsync.net already supports this via ZFS.

Maakuth2y ago

I have a similar setup, though with Linux. From my experience I can recommend taking a look at restic (https://restic.readthedocs.io/). It does encrypted and deduplicated snapshots to local and remote repositories. There's a good selection of remote target options available, but you can also use it with rclone to use any weird remote. Just remember to keep a backup of your encryption key somewhere besides the machine you back up ;-)

neverartful2y ago

Are you running regular scrubs with ZFS and checking the results?

idatum2y ago

In your experience, what is a good schedule for scrubs?

I do one about every month or so. I should probably add a crontab for that.

1 more reply

asicsp2y ago

footlose_38152y ago

"Also read up on the zpool add command."

Haha, The only part of maintenance that I need to look up every time I do it is replacing a faulty hard drive.

Even this guide skips that.

znpy2y ago

is there an equivalent "btrfs for dummies" ?

dontupvoteme2y ago

As nice as the technology is as long as there's the potential of a Damoclean license issue I'll always feel hesitant around ZFS.

(Hey looks like it's a sore spot!)

ggm2y ago

This is only going to get a shedload of "you do you" responses. BSD licencing is quite content to use ZFS, its pretty much ubiquitous now in widescale deployment and the source would be impossible to re-lock. Worst case is fork.

I very much regret the fragmentation of FS design, it has many mothers. "there can only be one" was never going to work, but we seem to have perhaps 4-5 more than we really need. ZFS manages to wrap up a number of behaviours cohesively with good version-dependent signalling so it should always be possible to know you're risking a non-reversible change to your flags. And, it keeps improving.

But, counter "it keeps improving" so do all the other current, maintained, developed FS and if somebody tells me they prefer to use Hammer, or one of the Linux FS models with a discrete volume-management and encryption layer, I don't think thats necessarily wrong.

Mainly I regret Apple walking away. That was about Oracle behaviour. It wasn't helpful. A lot of Apple's FS design ideas persist. I never got resource/data forks, it only ever appeared on my radar as .files in the UNIX AUFS backend model of them. Obviously inside Apples code, it was dealt with somehow. It felt like the wrong lessons about meta data had been learned. Maybe an Ex-VMS person went to Apple? Also Apple has a rather "yea maybe or no, dunno" view about case-independent or case-dependent naming. Time machine is good. Feels like it should fit ZFS well. Oh well.

chaxor2y ago

I wish zfs had a 'tag' type of feature like Apple style fs. I have thought about making an object store to get some of the metadata on the fs level, rather than having to make some cobbles together solution of an SQLite DB of metadata with the files or something, but many of the object store solutions seem to be for fairly large files, rather than a few thousand pictures of family and such. Unsure what a good solution using zfs would be.

1 more reply

lmm2y ago

You might want to consider using FreeBSD then, where there is no licensing issue. I've found it has all the stuff I liked about Linux, and less of the stuff I dislike.

seized2y ago

There is also Illumos/OpenIndiana/OmniOS, basically continuations of OpenSolaris. I have used OpenIndiana for many years and it's been dead reliable and stable.

There's quite a few "quality of life" differences like boot environments (boot into a pre upgraded OS state, even years old), built in SMB server with NFS v4 style ACLs, dtrace, built in snapshot scheduling and management, and Napp-It is an available web UI for management a la FreeNAS/TrueNAS.

It has a few differences, service management is quite different from other things, but overall very underrated as an OS I think.

1 more reply

tomxor2y ago

I was concerned about this aspect initially, but ZoL has been going for many years now. At this point even Debian (the most purist in terms of GPL considerations) includes ZoL in it's official repository. It doesn't distribute binaries, apt just builds the kernel module automatically.

Frankly, if even Debian can use it, it's a non-issue.

kaba02y ago

The license issue in the very worst case (but even that is quite questionable and I would say that Linus is a bit paranoid here) could only be a problem for a distro that ships it by default, you as the end user can freely use both linux and ZFS, as well as their combination.

wkat42422y ago

It's not really an issue with ZFS itself. It's a great filesystem. GPL isn't the only way to do open source. It's the way Linux has chosen and that has pros and cons. Not integrating so well with other open source licenses is one of the cons.

tylercrompton2y ago

> Not integrating so well with other open source licenses is one of the cons.

That depends on what you want. If you want a license that will play well with closed source software, then yeah, it's a downside. But the GPL family comes from the perspective of a developer who wants to retain their rights while respecting others' desire for the same. If you care about your rights, then this is an upside.

1 more reply

judge20202y ago

Even in such an event, I don’t see how it would affect end users in an extreme way. OpenZFS doesn’t call home to make sure it’s still legal to use, so worst that could happen is that they are forced to stop distribution and you need to stay on old software until you can move to another storage system.

NavinF2y ago

Not to mention Ubuntu included ZFS by default since 2016. I still remember all the doomsday predictions people made back then. People on HN and proggit kinda suck at legal

arjvik2y ago

Playing devil's advocate here, but what if Oracle decides to sue you, a company with only a few thousands in revenue, for their multi-million dollar licensing fee?

I agree this is unlikely, but so is someone being born with as much litigiousness as Larry Ellison.

2 more replies

unethical_ban2y ago

ZFS on Linux, and even Ubuntu distributing the binary form in their base OS, has been going on for years. I think you would be a huge event and very unlikely to occur if Oracle tried to pull some licensing bullshit.

As someone ignorant to file system development, I would almost expect something more likely to be BTRFS getting sued for copying a feature of ZFS or something like that.

jakobson142y ago

There's really no case to be made from the CDDL side. It's a weak per-file copyleft.

If anyone (Oracle or Linus Torvalds) launches a ZFS-related lawsuit, it'll be as an author of GPLv2 kernel code. For the time being the solution has been to ship ZFS separate from the kernel, as any module with a non-GPL licence typically does.

The biggest hurdle to ZFS isn't legal, but technical. The various teams working on ZFS has so far been able to keep up with kernel churn and symbols (eg FPU ones) being made GPL-only.

That said, the kernel devs have made it clear that they don't care if you're open source or proprietary, they will make changes and mark new symbols GPL-only to fuck with you regardless.

rodgerd2y ago

Do you refuse to use NVidia cards for the same reason?

yjftsjthsd-h2y ago

Honestly the nvidia thing was always annoying because it seems like the linux devs are more aggressive about licensing with an actual full-on Open Source project than the very much closed-source nvidia drivers. They even both use the same approach of having a GPL wrapper to talk to the non-GPL code AFAIK:\

mrweasel2y ago

That might be a bad example. I'd trust ZFS and the CDDL before NVIDIA any day. At least the CDDL is designed with the intention of being open attemps to work with other licenses. NVIDIA seems hellbent on working around the GPL.

2 more replies

dontupvoteme2y ago

No choice on that front.

1 more reply

j / k navigate · click thread line to collapse

164 comments

istjohn2y ago

I'm getting started with ZFS just now. The learning curve is steeper than I expected. I would love to have a dumbed down wrapper that made the common case dead-simple. For example:

- Use sane defaults for pool creation. ashift=12, lz4 compression, xattr=sa, acltype=posixacl, and atime=off. Don't even ask me.

- Make encryption just on or off instead of offering five or six options

- Generate the encryption key for me, set up the systemd service to decrypt the pool at start up, and prompt me to back up the key somewhere

- `zfs list` should show if a dataset is mounted or not, if it is encrypted or not, and if the encryption key is loaded or not

- No recursive datasets and use {pool}:{dataset} instead of {pool}/{dataset} to maintain a clear distinction between pools and datasets.

- Don't make me name pools or snapshots. Assign pools the name {hostname}-[A-Z]. Name snapshots {pool name}_{datetime created} and give them numerical shortcuts so I never have to type that all out

- Always use `pv` to show progress

- Automatically set up weekly scrubs

- Automatically set up hourly/daily/weekly/monthly snapshots and snapshot pruning

- If I send to a disk without a pool, ask for confirmation and then create a new single disk pool for me with the same settings as on the sending pool

- collapse `zpool` and `zfs` into a single command

- Automatically use `--raw` when sending encrypted datasets, default to `--replicate` when sending, and use `-I` whenever possible when sending

- Provide an obvious way to mount and navigate a snapshot dataset instead of hiding the snapshot filesystem in a hidden directory

gruturo2y ago

Naming pools after hostnames: I have pools on a SAN which can be imported by more than one host.

Weekly scrubs, periodic snapshots, periodic pruning: This is really the job of the OS' scheduler (an equally opinionated view, I admit)

collapsing zpool and zfs commands - sure but why? so you can have zfs -pool XXXX and zfs -volume XXXX?

No recursive datasets? I have use cases where it's very useful.

`zfs list` should show if a dataset is mounted or not, if it is encrypted or not, and if the encryption key is loaded or not: Fully agree!

istjohn2y ago

1 more reply

simondotau2y ago

> ZFS caters to a lot more people than you.

This "dumbed down wrapper" wouldn't even need to be called ZFS, to avoid confusion. Personally I'd like to propose the name ZzzFS: which is ZFS made so simple you can do it in your sleep...

2 more replies

traceroute662y ago

> so ZFS doesn't get confused if I set up a pool with `/dev/sda` and `/dev/sdb`

To be fair, that's not ZFS's problem, that is your problem for not keeping up with the times. PBCAK.

For quite some time now, Linux has had fully-qualified references, e.g. : `/dev/disk/by-id/ata-$manufactuer-$serial-$whatever`

That is what you should be using when building your pools.

istjohn2y ago

7 more replies

tuetuopay2y ago

OTOH you have BTRFS where you can use whatever you want, and just find all disks using the filesystem ID to join the array. Works like a charm and you never have to think about it.

2 more replies

rollcat2y ago

A lot of these suggestions are heavily opinionated. Which is not necessarily bad, but they seem to mess with existing conventions just for the sake of it (why {pool}:{dataset}?).

> Don't make me name [...] snapshots.

You might like this little tool I wrote: https://github.com/rollcat/zfs-autosnap

It's not hard writing simple ZFS command wrappers, feel free to take my code and make your own tools.

istjohn2y ago

1 more reply

Freaky2y ago

I ended up finishing neither, and should pick them back up!

(I snapshot in big chunks with xargs to try to minimise temporal smear - snapshots created in the same `zfs snapshot` command are atomic)

cduzz2y ago

Eventually maybe even that stuff will be automated away by our provisioning, if we ever are in a position to provision systems more than 20 times per year.

godelski2y ago

feitingen2y ago

Last time I tested performance (~2 years ago), zfs on luks performed better than zfs encrypted datasets, on sequential reads, almost twice as good. This was on particularly slow hard drives.

Not sure why, and I should probably make the test reproducable.

mustache_kimono2y ago

The ones I find most personally objectionable:

> - Don't make me name pools or snapshots. Assign pools the name {hostname}-[A-Z]. Name snapshots {pool name}_{datetime created} and give them numerical shortcuts so I never have to type that all out

Not naming pools is just bonkers. You don't create pools often enough to not simply name them.

Re: not naming snapshots, you could use `httm` and `zfs allow` for that[0]:

    $ httm -S .
    httm took a snapshot named: rpool/ROOT/ubuntu_tiebek@snap_2022-12-14-12:31:41_httmSnapFileMount

> - collapse `zpool` and `zfs` into a single command

> - Provide an obvious way to mount and navigate a snapshot dataset instead of hiding the snapshot filesystem in a hidden directory

[0]: https://kimono-koans.github.io/opinionated-guide/#dynamic-sn... [1]: https://github.com/kimono-koans/httm

formerly_proven2y ago

> I would love to have a dumbed down wrapper that made the common case dead-simple.

TrueNAS

highpost2y ago

This. ZFS for Dummies == TrueNAS.

barrkel2y ago

istjohn2y ago

1 more reply

gigatexal2y ago

I’m on the other end of the spectrum. I like knowing the flags and settings I use to create the pools.

For snapshots and replication take a look at sanoid (https://github.com/jimsalterjrs/sanoid).

aborsy2y ago

— when destroying a dataset, please ask for confirmation before permanently deleting it

— please provide support for multiple key slots as in LUKS

— please build in the functionality of sanoid and syncoid, so that snapshots and replication don’t need a third party tool

— please build a usable deduplication, so that we don’t have to use external tools such as Restic or Borg

Modified30192y ago

https://www.bsdcan.org/events/bsdcan_2023/sessions/session/1...

1 more reply

istjohn2y ago

Do you use Restic or Borg with ZFS? How do you have that set up? Do you use them in lieu of zfs send/recv?

2 more replies

tomxor2y ago

> - Make encryption just on or off instead of offering five or six options

You can set "encryption=on", and it will select the default strongest option, currently AES-256-GCM

> - Generate the encryption key for me, set up the systemd service to decrypt the pool at start up, and prompt me to back up the key somewhere

> - Automatically set up weekly scrubs

This might be ZoL specific, but, the Debian package does exactly this, sets up systemd weekly scrub.

> - No recursive datasets

paulmd2y ago

this is honestly hard because many of the decisions that matter are not things you type into zfs at all (except incidentally).

how many disks per vdev? how much memory? etc

a lot of the things you've outlined are not universal at all, just situational

istjohn2y ago

3 more replies

realusername2y ago

LanternLight832y ago

1 more reply

znpy2y ago

> The learning curve is steeper than I expected

In fairness, learning about zfs is like learning about mdadm, lvm and a filesystem all at once… so it’s kinda justifiable in my opinion

pmarreck2y ago

I'd write a wrapper to do much of that to automate whatever my particular use-case is.

One pattern I've found useful when writing wrapper shell scripts: Output the actual command(s) that actually get run, to stderr in yellow, before running them. This also serves as a sanity check.

cryptonector2y ago

> - Generate the encryption key for me, set up the systemd service to decrypt the pool at start up, and prompt me to back up the key somewhere

This should have an option to integrate use of a TPM for super-encrypting the ZFS encryption key(s).

cryptonector2y ago

> - collapse `zpool` and `zfs` into a single command

Nooo, that should not be done. They are very different tools, for very different things.

CalChris2y ago

I expect that that’s what Apple would have done if they had adopted ZFS, sensible defaults for the common user.

zb109482y ago

This is a very funny list.

"Sane defaults", are you going to be arbiter of sanity? With the #3 recommendation, "set up systemd for me", I'd rather not have you at that position.

istjohn2y ago

tripleo12y ago

> It's not meant to be a hands-off filesystem,

vermaden2y ago

Other useful things about ZFS:

- get to know the difference between zpool-attach(8) and zpool-replace(8).

- this one will tell you where your space is used:

    # zfs list -t all -o space
    NAME                      AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
    (...)

- ZFS Boot Environments is the best feature to protect your OS before major changes/upgrades

--- this may be useful for a start: https://is.gd/BECTL

- this command will tell you all history about ZFS pool config and its changes:

    # zpool history poolname
    History for 'poolname':
    2023-06-20.14:03:08 zpool create poolname ada0p1
    2023-06-20.14:03:08 zpool set autotrim=on poolname
    2023-06-20.14:03:08 zfs set atime=off poolname
    2023-06-20.14:03:08 zfs set compression=zstd poolname
    2023-06-20.14:03:08 zfs set recordsize=1m poolname
    (...)

- the guide misses one important info:

  --- you can create 3-way mirror - requires 3 disks and 2 may fail - still no data lost

  --- you can create 4-way mirror - requires 4 disks and 3 may fail - still no data lost

  --- you can create N-way mirror - requires N disks and N-1 may fail - still no data lost

  (useful when data is most important and you do not have that many slots/disks)

65a2y ago

N-way mirrors also have the property that ZFS can shard reads across them, which mattered a lot on spinning rust, since iops can be limited.

customizable2y ago

photon_lines2y ago

Nice - thanks for the info! I had no idea about the Toy Story 2 fiasco as well so this was a great read :)

customizable2y ago

Thanks, glad you liked it.

qwertox2y ago

FreeBSD's Handbook on ZFS [0] and Aaron Toponce's articles [1] were what helped me the most when getting started with ZFS

[0] https://docs.freebsd.org/en/books/handbook/zfs/

[1] https://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux...

CTDOCodebases2y ago

I love FreeBSD's docs.

philsnow2y ago

One of the diagrams under the bit about snapshotting has a typo reading "snapthot" and I immediately thought it was talking about instagram.

(I realize now after writing it that maybe snapchat should have occurred to me first, but I have never used it)

tomxor2y ago

I recently rebuilt a load of infrastructure (mainly LAMP servers) and decided to back them all with ZFS on Linux for the benefit of efficient backup replication and encryption.

[0] https://github.com/letsencrypt/openzfs-nvme-databases

sgarland2y ago

[0] https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.ht...

tomxor2y ago

yjftsjthsd-h2y ago

> For databases (you really should at minimum do basic tuning like block size alignment),

tomxor2y ago

I know what you're getting at, I wondered the same thing, but my results were the opposite of what I expected for compression.

1 more reply

guerby2y ago

I started to use ZFS (on Linux) a few years ago and it went smoothly.

My only surprise was volblocksize default which is pretty bad for most RAIDZ configuration: you need to increase it to avoid loosing 50% of raw disk space...

Articles touching this topic :

https://jro.io/nas/#overhead

https://openzfs.github.io/openzfs-docs/Basic%20Concepts/RAID...

https://www.delphix.com/blog/zfs-raidz-stripe-width-or-how-i...

And you end up on one of the ZFS "spreadsheet" out there:

ZFS overhead calc.xlsx https://docs.google.com/spreadsheets/d/1tf4qx1aMJp8Lo_R6gpT6...

RAID-Z parity cost https://docs.google.com/spreadsheets/d/1pdu_X2tR4ztF6_HLtJ-D...

rhinoceraptor2y ago

tweetle_beetle2y ago

andruby2y ago

Can you roughly remember how long ago that was? ZFS has been around since the earlier 2000's, with FreeNAS starting in 2005 iirc.

The filesystem has gotten a lot more stable, and imo the documentation clearer.

That said, it's "more powerful and more advanced" than traditional journaling filesystems like ext3, and thus comes with more ways to shoot yourself in the foot.

unethical_ban2y ago

Some additional points for posterity, in case it isn't driven home here:

- All redundancy in ZFS is built in the vdev layer. Zpools are created with one or more vdevs, and no matter what, if you lose any single vdev in a zpool, the zpool is permanently destroyed.

sgarland2y ago

Pool of mirrors is usually the safer way, yes.

Truly important data, of course, has off-site as well.

mastax2y ago

Modified30192y ago

unethical_ban2y ago

totetsu2y ago

Nice. My gotchas form using zfs on my personal laptop with Ubuntu.

- you can’t just make a swap file and use it,

Helmut100012y ago

Isn't this what `zpool export` is for?

dizhn2y ago

totetsu2y ago

I found

  cat /etc/apt/apt.conf.d/90_zsys_system_autosnapshot
  // Takes a snapshot of the system before package changes.
  DPkg::Pre-Invoke {"[ -x /usr/libexec/zsys-system-autosnapshot ] && /usr/libexec/zsys-system-autosnapshot snapshot || true";};

  // Update our bootloader to list the new snapshot after the update is done to not block the critical path
  DPkg::Post-Invoke {"[ -x /usr/libexec/zsys-system-autosnapshot ] && /usr/libexec/zsys-system-autosnapshot update-menu || true";};

but how would I get this to not snapshot , say /home/Downloads .. make that its own zpool?

1 more reply

Dylan168072y ago

> I had scheduled snapshots for my /home at one point but since it's a very actively used directory (downloading isos, games then deleting them) I had similar problems to yours.

1 more reply

idatum2y ago

It's #3 where I need to do some more research/work. I need to spend some time sending snapshots/diffs to cloud blob storage and make sure I can restore. Yes, I know there is rsync.net.

Any experiences to share?

zyberzero2y ago

Clarification: Remote end also uses ZFS, so I can use cheap replication with encryption

havnagiggle2y ago

oniony2y ago

I'm using Borg to back up to rysnc.net.

Rysnc.net have special pricing for customers using Borg/Restic:

https://www.rsync.net/products/borg.html

https://www.rsync.net/products/restic.html

CTDOCodebases2y ago

I use Restic and backup to rsync.net for remote backups. Works great.

Maakuth2y ago

neverartful2y ago

Are you running regular scrubs with ZFS and checking the results?

idatum2y ago

In your experience, what is a good schedule for scrubs?

I do one about every month or so. I should probably add a crontab for that.

1 more reply

asicsp2y ago

footlose_38152y ago

"Also read up on the zpool add command."

Haha, The only part of maintenance that I need to look up every time I do it is replacing a faulty hard drive.

Even this guide skips that.

znpy2y ago

is there an equivalent "btrfs for dummies" ?

dontupvoteme2y ago

As nice as the technology is as long as there's the potential of a Damoclean license issue I'll always feel hesitant around ZFS.

(Hey looks like it's a sore spot!)

ggm2y ago

chaxor2y ago

1 more reply

lmm2y ago

You might want to consider using FreeBSD then, where there is no licensing issue. I've found it has all the stuff I liked about Linux, and less of the stuff I dislike.

seized2y ago

There is also Illumos/OpenIndiana/OmniOS, basically continuations of OpenSolaris. I have used OpenIndiana for many years and it's been dead reliable and stable.

It has a few differences, service management is quite different from other things, but overall very underrated as an OS I think.

1 more reply

tomxor2y ago

Frankly, if even Debian can use it, it's a non-issue.

kaba02y ago

wkat42422y ago

tylercrompton2y ago

> Not integrating so well with other open source licenses is one of the cons.

1 more reply

judge20202y ago

NavinF2y ago

Not to mention Ubuntu included ZFS by default since 2016. I still remember all the doomsday predictions people made back then. People on HN and proggit kinda suck at legal

arjvik2y ago

Playing devil's advocate here, but what if Oracle decides to sue you, a company with only a few thousands in revenue, for their multi-million dollar licensing fee?

I agree this is unlikely, but so is someone being born with as much litigiousness as Larry Ellison.

2 more replies

unethical_ban2y ago

As someone ignorant to file system development, I would almost expect something more likely to be BTRFS getting sued for copying a feature of ZFS or something like that.

jakobson142y ago

There's really no case to be made from the CDDL side. It's a weak per-file copyleft.

The biggest hurdle to ZFS isn't legal, but technical. The various teams working on ZFS has so far been able to keep up with kernel churn and symbols (eg FPU ones) being made GPL-only.

That said, the kernel devs have made it clear that they don't care if you're open source or proprietary, they will make changes and mark new symbols GPL-only to fuck with you regardless.

rodgerd2y ago

Do you refuse to use NVidia cards for the same reason?

yjftsjthsd-h2y ago

mrweasel2y ago

2 more replies

dontupvoteme2y ago

No choice on that front.

1 more reply

j / k navigate · click thread line to collapse