For many applications (file formats), ubiquity is important, so it would be fun if zstd becomes ubiquitous and can be relied on to be available. Let's say for example in future versions of HDF (HDF5 or later).
For context, filesystem compression usually compresses blocks of data individually (for instance, every 64K block of a file will be individually compressed, and when you modify a file in the middle, that block needs to be recompressed entirely). This is usually good enough, and it has some pretty cool properties, like being able to have only compressable parts of a file compressed, or turning on compression on a file and having only new and rewritten blocks get compressed. Because of Zstd's separated dictionary, it seems like it could be feasible to instead store the dictionary in the file's inode and compress the blocks with that dictionary (recomputing the dictionary and recompressing existing blocks when the file allocates 10 4K blocks and then again at 100 blocks, perhaps).
I wonder what different properties such a compression scheme would have. I imagine it would be able to achieve a much smaller size due to not having to store a dictionary with each compressed block. A downside would be that a corrupted or overwritten inode would render the file completely unrecoverable, where current compression schemes allow blocks to be individually decompressed. Another downside is that files can't be partially compressible, only entirely.
Trained dictionary is just meaningless but very frequent bits of data which may be referenced in that fashion as if they preceded the real data.
(But seriously, Mozilla engineers have warned the Chrome team that they are too rush with the inclusion of Brotli, since that compression wars are heating up. They still proceeded though, which is unsurprising.)
bz2/gz still predominates for compressed objects in filestore from what I can see.
It is, and you should definitely at least give it a look. I posted a comment mentioning it the other day in the OpenZFS 2.0 thread [0], and it also came up recently on HN in a thread linked there, but there are some interesting performance graphs comparing different standards in the github PR for zstd in ZFS [1]. LZ4 still has its place IMO, ZFS is not run nor good for exclusively heavier metal, people use it to good effect on the likes of things like RPis as well. Sometimes CPU cycles is still the limiter or every last one is needed elsewhere. I also think it matters a lot less on spinning rust, where $/TB tends to be so much lower. How much one gets out of it also is influenced by application, general NAS with larger record size is going to see different gains vs a database. But with even vaguely modern desktop CPUs (and their surfeit of cores) and SSDs, particularly in network storage dedicated devices, an extra 10-30% even is worth a lot and there's usually plenty of CPU to throw at it. Even more so if primary usage is limited to only a 10-50 Gbps connection.
As always though probably best if you can benchmark it with your own stuff and play around a bit pulling different levers. ZFS is nice that way too since it's so easy to create a bunch of different test FS at the same time.
----
0: https://news.ycombinator.com/item?id=29268907
1: https://github.com/openzfs/zfs/pull/9735#issuecomment-570082...
There's enough grunt left over to be my plex headend as well as long as I avoid transcoding.
I actually wanted to run FreeBSD on it, but the rpi4 wasn't fully ported when I started. In the back of my mind, it should be "safe" to convert over because OpenZFS. Which is kind-of the point.
I avoided de-dup. It burns your CPU. But I think compression is worth it, even with a lot of mp4 and mp3 and jpg content (photo and live-TV PVR archives, aside from music)
There's a lot of FUD about ZFS on small devices, how much memory you "need" -I think something said of Solaris got conflated up into a ZFS "law" about minimum memory for the ARC which just isn't really true: it may not be performant, but it works fine on smaller memory (than 8gb) systems. I chose the 8GB pi4 because I could afford it. I would have been fine on 4GB.
Clearly they produce good technology. But the company is morally bankrupt.
Here is a quote from the summary, but I could not find where it was substantiated in the 250-page document:
> Facebook knew that the changes to its policies on the Android mobile phone system, which enabled the Facebook app to collect a record of calls and texts sent by the user would be controversial. To mitigate any bad PR, Facebook planned to make it as hard of possible for users to know that this was one of the underlying features of the upgrade of their app.
It's also supported by tar in recent Linux distros, if zstd is installed, so "tar acf blah.tar.zst *" works fine, and "tar xf blah.tar.zst" works automatically as well. Give it a try, folks, and retire gzip shortly afterwards.
Just be careful that you're comparing against the best implementation of gzip. One recent re-implementation of zcat was 3.1x faster than /bin/zcat (and the CRC-32 implementation within was 7.3x faster than /bin/crc32). Both programs decode exactly the same file format. They're just different implementations. For details, see: https://nigeltao.github.io/blog/2021/fastest-safest-png-deco...
Was .zs not sufficient if a file format ending in 'std' is so abhorrent?