Now using Zstandard instead of xz for package compression (opens in new tab)

(archlinux.org)

271 pointsnloomans6y ago153 comments

153 comments

Meta: This post is yet another victim of the HN verbatim title rule despite the verbatim title making little sense as one of many headlines on a news page.

How is "Now using Zstandard instead of xz for package compression" followed by the minuscule low-contrast grey "(archlinux.org)" better than "Arch Linux now using Zstandard instead of xz for package compression" like it was when I originally read this a few hours ago?

nmstoker6y ago

Saying it's "yet another victim" seems slightly too emotive to me.

If people can't read the source site's domain after the headline then I agree there wouldn't be much context, but equally, if they can't read that, surely their best solution is to adjust the zoom level in the browser.

It's clear you won't get complete context from the headline list plus domain, but a hint of it is provided and if you want more you click the link. Maybe I'm being a little uncharitable but I don't see a big problem here.

rat99886y ago

Even using the source domain isn't informative enough. The alternative headline is better. You are being too charitable to an inferior title.

Dylan168076y ago

"archlinux.org" is less informative than "Arch Linux"?

I'm sympathetic to disliking the change, but that's taking it to an extreme.

1 more reply

hinkley6y ago

Archlinux: Now using Zstandard instead of xz for package...

should be allowed. But I'm not sure that it is.

nloomansOP6y ago

I just woke up to this and was surprised the title was eddied as well. I looked up the guidelines and it looks like I violated the "If the title includes the name of the site, please take it out, because the site name will be displayed after the link." guideline.

gravitas6y ago

In your defense, the only reason I knew about this being Arch is because I got the email first last night. The belief that "everyone reads the domain in light grey parens on the right" is false, as a reader I 100% ignore that information subconsciously. This article would be a lot better if it started with "ArchLinux: ...." as it apparently used to be last night. This is a 100% bad title edit, "guidelines" be damned - it made your article submission worse not better.

RivieraKid6y ago

Yes, this is a real problem, verbatim titles are often far from the "optimal" title. In some cases the original title provides almost no information about the content.

The question is what's better than a strict "no editorialization" rule.

Lammy6y ago

The exact guideline is "If the title includes the name of the site, please take it out, because the site name will be displayed after the link", and I think that wording speaks from an outdated mindset where every submission is a standalone web page that _has_ a title, for one thing. This submission is a web page with its own title, of course, but that makes it sound like the guideline hasn't been rethought in too-long of a time.

For a contrived example of how dated the guideline seems, what if somebody submitted a tweet thread criticizing Twitter the company with a headline/sitebit like "Twitter now banning third-party clients. (twitter.com)". Would it have to be renamed to "Now banning third-party clients. (twitter.com)"? That would make it appear to be a more official statement instead of an unsponsored opinion.

I'm picking on Twitter out of recent memory of this submission of mine a couple weeks ago, where the submission title "Tracking down the true origin of a font used in many games and shareware titles" was 100% my own editorializing for lack of title-worthy material in the linked tweet itself: https://news.ycombinator.com/item?id=21667238

bscphil6y ago

> The exact guideline is "If the title includes the name of the site, please take it out, because the site name will be displayed after the link", and I think that wording speaks from an outdated mindset where every submission is a standalone web page that _has_ a title, for one thing.

I suspect the original intent of the rule was to get rid of pointless redundancy in the title. "The 10 craziest things you don't know about X - clickbait.com" is the sort of thing you see very often in the <title> element, but it adds no new information. Actually, you'll notice even Hacker News posts have " | Hacker News" appended to them.

In an article about Arch Linux, the text "Arch Linux" is much less likely to be redundant than an article about something else that just happens to be on the archlinux.org domain.

imtringued6y ago

The window title of the submission is "Arch Linux - News: Now using Zstandard instead of xz for package compression". There is no need to invent a new title.

hordeallergy6y ago

Hn rules are to be ignored. It's _hacker_ news.

throwGuardian6y ago

Guessing such a rule helps in duplicate submissions detection. But, that should be possible from checking the URL.

Unless one uses a link shortener. Are shorteners permitted on HN?

grzm6y ago

Nope. Some sources, such as Medium embed viewer-dependent identifiers in the url which can confound de-duping based on URL alone. I don't know if 'dang et al have figured out a way to handle these cases.

Dylan168076y ago

Most of the text on the front page is that size and that color of gray.

If it's not easy to read, then the problem is between the css and your screen. Not the title rules.

Lammy6y ago

I'm talking about the difference in size and contrast between the actual headline and the trailing HN sitebit "(archlinux.org)". The final size they end up on my screen is irrelevant to my point, because my point is about the size and contrast difference _between_ the two, whatever final sizes those might happen to be. The default HN stylesheet calls for 10pt and 8pt for those, respectively, so it's not like I'm just making this up. I'm saying the verbatim title rule is a poor fit here because it took a relevant (central, even!) part of the headline and moved it to a spot of secondary importance and size. There are cases where I defend the rule, but right now I am talking about this case and only this case :)

Dylan168076y ago

So even if the url got bumped to 12pt, you'd complain if the rest was 15? I think that's weird.

As long as it's on the same line as the title and easily legible, I really don't see a problem.

And it's not a spot of secondary importance. If it was still in the title, making it longer, the spot where you see the url would have title in it.

2 more replies

WinonaRyder6y ago

Zstandard is awesome!

Earlier last year I was doing some research that involved repeatedly grepping through over a terabyte of data, most of which were tiny text files that I had to un-zip/7zip/rar/tar and it was painful (maybe I needed a better laptop).

With Zstd I was able to re-compress the whole thing down to a few hundred gigs and use ripgrep which solved the problem beautifully.

Out of curiosity I tested compression with (single-threaded) lz4 and found that multi-threaded zstd was pretty close. It was an unscientific and maybe unfair test but I found it amazing that I could get lz4-ish compression speeds at the cost of more CPU but with much better compression ratios.

EDIT: Btw, I use arch :) - yes, on servers too.

bufferoverflow6y ago

Here's a compression benchmark.

http://pages.di.unipi.it/farruggia/dcb/

Looks like Snappy beats both LZ4 and Zstd in compression speed and compression ratio, by a huge margin.

LZ4 is a ahead of Snappy in the decompression speed.

correct_horse6y ago

Similar to how code is read more times than it is written, files are decompressed more times than compressed.

I have not researched this opinion much

ncmncm6y ago

I find these numbers for Snappy entirely implausible.

The numbers I know about are wrong: zstd always beats gzip for compression ratio.

I will need to do my own testing.

ncmncm6y ago

I have tested snzip 1.0.4.

It compresses about as well as lz4, but more slowly. It also decompresses more slowly.

It is faster than zstd -1, but compresses less well.

It is possible that it does better with certain kinds of data, but 12x remains implausible.

Apparently the current file format has suffix ".sz".

filereaper6y ago

Apparently this is how to use Zstd with tar if anyone else was wondering:

  tar -I zstd -xvf archive.tar.zst

https://stackoverflow.com/questions/45355277/how-can-i-decom...

Hopefully there's another option added to tar that simplifies this if this compression becomes mainstream.

viraptor6y ago

tar accepts `-a` for format autodetection for a while now. You can do:

    tar -axf archive.tar.whatever

and it should work for gz, bz2, Z, zstd, and probably more. (verified works for zstd on gnu tar 1.32)

yjftsjthsd-h6y ago

I think that's a GNU extension, so obviously fine on Arch, but probably not on ex. MacOS (Darwin) or Alpine (busybox) by default.

JonathonW6y ago

BSD tar on my Mac (running 10.15.2) has -a for tarfile creation (-c mode); it always autodetects the compression format in extraction mode (-z, -j, etc. are ignored if -x is specified). Not sure when either behavior would've been introduced; the somewhat-older machine I tested on (running 10.13) does not have -a but does have the autodetection behavior on extract.

-I, on the other hand (which, in gnutar, specifies a compression program to run the output through), appears to actually be GNU-specific. BSD tar makes -I synonymous with -T (specifying a file containing the list of filenames to be extracted or archived).

(Please don't use Zstandard if you care about cross-platform compatibility at all, though-- it's fine in controlled environments like an OS package manager, but I don't have it on my Mac, nor do I have it by default on my Ubuntu server (which is still sitting back on 16.04; I should fix that sooner or later).)

1 more reply

_ZeD_6y ago

With GNU tar, the -a flag is not needed

2 more replies

Macha6y ago

It's actually (at least in the implicit tar -xf foo.tar.gz) made it to all major implementations by now. OpenBSD was the sole exception last time I checked, and OpenBSD tar only untars, it won't decompress even with flags.

kdeldycke6y ago

Thanks for that tip, I was able to simplify the magic extract() function from my .bash_profile that is relying on file extension to figure the format of the archive: https://github.com/kdeldycke/dotfiles/commit/8120778f3b968a6...

xorcist6y ago

It's not too bad to specify compression separately, as in:

  zstd -cd archive.tar.xst | tar xvf -

it's needed anyway as soon as you step outside of what somebody made an option for, for example encryption.

cmurf6y ago

    tar -acf blah.tar.zst blah/

-a figures it out from the filename extension, and its zst not zstd.

chungy6y ago

tar automatically detects and supports unpacking zstd-compressed archives (as well as other compression types). there's no reason to use -x combined with other compression flags.

For compression, you can use "-c -I zstd"

ben0x5396y ago

Use aunpack imo: https://linux.die.net/man/1/aunpack

cmurf6y ago

Fedora 31 switched RPM to use zstd. https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_c...

Package installations are quite a bit faster, and while I don't have any numbers I expect that the ISO image compose times are faster, since it performs an installation from RPM to create each of the images.

Hopefully in the near future the squashfs image on those ISOs will use zstd, not only for the client side speed boost for boot and install, but it cuts the CPU hit for lzma decompression by a lot (more than 50%). https://pagure.io/releng/issue/8581

m4rtink6y ago

BTW, Fedora recently switched to zstd compression for its packages as well. For the same resons basically - much better overall de/compression speed while keeping the result mostly the same size.

Also one more benefit of zstd compression, that is not widely noted - a zstd file conpressed with multiple threads is binary the same as file compressed with single thread. So you can use multi threaded compression and you will end up with the same file cheksum, which is very important for package signing.

On the other hand xz, which has been used before, produces a binary different file if compressed by single or multiple threads. This basucally precludes multi threaded compression at package build time, as the compressed file checksums would not match if the package was rebuild with a different number of compression threads. (the unpacked payload will be always the same, but the compressed xz file will be binary different)

ncmncm6y ago

Zstd has an enormous advantage in compression and, especially, decompression speed. It often doesn't compress quite as much, but we don't care as much as we once did. We rebuild packages more than we once did.

This looks like a very good move. Debian should follow suit.

beatgammit6y ago

I build packages periodically from the AUR, and compression is the longest part of the process much of the time. For a while, I disabled compression on AUR packages because it was becoming enough of a problem for me to look into solutions. If it's annoying for me, I can imagine it's especially problematic for package maintainers. I can only imagine how much CPU time switching the compression tool will save.

SamWhited6y ago

I love the AUR, but every single time I have to wait for it to compress Firefox nightly, and then wait for it to immediately decompress it again because the only reason I was building the package in the first place was to install it I about lose my mind. Hopefully this helps, but I really wish AUR helpers would just disable compression and call it a day so I don't have to go mess with config files that would also change my manual package building routine.

EDIT: nevermind, this doesn't seem to have made this the default for building packages locally, just for ones you download from the official repos. Guess I'll go change that by hand and then still be sad that I can't have it easily disabled entirely for AUR helpers but build my packages with compression.

beanaroo6y ago

This isn't a function of an AUR helper but rather makepkg itself.

In makepkg.conf, ommit compression by specifying:

    PKGEXT='.pkg.tar'

More information can be found at https://wiki.archlinux.org/index.php/Makepkg#Tips_and_tricks

1 more reply

thenewnewguy6y ago

> EDIT: nevermind, this doesn't seem to have made this the default for building packages locally, just for ones you download from the official repos. Guess I'll go change that by hand and then still be sad that I can't have it easily disabled entirely for AUR helpers but build my packages with compression.

I believe you can supply it via an environment variable if your AUR helper has the ability to set those for `makepkg`.

1 more reply

yjftsjthsd-h6y ago

Why did you re-enable it? Seems to work fine in my experience?

kbumsik6y ago

> Recompressing all packages to zstd with our options yields a total ~0.8% increase in package size on all of our packages combined, but the decompression time for all packages saw a ~1300% speedup.

Impressive. As a AUR package maintainer I am also wondering how the compression speed is though.

ncmncm6y ago

Compression speed is many, many, many times faster than xz, and (only) much faster than gzip. Really, only lz4 beats it.

integricho6y ago

After reading these comments, I can't help but wonder, what is the benefit of Zstd over lz4? Why didn't they switch to lz4 if it was the speed of the algorithm that they favored even with marginally worse compression ratios?

ncmncm6y ago

Where Zstd will reduce, say, 3x, Lz4 reduces only 2.5x. This doesn't seem very different until you look at it from the other end: my .zst file is 3.3 GB, but the .lz4 would have been 4 GB, which is 700 MB more.

Was a time when 700 MB mattered; it was as much as you could get onto a CD.

So, there is a place for each. I would set up the process to use Lz4 when testing, and Zstd for actual delivery to download archives.

In some circumstances, particularly when using a shared file server, Lz4 can be quite a lot faster than writing and reading data uncompressed.

isatty6y ago

Guessing that 0.8x size increase for 1300% speedup was worth the tradeoff but maybe ≥1.5 size increase or more was not (especially considering a 1300%->2000% increase is not going to be user visible for 99% of the packages).

1 more reply

the84726y ago

While the speedup is nice pacman still seems to operate sequentially, i.e. download, then decompress one by one. Decompressing while downloading or decompressing in parallel seems like a low-hanging fruit that hasn't been plucked yet that wouldn't have needed any changes to the compressor.

svckr6y ago

I might be wrong, but wouldn't it be prudent to first verify the checksum/signature of the downloaded archive before unpacking it? Even when just decompressing there's at least the danger of being zip-bombed (assuming a zip bomb can be constructed for any dictionary-based compression algorithm.)

FWIW I really applaud Arch here. Even if it's just a small step. Commercial operating systems should take notice. OS updates should really not take as long as they (mostly) do.

the84726y ago

Even then it still could be pipelined. download, check signature, decompress while the next download is running. But yeah, pacman is plenty fast already.

michaelcampbell6y ago

Since most people are interested in the time taken to compress/decompress rather than the speed at which it happens, seems to me a better metric would be:

"... decompression time dropped to 14% of what it was..." (s/14/actual_value)

JeremyNT6y ago

I learned about this one the hard way when I went to update a really crufty (~ 1 year since last update) Arch system I use infrequently the other day. I had failed to update my libarchive version prior to the change and the package manager could not process the new format.

Luckily updating libarchive manually with an intermediate version resolved my issue and everything proceeded fine.

This is a good change, but it's a reminder to pay attention to the Arch Linux news feed, because every now and then something important will change. The maintainers provided ample warning about this change there (and indeed I had updated by other systems in response) so we procrastinators really had no excuse :)

golergka6y ago

I used zstd for on-the-fly compression of game data for p2p multiplayer synchronization, and got 2-5x as much data (depends on the payload type) in each TCP packet. Sad that it still doesn't get much adoption in the industry.

ncmncm6y ago

Zstd knows how to use a user-supplied dictionary at each end. I hope you are doing that.

But if latency matters you might better use lz4.

golergka6y ago

Yes, I did. Too bad that I didn't get to see up CI in that project, and current maintainers probably forgot to update the dictionary.

loeg6y ago

I'd love to see Zstandard accepted in other places where the current option is only the venerable zlib. E.g., git packing, ssh -C. It's got more breadth and is better (ratio / cpu) than zlib at every point in the curve where zlib even participates.

rurban6y ago

Also zlib is horrible code. Went away with disgust after finding critical errors (that time only patched in master).

jacobolus6y ago

It would be great to see better compression supported by browsers.

tyingq6y ago

Chrome apparently tested zstd out, and it's an improvement over Chrome's forked and optimized zlib on x64, but slower on ARM/Android. https://bugs.chromium.org/p/chromium/issues/detail?id=912902...

Few (or none?) of Chrome's fairly dramatic improvements to zlib have been upstreamed. https://github.com/madler/zlib/issues/346

Edit: Also, if browsers do adopt zstd, it's likely you'll end up with the same situation where they fork their own implementation of zstd. Upstreaming requires signing Facebook's CLA, which has patent clauses that don't work for most.

dictum6y ago

Brotli has wide browser support (https://caniuse.com/#feat=brotli) and comes closer to zstd in compression ratio and compression speed, but its decompression speed is significantly lower and closer to zlib.

https://github.com/facebook/zstd#benchmarks

AFAIK (I haven't looked much into it since 2018) it's not widely supported by CDNs, but at least Cloudflare seems to serve it by default (EDIT: must be enabled per-site https://support.cloudflare.com/hc/en-us/articles/200168396-W...)

JyrkiAlakuijala6y ago

Brotli compresses about 5-10 % more than zstd. Benchmarks showing equal compression performance use different window sizes (smaller window sizes for brotli) or do not run at maximum compression density.

https://github.com/google/brotli/issues/642 is the best 3rd party documentation of this behavior.

zstd does decompress fast, but this is not free. The cost is the compression density -- and lesser streaming properties than brotli.

For typical linux package use, one could save 5 % more in density by moving from zstd to large window brotli. The decompression speed for a typical package would be slowed down by 1 ms, but the decompression could happen during the transfer or file I/O if that is an issue.

1 more reply

imtringued6y ago

That's interesting. Brotli has wide browser support although its less than 5 years old but webp is reaching a decade and Safari still doesn't support it...

2 more replies

ncmncm6y ago

Wireshark! Wireshark!

Also lz4, of course.

rwmj6y ago

I wish zstd supported seeking and partial decompression (https://github.com/facebook/zstd/issues/395#issuecomment-535...). We could then use it for hosting disk images as it would be a lot faster than xz which we currently use.

ncmncm6y ago

Fun fact: Two zstd files appended is a zstd file.

Also, parallel zstd must have some way to split up the work, that you could maybe use too.

rwmj6y ago

I would suggest reading the github issue that I linked to, you'll see why it's not currently possible.

gravitas6y ago

AUR users -- the default settings in /etc/makepkpg.conf (delivered by the pacman package as of 5.2.1-1) are still at xz, you must manually edit your local config:

  PKGEXT='.pkg.tar.zst'

The largest package I always wait on perfect for this scenario is `google-cloud-sdk` (the re-compression is a killer -- `zoom` is another one in AUR that's a beast) so I used it as a test on my laptop here in "real world conditions" (browsers running, music playing, etc.). It's an old Dell m4600 (i7-2760QM, rotating disk), nothing special. What matters is using default xz, compression takes twice as long and appears to drive the CPU harder. Using xz my fans always kick in for a bit (normal behaviour), testing zst here did not kick the fans on the same way.

After warming up all my caches with a few pre-builds to try and keep it fair by reducing disk I/O, here's a sampling of the results:

  xz defaults  - Size: 33649964
  real  2m23.016s
  user  1m49.340s
  sys   0m35.132s
  ----
  zst defaults - Size: 47521947
  real  1m5.904s
  user  0m30.971s
  sys   0m34.021s
  ----
  zst mpthread - Size: 47521114
  real  1m3.943s
  user  0m30.905s
  sys   0m33.355s

I can re-run them and get a pretty consistent return (so that's good, we're "fair" to a degree); there's disk activity building this package (seds, etc.) so it's not pure compression only. It's a scenario I live every time this AUR package (google-cloud-sdk) is refreshed and we get to upgrade. Trying to stick with real world, not synthetic benchmarks. :)

I did not seem to notice any appreciable difference in adding the `--threads=0` to `COMPRESSZST=` (from the Arch wiki), they both consistently gave me right around what you see above. This was compression only testing which is where my wait time is when upgrading these packages, huge improvement with zst seen here...

Foxboron6y ago

It should be noted that the makepkg.conf file distributed with pacman does not contain the same compression settings as the one used to build official packages.

pacman:

    COMPRESSZST=(zstd -c -z -q -)

https://git.archlinux.org/svntogit/packages.git/tree/trunk/m...

devtools:

    COMPRESSZST=(zstd -c -T0 --ultra -20 -)

https://github.com/archlinux/devtools/blob/master/makepkg-x8...

gravitas6y ago

The man page for zstd mentions that using the --ultra flag will cause decompression to take more RAM as well when used to compress. Does this indicate a huge increase in memory to decompress, or just a trivial amount per package, say something large like... `libreoffice-fresh`? Or `go`? They're two of the largest main repo packages I have installed... (followed by linux-firmware)

telendram6y ago

Without `--ultra`, the decompression memory budget is capped at 8 MB. At `--ultra -20`, it's increased to 32 MB.

That's still less than XZ, which reaches 64 MB.

1 more reply

maxpert6y ago

I’ve used LZ4 and Snappy in production for compressing cache/mq payloads. This is on a service serving billions of clicks in a day. So far very happy with the results, I know zstd requires more CPU than LZ4 or snappy on average but has someone used it under heavy traffic loads on web services. I am really interested trying it out but at the same time held back by “don’t fix it if it ain’t broken”.

ncmncm6y ago

Use Lz4 where latency matters, Zstd if you can afford some CPU.

I have a server that spools off the entire New York stock and options market every day, plus Chicago futures, using Lz4. But when we copy to archive, we recompress it with Zstd, in parallel using all the cores that were tied up all day.

There is not much size benefit to more than compression level 3: I would never use more than 6. And, there's not much CPU benefit for less than 1, even though it will go into negative numbers; switch to Lz4 instead.

loeg6y ago

Zstd has "fast" negative levels (-5, -4, ... -1, 1, ..., 22). -4 or -5 are purportedly comparable (but not quite as good) as LZ4.

ncmncm6y ago

Better to just use Lz4, then.

emn136y ago

Maybe. The thing is; zstd is quite close, and unlike lz4, zstd has a broad curve of supported speed/time tradeoffs. Unless you're huge and engineering effort is essentially free or at least the microoptimization for one specific ratio is worth the tradeoff - you may be better off choosing the solution that's less opinionated about the settings. If it then turns out that you care mostly about decompression speed + compression ratio and a little less about compression speed, it's trivial to go there. Or maybe it turns out you only sometimes need the speed, but usually can afford spending a little more CPU time - so you default to higher compression ratios, but under load use lower ones (there's even a streaming mode built-in that does this for you for large streams). Or maybe your dataset is friendly to the parallization options, and zstd actually outperforms lz4.

If you know your use case well and are sure the situation won't change (or don't mind swapping compression algorithms when they do), then lz4 still has a solid niche, especially where compression speed matters more than decompression speed. But in many if not most cases I'd say it's probably a kind of premature optimization at this point, even if you think you're close to lz4's sweet spot.

G4E6y ago

For those who want a TLDR : The trade off is 0.8% increase of package size for 1300% increase in decompression speed. Those numbers come from a sample of 542 packages.

agumonkey6y ago

thanks, a great change for those with SSDs

crazysim6y ago

or CPUs!

fctorial6y ago

Or RAM

Phlogi6y ago

The wiki is already up to date if you build your own or AUR packages and want to use multiple cpu cores https://wiki.archlinux.org/index.php/Makepkg#Utilizing_multi...

yjftsjthsd-h6y ago

> If you nevertheless haven't updated libarchive since 2018, all hope is not lost! Binary builds of pacman-static are available from Eli Schwartz' personal repository, signed with their Trusted User keys, with which you can perform the update.

I am a little shocked that they bothered; Arch is rolling release and explicitly does not support partial upgrades (https://wiki.archlinux.org/index.php/System_maintenance#Part...). So to hit this means that you didn't update a rather important library for over a year, which officially implies that you didn't update at all for over a year, which... is unlikely to be sensible.

jpgvm6y ago

Arch is actually surprisingly stable and even with infrequent updates on the order of months still upgrades cleanly most of the time. The caveats to this were the great period of instability when switching to systemd, changing the /usr/lib layout, etc but those changes are now pretty far in the past.

yjftsjthsd-h6y ago

Sure, and I've done partial upgrades and it was mostly fine:) It just surprised me to see the devs going out of their way to support it on volunteer time. On the other hand, maybe that's exactly the reason; maybe someone said "hey look, I can make static packages that are immune to library changes! I guess I'll publish these in case they're useful". Open source is fun like that:)

semi-extrinsic6y ago

Also, Arch devs probably run Arch servers, and I'd not be surprised if some of those have uptimes in hundreds of days.

1 more reply

netfl06y ago

I remember that time. I had successfully migrate 2 systems to systemd from init. One was a production server. I felt like a genius at the time. Of course all the arch devs did all the real work :)

(I wanted the challenge of running arch in production just to learn, good times)

ubercow136y ago

That's only not sensible if you continued to use that computer for the year. You might have just not used it for a year, which doesn't seem unlikely. In fact I just updated my Arch desktop, which I had indeed not used for more than a year :)

jonathonf6y ago

You're shocked that they consider and plan for a worst-case/edge-case scenario?

That sort of attention to detail is what continues to impress me about the Arch methodology.

computerfriend6y ago

I have a laptop in another country that I see quite infrequently and I'm pretty happy this exists.

cjbillington6y ago

pacman-static existed already, and can be used to fix some of the most broken systems in a variety of circumstances. So, they didn't make it just for this, might as well mention it as the right tool to fix the problem should it occur.

xiaq6y ago

I guess that's why it is provided by an individual, instead of as an officially supported solution.

shmerl6y ago

Was XZ used in parallelized fashion? Otherwise comparing is kind of pointless. Single threaded XZ decompression is way too slow.

telendram6y ago

A little known fact is that parallel XZ do compress worse than XZ ! I measured pixz as being approximately ~2% worse than xz. That's because input is split into independent chunks.

In comparison, the 0.8% of zstd looks like a bargain.

shmerl6y ago

Is 0.8% with maximum compression? It's surprising the difference is so small.

telendram6y ago

0.8% is with Arch's default settings. It's fairly strong, but not the strongest, to preserve cpu during compression.

zstd is used at level 20, but it can compress more. Levels can go up to 22, and (complex) advanced commands are available to compress data even more.

esotericn6y ago

Multithreaded xz is non-deterministic and so it's not a candidate.

shmerl6y ago

How is it non deterministic? Works pretty consistently for me with pixz.

tempay6y ago

The bytes of the compressed file are non deterministic and depend on the number of cores used, system load and other “random” factors.

1 more reply

ComputerGuru6y ago

We are talking about decompression speed and not encryption. Decompression is necessarily deterministic.

esotericn6y ago

The compression speed is also an issue for developers. In many cases the compression step takes longer than the rest of the build.

shmerl6y ago

May be the point is that compressed package can change every time, which is an issue for reproducible builds idea many distros now are using. Though I'm not sure why parallelized xz can't behave in predictable fashion.

1 more reply

rubicks6y ago

I give thanks every day for pxz. I can churn out apt indices so much faster relative to the alternative.

shmerl6y ago

For general purposes, I like using pixz which is indexable in comparison: https://github.com/vasi/pixz

Do you know if Debian is using parallelized XZ or not with apt / dpkg?

ncmncm6y ago

Maybe worth mention that zstd is happy to work in parallel.

zerogara6y ago

Most of the results published show very little positive or negative speed in decompression, where is all this -1300% coming from?

edit: Sorry, my fault that was decompression RAM I was thinking about, not speed, although I was influenced by my test that without measuring both xz and zstd seemed instant.

dhsysusbsjsi6y ago

Quick shout out to LZFSE. Similar compression ratio to zlib but much faster.

https://github.com/lzfse/lzfse

nwah16y ago

I wonder if they will switch to using zstd for mkinititcpio

yjftsjthsd-h6y ago

I thought that was user configurable? Or do you mean by default?

nwah16y ago

Yea, by default. Last time I tried it manually, the kernel wouldn't boot. Best to have these things handled for you.

Squithrilve6y ago

mkinitcpio is being replaced with dracut so zstd won't probably happen.

Foxboron6y ago

Well, that is a bit on a 50/50 coinflip currently. There has been intentions but we need some collaboration from upstream to make this happen.

It's not set in stone currently.

nwah16y ago

Man page says zstd is an option on dracut

http://man7.org/linux/man-pages/man5/dracut.conf.5.html

Foxboron6y ago

The kernel doesn't support booting zstd compressed initramfs' yet, but you can very well use zstd compression with dracut and mkinitcpio

imtringued6y ago

This blog post probably wasted more of my time than I will ever gain from the faster decompression...

vmchale6y ago

What of lzip?

Annatar6y ago

I couldn't care less about decompression speed, because the bottleneck is the network, which means that I want my packages as small as possible. Smaller packages mean faster installation; at 54 MB/s or faster decompression rate of xz, I couldn't care less about a few milliseconds saved during decompression. For me, this decision is dumbass stupid.

nullc6y ago

Per the post, the speedup on decompress is _13x_ while the size is 1.008x.

For those figures, this will be better total time for you if your computer network connection is faster than about 1.25mbit/sec. For a slow arm computer with an XZ decompress speed of 3MB/s the bandwidth threshold for a speedup drops to _dialup_ speeds.

And no matter how slow your network connection is and how fast your computer is you'll never take more than 0.8% longer with this change.

For many realistic setups it will be faster, in some cases quite a bit. Your 54MB XZ host should be about 3% faster if you're on a 6mbit/sec link-- assuming your disk can keep up. A slow host that decompresses xz at 3MB/s w/ a 6mbit link would a wopping 40% faster.

ubercow136y ago

Why do you care so much about the few extra miliseconds wasted downloading, then? (0.8% size increase is ~ 0). Also don't forget that Arch can also be used on machines with very slow CPU but very fast network connections, like many VPSs. I think this will make a tangible difference on mine. This is also a big improvement for package maintainers and anyone building their own packages without bothering to modify the makepkg defaults, eg. most people using an AUR helper.

Annatar6y ago

Because size does matter.

powturbo6y ago

There are nice plots [1] to see the tranfer+decompression speedup depening on the network bandwidth.

This is for html web compression, but the results are similar for other datasets. For internet transfer more compression is better than more decompression speed.

You can make your own experiments incl. the plots with turbobench [2]

[1] https://sites.google.com/site/powturbo/home/web-compression [2] https://github.com/powturbo/TurboBench

snvzz6y ago

The extra decompression complexity might be a joke on a Zen2 server, but it definitely is not in older systems.

If this was netbsd m68k, you'd probably easily understand.

Annatar6y ago

I use xz on my A1200 all of the time, and Amiga is the stereotypical system where maximum possible compression matters over everything else. Don't make assumptions about me.

snvzz6y ago

I applaud your patience. Even with my vampire, I'll use something faster whether at all possible.

May I ask, why xz over, say, PAQ8PF?

1 more reply

j / k navigate · click thread line to collapse

153 comments

Lammy6y ago

Meta: This post is yet another victim of the HN verbatim title rule despite the verbatim title making little sense as one of many headlines on a news page.

nmstoker6y ago

Saying it's "yet another victim" seems slightly too emotive to me.

rat99886y ago

Even using the source domain isn't informative enough. The alternative headline is better. You are being too charitable to an inferior title.

Dylan168076y ago

"archlinux.org" is less informative than "Arch Linux"?

I'm sympathetic to disliking the change, but that's taking it to an extreme.

1 more reply

hinkley6y ago

Archlinux: Now using Zstandard instead of xz for package...

should be allowed. But I'm not sure that it is.

nloomansOP6y ago

gravitas6y ago

RivieraKid6y ago

Yes, this is a real problem, verbatim titles are often far from the "optimal" title. In some cases the original title provides almost no information about the content.

The question is what's better than a strict "no editorialization" rule.

Lammy6y ago

bscphil6y ago

In an article about Arch Linux, the text "Arch Linux" is much less likely to be redundant than an article about something else that just happens to be on the archlinux.org domain.

imtringued6y ago

The window title of the submission is "Arch Linux - News: Now using Zstandard instead of xz for package compression". There is no need to invent a new title.

hordeallergy6y ago

Hn rules are to be ignored. It's _hacker_ news.

throwGuardian6y ago

Guessing such a rule helps in duplicate submissions detection. But, that should be possible from checking the URL.

Unless one uses a link shortener. Are shorteners permitted on HN?

grzm6y ago

Dylan168076y ago

Most of the text on the front page is that size and that color of gray.

If it's not easy to read, then the problem is between the css and your screen. Not the title rules.

Lammy6y ago

Dylan168076y ago

So even if the url got bumped to 12pt, you'd complain if the rest was 15? I think that's weird.

As long as it's on the same line as the title and easily legible, I really don't see a problem.

And it's not a spot of secondary importance. If it was still in the title, making it longer, the spot where you see the url would have title in it.

2 more replies

WinonaRyder6y ago

Zstandard is awesome!

With Zstd I was able to re-compress the whole thing down to a few hundred gigs and use ripgrep which solved the problem beautifully.

EDIT: Btw, I use arch :) - yes, on servers too.

bufferoverflow6y ago

Here's a compression benchmark.

http://pages.di.unipi.it/farruggia/dcb/

Looks like Snappy beats both LZ4 and Zstd in compression speed and compression ratio, by a huge margin.

LZ4 is a ahead of Snappy in the decompression speed.

correct_horse6y ago

Similar to how code is read more times than it is written, files are decompressed more times than compressed.

I have not researched this opinion much

ncmncm6y ago

I find these numbers for Snappy entirely implausible.

The numbers I know about are wrong: zstd always beats gzip for compression ratio.

I will need to do my own testing.

ncmncm6y ago

I have tested snzip 1.0.4.

It compresses about as well as lz4, but more slowly. It also decompresses more slowly.

It is faster than zstd -1, but compresses less well.

It is possible that it does better with certain kinds of data, but 12x remains implausible.

Apparently the current file format has suffix ".sz".

filereaper6y ago

Apparently this is how to use Zstd with tar if anyone else was wondering:

  tar -I zstd -xvf archive.tar.zst

https://stackoverflow.com/questions/45355277/how-can-i-decom...

Hopefully there's another option added to tar that simplifies this if this compression becomes mainstream.

viraptor6y ago

tar accepts `-a` for format autodetection for a while now. You can do:

    tar -axf archive.tar.whatever

and it should work for gz, bz2, Z, zstd, and probably more. (verified works for zstd on gnu tar 1.32)

yjftsjthsd-h6y ago

I think that's a GNU extension, so obviously fine on Arch, but probably not on ex. MacOS (Darwin) or Alpine (busybox) by default.

JonathonW6y ago

1 more reply

_ZeD_6y ago

With GNU tar, the -a flag is not needed

2 more replies

Macha6y ago

kdeldycke6y ago

xorcist6y ago

It's not too bad to specify compression separately, as in:

  zstd -cd archive.tar.xst | tar xvf -

it's needed anyway as soon as you step outside of what somebody made an option for, for example encryption.

cmurf6y ago

    tar -acf blah.tar.zst blah/

-a figures it out from the filename extension, and its zst not zstd.

chungy6y ago

tar automatically detects and supports unpacking zstd-compressed archives (as well as other compression types). there's no reason to use -x combined with other compression flags.

For compression, you can use "-c -I zstd"

ben0x5396y ago

Use aunpack imo: https://linux.die.net/man/1/aunpack

cmurf6y ago

Fedora 31 switched RPM to use zstd. https://fedoraproject.org/wiki/Changes/Switch_RPMs_to_zstd_c...

m4rtink6y ago

BTW, Fedora recently switched to zstd compression for its packages as well. For the same resons basically - much better overall de/compression speed while keeping the result mostly the same size.

ncmncm6y ago

This looks like a very good move. Debian should follow suit.

beatgammit6y ago

SamWhited6y ago

beanaroo6y ago

This isn't a function of an AUR helper but rather makepkg itself.

In makepkg.conf, ommit compression by specifying:

    PKGEXT='.pkg.tar'

More information can be found at https://wiki.archlinux.org/index.php/Makepkg#Tips_and_tricks

1 more reply

thenewnewguy6y ago

I believe you can supply it via an environment variable if your AUR helper has the ability to set those for `makepkg`.

1 more reply

yjftsjthsd-h6y ago

Why did you re-enable it? Seems to work fine in my experience?

kbumsik6y ago

> Recompressing all packages to zstd with our options yields a total ~0.8% increase in package size on all of our packages combined, but the decompression time for all packages saw a ~1300% speedup.

Impressive. As a AUR package maintainer I am also wondering how the compression speed is though.

ncmncm6y ago

Compression speed is many, many, many times faster than xz, and (only) much faster than gzip. Really, only lz4 beats it.

integricho6y ago

ncmncm6y ago

Was a time when 700 MB mattered; it was as much as you could get onto a CD.

So, there is a place for each. I would set up the process to use Lz4 when testing, and Zstd for actual delivery to download archives.

In some circumstances, particularly when using a shared file server, Lz4 can be quite a lot faster than writing and reading data uncompressed.

isatty6y ago

1 more reply

the84726y ago

svckr6y ago

FWIW I really applaud Arch here. Even if it's just a small step. Commercial operating systems should take notice. OS updates should really not take as long as they (mostly) do.

the84726y ago

Even then it still could be pipelined. download, check signature, decompress while the next download is running. But yeah, pacman is plenty fast already.

michaelcampbell6y ago

Since most people are interested in the time taken to compress/decompress rather than the speed at which it happens, seems to me a better metric would be:

"... decompression time dropped to 14% of what it was..." (s/14/actual_value)

JeremyNT6y ago

Luckily updating libarchive manually with an intermediate version resolved my issue and everything proceeded fine.

golergka6y ago

ncmncm6y ago

Zstd knows how to use a user-supplied dictionary at each end. I hope you are doing that.

But if latency matters you might better use lz4.

golergka6y ago

Yes, I did. Too bad that I didn't get to see up CI in that project, and current maintainers probably forgot to update the dictionary.

loeg6y ago

rurban6y ago

Also zlib is horrible code. Went away with disgust after finding critical errors (that time only patched in master).

jacobolus6y ago

It would be great to see better compression supported by browsers.

tyingq6y ago

Chrome apparently tested zstd out, and it's an improvement over Chrome's forked and optimized zlib on x64, but slower on ARM/Android. https://bugs.chromium.org/p/chromium/issues/detail?id=912902...

Few (or none?) of Chrome's fairly dramatic improvements to zlib have been upstreamed. https://github.com/madler/zlib/issues/346

dictum6y ago

https://github.com/facebook/zstd#benchmarks

JyrkiAlakuijala6y ago

https://github.com/google/brotli/issues/642 is the best 3rd party documentation of this behavior.

zstd does decompress fast, but this is not free. The cost is the compression density -- and lesser streaming properties than brotli.

1 more reply

imtringued6y ago

That's interesting. Brotli has wide browser support although its less than 5 years old but webp is reaching a decade and Safari still doesn't support it...

2 more replies

ncmncm6y ago

Wireshark! Wireshark!

Also lz4, of course.

rwmj6y ago

ncmncm6y ago

Fun fact: Two zstd files appended is a zstd file.

Also, parallel zstd must have some way to split up the work, that you could maybe use too.

rwmj6y ago

I would suggest reading the github issue that I linked to, you'll see why it's not currently possible.

gravitas6y ago

AUR users -- the default settings in /etc/makepkpg.conf (delivered by the pacman package as of 5.2.1-1) are still at xz, you must manually edit your local config:

  PKGEXT='.pkg.tar.zst'

After warming up all my caches with a few pre-builds to try and keep it fair by reducing disk I/O, here's a sampling of the results:

  xz defaults  - Size: 33649964
  real  2m23.016s
  user  1m49.340s
  sys   0m35.132s
  ----
  zst defaults - Size: 47521947
  real  1m5.904s
  user  0m30.971s
  sys   0m34.021s
  ----
  zst mpthread - Size: 47521114
  real  1m3.943s
  user  0m30.905s
  sys   0m33.355s

Foxboron6y ago

It should be noted that the makepkg.conf file distributed with pacman does not contain the same compression settings as the one used to build official packages.

pacman:

    COMPRESSZST=(zstd -c -z -q -)

https://git.archlinux.org/svntogit/packages.git/tree/trunk/m...

devtools:

    COMPRESSZST=(zstd -c -T0 --ultra -20 -)

https://github.com/archlinux/devtools/blob/master/makepkg-x8...

gravitas6y ago

telendram6y ago

Without `--ultra`, the decompression memory budget is capped at 8 MB. At `--ultra -20`, it's increased to 32 MB.

That's still less than XZ, which reaches 64 MB.

1 more reply

maxpert6y ago

ncmncm6y ago

Use Lz4 where latency matters, Zstd if you can afford some CPU.

loeg6y ago

Zstd has "fast" negative levels (-5, -4, ... -1, 1, ..., 22). -4 or -5 are purportedly comparable (but not quite as good) as LZ4.

ncmncm6y ago

Better to just use Lz4, then.

emn136y ago

G4E6y ago

For those who want a TLDR : The trade off is 0.8% increase of package size for 1300% increase in decompression speed. Those numbers come from a sample of 542 packages.

agumonkey6y ago

thanks, a great change for those with SSDs

crazysim6y ago

or CPUs!

fctorial6y ago

Or RAM

Phlogi6y ago

The wiki is already up to date if you build your own or AUR packages and want to use multiple cpu cores https://wiki.archlinux.org/index.php/Makepkg#Utilizing_multi...

yjftsjthsd-h6y ago

jpgvm6y ago

yjftsjthsd-h6y ago

semi-extrinsic6y ago

Also, Arch devs probably run Arch servers, and I'd not be surprised if some of those have uptimes in hundreds of days.

1 more reply

netfl06y ago

I remember that time. I had successfully migrate 2 systems to systemd from init. One was a production server. I felt like a genius at the time. Of course all the arch devs did all the real work :)

(I wanted the challenge of running arch in production just to learn, good times)

ubercow136y ago

jonathonf6y ago

You're shocked that they consider and plan for a worst-case/edge-case scenario?

That sort of attention to detail is what continues to impress me about the Arch methodology.

computerfriend6y ago

I have a laptop in another country that I see quite infrequently and I'm pretty happy this exists.

cjbillington6y ago

xiaq6y ago

I guess that's why it is provided by an individual, instead of as an officially supported solution.

shmerl6y ago

Was XZ used in parallelized fashion? Otherwise comparing is kind of pointless. Single threaded XZ decompression is way too slow.

telendram6y ago

A little known fact is that parallel XZ do compress worse than XZ ! I measured pixz as being approximately ~2% worse than xz. That's because input is split into independent chunks.

In comparison, the 0.8% of zstd looks like a bargain.

shmerl6y ago

Is 0.8% with maximum compression? It's surprising the difference is so small.

telendram6y ago

0.8% is with Arch's default settings. It's fairly strong, but not the strongest, to preserve cpu during compression.

zstd is used at level 20, but it can compress more. Levels can go up to 22, and (complex) advanced commands are available to compress data even more.

esotericn6y ago

Multithreaded xz is non-deterministic and so it's not a candidate.

shmerl6y ago

How is it non deterministic? Works pretty consistently for me with pixz.

tempay6y ago

The bytes of the compressed file are non deterministic and depend on the number of cores used, system load and other “random” factors.

1 more reply

ComputerGuru6y ago

We are talking about decompression speed and not encryption. Decompression is necessarily deterministic.

esotericn6y ago

The compression speed is also an issue for developers. In many cases the compression step takes longer than the rest of the build.

shmerl6y ago

1 more reply

rubicks6y ago

I give thanks every day for pxz. I can churn out apt indices so much faster relative to the alternative.

shmerl6y ago

For general purposes, I like using pixz which is indexable in comparison: https://github.com/vasi/pixz

Do you know if Debian is using parallelized XZ or not with apt / dpkg?

ncmncm6y ago

Maybe worth mention that zstd is happy to work in parallel.

zerogara6y ago

Most of the results published show very little positive or negative speed in decompression, where is all this -1300% coming from?

edit: Sorry, my fault that was decompression RAM I was thinking about, not speed, although I was influenced by my test that without measuring both xz and zstd seemed instant.

dhsysusbsjsi6y ago

Quick shout out to LZFSE. Similar compression ratio to zlib but much faster.

https://github.com/lzfse/lzfse

nwah16y ago

I wonder if they will switch to using zstd for mkinititcpio

yjftsjthsd-h6y ago

I thought that was user configurable? Or do you mean by default?

nwah16y ago

Yea, by default. Last time I tried it manually, the kernel wouldn't boot. Best to have these things handled for you.

Squithrilve6y ago

mkinitcpio is being replaced with dracut so zstd won't probably happen.

Foxboron6y ago

Well, that is a bit on a 50/50 coinflip currently. There has been intentions but we need some collaboration from upstream to make this happen.

It's not set in stone currently.

nwah16y ago

Man page says zstd is an option on dracut

http://man7.org/linux/man-pages/man5/dracut.conf.5.html

Foxboron6y ago

The kernel doesn't support booting zstd compressed initramfs' yet, but you can very well use zstd compression with dracut and mkinitcpio

imtringued6y ago

This blog post probably wasted more of my time than I will ever gain from the faster decompression...

vmchale6y ago

What of lzip?

Annatar6y ago

nullc6y ago

Per the post, the speedup on decompress is _13x_ while the size is 1.008x.

And no matter how slow your network connection is and how fast your computer is you'll never take more than 0.8% longer with this change.

ubercow136y ago

Annatar6y ago

Because size does matter.

powturbo6y ago

There are nice plots [1] to see the tranfer+decompression speedup depening on the network bandwidth.

This is for html web compression, but the results are similar for other datasets. For internet transfer more compression is better than more decompression speed.

You can make your own experiments incl. the plots with turbobench [2]

[1] https://sites.google.com/site/powturbo/home/web-compression [2] https://github.com/powturbo/TurboBench

snvzz6y ago

The extra decompression complexity might be a joke on a Zen2 server, but it definitely is not in older systems.

If this was netbsd m68k, you'd probably easily understand.

Annatar6y ago

I use xz on my A1200 all of the time, and Amiga is the stereotypical system where maximum possible compression matters over everything else. Don't make assumptions about me.

snvzz6y ago

I applaud your patience. Even with my vampire, I'll use something faster whether at all possible.

May I ask, why xz over, say, PAQ8PF?

1 more reply

j / k navigate · click thread line to collapse