http://mattmahoney.net/dc/text.html
http://mattmahoney.net/dc/uiq/
Matt Mahoney mantains the best benchmarks on text and generic compression. Some of the best on the field (like Matt) usually hang out at encode.ru.
What else did you find odd/wrong?
Thanks for the shout out. BTW Matt's benchmarks are for all major compression engines, not only bzip2.
I usually use -0 with xz because it is extremely fast and memory-efficient yet still compresses better than gzip or bzip2. You can also use -0e for a slower, better compression that still requires only 1 MB to decompress. This way the decompressor can run entirely within the CPU's cache.
http://mattmahoney.net/dc/text.html#2118
http://mattmahoney.net/dc/text.html#2300
Proper analysis would need benchmarking with different data and different flags for all compressors.
"Pigz compresses using threads to make use of multiple processors and cores. The input is broken up into 128 KB chunks with each compressed in parallel. The individual check value for each chunk is also calculated in parallel. The compressed data is written in order to the output, and a combined check value is calculated from the individual check values.
The compressed data format generated is in the gzip, zlib, or single-entry zip format using the deflate compression method. The compression produces partial raw deflate treams which are concatenated by a single write thread and wrapped with the appropriate header and trailer, where the trailer contains the combined check value.
Each partial raw deflate stream is terminated by an empty stored block (using the Z_SYNC_FLUSH option of zlib), in order to end that partial bit stream at a byte boundary. That allows the partial streams to be concatenated simply as sequences of bytes. This adds a very small four to five byte overhead to the output for each input chunk. The default input block size is 128K, but can be changed with the -b option. The number of compress threads is set by default to the number of online processors, which can be changed using the -p option. Specifying -p 1 avoids the use of threads entirely. The input blocks, while compressed independently, have the last 32K of the previous block loaded as a preset dictionary to preserve the compression effectiveness of deflating in a single thread. This can be turned off using the -i or --independent option, so that the blocks can be decompressed independently for partial error recovery or for random access.
Decompression can’t be parallelized, at least not without specially prepared deflate streams for that purpose. As a result, pigz uses a single thread (the main thread) for decompression, but will create three other threads for reading, writing, and check calculation, which can speed up decompression under some circumstances. Parallel decompression can be turned off by specifying one process ( -dp 1 or -tp 1 )."
I also just updated the article with this info.
josh@snoopy:~/Downloads $ grep -m2 -i intel /proc/cpuinfo
vendor_id : GenuineIntel
model name : Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz
josh@snoopy:~/Downloads $ ls -l test
-rw-r--r-- 1 josh josh 1073741824 2012-03-07 20:06 test
josh@snoopy:~/Downloads $ time gzip test
real 0m16.430s
user 0m10.210s
sys 0m0.490s
josh@snoopy:~/Downloads $ time pigz test
real 0m5.028s
user 0m16.040s
sys 0m0.620s
Looks good.. although the man page describes it as being "an almost compatible replacement for the gzip program".I wonder what the speed looks like with two threads? (pigz -p 2) ?
josh@snoopy:~/Downloads $ dd if=/dev/zero of=test bs=1024K count=1000
josh@snoopy:~/Downloads $ gzip -l test.gz
compressed uncompressed ratio uncompressed_name
1176024 1048576000 99.9% test.txt
with -p2 it runs ~2 sec slowerOne example (using the standard tools, not the multithreaded ones):
igorhvr:tmp/ $ ls -la co_2011-08_import.mdb*
-rw-r--r-- 1 igorhvr igorhvr 261832704 2011-10-11 07:19 co_2011-08_import.mdb
igorhvr:tmp/ $ sudo time nice -n -20 bzip2 -k co_2011-08_import.mdb
42.85user 0.14system 0:43.16elapsed 99%CPU (0avgtext+0avgdata 31968maxresident)k 112inputs+0outputs (3major+2268minor)pagefaults 0swaps
igorhvr:tmp/ $ sudo time nice -n -20 xz -3 -k co_2011-08_import.mdb
30.99user 0.19system 0:31.48elapsed 99%CPU (0avgtext+0avgdata 130640maxresident)k 96inputs+0outputs (2major+8389minor)pagefaults 0swaps
igorhvr:tmp/ $ ls -la co_2011-08_import.mdb*
-rw-r--r-- 1 igorhvr igorhvr 261832704 2011-10-11 07:19 co_2011-08_import.mdb
-rw-r--r-- 1 igorhvr igorhvr 24020243 2011-10-11 07:19 co_2011-08_import.mdb.bz2
-rw-r--r-- 1 igorhvr igorhvr 23949408 2011-10-11 07:19 co_2011-08_import.mdb.xz
In this case xz compressed roughly to the same size (slighly better than bzip2, actually) using only 72% of the time. This is often the case.In that case you might have, say, 512 partitions and you can farm out compression, decompression and other tasks to as many CPUs as you want, even other machines in a cluster.