My understanding is Zstd was designed to be fast, not necessarily efficient, and the other algorithms listed (e.g. 7-zip, gzip) are quite ancient.
However, I don't think the performance is a fundamental law, and there are clearly some clever optimizations yet-to-be done.
I was very surprised that the lzma-brotli mashup outperformed either. This leads me to think that with enough community involvement we could discover some really clever heuristics and algorithms for compression all together.
Now LZ77 compressors have the advantage of being really popular, and many of the leading compressors at various points on the space/time frontier use it, but there a large number of other interesting compression algorithms as well, which this IR cuts out.
A big downside of LZ77 is that it is simple enough that we had optimal parsing for it a long time ago, and it isn't likely to be the subject of any big breakthough in the future: much of the interesting work will probably happen in non-LZ77 style compressors.
One of the interesting areas for LZ77, however, is coding-sensitive-parsing: i.e., basing your match-finding decisions on the actual code lengths to represent specific matches, literals runs, etc. Since these code lengths are usually dynamic, set by the entropy coding backend, you often want a back-and-forth type of approach between the two components: either multiple passes or two-way feedback.
The approach of generating IR once and then feeding it into the DivANS entropy coding backend seems to preclude this.
Without -mixing=2 (eg for the results present in the blog post), DivANS actually relies on a static determination about which models to use in which situations, serialized into the file in the mixing map. It only ever evaluates a single model per nibble.
The idea of mashing up the intermediate representation with a bunch of different compressors isn't present in zpaq as far as I know, but the spirit of the idea is similar, but on a per-bit level, rather than on the IR level.
I don't understand how probability tables can compress AND be deterministic. Intuitively, I would think it's a trade-off between the two competing effect.
You can view this Huffman Table or Morse codebook as a probability distribution on the likely letters to follow. A symbol with a small number of bits == high probability. A symbol needing large number of bits == low probability.
The only innovation with Arithmetic coding and ANS is that they allow for "fractional bits:" you don't need a whole dot for a very likely next-letter. But the idea is the same: you are taking your understanding of the probability distribution table and using it to come up with codes for the following letters.
One caveat is that instead of agreeing to a fixed code upfront, you agree to an algorithm to dynamically estimate the future probability tables as you read through the file.
Will pass this to a few colleagues.