The point of compression is to decompress after. That's what happens during inference, and when the extrapolation occurs.
Let's say I tell GPT "write 8 times foobar". Will it? Well then it understands me and can extrapolate from the request to the proper response, without having specifically "write 8 times foobar" in its model.
Most decompression algorithms focus on predicting the next token (byte, term, etc.), believe it or not. The more accurately they predict the next token, the less information you need to store to correct misprediction.