I feel like they're kind of two sides of the same coin: learning is about putting more information in the same data, while compression is about putting the same information in less data.
I'm wondering if some lossy floating-point compressor (such as zfp) would work.