Its a giant guesswork of what was there originally. Reminds me of Xerox scanners lying about scanned in numbers http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...
The only difference is that this compression is better at hallucinating, so you don't get ringing artifacts or blocks, but some internally consistent alternate reality.
If you don't want to lose data you should not use lossy compression at all. JPEG can erase the distinction between digits as well.
Because it turns out that fuzziness and compression artifacts have a higher-level meaning: when you see them, you know something has been lost. That's an important (if inadvertent) signal. We need to make sure the artifacts don't go away.
In other words, mostly only unimportant details are hallucinated, which is what we want.
Kinda what would happen if you use a perfect painter with a blurry memory.
This seems to be for static images, but this gets me wondering if an RNN can be used and have better motion prediction that other current "hard coded" solutions.
Also, the more specific the domain, the better the compression, since it can specialize. I'm wondering about the practical applications of this. Do we have different baselines that can be used for different use cases?
source: I worked with both HEVC and neural network based compression.
I would be surprised if there wasn't a way to provide a learned, lossless method of compression, but that would be a very different paper and result.
I found this particularly interesting "To compress an image x ∈ X , we follow the formulation of [20, 8] where one learns an encoder E, a decoder G, and a finite quantizer q."
I feel like this is related to some of the standard human memorization/learning techniques. Example: I'm learning the guitar fretboard note placement in e standard. It's difficult for me to visualize the first 4 frets on a 6 string guitar with notes on each fret.
To help me memorize the note placement I develop various mnemonic devices (both lossy and lossless). I know I've memorized the fretboard sufficiently when I can visualize it.
Attempting to translate my reading of the paper I believe the following analogy is apt. My "encoder" operates on a short term image when I close my eyes after looking at a fret diagram. It produces semantic objects, i.e. an ordered sequence of "letters" or pairs of letters (letters that are horizontally, vertically or diagonally aligned). The quantizer takes these objects and looks at the order/distribution. The quantizer places more importance on some of the semantic objects than others (the fourth fret has 4 natural notes before an accidental). My decoder is interpreting the stored/compressed note information to try to produce the image. It may be off substantially, so I correct and repeat the process.
The process of optimizing what the semantic objects are, the weight each gets, and how I use them to derive the original image seems like a fairly good representation of what I do (though at least some of that appears to be fixed in the learning algorithm typically). Of course, analogies are just that and mine doesn't take into account the discriminator or the remaining "heart" of the paper.
I think the heart of the paper is that they're trying to determine through GANs a good way to both store the image and recover it while reducing bits per pixel and increasing the quality of reproduction. Using some classical terms, the GAN algorithm thus tweaks the compressor, the data storage format and the decompressor to optimize what should be "hard-coded" in the compressing/decompressing process or program vs what will be stored as a result of the compression program.
Very handwavey but I think the general idea is right?
For example, if trained on faces, it will learn features for things like eyes and mouths. So the image can be encoded as put a mouth of this type with this width at this location rather than operating at the level of pixels.
If trained on text, it might learn features related to letters and typography (boldness, italics, size, spacing). So it might encode things as Helevetica, 16pt, italics.
This is a gross oversimplification, and things rarely map exactly to concepts humans would use, but hopefully it communicates the concept.