Interesting that decoding is slower than encoding. Also curious about performance on CPU.
This approach may also be susceptible to "hallucinating" inaccurate detail; you can see a little bit of this on the upper-right of the girl's circled eyelid compared to the original Kodak image. See also: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...
Yes! This. I can see us givng up decisions to 'AI' without realizing that it's just a loose association machine. Would I want to have my mortgage rate adjusted by a 'loose association'?
Most codecs have been tuned for a mean subjective score (MOS). MS-SSIM is not particularly good to fully rely on [1] [2]. In my experiments it performed poorly.
I think Google team effort will have a much bigger impact [3] just by combining all recent practical improvements in the image compression.
Meanwhile, they could have optimized images on the website a little better. Saved ~15% with my soon-to-be-obsolete tool [4].
[1] https://encode.ru/threads/2738-jpegraw?p=52583&viewfull=1#po...
[2] https://medium.com/netflix-techblog/toward-a-practical-perce...
[3] https://encode.ru/threads/2628-Guetzli-a-new-more-psychovisu...
> Lubomir holds a Ph.D. from UC Berkeley, 20 years of professional experience, 50+ issued patents and 5000+ citations.
I just hope this type of research isn't going to end in a patent encumbrance, like it did with JPEG and MPEG.
These techniques are right around the corner, no matter who invents the file formats.
So if their idea is to lock these general ideas down with more patents, I'd want them to stop their research and let people with more open intentions research this further.
Here's a Google Translate example: https://twitter.com/keff85/status/862690920805916672
I wouldn't like to lose a part of parcel in a lawsuit because an adaptive algorithm made up some details in aerial photograph so that it compresses better ...
What I want to see is an acceptable looking JPG next to a WaveOne image of the same size. Or an acceptable looking WaveOne next to a JPG of the same size.
How small is good enough? How good is small enough?
I'm just a bit struggling with their performance comparison. The graphs they present are very pretty and promising but for the presented images we're quite left in the dark. They dump some images and theirs looks prettier and the authors give us some indication of quality but it's not conclusive evidence that their method produces better images. Typically when different compressed images are presented two things can vary: quality and file-size. In the presented images both seem to varying without telling us which is which. Also, there is no baseline to compare against, either in terms of filesize or what the should look like. Sure, we humans can make a very educated guess but it is just sloppy to not include the uncompressed original image.
I will be fully convinced when I can try it for myself on my own image set.
However, it also has a kind of adaptive ML-ish approach so it might be technically similar.
> FLIF is based on MANIAC compression. MANIAC (Meta-Adaptive Near-zero Integer Arithmetic Coding) is an algorithm for entropy coding developed by Jon Sneyers and Pieter Wuille. It is a variant of CABAC (context-adaptive binary arithmetic coding), where instead of using a multi-dimensional array of quantized local image information, the contexts are nodes of decision trees which are dynamically learned at encode time. This means a much more image-specific context model can be used, resulting in better compression.
They obviously have some good image priors here, if I were them I would consider applying this tech to other image-related things, like image manipulation, or image search. Although competition is heating up quickly in these fields...
Literally the first sentence of the linked page:
"Even though over 70% of internet traffic today is digital media, the way images and video are represented and transmitted has not evolved much in the past 20 years (apart from Pied Piper's Middle-Out algorithm)."
EDIT: This is embarrassing that not one person in this thread seems to have actually read any of the paper. Now with obvious evidence that this is fiction, people still don't want to believe it.
https://scholar.google.ca/citations?user=reEAEWsAAAAJ&hl=en
https://scholar.google.ca/citations?user=OXFjRnEAAAAJ&hl=en
Company entry on Linkedin: https://www.linkedin.com/company-beta/12953035/
Basically each domain you want to do well in you need a knowledge set that is trained on that data. Then you need a discriminator on the compression side to classify an image or subregions of an image into those categories.
If you can make the knowledge sets downloadable on demand and then cached you can be incredibly efficient over the long term, while maintaining very small initial download sets. I think evolveable knowledge sets over time also ensure that the codex is flexible to handle currently not foreseen situations. Nobody wants a future where are DL-based image/video compression tool only knows a few pre-determined sets and is mediocre on everything else.
[1] https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...
[2] https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...
This is great news!
I'd actually like to see the plot, though. (Both for encoding and decoding.) It stands to reason that a neural network can optimize image compression, as it can encode high-level information like "this is a face". But encoding / decoding speed is the sticking point, so I feel successes there should be emphasized.
The necessity of having a GPU doesn't seem problematic nowadays; everything has one. Testing it with a mobile-grade GPU would be interesting.
As long as the decompressor needs just an image file and no other data, it's a fair game.
Is this image compression tool good at images it was not trained on?
How bad does it get in those situations?
Is this training data fixed into the codex forever? Will there be slightly different image codexs that have different training data? That would be sort of hellish.
One can improve e.g. H.265 somewhat easily, if software only solution is an option. But if you need cheap hardware only solution then ML-required-way seems a bit too expensive.
Directly from the paper's PDF:
"Finally, Pied Piper has recently claimed to employ ML techniques in its Middle-Out algorithm (Judge et al., 2016), although their nature is shrouded in mystery."
This whole thread is like being the only sane person in an asylum.
Also, it isn't a 'real implementation' since there isn't any source code to reproduce the results.
On the actual web page the first line of its abstract:
"Even though over 70% of internet traffic today is digital media, the way images and video are represented and transmitted has not evolved much in the past 20 years (apart from Pied Piper's Middle-Out algorithm)."
From the PDF:
https://arxiv.org/pdf/1705.05823.pdf
The end of Section 2.2. ML-based lossy image compression Right above 2.3. Generative Adversarial Networks
Theis et al. (2016) and Ball ́ e et al. (2016) quantize rather than binarize, and propose strategies to approximate the entropy of the quantized representation: this provides them with a proxy to penalize it. Finally, Pied Piper has recently claimed to employ ML techniques in its Middle-Out algorithm (Judge et al., 2016), although their nature is shrouded in mystery.
edit: as for the other images: it would be indeed nice to see those.
[1] http://r0k.us/graphics/kodak/ [2] http://r0k.us/graphics/kodak/kodim15.html