Real-Time Adaptive Image Compression (opens in new tab)

(wave.one)

72 pointscardigan9y ago48 comments

48 comments

Nice work, but disingenuous to not include a BPG (HEVC) image for comparison -- BPG is close to state-of-the-art, not WebP -- even their own SSIM charts show this.

Interesting that decoding is slower than encoding. Also curious about performance on CPU.

This approach may also be susceptible to "hallucinating" inaccurate detail; you can see a little bit of this on the upper-right of the girl's circled eyelid compared to the original Kodak image. See also: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...

tomaskafka9y ago

> This approach may also be susceptible to "hallucinating" inaccurate detail

Yes! This. I can see us givng up decisions to 'AI' without realizing that it's just a loose association machine. Would I want to have my mortgage rate adjusted by a 'loose association'?

https://twitter.com/keff85/status/862690920805916672

vladdanilov9y ago

As someone interested in the field, this does not look much different from state-of-the-art video codecs, e.g. variable-size blocks, wavelet, predicator, arithmetic coding. Only that the predicator is trained on real data. But symbol dictionaries are already used in modern compressors like brotli and zstd.

Most codecs have been tuned for a mean subjective score (MOS). MS-SSIM is not particularly good to fully rely on [1] [2]. In my experiments it performed poorly.

I think Google team effort will have a much bigger impact [3] just by combining all recent practical improvements in the image compression.

Meanwhile, they could have optimized images on the website a little better. Saved ~15% with my soon-to-be-obsolete tool [4].

[1] https://encode.ru/threads/2738-jpegraw?p=52583&viewfull=1#po...

[2] https://medium.com/netflix-techblog/toward-a-practical-perce...

[3] https://encode.ru/threads/2628-Guetzli-a-new-more-psychovisu...

[4] http://getoptimage.com

amelius9y ago

From their website:

> Lubomir holds a Ph.D. from UC Berkeley, 20 years of professional experience, 50+ issued patents and 5000+ citations.

I just hope this type of research isn't going to end in a patent encumbrance, like it did with JPEG and MPEG.

These techniques are right around the corner, no matter who invents the file formats.

So if their idea is to lock these general ideas down with more patents, I'd want them to stop their research and let people with more open intentions research this further.

sitkack9y ago

This also looks like a meta-algorithm, an algorithmic way to generate domain specific compressors, potentially anything this thing creates would also be covered by patents.

stillhere9y ago

They are going to capitalize on it, make no mistake.

tomaskafka9y ago

A rarely discussed danger of all machine learning models: If they don't know the answer, they'll rather make something up.

Here's a Google Translate example: https://twitter.com/keff85/status/862690920805916672

I wouldn't like to lose a part of parcel in a lawsuit because an adaptive algorithm made up some details in aerial photograph so that it compresses better ...

maaark9y ago

So is this only good for ridiculously low target filesizes? Noone in their right mind is going to compress a "480x480 image to a file size of 2.3kB"

What I want to see is an acceptable looking JPG next to a WaveOne image of the same size. Or an acceptable looking WaveOne next to a JPG of the same size.

How small is good enough? How good is small enough?

espadrine9y ago

The use of a small image is for illustrative purposes. Saying "this image is less bytes than that one" is less striking than "look at those two images: they have the same size, but one is ugly".

SimplyUnknown9y ago

I wonder how this compares to FLIF. I also tried to compress images based on shape and structure but by approximating these using skeletons.

I'm just a bit struggling with their performance comparison. The graphs they present are very pretty and promising but for the presented images we're quite left in the dark. They dump some images and theirs looks prettier and the authors give us some indication of quality but it's not conclusive evidence that their method produces better images. Typically when different compressed images are presented two things can vary: quality and file-size. In the presented images both seem to varying without telling us which is which. Also, there is no baseline to compare against, either in terms of filesize or what the should look like. Sure, we humans can make a very educated guess but it is just sloppy to not include the uncompressed original image.

I will be fully convinced when I can try it for myself on my own image set.

bhouston9y ago

FLIF is lossless. So very very different.

vanderZwan9y ago

Err... FLIF is intended for lossless compression, but can be lossy (it doesn't perform as well as intentionally lossy codecs, although that might also be a matter of optimising for it).

However, it also has a kind of adaptive ML-ish approach so it might be technically similar.

> FLIF is based on MANIAC compression. MANIAC (Meta-Adaptive Near-zero Integer Arithmetic Coding) is an algorithm for entropy coding developed by Jon Sneyers and Pieter Wuille. It is a variant of CABAC (context-adaptive binary arithmetic coding), where instead of using a multi-dimensional array of quantized local image information, the contexts are nodes of decision trees which are dynamically learned at encode time. This means a much more image-specific context model can be used, resulting in better compression.

http://flif.info/

svantana9y ago

Very impressive work, though it seems like a mistake to focus on compression, which gets less valuable as storage and bandwidth gets cheaper. You need only look to the staying power of jpeg, which is so far from the state of the art, yet it's not going anywhere. Why? The demand for replacing it is not strong enough.

They obviously have some good image priors here, if I were them I would consider applying this tech to other image-related things, like image manipulation, or image search. Although competition is heating up quickly in these fields...

CyberDildonics9y ago

It isn't 'very impressive work', it is marketing for Silicon Valley (impressive marketing though).

Literally the first sentence of the linked page:

"Even though over 70% of internet traffic today is digital media, the way images and video are represented and transmitted has not evolved much in the past 20 years (apart from Pied Piper's Middle-Out algorithm)."

EDIT: This is embarrassing that not one person in this thread seems to have actually read any of the paper. Now with obvious evidence that this is fiction, people still don't want to believe it.

bhouston9y ago

There are real people behind it:

https://scholar.google.ca/citations?user=reEAEWsAAAAJ&hl=en

https://scholar.google.ca/citations?user=OXFjRnEAAAAJ&hl=en

Company entry on Linkedin: https://www.linkedin.com/company-beta/12953035/

1 more reply

dschndstryr9y ago

It's humor. Everything else looks kinda real. The science is bad though, because it lacks all details needed to reproduce and the evaluation is a bit fishy. They also claim it's super fast without mentioning what makes their approach fast.

1 more reply

bhouston9y ago

Deep learning will be a great way to do compression for sure, both for audio, video and images. I could see that one could download "knowledge sets" for these decompressors. Looking at Google Earth, download the supplemental "knowledge set" for overhead shots of cities and country side. Looking at people, download the supplemental "knowledge set" for faces and clothing, etc.

Basically each domain you want to do well in you need a knowledge set that is trained on that data. Then you need a discriminator on the compression side to classify an image or subregions of an image into those categories.

If you can make the knowledge sets downloadable on demand and then cached you can be incredibly efficient over the long term, while maintaining very small initial download sets. I think evolveable knowledge sets over time also ensure that the codex is flexible to handle currently not foreseen situations. Nobody wants a future where are DL-based image/video compression tool only knows a few pre-determined sets and is mediocre on everything else.

thesz9y ago

The "knowledge set" would be a small matrix - the one that allows to decode whatever encoder has put into compressed data. I guess it will be in volume range of 8x8 16-bit floats or so. Maybe three to six such matrices per channel.

discreditable9y ago

This encoder seems to have some weird distortions that are most visible in the aerial shots. Compare [1] and [2]. The lines on the basketball court are distorted and curved. If you look closely, there is also curvature added to the sidewalks where there isn't any. In case those links break, I'm referring to the top row of aerial images.

[1] https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...

[2] https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...

tomaskafka9y ago

Yep - and see how it almost disappeared the white car in the shadow in the left middle part. "We've deleted your only alibi so that our data would compress better ..."

AstralStorm9y ago

Lossy codec being lossy is a feature, not a drawback.

espadrine9y ago

> While we are slightly faster than JPEG (libjpeg) and significantly faster than JPEG 2000, WebP and BPG, our codec runs on a GPU and traditional codecs do not — so we do not show this comparison.

This is great news!

I'd actually like to see the plot, though. (Both for encoding and decoding.) It stands to reason that a neural network can optimize image compression, as it can encode high-level information like "this is a face". But encoding / decoding speed is the sticking point, so I feel successes there should be emphasized.

The necessity of having a GPU doesn't seem problematic nowadays; everything has one. Testing it with a mobile-grade GPU would be interesting.

SimplyUnknown9y ago

Thing is, "running on GPU" might mean "uses CUDA" which would make it more problematic

boromi9y ago

I'm going to need to see this code in practise to believe it.

madez9y ago

I didn't find the code. Did you have more luck?

CyberDildonics9y ago

It is a joke paper as a marketing stunt for Silicon Valley. I would bet they could get it accepted to some journals / conferences too since it looks extremely convincing.

rothron9y ago

Seems like a slightly unfair comparison. Training the compressor moves data from the images into the compressor, making the bit per pixel evaluation slightly more iffy.

huhtenberg9y ago

Not really.

As long as the decompressor needs just an image file and no other data, it's a fair game.

bhouston9y ago

How large is the decompressor to download?

Is this image compression tool good at images it was not trained on?

How bad does it get in those situations?

Is this training data fixed into the codex forever? Will there be slightly different image codexs that have different training data? That would be sort of hellish.

1 more reply

ClassyJacket9y ago

As long as they don't test on the training set it's a fair comparison isn't it?

mcraiha9y ago

One big reason for "hardcoded" encoders and decoders is that they much easier to implement in hardware.

One can improve e.g. H.265 somewhat easily, if software only solution is an option. But if you need cheap hardware only solution then ML-required-way seems a bit too expensive.

CyberDildonics9y ago

Does no one realize this is a joke / marketing?

Directly from the paper's PDF:

"Finally, Pied Piper has recently claimed to employ ML techniques in its Middle-Out algorithm (Judge et al., 2016), although their nature is shrouded in mystery."

akx9y ago

Or, you know, that could just be a humorous reference to the TV series, while this is a real implementation.

web0079y ago

This is my read too, since they're citing Judge et al versus the characters or the characters' papers. Its the same vein as when Dropbox's Lepton article cited middle-out as a humorous attempt at self-promotion.

CyberDildonics9y ago

So you think they would put a humorous reference to a TV show in the synopsis of their groundbreaking image compression paper and again as an academic reference in their paper?

This whole thread is like being the only sane person in an asylum.

Also, it isn't a 'real implementation' since there isn't any source code to reproduce the results.

jstanley9y ago

I couldn't find that in the PDF?

CyberDildonics9y ago

Are you asking me whether or not you couldn't find it in the PDF?

On the actual web page the first line of its abstract:

From the PDF:

https://arxiv.org/pdf/1705.05823.pdf

The end of Section 2.2. ML-based lossy image compression Right above 2.3. Generative Adversarial Networks

Theis et al. (2016) and Ball ́ e et al. (2016) quantize rather than binarize, and propose strategies to approximate the entropy of the quantized representation: this provides them with a proxy to penalize it. Finally, Pied Piper has recently claimed to employ ML techniques in its Middle-Out algorithm (Judge et al., 2016), although their nature is shrouded in mystery.

1 more reply

bhouston9y ago

Is this open source or something that you are aiming to license?

creo9y ago

Where is PNG?

H4CK3RM4N9y ago

I think this is lossy compression, so it doesnt really overlap w/PNG in terms of performance/file size goals.

creo9y ago

Oh, right. I didn't consider that.

hojijoji9y ago

for some reaso they do not show the uncompressed image for comparison

syberspace9y ago

They mentioned the Kodak dataset[1] in the second paragraph. It seems to be Kodak Image 15[2]

edit: as for the other images: it would be indeed nice to see those.

[1] http://r0k.us/graphics/kodak/ [2] http://r0k.us/graphics/kodak/kodim15.html

j / k navigate · click thread line to collapse

48 comments

trevyn9y ago

Nice work, but disingenuous to not include a BPG (HEVC) image for comparison -- BPG is close to state-of-the-art, not WebP -- even their own SSIM charts show this.

Interesting that decoding is slower than encoding. Also curious about performance on CPU.

tomaskafka9y ago

> This approach may also be susceptible to "hallucinating" inaccurate detail

Yes! This. I can see us givng up decisions to 'AI' without realizing that it's just a loose association machine. Would I want to have my mortgage rate adjusted by a 'loose association'?

https://twitter.com/keff85/status/862690920805916672

vladdanilov9y ago

Most codecs have been tuned for a mean subjective score (MOS). MS-SSIM is not particularly good to fully rely on [1] [2]. In my experiments it performed poorly.

I think Google team effort will have a much bigger impact [3] just by combining all recent practical improvements in the image compression.

Meanwhile, they could have optimized images on the website a little better. Saved ~15% with my soon-to-be-obsolete tool [4].

[1] https://encode.ru/threads/2738-jpegraw?p=52583&viewfull=1#po...

[2] https://medium.com/netflix-techblog/toward-a-practical-perce...

[3] https://encode.ru/threads/2628-Guetzli-a-new-more-psychovisu...

[4] http://getoptimage.com

amelius9y ago

From their website:

> Lubomir holds a Ph.D. from UC Berkeley, 20 years of professional experience, 50+ issued patents and 5000+ citations.

I just hope this type of research isn't going to end in a patent encumbrance, like it did with JPEG and MPEG.

These techniques are right around the corner, no matter who invents the file formats.

So if their idea is to lock these general ideas down with more patents, I'd want them to stop their research and let people with more open intentions research this further.

sitkack9y ago

This also looks like a meta-algorithm, an algorithmic way to generate domain specific compressors, potentially anything this thing creates would also be covered by patents.

stillhere9y ago

They are going to capitalize on it, make no mistake.

tomaskafka9y ago

A rarely discussed danger of all machine learning models: If they don't know the answer, they'll rather make something up.

Here's a Google Translate example: https://twitter.com/keff85/status/862690920805916672

I wouldn't like to lose a part of parcel in a lawsuit because an adaptive algorithm made up some details in aerial photograph so that it compresses better ...

maaark9y ago

So is this only good for ridiculously low target filesizes? Noone in their right mind is going to compress a "480x480 image to a file size of 2.3kB"

What I want to see is an acceptable looking JPG next to a WaveOne image of the same size. Or an acceptable looking WaveOne next to a JPG of the same size.

How small is good enough? How good is small enough?

espadrine9y ago

The use of a small image is for illustrative purposes. Saying "this image is less bytes than that one" is less striking than "look at those two images: they have the same size, but one is ugly".

SimplyUnknown9y ago

I wonder how this compares to FLIF. I also tried to compress images based on shape and structure but by approximating these using skeletons.

I will be fully convinced when I can try it for myself on my own image set.

bhouston9y ago

FLIF is lossless. So very very different.

vanderZwan9y ago

Err... FLIF is intended for lossless compression, but can be lossy (it doesn't perform as well as intentionally lossy codecs, although that might also be a matter of optimising for it).

However, it also has a kind of adaptive ML-ish approach so it might be technically similar.

http://flif.info/

svantana9y ago

CyberDildonics9y ago

It isn't 'very impressive work', it is marketing for Silicon Valley (impressive marketing though).

Literally the first sentence of the linked page:

EDIT: This is embarrassing that not one person in this thread seems to have actually read any of the paper. Now with obvious evidence that this is fiction, people still don't want to believe it.

bhouston9y ago

There are real people behind it:

https://scholar.google.ca/citations?user=reEAEWsAAAAJ&hl=en

https://scholar.google.ca/citations?user=OXFjRnEAAAAJ&hl=en

Company entry on Linkedin: https://www.linkedin.com/company-beta/12953035/

1 more reply

dschndstryr9y ago

1 more reply

bhouston9y ago

thesz9y ago

discreditable9y ago

[1] https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...

[2] https://static1.squarespace.com/static/57c8be4459cc68c3e3d7b...

tomaskafka9y ago

Yep - and see how it almost disappeared the white car in the shadow in the left middle part. "We've deleted your only alibi so that our data would compress better ..."

AstralStorm9y ago

Lossy codec being lossy is a feature, not a drawback.

espadrine9y ago

This is great news!

The necessity of having a GPU doesn't seem problematic nowadays; everything has one. Testing it with a mobile-grade GPU would be interesting.

SimplyUnknown9y ago

Thing is, "running on GPU" might mean "uses CUDA" which would make it more problematic

boromi9y ago

I'm going to need to see this code in practise to believe it.

madez9y ago

I didn't find the code. Did you have more luck?

CyberDildonics9y ago

It is a joke paper as a marketing stunt for Silicon Valley. I would bet they could get it accepted to some journals / conferences too since it looks extremely convincing.

rothron9y ago

Seems like a slightly unfair comparison. Training the compressor moves data from the images into the compressor, making the bit per pixel evaluation slightly more iffy.

huhtenberg9y ago

Not really.

As long as the decompressor needs just an image file and no other data, it's a fair game.

bhouston9y ago

How large is the decompressor to download?

Is this image compression tool good at images it was not trained on?

How bad does it get in those situations?

Is this training data fixed into the codex forever? Will there be slightly different image codexs that have different training data? That would be sort of hellish.

1 more reply

ClassyJacket9y ago

As long as they don't test on the training set it's a fair comparison isn't it?

mcraiha9y ago

One big reason for "hardcoded" encoders and decoders is that they much easier to implement in hardware.

One can improve e.g. H.265 somewhat easily, if software only solution is an option. But if you need cheap hardware only solution then ML-required-way seems a bit too expensive.

CyberDildonics9y ago

Does no one realize this is a joke / marketing?

Directly from the paper's PDF:

"Finally, Pied Piper has recently claimed to employ ML techniques in its Middle-Out algorithm (Judge et al., 2016), although their nature is shrouded in mystery."

akx9y ago

Or, you know, that could just be a humorous reference to the TV series, while this is a real implementation.

web0079y ago

CyberDildonics9y ago

So you think they would put a humorous reference to a TV show in the synopsis of their groundbreaking image compression paper and again as an academic reference in their paper?

This whole thread is like being the only sane person in an asylum.

Also, it isn't a 'real implementation' since there isn't any source code to reproduce the results.

jstanley9y ago

I couldn't find that in the PDF?

CyberDildonics9y ago

Are you asking me whether or not you couldn't find it in the PDF?

On the actual web page the first line of its abstract:

From the PDF:

https://arxiv.org/pdf/1705.05823.pdf

The end of Section 2.2. ML-based lossy image compression Right above 2.3. Generative Adversarial Networks

1 more reply

bhouston9y ago