Neural Enhance – Super Resolution for images using deep learning (opens in new tab)

(github.com)

221 pointseejr9y ago51 comments

51 comments

We enhanced the image like on CSI and look, the defendants face!

"Because my photos were used heavily in the dataset..."

Jury: So guilty

Defense should train a network with faces of the jury and then show how the same technique, run by their biased network, now shows each of them in the scene of the crime :)

1 more reply

kelvin09y ago

Wow, we'll finally blow the lid off all these conspiracy theories when we 'unblur' the pictures of : Sasquatch, UFOs, Loch Ness Monster ... :)

nullc9y ago

Comparison using nearest neighbor, instead of a more reasonable linear filter, or-- heaven forbid-- some edge basic directed interpolator... is a little cheaty.

dharma19y ago

Agreed, it would have been nice to show other upscaling algorithms. But neural net super resolution generators can still have significantly more detail at 4-8x, as shown here http://arxiv.org/abs/1609.04802

nuclai9y ago

(Author here.) Yeah, I knew this would come up but decided to proceed with the pixelated comparison anyway. I couldn't get the GIFs to reflect the results because of 8-bit quantization/dithering. The images show the neural network inputs and outputs, not a comparison with other super-resolution algorithms (still fascinating :-).

I'm working on the Docker instance now, that should help anyone with interest/experience in the field compare results easily.

antome9y ago

Really cool to see them taking a page out of X264, trying to match the "energy" of the image rather than the "correctness".

Leynos9y ago

A friend of mine suggested that an approach similar to this could be used to upscale old standard definition TV shows (specifically, those shot on video rather than film). I'd imagine that multiple specially trained networks would be employed for different parts of the image (trained on pictures of individual performers or types of set/background). Pleased to see that this is possible. Is there anyone doing something along those lines already?

Mithaldu9y ago

It should also be possible to train it on itself to improve moving scenes by using the motion itself as temporal super-sampling, just like the human eye does.

mm_alex9y ago

this works quite well, and does not necessarily require any NN/machine learning. see the youtube for this paper https://www.disneyresearch.com/publication/scenespace/ tldr simple brute force weighted average of samples from many frames, combined with a noisy/low quality depth-from-motion estimate can be used to de-noise, increase resolution and otherwise manipulate video footage. very cool paper with great results from a simple technique.

ttul9y ago

Ooh

louprado9y ago

As you suggested, continuity of appearance is what makes this problem so difficult.

I recall watching a movie that was converted from black-and-white to color as a child. There were many distracting artifacts. Most notable was the hairlines of the actors would shift as the actor rotated their head. It made the film unwatchable.

nuclai9y ago

(Author here.) Absolutely! Using multiple super-resolution networks, not only continuity would present problems, but also blending between different regions. I agree there's a lot of value for domain-specific networks here, as you can see from the faces example on GitHub.

I'd be curious to see an ensemble-based super-resolution, where each model can output the confidence of a pixel region, then have another network learn to blend the result.

Conversely, these results are achieved using a single top-of-range GPU. Everything fits in memory for a batch-size 15 at 192x192. By distributing the training somehow, you could make the network 10x bigger and train for a whole week and likely get much better general purpose results.

Keyframe9y ago

Is there anyone doing something along those lines already?

I have a side business doing Film restoration and am not aware of any solution like that. Probably the best upscale solution there is is from Teranex, acquired by BlackMagic Design. Evertz probably also has something in their offering.

Houshalter9y ago

It should work, I don't think you need to bother with training it on individual performers. Someone made a thing like this to improve low res anime, that worked well.

In theory you could use this to increase temporal resolution as well. Turn 24 fps movies into 60 fps, and upscale regular HD to 4k.

ingenter9y ago

Here's a list of various image interpolation techniques, with similar goal:

- http://www.wisdom.weizmann.ac.il/~vision/SingleImageSR.html

- http://chiranjivi.tripod.com/EDITut.html

- http://www.tecnick.com/pagefiles/appunti/iNEDI_tesi_Nicola_A...

- http://www.eurasip.org/Proceedings/Eusipco/Eusipco2009/conte...

- http://bengal.missouri.edu/~kes25c/

http://bengal.missouri.edu/~kes25c/nnedi3.zip

http://forum.doom9.org/showthread.php?t=147695

- http://arxiv.org/pdf/1501.00092v2.pdf

http://waifu2x.udp.jp/

https://github.com/nagadomi/waifu2x

http://waifu2x-avisynth.sunnyone.org/

https://github.com/sunnyone/Waifu2xAvisynth

- http://i-programmer.info/news/192-photography-a-imaging/1010...

https://github.com/david-gpu/srez

- http://arxiv.org/pdf/1609.04802v1.pdf

andai9y ago

My question is offtopic, but how do you keep lists of urls like that? Do you just use text files? I'm struggling with too much to read

oh_sigh9y ago

The key when you have too much to read is losing links, not retaining them better.

osense9y ago

I like pinboard.in

placebo9y ago

It definitely makes a significant qualitative improvement, making the picture appear more in sync with what our brain interprets as a higher resolution picture, but my first thought is whether this particular example goes beyond aesthetics. Is there really any instance where this method could for instance turn an unintelligible picture of a license plate to something in which the characters can be recognised? More generally, I wonder whether there has been any research on the limits - i.e, what needs to be the combined minimal size of the information stored in the neural network plus the information on its inputs before the output can be said to be true to the source with probability x ?

semi-extrinsic9y ago

I imagine that if you trained this on a set of license plate photos, it would be able to enhance license plates illegible to an untrained human such that they're readable. However, I doubt it would be better than a human specifically trained at this task.

I've seen some videos from Cold War satellite photo analysts, and the way they can look at some tiny gray blobs and go "That's a T-64 tank, that's a T-62 tank, that's an SA2 launcher" etc.

PeterisP9y ago

Well, it doesn't create any information that wasn't in the original data (nothing can do that, you can only lose information in processing) so if e.g. the characters can be recognized in the processed image of a licence plate, then by definition they could have been recognized from the original data as well in some manner.

However, they can make things more easily interpretable by humans. A rough analogy is turning up the contrast - given a very dark image of licence plate where the black parts are totally black (#000000) and white parts are just very dark (#010101), the characters definitely can be recognized even while human in normal conditions would just see it as totally black, and processing would help.

placebo9y ago

> Well, it doesn't create any information that wasn't in the original data (nothing can do that, you can only lose information in processing)

I'm not sure this is correct. In a sense, it does contain information that wasn't in the original inputs - i.e information added by the weights in the neural network which itself was obtained by information extracted from an enormous amount of previous samples. Of course, the largest and best trained neural network won't be able to tell the license number given 2 pixels of information, but I am curious as to the theoretical limits of what can be achieved in extreme cases of with very little information as input and a neural network that has almost limitless resources.

anilgulecha9y ago

This is amazing. The surprise is that while the higher resolution images seem real, they are reconstructions based on the previous learning, and can be very different from the actual.

Nice test images to include would have been an original image, downsampled image, and the reconstructed image. If the author is reading this, could they add this to the README?

Sci-fi on TV is making it to the real world :)

ehsanu19y ago

Would also be interesting to see some pixel art run through this. It probably won't work that well given that its trained on real downsampled photos though, but who knows.

fezz9y ago

The examples show a comparison to the original and the scaled.

kartan9y ago

This technique is akin to hire an artists to draw a high resolution version of your pixelated photos.

A good example of this is "They Of The Tentacle Remastered" (http://dott.doublefine.com/). The new game looks extremely similar to the old one but it has been redrawn.

As some one suggested you should be able to take an old TV show, train the neural network with HD pictures of the cast. And let it redraw it in its own "artistic" interpretation of the images.

WhitneyLand9y ago

This approach can be equaled or bettered with no machine learning.

This example allows easy comparison between common techniques. Choose image 7 to see an example with a person: https://dl.dropboxusercontent.com/u/2810224/Homepage/publica...

nuclai9y ago

(Author here.) Did you see the faces example on the GitHub page? It was a domain-specific network trained adversarially for that purpose, but I have yet to see any super-resolution of that quality with or without machine learning.

Most other approaches don't even try to inject high-frequency detail into the high-resolution images because the PSNR/SSIM benchmarks drop. Until those metrics/benchmarks are dropped, there'll be little more progress in super-resolution.

tgb9y ago

While those results are nice too, they really don't seem comparable to the ones given here. The look is too different.

brianorwhatever9y ago

I ran that image through the library with the default settings and it came out with an image that is in my opinion much better than all of the approaches shown there

http://imgur.com/a/moP57

hcarvalhoalves9y ago

The example w/ the Japanese ideograms is impressive, it seems to actually make a difference on readability.

https://github.com/alexjc/neural-enhance/blob/master/docs/St...

eigengrau9y ago

How does this compare to waifu2x?

kuschku9y ago

How much RAM do I need to run this?

It dies for me with a MemoryAllocation Error after eating 28GB of RAM…

fratlas9y ago

I found I couldn't really use anything larger than 300x300

andai9y ago

Wow how much do you have?

kuschku9y ago

On this system I’ve got 32GB, of which about 2GB were used by the OS itself, and another 2GB by firefox, that’s why it stopped at around 28GB.

1 more reply

beautifulfreak9y ago

I do photo restorations on Reddit, where people often submit blurry photos that sharpening just can't fix. It would be great if this were offered as an online service.

nuclai9y ago

Yes, it sounds possible with this code — but would require training a new network. Do you have a link to some examples?

beautifulfreak9y ago

Deblurring requests turn up frequently on https://www.reddit.com/r/estoration/ and https://www.reddit.com/r/picrequests/

thenomad9y ago

This looks amazing.

Question for the more experienced deep learning folk: if I wanted to use this to upscale textures for a game, would I have to train it on the same type of texture? In other words additional wood textures when upscaling wood, brick when upscaling brick textures, and so on?

nuclai9y ago

(Author here.) If you have the luxury to train on domain-specific textures, the results will definitely be better. That's why I included all the training code in the repository as well—to allow for this kind of solution.

If you scroll down on GitHub to see the faces examples, those are achieved by a domain-specific network. I suspect you'll similarly get extremely high-quality if you have good input images.

martinkallstrom9y ago

Yes, that would help a lot with output quality. The machine can only hallucinate what it has previously seen.

manav9y ago

I've seen a number of neural network approaches for super-resolution like waifu, but I haven't seen something general purpose thats better than bicubic/fourier/nearest neighbor.

Would be nice if the author did a comparison.

nuclai9y ago

(Author here.) My biggest insight from this project is that super-resolution with neural networks benefits significantly from being domain specific. If you train on broader datasets, it does pretty well but has to make compromises. Many recent papers do a comparison in terms of pixel similarity (PSNR/SSIM), and using those metrics the quality drops because high-frequency detail is punished under those criteria (even though it may look better perceptually). Reference: http://arxiv.org/abs/1609.04802

On GitHub, below each GIF there's a demo comparison, but on the site you can also submit your own to try it out (click on title or restart button). Takes about 60s currently; running on CPU as GPUs are busy training ;-)

webmaven9y ago

> super-resolution with neural networks benefits significantly from being domain specific. If you train on broader datasets, it does pretty well but has to make compromises.

To what extent could the need for this trade-off be overcome with a larger network?

return09y ago

Train this using a huge facial database such as the one US immigration holds and you have the perfect human detector, able to identify you even from nighttime security cameras.

shash79y ago

Granted it won't be actually sharpening the images but for 99% of the use cases it would be awesome!

nuclai9y ago

(Author here.) Unlike most other non generative adversarial network (GAN) approaches to super-resolution, it does try to inject high-frequency detail; see the faces example on GitHub. But I tuned down that parameter in the released models a bit so it performed better generally.

jamesluo9y ago

Hi, what was the parameter you used for that face example, it's really impressive.

j / k navigate · click thread line to collapse

51 comments

ENGNR9y ago

We enhanced the image like on CSI and look, the defendants face!

"Because my photos were used heavily in the dataset..."

Jury: So guilty

dr_zoidberg9y ago

Defense should train a network with faces of the jury and then show how the same technique, run by their biased network, now shows each of them in the scene of the crime :)

1 more reply

kelvin09y ago

Wow, we'll finally blow the lid off all these conspiracy theories when we 'unblur' the pictures of : Sasquatch, UFOs, Loch Ness Monster ... :)

nullc9y ago

Comparison using nearest neighbor, instead of a more reasonable linear filter, or-- heaven forbid-- some edge basic directed interpolator... is a little cheaty.

dharma19y ago

nuclai9y ago

I'm working on the Docker instance now, that should help anyone with interest/experience in the field compare results easily.

antome9y ago

Really cool to see them taking a page out of X264, trying to match the "energy" of the image rather than the "correctness".

Leynos9y ago

Mithaldu9y ago

It should also be possible to train it on itself to improve moving scenes by using the motion itself as temporal super-sampling, just like the human eye does.

mm_alex9y ago

ttul9y ago

Ooh

louprado9y ago

As you suggested, continuity of appearance is what makes this problem so difficult.

nuclai9y ago

I'd be curious to see an ensemble-based super-resolution, where each model can output the confidence of a pixel region, then have another network learn to blend the result.

Keyframe9y ago

Is there anyone doing something along those lines already?

Houshalter9y ago

It should work, I don't think you need to bother with training it on individual performers. Someone made a thing like this to improve low res anime, that worked well.

In theory you could use this to increase temporal resolution as well. Turn 24 fps movies into 60 fps, and upscale regular HD to 4k.

ingenter9y ago

Here's a list of various image interpolation techniques, with similar goal:

- http://www.wisdom.weizmann.ac.il/~vision/SingleImageSR.html

- http://chiranjivi.tripod.com/EDITut.html

- http://www.tecnick.com/pagefiles/appunti/iNEDI_tesi_Nicola_A...

- http://www.eurasip.org/Proceedings/Eusipco/Eusipco2009/conte...

- http://bengal.missouri.edu/~kes25c/

http://bengal.missouri.edu/~kes25c/nnedi3.zip

http://forum.doom9.org/showthread.php?t=147695

- http://arxiv.org/pdf/1501.00092v2.pdf

http://waifu2x.udp.jp/

https://github.com/nagadomi/waifu2x

http://waifu2x-avisynth.sunnyone.org/

https://github.com/sunnyone/Waifu2xAvisynth

- http://i-programmer.info/news/192-photography-a-imaging/1010...

https://github.com/david-gpu/srez

- http://arxiv.org/pdf/1609.04802v1.pdf

andai9y ago

My question is offtopic, but how do you keep lists of urls like that? Do you just use text files? I'm struggling with too much to read

oh_sigh9y ago

The key when you have too much to read is losing links, not retaining them better.

osense9y ago

I like pinboard.in

placebo9y ago

semi-extrinsic9y ago

I've seen some videos from Cold War satellite photo analysts, and the way they can look at some tiny gray blobs and go "That's a T-64 tank, that's a T-62 tank, that's an SA2 launcher" etc.

PeterisP9y ago

placebo9y ago

> Well, it doesn't create any information that wasn't in the original data (nothing can do that, you can only lose information in processing)

anilgulecha9y ago

This is amazing. The surprise is that while the higher resolution images seem real, they are reconstructions based on the previous learning, and can be very different from the actual.

Nice test images to include would have been an original image, downsampled image, and the reconstructed image. If the author is reading this, could they add this to the README?

Sci-fi on TV is making it to the real world :)

ehsanu19y ago

Would also be interesting to see some pixel art run through this. It probably won't work that well given that its trained on real downsampled photos though, but who knows.

fezz9y ago

The examples show a comparison to the original and the scaled.

kartan9y ago

This technique is akin to hire an artists to draw a high resolution version of your pixelated photos.

A good example of this is "They Of The Tentacle Remastered" (http://dott.doublefine.com/). The new game looks extremely similar to the old one but it has been redrawn.

As some one suggested you should be able to take an old TV show, train the neural network with HD pictures of the cast. And let it redraw it in its own "artistic" interpretation of the images.

WhitneyLand9y ago

This approach can be equaled or bettered with no machine learning.

This example allows easy comparison between common techniques. Choose image 7 to see an example with a person: https://dl.dropboxusercontent.com/u/2810224/Homepage/publica...

nuclai9y ago

tgb9y ago

While those results are nice too, they really don't seem comparable to the ones given here. The look is too different.

brianorwhatever9y ago

I ran that image through the library with the default settings and it came out with an image that is in my opinion much better than all of the approaches shown there

http://imgur.com/a/moP57

hcarvalhoalves9y ago

The example w/ the Japanese ideograms is impressive, it seems to actually make a difference on readability.

https://github.com/alexjc/neural-enhance/blob/master/docs/St...

eigengrau9y ago

How does this compare to waifu2x?

kuschku9y ago

How much RAM do I need to run this?

It dies for me with a MemoryAllocation Error after eating 28GB of RAM…

fratlas9y ago

I found I couldn't really use anything larger than 300x300

andai9y ago

Wow how much do you have?

kuschku9y ago

On this system I’ve got 32GB, of which about 2GB were used by the OS itself, and another 2GB by firefox, that’s why it stopped at around 28GB.

1 more reply

beautifulfreak9y ago

I do photo restorations on Reddit, where people often submit blurry photos that sharpening just can't fix. It would be great if this were offered as an online service.

nuclai9y ago

Yes, it sounds possible with this code — but would require training a new network. Do you have a link to some examples?

beautifulfreak9y ago

Deblurring requests turn up frequently on https://www.reddit.com/r/estoration/ and https://www.reddit.com/r/picrequests/

thenomad9y ago

This looks amazing.

nuclai9y ago

If you scroll down on GitHub to see the faces examples, those are achieved by a domain-specific network. I suspect you'll similarly get extremely high-quality if you have good input images.

martinkallstrom9y ago

Yes, that would help a lot with output quality. The machine can only hallucinate what it has previously seen.

manav9y ago

I've seen a number of neural network approaches for super-resolution like waifu, but I haven't seen something general purpose thats better than bicubic/fourier/nearest neighbor.

Would be nice if the author did a comparison.

nuclai9y ago

webmaven9y ago

> super-resolution with neural networks benefits significantly from being domain specific. If you train on broader datasets, it does pretty well but has to make compromises.

To what extent could the need for this trade-off be overcome with a larger network?

return09y ago

Train this using a huge facial database such as the one US immigration holds and you have the perfect human detector, able to identify you even from nighttime security cameras.

shash79y ago

Granted it won't be actually sharpening the images but for 99% of the use cases it would be awesome!

nuclai9y ago

jamesluo9y ago

Hi, what was the parameter you used for that face example, it's really impressive.

j / k navigate · click thread line to collapse