"Because my photos were used heavily in the dataset..."
Jury: So guilty
I'm working on the Docker instance now, that should help anyone with interest/experience in the field compare results easily.
I recall watching a movie that was converted from black-and-white to color as a child. There were many distracting artifacts. Most notable was the hairlines of the actors would shift as the actor rotated their head. It made the film unwatchable.
I'd be curious to see an ensemble-based super-resolution, where each model can output the confidence of a pixel region, then have another network learn to blend the result.
Conversely, these results are achieved using a single top-of-range GPU. Everything fits in memory for a batch-size 15 at 192x192. By distributing the training somehow, you could make the network 10x bigger and train for a whole week and likely get much better general purpose results.
I have a side business doing Film restoration and am not aware of any solution like that. Probably the best upscale solution there is is from Teranex, acquired by BlackMagic Design. Evertz probably also has something in their offering.
In theory you could use this to increase temporal resolution as well. Turn 24 fps movies into 60 fps, and upscale regular HD to 4k.
- http://www.wisdom.weizmann.ac.il/~vision/SingleImageSR.html
- http://chiranjivi.tripod.com/EDITut.html
- http://www.tecnick.com/pagefiles/appunti/iNEDI_tesi_Nicola_A...
- http://www.eurasip.org/Proceedings/Eusipco/Eusipco2009/conte...
- http://bengal.missouri.edu/~kes25c/
http://bengal.missouri.edu/~kes25c/nnedi3.zip
http://forum.doom9.org/showthread.php?t=147695
- http://arxiv.org/pdf/1501.00092v2.pdf
https://github.com/nagadomi/waifu2x
http://waifu2x-avisynth.sunnyone.org/
https://github.com/sunnyone/Waifu2xAvisynth
- http://i-programmer.info/news/192-photography-a-imaging/1010...
I've seen some videos from Cold War satellite photo analysts, and the way they can look at some tiny gray blobs and go "That's a T-64 tank, that's a T-62 tank, that's an SA2 launcher" etc.
However, they can make things more easily interpretable by humans. A rough analogy is turning up the contrast - given a very dark image of licence plate where the black parts are totally black (#000000) and white parts are just very dark (#010101), the characters definitely can be recognized even while human in normal conditions would just see it as totally black, and processing would help.
I'm not sure this is correct. In a sense, it does contain information that wasn't in the original inputs - i.e information added by the weights in the neural network which itself was obtained by information extracted from an enormous amount of previous samples. Of course, the largest and best trained neural network won't be able to tell the license number given 2 pixels of information, but I am curious as to the theoretical limits of what can be achieved in extreme cases of with very little information as input and a neural network that has almost limitless resources.
Nice test images to include would have been an original image, downsampled image, and the reconstructed image. If the author is reading this, could they add this to the README?
Sci-fi on TV is making it to the real world :)
A good example of this is "They Of The Tentacle Remastered" (http://dott.doublefine.com/). The new game looks extremely similar to the old one but it has been redrawn.
As some one suggested you should be able to take an old TV show, train the neural network with HD pictures of the cast. And let it redraw it in its own "artistic" interpretation of the images.
This example allows easy comparison between common techniques. Choose image 7 to see an example with a person: https://dl.dropboxusercontent.com/u/2810224/Homepage/publica...
Most other approaches don't even try to inject high-frequency detail into the high-resolution images because the PSNR/SSIM benchmarks drop. Until those metrics/benchmarks are dropped, there'll be little more progress in super-resolution.
https://github.com/alexjc/neural-enhance/blob/master/docs/St...
It dies for me with a MemoryAllocation Error after eating 28GB of RAM…
Question for the more experienced deep learning folk: if I wanted to use this to upscale textures for a game, would I have to train it on the same type of texture? In other words additional wood textures when upscaling wood, brick when upscaling brick textures, and so on?
If you scroll down on GitHub to see the faces examples, those are achieved by a domain-specific network. I suspect you'll similarly get extremely high-quality if you have good input images.
Would be nice if the author did a comparison.
On GitHub, below each GIF there's a demo comparison, but on the site you can also submit your own to try it out (click on title or restart button). Takes about 60s currently; running on CPU as GPUs are busy training ;-)
To what extent could the need for this trade-off be overcome with a larger network?