This happens at the expense of detail in low-contrast areas, producing a plastic-like appearance of human skin and hair, and making low-contrast text unintelligible, which is why it's generally not done by default.
I'm sure you know exactly how much of which filter to apply for similar results. Laymen like ourselves will need a lot more trial and error. Their contribution here is to provide a push-button, automated mechanism.
I would have probably also tried something simple and given up due to the noise. So this is definitely interesting.
What you are describing is usually called automatic tone mapping. This is basically noise reduction and possibly color normalization from brightening a dark image. Them showing their black image as the starting point is silly, because jpg will make a mess of the remaining information. What they should show is the raw image brightened by a straight multiplier to show the noisy version that you would get from trying to increase brightness in a trivial way.
Huh? At 1:40 in the video that's exactly what they do.
[0]: https://raw.githubusercontent.com/cchen156/Learning-to-See-i...
For those curious, our current approach differs in some very significant ways to the author's implementation, such as performing our denoising and enhancement on a raw bayer -> raw bayer basis with a separate pipeline for tone mapping, white-balance, and HDR enhancement. As well, we explored a fair amount of different architectures for the CNN and came to the conclusion that a heavily mixed multi-resolution layering solution produces superior results.
As other commentators have pointed out, the most interesting part of it is really coming to terms that, as war1025 pointed out, "The message has an entropy limit, but the message isn't the whole dataset." It is incredibly powerful what can be accomplished with even extraordinarily noisy information as long as one has a extremely "knowledge packed" prior.
If anyone has any questions about our research in this space, please feel free to ask.
Often flash is not the look people are going for, but would be okay with the flash firing in order to improve the non-flash photo.
As a proof of concept that this task can be tackled directly, a quick search brought up "DeepFlash: Turning a Flash Selfie into a Studio Portrait"[0]
Beyond denoising, we are already running experiments with very promising results on haze, lens flare, and reflection removal; super resolution; region adaptive white balancing; single exposure HDR; and a fair bit more.
One of the other cooler things we are doing is putting together a unified SDK where our algorithms and neural nets will be able to run pretty much anywhere, on any hardware, using transparent backend switching. (e.g. CPU, GPU, TPU, NPU, DSP, other accelerator ASICs, etc..)
What would happen if you
- begin capturing video (unsure of fps) on a phone-quality sensor in a near-dark environment
- pulse the phone's flash LED(s) like you're taking a photo
- do super-resolution on the resulting video to extract a photo...
- ...while factoring in the decay in brightness/saturation in consecutive video frames produced by the flash pulse?
I vaguely recall reading somewhere that oversaturated photos have more signal in them and are easier to fix than undersaturated. Hmm.
IIRC super-resolution worked with 30fps source video for better quality; I wonder if 60fps or 120fps source video would produce better brightness decay data, or whether super-resolution could actually help extract more signal out of the decay sequence too.
On the other hand, I'm not sure if super-resolution fundamentally requires largely consistent brightness in order to work as well as it does. :/
Perhaps individual networks could be trained/tuned to specific slices/windows of the brightness gradient. I also wonder if it would be useful to factor the superresolution process into each of the brightness-specific stages or just to do it at the end.
We also have some more raw data[2] where there is the original bayer data available as .npy files with 40db analog gain applied, however I think the calibration targets show off what we are able to accomplish more dramatically. Finally, we have a short youtube video[3] that shows off how it works when applied to video.
[0] https://www.dropbox.com/s/0bm4dpxhn35vkhe/ALLIS_Investor_Int...
[1] https://www.dropbox.com/sh/k861saentyq1cs6/AADmO7X_L49nUkEI_...
[2] https://www.dropbox.com/sh/fv8omdf4fbx59m9/AABDnf6sdvv7rtIml...
[1] - https://github.com/cchen156/Learning-to-See-in-the-Dark/blob...
If you want to know what the next hot thing in software engineering will be, just pay attention to whatever Jeff Dean is doing.
IMHO, credit should always go to Alex Krizhevsky for the rapid spread of deep learning. He has shown us it was possible. Even without Tensorflow and PyTorch, we will be fine with Caffe, torch, mxnet or Julia.
Generally, you just need to subtract the right black level and pack the data in the same way of Sony/Fuji data. If using rawpy, you need to read the black level instead of using 512 in the provided code. The data range may also differ if it is not 14 bits. You need to normalize it to [0,1] for the network input."
The Sony and Fuji training code looks mostly the same - they haven't bothered to pull out common code and re-use.
But, many DNN concepts (and ML concepts themselves) can be described with a few lines of pseudocode. CNNs, RNNs, etc. can all be described in a few lines.
It's really quite amazing, most of the work goes into first creating the net work from theory, then training and tuning it until you get good results.
Machine learning is really machine-enhanced educated-guesswork, which has its place but also has its limits.
Being able to read the title on the books in the example photo is great; you could rely on the title for evidentiary purposes, the smaller text probably not so much. So for a security camera it would do poorly at identifying the color of a car, but might well be sufficient to read the license plate.
You show in a courtroom a CNN-enhanced low light image of a car and it's there, literally 'clear as day' - the jury will find it pretty compelling. But maybe the data really wasn't there in the original image, and the CNN just filled in some blanks based on previous images of license plates, letters, and just random noise it had seen in the past.
The worry is when these kinds of algorithms get built in to basic image capture processes, so you never even see the raw data, only data that has already been filtered through the inbuilt prejudices of the CNN enhancement suite.
The camera never lies, but now it doesn't have to, because it can convince itself it saw something that wasn't really there...
There is an entropy limit to the message, but the message isn't actually the only data.
One thing humans are great at is integrating existing knowledge into a messy situation and intuiting more than is available just from the raw message.
I.e. The message has an entropy limit, but the message isn't the whole dataset.
It's not trying to make things readable; it's trying to make things look like there was more light in the room when they were shot. In rooms with high lighting, some objects have glare. That's "correct"—it's what would appear in the training data.
Wait, did it? Isn't the middle photo being shown for comparison only, rather than as an input?
people bring this up all the time as hot-takes in these areas. it's conditional inference. it's no more disingenuous than linear regression.
This seems to replicate the post-processing we do in our brain (which is also a giant neural network). I wonder if the process is similar?
”The human eye can detect a luminance range of 10¹⁴, or one hundred trillion (100,000,000,000,000) (about 46.5 f-stops), from 10−6 cd/m2, or one millionth (0.000001) of a candela per square meter to 10⁸ cd/m2 or one hundred million (100,000,000) candelas per square meter. This range does not include looking at the midday sun (10⁹ cd/m2)[21] or lightning discharge.”
“ The pretrained model probably not work for data from another camera sensor. We do not have support for other camera data. It also does not work for images after camera ISP, i.e., the JPG or PNG data.”
Would be cool to see how they come up with better models that would allow them to overcome the above limitations
https://github.com/cchen156/Learning-to-See-in-the-Dark/blob...
If you take the dark image (a) from that and balance its color, the information that is present in it simply cannot contain the text from the book covers and so on. In fact, it's full of JPEG artifacts despite the image being a PNG. It would be useful if they presented a histogram equalized image of (a).
http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...?
So if you're planning to do crime, make choices where the evidence relies on spectra rather than geometry. Steal Rothkos rather than Mondrians; baggy coveralls are in, form-fitting ninja wear is out.
- Did they create a special network topology for this problem?
- Does the network need to see the entire image, or only an NxN subblock at a time?
- How did they obtain the training data? Is it possible to take daylight images and automatically turn them into nighttime images somehow?
I didn't even think this was possible. Have people ever done this manually before? Like without AI?
X27 is also using some kind of neural algorithm to denoise and get maximum out of the CIS
>Still images: ISO 100-102400 (expandable to ISO 50-409600),
[1]:https://www.sony.co.uk/electronics/interchangeable-lens-came...
So in this instance they're processing lossily on top of an image already processed lossily in-camera.
The other option is spelling it out.
Most people will read CNN as the news channel. Even those familiar with neural networks.
See also: https://en.m.wikipedia.org/wiki/Thiotimoline
The major peculiarity of the chemical is its "endochronicity": it starts dissolving before it makes contact with water.
But still, could we make an effort not to devolve into what has happened on Reddit, i.e. comment sections which mainly consist of puns and other low effort jokes?
Would make sense to add Tensorflow to make it more specific.
And it is even more "HN" to comment on details of the title or the article because you don't really know what to say about the article.
Look at my comment.
https://www.thedrive.com/the-war-zone/25803/this-is-what-col...