2) There is way more chance to get an increment on performance depending of the choose of the features being used, and that seems to be the case here.
However: The original title, "Neural Networks officially best at object recognition", is much more appropriate than the current title, because it is by far the hardest vision contest. It is nearly two orders of magnituder larger and harder than other contests, which is why the winner of this contest is best at object recognition. The original title is much more accurate and should be restored.
Second, the gap between the first and the second entry is so obviously huge (25% error vs 15% error), that it cannot be bridged with simple "feature engineering". Neural networks win precisely because they look at the data, and choose the best possible features. The best human feature engineers could not come close to a relentless data-hungry algorithm.
Third, there was mention of the no-free lunch theorem and of how one cannot tell which methods are better. That theorem says that learning is impossible on data that has no structure, which is true but irrelevant. What's relevant that on the "specific" problem of object recognition as represented by this 1-million large dataset, neural networks are the best method.
Finally, if somebody makes SVMs deep, they will become more like neural networks and do better. Which is the point.
This is the beginning of the neural networks revolution in computer vision.
However, my point was that most of the algorithms used on that link (ANN, SVM, etc) had similar expressive power (VC dimension) and had been proved to have similar performance between them in object recognition.
People normally take advantage on their specific properties rather than paying too much attention how well the algorithm would perform (since either SVM and ANN are expected to perform reasonably well). I still maintain my opinion that any difference in classification performance is more likely to be related to how the team managed the data instead of the chosen algorithm.
Deep convolutional learning is the difference here and indeed seems to be an interesting architecture which the current state of the art only support ANN. But that doesn't mean that somebody wouldn't come up with a strategy for deep learning on SVM or another classification technique in the future.
When you average an learning algorithms performance over a whole bunch of domains that _NATURE WILL NEVER GENERATE_, all algorithms are equally bad.
Paying attention to the theorem is mostly defeatist and counter-productive.
Imagine some ads serving company improves their learning algorithms 10% and is making 100s of millions more dollars. Are you going to say, well, there are billions of other possible universes in which they'd be losing money, they just got lucky that we don't live in those universes?
I will be more agree with a title like "Deep Convolutional learning overperformed traditional techniques in Object Recognition"
http://www.youtube.com/watch?v=DleXA5ADG78&feature=plcp
And an older talk that covers some of what a deep convolutional net is:
So far I've watched the first lecture and it seems like it'll be exactly the course I've been wanting: starting with the basics of machine learning but quickly diving into the state of the art for neural nets.
Check out : "Unbiased Look at Dataset Bias", A. Torralba, A. Efros,CVPR 2011.
Task 1:
1st 0.15315 (convolutional neural net)
2nd 0.26172
3rd 0.26979
4th 0.27058
5th 0.29576
[...]
Differences: 0.10857
0.00807
0.00079
0.02518
As you can see the first is way ahead of the rest. The difference between the 1st and 2nd is ~11%, between the second and third ~1%.Task 2:
1st 0.335463 (convolutional neural net)
2nd 0.500342
3rd 0.536474
Idem dito.But the most exciting thing is that the results were obtained with a relatively general purpose learning algorithm. No extraction of SIFT features, no "hough circle transform to find eyes and noses".
The points of the paper you cite are important concerns, but this result is still very exciting.
This deserves even more emphasis. All of the other teams were writing tons of domain specific code to implement fancy feature detectors that are the results of years of in-depth research and the subject of many PhDs. The machine learning only comes into play after the manually-coded feature detectors have preprocessed the data.
Meanwhile, the SuperVision team fed raw RGB pixel data directly into their machine learning system and got a much better result.
Not to take away from the accomplishment of the SuperVision team, but claim in the title seems somewhat sensationalist. Is this competition like the world cup of object recognition or something?
"There is now clearly an objective answer to which inductive algorithm to use"