Obviously, the question then becomes: what happens when you have visual situations that violate or come close to violating the assumptions made?
I'm not familiar enough with the specifics of RCNs to be able to answer this; maybe someone else can. Very interesting paper and approach regardless.
I haven't read it but skimming, I could see that there definitely were no formulas in it at all . Which sort of says, at best what it tells you is "we did this thing, which is kind of like X and kind of like Y with Z changes". Essentially, no way to reproduce or understand by itself. The first reference then had a link behind a paywall...
So despite lots of apparent explanation, it seems like what they're actually doing is essentially unspecified (at least to the interested layman). It seems like at best an expert in the field of "compositional models" could say what is happening.
Also, the paper is published under the heading of an AI firm Fremont, ca rather than folks in a university, with the many authors listed by initial and last name...
PDF for the curious:
http://science.sciencemag.org/content/sci/early/2017/10/26/s...
Edit: tracked down that apparently has some "real" math. Whether is even what the OP is doing remains to be seen.
https://staff.fnwi.uva.nl/t.e.j.mensink/zsl2016/zslpubs/lake...
Reference code: https://github.com/vicariousinc/science_rcn
It is true for sure that absolute performance on MNIST isn't the most interesting thing in the world.
But when introducing a new tool or technique being able to show competitive performance on MNIST is a good way to show that it isn't an entirely useless thing.
I'd note that recent Sabour, Frosst and Hinton paper[1] (where they finally got Hinton's capsules to work) spends most of the paper analyzing how it performs on MNIST, and only a short section on other datasets.
I assume I don't need to point out that Geoff Hinton does know a little about deep learning, and if he thinks submitting a NIPS paper on MNIST is acceptable in 2017 then I'm not going to argue too hard against it.
So yes, submitting experiments on MNIST in 2017 should not be taken seriously.
does your network solve/recognise those?
The title of the paper is: A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs
The title of the article is: Common Sense, Cortex, and CAPTCHA
That's nowhere near the sensationalist title at HN: RCN is much more data efficient than traditional Deep Neural Networks
Learning from few examples and generalizing to dramatically different situations are capabilities of human visual intelligence that are yet to be matched by leading machine learning models. By drawing inspiration from systems neuroscience, we introduce a probabilistic generative model for vision in which message-passing based inference handles recognition, segmentation and reasoning in a unified way. The model demonstrates excellent generalization and occlusion-reasoning capabilities, and outperforms deep neural networks on a challenging scene text recognition benchmark while being 300-fold more data efficient. In addition, the model fundamentally breaks the defense of modern text-based CAPTCHAs by generatively segmenting characters without CAPTCHA-specific heuristics. Our model emphasizes aspects like data efficiency and compositionality that may be important in the path toward general artificial intelligence.
Unclear how to run on the CAPTCHA examples referenced in the paper, even though they did make the datasets for those examples available.
Bummer, a big part of what the paper mentions about being so great with this RCN model is being able to segment sequences of characters (of indeterminate length even!). Sad that I cannot easily verify this for myself!
body { text-rendering: optimizeLegibility; }
Ok
The header has the awful "ObjektivMk1-Thin" font mentioned elsewhere, but for me the body is a normal "Roboto","Helvetica Neue",Helvetica,Arial,sans-serif font-family.
To date, my experience with "deep PGM models" (for lack of a better term) is limited to some tinkering with (a) variational autoencoders using ELBO maximization as the training objective, and to a much lesser extent (b) "bi-directional" GANs using a Jensen-Shannon divergence between two joint distributions as the training loss.
Has anyone here with a similar background to mine had a chance to read this paper? Any thoughts?
I am curious how RCN performs on real-life images like ImageNet, and how do they perform against adversarial examples. If they can easily recognize adversarial examples, that would be very interesting...
66% with reCaptcha and up to 90% when optimised is much higher than what I can achieve with my actual brain. Maybe I should consider using a neural network to answer those, it happens quite frequently that I need 2-3 rounds to get through reCaptcha.
ps: thanks god for Reader mode on Safari
(mentioned by boltzmannbrain in one of the other comments)
> Use of appearance during the forward pass: Surface appearance is now only used after the backward pass. This means that appearance information (including textures) is not being used during the forward pass to improve detection (whereas CNNs do). Propagating appearance bottom-up is a requisite for high performance on appearance-rich images.
I presume from this that in the current form RCN requires much more computations than CNN per detection, but I could be wrong.
What I don't quite understand is why Deep Belief Nets seem to not be getting press these days. For example, see this paper from 2010: http://proceedings.mlr.press/v9/salakhutdinov10a.html.
https://gizmodo.com/a-new-ai-system-passed-a-visual-turing-t... / http://web.mit.edu/cocosci/Papers/Science-2015-Lake-1332-8.p...