If you're also interested in real time OCR like this, I did a write up [2] of the approach that worked well for my project. It only needed to recognize Scrabble fonts, but it could be extended to more fonts by using more training examples.
Can't get [1]https://www.dcs.shef.ac.uk/intranet/teaching/campus/projects...
More generally, I really like the idea of generating controlled synthetic images and then messing them up for regularization.
I've not tried it on anything else, but I remember thinking that it has a lot of potential uses. Also I only used it on gray-scale features, but I'm sure it could make use of full RGB too. I'll have to try it some time!
Neural networks and deep learning are truly awesome technologies.
A neural net is a graph, in which a subset of nodes are "inputs" (that's where the net gets information), some are outputs, and there are other nodes which are called "hidden neurons".
The nodes are interconnected between each other in a fashion, which is called the "topology" or sometimes "architecture" of the net. For example I-H-O is a tipical feed forward net, in which I (inputs) is the input layer, H is the hidden layer and O the output layer. All the hidden neurons connect with all the input neurons "output", and all the output neurons connect to the hidden neurons "output". The connections are called "weights", and the training adjusts the weights of all the neuron with lots of cases until the desired output is achieved. There are also algorithms and criteria to stop before the net "learns too much" and looses the ability to generalize (this is called overfitting). In particular, a net with one hidden layer and one output layer is a universal function estimator -- that is, an estimator that can model any mathematical function of the form f(x1, x2, x3, ..., xn) = y.
Deep learning means you're using a feedforward net with lots of hidden layers (I think it's usually between 5 to 15 now), which apply convolution operators (hence the "convolutional" in the name), and lots of neurons (in the order of thousands). All this was nearly impossible until the GPGPUs came along, because of the time it took to train a modest network (minutes to hours for a net with a between 50 to 150 neurons in one hidden layer).
This is a very shortened explanation -- if you want to read more I recommend this link[1] which gives some simple Python code to illustrate and implement the innards of a basic neural network and you can learn from the inside. Once you get that you should move to more mature implementations, like Theano or Torch to get the full potential of neutral nets without worrying about implementation.
[1] http://iamtrask.github.io/2015/07/12/basic-python-network/
Oh humbug! The black magic comes from the vast resources Google drew to obtain perfect training datasets. Each step in the process took years to tune, demonstrating that data is indeed for those who dont have enough priors.
> [...] the "black magic" part comes mostly from their mathematical nature and very little from them being "inteligent computers". A brain is a graph, in which a subset of neurons are "inputs", some are outputs, and others are "hidden". The nodes are interconnected between each other in a fashion, which is called the "topology" or sometimes "architecture" of the net.
The deep question about deep learning is "Why is it so bloody effective?"
Convolutional networks are only one kind of deep learning. In particular, they generally apply only to image processing.
Two remarks. First, these guys probably don't know very well why what they are doing works so well ;) It requires a lot of trial and error, and a lot of patience and a lot of compute power (the latter being the reason why we are seeing breakthroughs only now).
Second, training a neural net requires different computing power from deploying the net. The neural network that is installed on your phone has been trained using a lot of time and/or a very large cluster. Your phone is merely "running" the network, and this requires much less compute power.
I took a few screen shots. Aligning the phone, focus, light, shadows on the small menu font was difficult. You must keep steady. Sadly, I ended up hitting the volume control on this best example. Tasty cockroaches! Ha! http://imgur.com/j9iRaY0
It worked ok (much better than nothing.) However there were a number of times when there were very large issues in the translations that created some pretty big misunderstandings. Luckily we had a friend who had fluent English and Portuguese who would translate when things got to confused.
To reduce errors, you do need to be really careful to use short, complete sentences with simple and correct grammar. It's also better to use and that contain words that aren't ambiguous. (Those two sentences would probably not translate well.)
e.g. Please write simple words, short phrases and simple phrases. Please write words with just one meaning. Those phrases and words are easier to translate.
Those words are very rare and tend to only be useful in very technical contexts.
It seems it can't really handle context, so 'cockroaches' may have been a mistranslation of 'cheap' in some contexts, as the 'it had stopped chestnut' may have simply been 'brazil nuts'
See here for what they did for offline speech recognition: http://static.googleusercontent.com/media/research.google.co...
http://static.googleusercontent.com/media/research.google.co...
WordLens was awesome for translating fragments of foreign languages - stuff like signs, menus and so on. But its offline translation seemed to be little more than a word->word translation, so there is a huge scope for improvement there. Very difficult when working offline!
Now if the features in the domain problem is more well defined, like credit ratings, and data is sparse, and domain expertise is available, decision trees are perfectly valid options.
Just wanted to add: and word/character/phrase embeddings.
https://www.youtube.com/watch?v=0zKU7jDA2nc&index=1&list=PLe...
[1] https://github.com/mateogianolio/mlp-character-recognition
[2] https://github.com/mateogianolio/mlp-character-recognition/b...
Was there any advantage of using a deep learning model instead of something more computationally simple?
On my desktop, the english dictionary is ~1 megabyte uncompressed, and compresses to ~250k with gzip. The download for Google Translate is somewhere around 30 megabytes.
The new fad for using the 'deep' learning buzzword annoys me though. It seems so meaningless. What makes one kind of neural net 'deep' and are all the other ones suddenly 'shallow' ?
If this is a serious question, then googling "what is a deep neural network" would take you to any number of explanations. But to summarize very briefly, it's not a buzzword; it's a technical term referring to a network with multiple nonlinear layers that are chained together in sequence. Deep networks have been talked about for as long as neural networks have been a research subject, but it's only in the last few years that the mathematical techniques and computational power have been available to do really interesting things with them.
The "fad" (as you call it) is not mainly because the word "deep" sounds cool, but because companies like Google have been seeing breakthrough results that are being used in production as we speak. For example:
http://papers.nips.cc/paper/4687-large-scale-distributed-dee...
http://static.googleusercontent.com/media/research.google.co...
http://static.googleusercontent.com/media/research.google.co...
Number of layers
It's that simple
Eventually, people realized that you could improve performance by adding more hidden layers. So while theoretically Cybenko was correct, practically stacking a bunch of hidden layers made more sense. These network architectures with stacks of hidden layers were then labelled as "deep" neural networks.
[1] https://en.wikipedia.org/wiki/Universal_approximation_theore...
> To achieve real-time, we also heavily optimized and hand-tuned the math operations. That meant using the mobile processor’s SIMD instructions and tuning things like matrix multiplies to fit processing into all levels of cache memory.
Let's see how this turns out to be. I'm still skeptical if other apps might crash because of this.