[0] http://andrew.gibiansky.com/blog/machine-learning/hessian-fr...
[1] http://machinelearning.wustl.edu/mlpapers/paper_files/icml20...
https://twitter.com/meta_learning
Also, you can register an account for an occasional email.
PS) We're completely free and open source: https://github.com/metacademy/metacademy-application
http://343hz.com/general-guidelines-for-deep-neural-networks...
One issue that can be a back breaker depending on your application is that, to produce a generalizable model, nets tend to need much more training data than the alternatives. There are ways to work around this, though.
The bigger problem to me is interpretability. Deep learning often gives feature sets that are very good for whatever task you are working on, they are in some senses artificial and it is difficult to relate changes in features to changes in the input data. I work with a lot of biological and medical data, and this is an issue because for some applications it is important not to just get accurate classification results, but to be able to understand what your features mean in the context of the original problem. I saw some interesting work in a computer vision paper earlier this year on trying to learn how to visualize how changes in input and outputs of a neural net were related, I'll try to dig that up later if anyone is interested.
I'm not sure how coherent that was as I was trying to get this typed out in a hurry.
A few tips for those of you who use neural nets:
Debug the weights with histograms. Track the gradient and make sure the magnitude is not too large and its normally distributed.
Keep track of your gradient changes when using either gradient descent or conjugate gradient.
Plot your filters, visualize what each neuron is learning.
Watch the rate of change of your cost function. If it seems like its changing too fast and stops early lower your learning rate.
Plot your activations: if they start out grey you're fine. If you start all black, you need to retune some of your parameters.
Lastly, understand the algorithm you're using. Convolutional nets are different from recursive neural tensor are different denoising autoencoders are different from RBMS/DBNs.
Pay attention to your cost function, reconstruction entropy is used differently from negative log likelihood is used differently for different objectives.
If you are trying to do feature learning, you are using RBMs, Denoising AutoEncoders and you will use reconstruction entropy. This is what you use for feature detectors. You may end up using negative log likelihood if you are dealing with continuous data.
For RBMs, pay attention to the different kinds of units[1]. Hinton recommends Gaussian visible with recitifed linear for continuous data, binary binary otherwise.
For denoising autoencoders, watch your corruption level. A higher one helps generalize better, especially with less data.
For time series or sequential data, you can use a recurrent net,moving window with DBNs, or recursive neural tensor
Other knobs:
If your deep learning framework doesn't have adagrad find one that does.
Dropout: crucial. Dropout is used in combination with mini batch learning to handle learning different "poses" of images as well as generalizing feature learning. This can be used in combination with sampling with replacement to minimize sampling error.
Regularization: L2 is typically used. Hinton once said: you want a neural net that always overfits but is regularized (youtube video...don't remember link right now).
Would love to answer questions! Source: I work on/teach this stuff. Still working my way up there, but it seems to be going well so far.[2][3]
Lastly, tweak one knob at a time. Neural nets have a lot going on. You don't want a situation where you A/B tested 10 different parameters at once and you don't know which one worked or why.
[1]: http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf
[2]: http://deeplearning4j.org/
[3]: http://zipfianacademy.com/
[4]: http://arxiv.org/abs/1206.5533 http://deeplearning4j.org/ http://deeplearning4j.org/debug.html http://yosinski.com/media/papers/Yosinski2012VisuallyDebuggi...
We just opened up the roadmap for contributions (click the "view source" with a logged in account). Feel free to add any of these notes where you think they'd fit in nicely -- don't worry about messing anything up, we have version control for a reason. Also, please email me if you run into any problems/confusion.
Then there's also Caffe CNN from Berkeley Vision group. Not sure what are the differences between the two.
Which one would you recommend as a learning tool, speed-wise, and as a possible starting point for customization?
http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf
"Linguistic Regularities in Continuous Space Word Representations" by Mikolov et al. (2013) is also a nice read.