Deep Learning From The Bottom Up (opens in new tab)

(metacademy.org)

139 pointszercool12y ago17 comments

17 comments

In addition to issues raised by other commenters, one of the problems with deep learning (deep nets in general) is that they can be very hard to train. If you're interested in some techniques people have been using, I highly suggested you read up on optimization methods such as conjugate gradient and hessian-free optimization. I did this recently [0] and have a brief write-up, but honestly the original Martens paper may be more understandable [1].

[0] http://andrew.gibiansky.com/blog/machine-learning/hessian-fr...

[1] http://machinelearning.wustl.edu/mlpapers/paper_files/icml20...

cjrd12y ago

Hi, I'm one of the creators of Metacademy. I hope you find it useful. Feel free to follow our new Twitter account if you'd like low volume updates:

https://twitter.com/meta_learning

Also, you can register an account for an occasional email.

PS) We're completely free and open source: https://github.com/metacademy/metacademy-application

nrmn12y ago

For anyone actually interested in implementing DNN's I wrote up a quick blog post (essentially a brain dump) of general guidelines to adhere to when training DNNs. The source for this information is primarily from videos given by Geoffrey Hinton as well as various papers.

http://343hz.com/general-guidelines-for-deep-neural-networks...

jarvic12y ago

I just skimmed the post as I don't have time to fully read it right now, but I'll point out a couple of problems that you can run into with neural nets and associated approaches.

One issue that can be a back breaker depending on your application is that, to produce a generalizable model, nets tend to need much more training data than the alternatives. There are ways to work around this, though.

The bigger problem to me is interpretability. Deep learning often gives feature sets that are very good for whatever task you are working on, they are in some senses artificial and it is difficult to relate changes in features to changes in the input data. I work with a lot of biological and medical data, and this is an issue because for some applications it is important not to just get accurate classification results, but to be able to understand what your features mean in the context of the original problem. I saw some interesting work in a computer vision paper earlier this year on trying to learn how to visualize how changes in input and outputs of a neural net were related, I'll try to dig that up later if anyone is interested.

I'm not sure how coherent that was as I was trying to get this typed out in a hurry.

tinkerdol12y ago

Sure, please post the link to the paper, it sounds interesting.

agibsonccc12y ago

To address some of the comments being presented here, neural nets despite being harder to train can be debugged visually.

A few tips for those of you who use neural nets:

Debug the weights with histograms. Track the gradient and make sure the magnitude is not too large and its normally distributed.

Keep track of your gradient changes when using either gradient descent or conjugate gradient.

Plot your filters, visualize what each neuron is learning.

Watch the rate of change of your cost function. If it seems like its changing too fast and stops early lower your learning rate.

Plot your activations: if they start out grey you're fine. If you start all black, you need to retune some of your parameters.

Lastly, understand the algorithm you're using. Convolutional nets are different from recursive neural tensor are different denoising autoencoders are different from RBMS/DBNs.

Pay attention to your cost function, reconstruction entropy is used differently from negative log likelihood is used differently for different objectives.

If you are trying to do feature learning, you are using RBMs, Denoising AutoEncoders and you will use reconstruction entropy. This is what you use for feature detectors. You may end up using negative log likelihood if you are dealing with continuous data.

For RBMs, pay attention to the different kinds of units[1]. Hinton recommends Gaussian visible with recitifed linear for continuous data, binary binary otherwise.

For denoising autoencoders, watch your corruption level. A higher one helps generalize better, especially with less data.

For time series or sequential data, you can use a recurrent net,moving window with DBNs, or recursive neural tensor

Other knobs:

If your deep learning framework doesn't have adagrad find one that does.

Dropout: crucial. Dropout is used in combination with mini batch learning to handle learning different "poses" of images as well as generalizing feature learning. This can be used in combination with sampling with replacement to minimize sampling error.

Regularization: L2 is typically used. Hinton once said: you want a neural net that always overfits but is regularized (youtube video...don't remember link right now).

Would love to answer questions! Source: I work on/teach this stuff. Still working my way up there, but it seems to be going well so far.[2][3]

Lastly, tweak one knob at a time. Neural nets have a lot going on. You don't want a situation where you A/B tested 10 different parameters at once and you don't know which one worked or why.

[1]: http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf

[2]: http://deeplearning4j.org/

[3]: http://zipfianacademy.com/

[4]: http://arxiv.org/abs/1206.5533 http://deeplearning4j.org/ http://deeplearning4j.org/debug.html http://yosinski.com/media/papers/Yosinski2012VisuallyDebuggi...

cjrd12y ago

Nice to see you HN, Adam =)

We just opened up the roadmap for contributions (click the "view source" with a logged in account). Feel free to add any of these notes where you think they'd fit in nicely -- don't worry about messing anything up, we have version control for a reason. Also, please email me if you run into any problems/confusion.

agibsonccc12y ago

Will do! Like we discussed before, great initiative!

p1esk12y ago

What do you think about Google Convnet platform? I got it running, and played with supplied configurations, however it seems that it hasn't been updated in a while, for example, there's no dropout.

Then there's also Caffe CNN from Berkeley Vision group. Not sure what are the differences between the two.

Which one would you recommend as a learning tool, speed-wise, and as a possible starting point for customization?

tshadwell12y ago

I'm pretty familiar with neural networks, and skimming that article it appears to describe something that is a neural network. Is 'Deep Learning' new terminology for 'Neural Network', or does it describe a subset of ways of using them?

makeset12y ago

Deep learning models are neural networks, but their recent popularization is due to a new method of building them incrementally, adding generative hidden layers trained as autoencoders to extract representative features, until the final discriminative layer. The resulting model is still a neural network, which can be finetuned by gradient methods, though conventionally training the same model from scratch with a random initialization would not have worked.

tshadwell12y ago

Ah, I see. Thanks.

zwieback12y ago

I have the same question - seems like the only difference is scale.

sytelus12y ago

THANK YOU for this link. Meta Academy is amazing! I always wanted a tool like this which tells me graph of concepts I need to learn first before I can learn X. I wish we had this kind of learning plan graph for other fields as well.

elliptic12y ago

Anyone know of good papers relating to deep learning that are not from image classification or speech/text recognition?

ninjin12y ago

"Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" by Socher et al. (2013) is a personal favourite of mine.

http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf

"Linguistic Regularities in Continuous Space Word Representations" by Mikolov et al. (2013) is also a nice read.

http://research.microsoft.com/pubs/189726/rvecs.pdf

Estragon12y ago

"Playing Atari with Deep Reinforcement Learning"

http://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

j / k navigate · click thread line to collapse

17 comments

PieSquared12y ago

[0] http://andrew.gibiansky.com/blog/machine-learning/hessian-fr...

[1] http://machinelearning.wustl.edu/mlpapers/paper_files/icml20...

cjrd12y ago

Hi, I'm one of the creators of Metacademy. I hope you find it useful. Feel free to follow our new Twitter account if you'd like low volume updates:

https://twitter.com/meta_learning

Also, you can register an account for an occasional email.

PS) We're completely free and open source: https://github.com/metacademy/metacademy-application

nrmn12y ago

http://343hz.com/general-guidelines-for-deep-neural-networks...

jarvic12y ago

I just skimmed the post as I don't have time to fully read it right now, but I'll point out a couple of problems that you can run into with neural nets and associated approaches.

I'm not sure how coherent that was as I was trying to get this typed out in a hurry.

tinkerdol12y ago

Sure, please post the link to the paper, it sounds interesting.

agibsonccc12y ago

To address some of the comments being presented here, neural nets despite being harder to train can be debugged visually.

A few tips for those of you who use neural nets:

Debug the weights with histograms. Track the gradient and make sure the magnitude is not too large and its normally distributed.

Keep track of your gradient changes when using either gradient descent or conjugate gradient.

Plot your filters, visualize what each neuron is learning.

Watch the rate of change of your cost function. If it seems like its changing too fast and stops early lower your learning rate.

Plot your activations: if they start out grey you're fine. If you start all black, you need to retune some of your parameters.

Lastly, understand the algorithm you're using. Convolutional nets are different from recursive neural tensor are different denoising autoencoders are different from RBMS/DBNs.

Pay attention to your cost function, reconstruction entropy is used differently from negative log likelihood is used differently for different objectives.

For RBMs, pay attention to the different kinds of units[1]. Hinton recommends Gaussian visible with recitifed linear for continuous data, binary binary otherwise.

For denoising autoencoders, watch your corruption level. A higher one helps generalize better, especially with less data.

For time series or sequential data, you can use a recurrent net,moving window with DBNs, or recursive neural tensor

Other knobs:

If your deep learning framework doesn't have adagrad find one that does.

Regularization: L2 is typically used. Hinton once said: you want a neural net that always overfits but is regularized (youtube video...don't remember link right now).

Would love to answer questions! Source: I work on/teach this stuff. Still working my way up there, but it seems to be going well so far.[2][3]

Lastly, tweak one knob at a time. Neural nets have a lot going on. You don't want a situation where you A/B tested 10 different parameters at once and you don't know which one worked or why.

[1]: http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf

[2]: http://deeplearning4j.org/

[3]: http://zipfianacademy.com/

[4]: http://arxiv.org/abs/1206.5533 http://deeplearning4j.org/ http://deeplearning4j.org/debug.html http://yosinski.com/media/papers/Yosinski2012VisuallyDebuggi...

cjrd12y ago

Nice to see you HN, Adam =)

agibsonccc12y ago

Will do! Like we discussed before, great initiative!

p1esk12y ago

What do you think about Google Convnet platform? I got it running, and played with supplied configurations, however it seems that it hasn't been updated in a while, for example, there's no dropout.

Then there's also Caffe CNN from Berkeley Vision group. Not sure what are the differences between the two.

Which one would you recommend as a learning tool, speed-wise, and as a possible starting point for customization?

tshadwell12y ago

makeset12y ago

tshadwell12y ago

Ah, I see. Thanks.

zwieback12y ago

I have the same question - seems like the only difference is scale.

sytelus12y ago

elliptic12y ago

Anyone know of good papers relating to deep learning that are not from image classification or speech/text recognition?

ninjin12y ago

"Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" by Socher et al. (2013) is a personal favourite of mine.

http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf

"Linguistic Regularities in Continuous Space Word Representations" by Mikolov et al. (2013) is also a nice read.

http://research.microsoft.com/pubs/189726/rvecs.pdf

Estragon12y ago

"Playing Atari with Deep Reinforcement Learning"

http://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

j / k navigate · click thread line to collapse