Tensorflow and Deep Learning, Without a PhD, Martin Gorner, Google [video] (opens in new tab)

(youtube.com)

218 pointsayanray9y ago33 comments

33 comments

nl9y ago

Or you could just implement this (A submission from Google Brain to ICLR 2017):

In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.84, which is only 0.1 percent worse and 1.2x faster than the current state-of-the-art model. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art.[1]

To translate that, they built and train a RNN to design neural networks. These machine designed networks are almost equal to the best human designed network on an image-recognition benchmark, and outperform the best human-designed systems on a text understanding benchmark.

[1] http://openreview.net/forum?id=r1Ue8Hcxg

minimaxir9y ago

Although TensorFlow/Keras allows anyone to implement Deep Learning easily, that doesn't mean that they will get hired for a relevant job position without a PhD. Most Deep Learning jobs, and even relatively mundane Data Scientist jobs nowadays, want a PhD from my experience. There is a surplus of Statistics/CS PhDs, why would a company hire someone without one if they do not have to?

Without a relevant job position, knowing how to implement Deep Learning is a buzzword trick for Medium thought pieces or getting $$ in funding from venture capitalists for a generic "AI" startup that no one actually understands how it works.

agibsonccc9y ago

I have seen this from multiple angles.

I used to teach at a data science bootcamp where many of the students got hired by big companies.

I've also been running a deep learning startup for the last few years and have hired quite a few people.

Many of our team don't have phds but can still write backprop code for even complex modules like inception among other things. A lot of my students didn't have phds either.

A few of us (me included) are self taught. I've also coauthored the largest oreilly book on deep learning: http://shop.oreilly.com/product/0636920035343.do

1 piece of advice I would offer is building something that differentiates you from the rest. Many of these "medium thought pieces" you're talking about are actually very cool applications of deep learning. If you want to get hired for these kinds of roles, I would demonstrate you understand how to build things with deep learning. The litmus test I would also look for is "I trained a net from scratch and innovated in x way". Honestly, there's a rare amount of talent out there that can do well at software engineering as well as deep learning. I'm not convinced a phd is a hard requirement.

I get that recruiters at these larger companies definitely tend to look for the buzz words and often can't tell the difference so it's definitely harder going the traditional route.

Tech hiring also tends to be a networking thing as much as it is buzz word bingo no matter what field you're in. If you can network a bit and build something cool that demonstrates an understanding of deep learning I don't see the problem.

imakecomments9y ago

Regarding your book, have you expanded on the math section? I saw somewhere a draft of the material and the math review seemed to be broken up into short paragraphs. These short paragraphs lacked examples and appeared to assume previous background knowledge in the subject, which seems contradictory to the book's title and aim. For example.. I believe you mentioned somewhere "The Jacobian is a m x n matrix containing the 1st order partial derivatives of vectors with respect to vectors." -- Since I have a math background I can understand what you write. But for someone with little to no math background (e.g. a software practitioner) this may throw them off.

I am hesitant to recommend your book to a true practitioner due to the assumed knowledge presented within the math section. I think a better treatment of mathematics would assume the reader has little to no background but is intelligent enough to learn ground up the specific use cases of the mathematics for the deep learning techniques presented in the book. See: http://www.deeplearningbook.org/ for better treatment of the math review. It seems more thorough and makes less assumptions about the math background of the reader.

I would love to recommend your book to a practitioner but I'm afraid the math section (the version I reviewed) would scare them off/they would get little out of it.

Jugurtha9y ago

>I believe you mentioned somewhere "The Jacobian is a m x n matrix containing the 1st order partial derivatives of vectors with respect to vectors." -- Since I have a math background I can understand what you write. But for someone with little to no math background (e.g. a software practitioner) this may throw them off.

This makes sense. However, there will always be requirements to understand any given topic. It is recursive and dangerous to assume otherwise because knowledge builds on previous knowledge. Knowledge gaps for requirements should be an exception handled by the reader, not by the author because it penalizes everyone who doesn't have that gap.

I understand the effort of authors wanting their books to be self contained and inclusive, bringing everyone up to speed, but this brings up awful college memories and students having to wait for the one person who doesn't know matrix multiplication asking a question in a class that is not about linear algebra. This person was the exception and instead of learning it on his own time, he was willing to penalize everyone.

Similarly, in the context of books, this is the reason 600 pages is the norm with the same first 400 pages "bringing everyone up to speed" (100 pages for a Python introduction, 70 pages for elementary linear algebra, etc).

The overlap is just staggering and it is safe to assume that a 600 pages book does not cost the same as a 200 pages book. In other words, everyone is paying the price for the one guy who wants to do the sexy Machine Learning/Deep Learning/Pattern Recognition, but doesn't want to bother looking up the Jacobian on his own. We're paying for the 400 pages we'll never read.

A large percentage of books caters to the beginner/neophyte knowing that being a beginner is a relatively short step for someone who has a long road ahead. There's an assumption of non-evolution/improvement, an everlasting tutorial 0. Imagine how frustrating it would be to have every item in the world being designed for crawling babies and disregarding the facts that they're on their way to be adults.

2 more replies

agibsonccc9y ago

We have appendixes covering the basics there. I actually recommend deeplearningbook.org myself for a reference.

The book is meant to contain simple examples oriented towards engineers building applications rather than deriving backprop.

The book isnt called the definitive guide for a reason ;)

1 more reply

deepnotderp9y ago

Yeah, I'm gonna piggy back on this comment. Deep Learning was really introduced to the public 4 years ago. That's not a lot of time....

agibsonccc9y ago

It's been around for quite a long time though. Neural nets themselves have seen multiple hype cycles now. See the history of CIFAR.

I would maybe rephrase this as "Machine learning really just became mainstream recently and now everyone wants in".

If you are talking about say: recruiters, they will always tend to piggy back on buzz words. They don't really learn the technology themselves. Requiring a phd and some of these other things that are being talked about is a general "data science problem".

I can't count how many candidates I've seen applying to companies that got turned down for jobs because they just went through the traditional HR funnel. Your best bet as I said earlier is just to network.

The worst parts of getting a deep learning job are the same ones that plague every tech position out there.

elliott349y ago

I think one of the only companies doing this right, and that has the resources to do this right, is facebook, as they seperate AI research and teams that are focused on putting these things into production (i.e. ML engineers vs Ml researchers). Trying to combine these two things into the same role is resulting in continued confusion and frustration. I like this Stitchfix article as an overview (http://multithreaded.stitchfix.com/blog/2016/03/16/engineers...)

elliott349y ago

And this is thanks to Yann LeCun whose vast experience at Bell Labs experience has shown him how mixing engineer/business requirements/deadlines with research produces shitty results, and so he designed it this way.

lightcycle9y ago

Getting hired isn't always the point. TensorFlow and other tools make deep learning a fairly accessible solution for ordinary developers to offer to customers. It lets me say "yes" (or "maybe") to requests that would not have been feasible previously.

drvdevd9y ago

This. For me the whole point of being self taught has always been so I could get down to work and build things :)

eshvk9y ago

> Most Deep Learning jobs, and even relatively mundane Data Scientist jobs nowadays, want a PhD from my experience.

Uhh..No. I have been doing this for 5 years or so. I don't have a PhD; most people we have hired don't have PhD. Some who write NIPS papers (very different from a "Medium thought piece") on their spare time. Now what we optimize for is relevant experience and the ability to not just throw a framework at something. That is highly correlated with having worked on this for a while or have strong math skills. Guess, what? Some of those who have these skills have a PhD. Some, not all.

FLUX-YOU9y ago

Deep learning companies will start asking for Deep Learning knowledge on positions posted for Front-end/Back-end/Full-stack developers. It may be a bonus or may be a requirement.

agibsonccc9y ago

I would caveat this a bit. There are various kinds of "deep learning companies".

These can either be horizontal plays or product focused. For the latter it doesn't matter as much. For full stack developers domain knowledge is usually a lot more helpful there.

For horizontal plays this can matter a bit more. I run a well funded deep learning startup and we are starting to hire full stack developers next year.

I have thought about this a bit and we would be looking for people at a minimum who have dealt with some basic machine learning before. Much of the stuff in deep learning we do is displaying some sort of output from a neural network (eg: various ways of displaying a choice a neural net makes). Being able to do things like visualizing clusters is also important (this would be d3). The other part of this would be a basic understanding of being able to communicate with a data pipeline of some kind. We are java based but I'm imagining a lot of startups would be python based in this case.

santaclaus9y ago

> There is a surplus of Statistics/CS PhDs, why would a company hire someone without one if they do not have to?

I thought it was a negative to have a PhD in SV?

throwaway7299y ago

I'm not sure if you're being sarcastic or not, but I don't think this is at all true at the types of companies that would be hiring deep learning experts in any significant quantity.

Even outside of "hot" research topics, large companies and startups doing technically interesting things recruit heavily out of top PhD programs. Many companies even have different hiring processes for Ph.D. candidates, even for job positions that don't require or recommend a Ph.D., which suggests those companies evaluate Ph.D. candidates differently (and therefore view them as a different sort of asset).

brilee9y ago

This video has nothing to do with the title "without a PhD". It's a walkthrough of basic techniques for training neural networks on the MNIST handwriting dataset.

paulsutter9y ago

This is a great tutorial for beginners. His sample code[1] is much better than the MNIST tutorials on the Tensorflow page, especially those visualizations.

[1] https://github.com/martin-gorner/tensorflow-mnist-tutorial/

dorianm9y ago

Here is the full code from his slides: https://gist.github.com/c4127d2d899386179dbd2e6cd013a87e

I just added a few comments and constant names.

annnnd9y ago

I am struggling to understand how TensorFlow should be used, but the syntax is very alien to me. I found Therano documentation much easier to follow. Is it just me?

zeratul9y ago

99.3% accuracy is not that great. Currently the best performer on this data set has 99.77% accuracy:

http://yann.lecun.com/exdb/mnist/

paulsutter9y ago

99.3% is pretty good for a one hour lecture that begins with a single-layer network.

danieltillett9y ago

What is the human accuracy on this data set?

mamon9y ago

more interesting question: if some of those "digits" are so hard to recognize even by humans then how can we ever label them with the one true "correct" answer?

Do we know a person who draw those digits and ask "what artist had in mind when making this masterpiece" ? And even then someone might have been trying to draw the "2" but end effect looks more like "3".

I think that some of the test cases simply don't have definitive answer and trying to reach 100% accuracy is just misguided effort.

danieltillett9y ago

Another interesting question is which approach most closely matches the errors made by humans.

gajomi9y ago

I wonder, is there a good reason why these accuracies are reported on a logscale by convention (say, as power of 2 from 50%)?

mamon9y ago

To make even small difference look bigger, cause even a 0.1% improvement over previous best result is considered big deal.

j / k navigate · click thread line to collapse

33 comments

nl9y ago

Or you could just implement this (A submission from Google Brain to ICLR 2017):

[1] http://openreview.net/forum?id=r1Ue8Hcxg

minimaxir9y ago

agibsonccc9y ago

I have seen this from multiple angles.

I used to teach at a data science bootcamp where many of the students got hired by big companies.

I've also been running a deep learning startup for the last few years and have hired quite a few people.

Many of our team don't have phds but can still write backprop code for even complex modules like inception among other things. A lot of my students didn't have phds either.

A few of us (me included) are self taught. I've also coauthored the largest oreilly book on deep learning: http://shop.oreilly.com/product/0636920035343.do

I get that recruiters at these larger companies definitely tend to look for the buzz words and often can't tell the difference so it's definitely harder going the traditional route.

imakecomments9y ago

I would love to recommend your book to a practitioner but I'm afraid the math section (the version I reviewed) would scare them off/they would get little out of it.

Jugurtha9y ago

2 more replies

agibsonccc9y ago

We have appendixes covering the basics there. I actually recommend deeplearningbook.org myself for a reference.

The book is meant to contain simple examples oriented towards engineers building applications rather than deriving backprop.

The book isnt called the definitive guide for a reason ;)

1 more reply

deepnotderp9y ago

Yeah, I'm gonna piggy back on this comment. Deep Learning was really introduced to the public 4 years ago. That's not a lot of time....

agibsonccc9y ago

It's been around for quite a long time though. Neural nets themselves have seen multiple hype cycles now. See the history of CIFAR.

I would maybe rephrase this as "Machine learning really just became mainstream recently and now everyone wants in".

The worst parts of getting a deep learning job are the same ones that plague every tech position out there.

elliott349y ago

lightcycle9y ago

drvdevd9y ago

This. For me the whole point of being self taught has always been so I could get down to work and build things :)

eshvk9y ago

> Most Deep Learning jobs, and even relatively mundane Data Scientist jobs nowadays, want a PhD from my experience.

FLUX-YOU9y ago

Deep learning companies will start asking for Deep Learning knowledge on positions posted for Front-end/Back-end/Full-stack developers. It may be a bonus or may be a requirement.

agibsonccc9y ago

I would caveat this a bit. There are various kinds of "deep learning companies".

These can either be horizontal plays or product focused. For the latter it doesn't matter as much. For full stack developers domain knowledge is usually a lot more helpful there.

For horizontal plays this can matter a bit more. I run a well funded deep learning startup and we are starting to hire full stack developers next year.

santaclaus9y ago

> There is a surplus of Statistics/CS PhDs, why would a company hire someone without one if they do not have to?

I thought it was a negative to have a PhD in SV?

throwaway7299y ago

I'm not sure if you're being sarcastic or not, but I don't think this is at all true at the types of companies that would be hiring deep learning experts in any significant quantity.

brilee9y ago

This video has nothing to do with the title "without a PhD". It's a walkthrough of basic techniques for training neural networks on the MNIST handwriting dataset.

paulsutter9y ago

This is a great tutorial for beginners. His sample code[1] is much better than the MNIST tutorials on the Tensorflow page, especially those visualizations.

[1] https://github.com/martin-gorner/tensorflow-mnist-tutorial/

dorianm9y ago

Here is the full code from his slides: https://gist.github.com/c4127d2d899386179dbd2e6cd013a87e

I just added a few comments and constant names.

annnnd9y ago

I am struggling to understand how TensorFlow should be used, but the syntax is very alien to me. I found Therano documentation much easier to follow. Is it just me?

zeratul9y ago

99.3% accuracy is not that great. Currently the best performer on this data set has 99.77% accuracy:

http://yann.lecun.com/exdb/mnist/

paulsutter9y ago

99.3% is pretty good for a one hour lecture that begins with a single-layer network.

danieltillett9y ago

What is the human accuracy on this data set?

mamon9y ago

more interesting question: if some of those "digits" are so hard to recognize even by humans then how can we ever label them with the one true "correct" answer?

I think that some of the test cases simply don't have definitive answer and trying to reach 100% accuracy is just misguided effort.

danieltillett9y ago

Another interesting question is which approach most closely matches the errors made by humans.

gajomi9y ago

I wonder, is there a good reason why these accuracies are reported on a logscale by convention (say, as power of 2 from 50%)?

mamon9y ago

To make even small difference look bigger, cause even a 0.1% improvement over previous best result is considered big deal.

j / k navigate · click thread line to collapse