In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.84, which is only 0.1 percent worse and 1.2x faster than the current state-of-the-art model. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art.[1]
To translate that, they built and train a RNN to design neural networks. These machine designed networks are almost equal to the best human designed network on an image-recognition benchmark, and outperform the best human-designed systems on a text understanding benchmark.
Without a relevant job position, knowing how to implement Deep Learning is a buzzword trick for Medium thought pieces or getting $$ in funding from venture capitalists for a generic "AI" startup that no one actually understands how it works.
I used to teach at a data science bootcamp where many of the students got hired by big companies.
I've also been running a deep learning startup for the last few years and have hired quite a few people.
Many of our team don't have phds but can still write backprop code for even complex modules like inception among other things. A lot of my students didn't have phds either.
A few of us (me included) are self taught. I've also coauthored the largest oreilly book on deep learning: http://shop.oreilly.com/product/0636920035343.do
1 piece of advice I would offer is building something that differentiates you from the rest. Many of these "medium thought pieces" you're talking about are actually very cool applications of deep learning. If you want to get hired for these kinds of roles, I would demonstrate you understand how to build things with deep learning. The litmus test I would also look for is "I trained a net from scratch and innovated in x way". Honestly, there's a rare amount of talent out there that can do well at software engineering as well as deep learning. I'm not convinced a phd is a hard requirement.
I get that recruiters at these larger companies definitely tend to look for the buzz words and often can't tell the difference so it's definitely harder going the traditional route.
Tech hiring also tends to be a networking thing as much as it is buzz word bingo no matter what field you're in. If you can network a bit and build something cool that demonstrates an understanding of deep learning I don't see the problem.
I am hesitant to recommend your book to a true practitioner due to the assumed knowledge presented within the math section. I think a better treatment of mathematics would assume the reader has little to no background but is intelligent enough to learn ground up the specific use cases of the mathematics for the deep learning techniques presented in the book. See: http://www.deeplearningbook.org/ for better treatment of the math review. It seems more thorough and makes less assumptions about the math background of the reader.
I would love to recommend your book to a practitioner but I'm afraid the math section (the version I reviewed) would scare them off/they would get little out of it.
Uhh..No. I have been doing this for 5 years or so. I don't have a PhD; most people we have hired don't have PhD. Some who write NIPS papers (very different from a "Medium thought piece") on their spare time. Now what we optimize for is relevant experience and the ability to not just throw a framework at something. That is highly correlated with having worked on this for a while or have strong math skills. Guess, what? Some of those who have these skills have a PhD. Some, not all.
These can either be horizontal plays or product focused. For the latter it doesn't matter as much. For full stack developers domain knowledge is usually a lot more helpful there.
For horizontal plays this can matter a bit more. I run a well funded deep learning startup and we are starting to hire full stack developers next year.
I have thought about this a bit and we would be looking for people at a minimum who have dealt with some basic machine learning before. Much of the stuff in deep learning we do is displaying some sort of output from a neural network (eg: various ways of displaying a choice a neural net makes). Being able to do things like visualizing clusters is also important (this would be d3). The other part of this would be a basic understanding of being able to communicate with a data pipeline of some kind. We are java based but I'm imagining a lot of startups would be python based in this case.
I thought it was a negative to have a PhD in SV?
Even outside of "hot" research topics, large companies and startups doing technically interesting things recruit heavily out of top PhD programs. Many companies even have different hiring processes for Ph.D. candidates, even for job positions that don't require or recommend a Ph.D., which suggests those companies evaluate Ph.D. candidates differently (and therefore view them as a different sort of asset).
[1] https://github.com/martin-gorner/tensorflow-mnist-tutorial/
I just added a few comments and constant names.
Do we know a person who draw those digits and ask "what artist had in mind when making this masterpiece" ? And even then someone might have been trying to draw the "2" but end effect looks more like "3".
I think that some of the test cases simply don't have definitive answer and trying to reach 100% accuracy is just misguided effort.