Deep Learning Is Easy – Learn Something Harder (opens in new tab)

(inference.vc)

67 pointshlynurd2y ago23 comments

23 comments

IMO this 2016 article really hasn't aged well. It turned out that architectural improvements really did matter, and there was still loads of low-hanging fruit. His prediction that bayesian approaches (which were the topic of his PhD) would turn out to be fundamental, has not turned out to be true so far (although they do have their place).

(I think in general when people say that their special area of study is particularly important, it should be taken with a grain of salt!)

inconfident20212y ago

> It turned out that architectural improvements really did matter

Transformer (2017). NET based Diffusion (2015). Score based diffusion (2019). DDPM (2020). Uses unet (2015). Clip (~2021) uses Resnet (2015) + ViT (2020). stable diffusion also uses Unet.

Yes, deep learning is easy. Throw compute, you get the answer.

Those with insights on the old problem are now churning papers because they throw compute and deep learning to the stuffs they understood well.

You can look into any paper and see inspiration from old methods. Applying deep learning is hard, deep learning itself is quite easy. If it were really hard, it wouldn't have been popular at all! Nowadays most of the people don't even bother with architectural gains.

Sure if better architecture eventually comes up, people will throw new ones into the compute. I don't believe in bayesian stuff either. However it is worth learning otherwise you might miss insights on a lot of papers!

Whatever you do, just having the idea to understand problem matters a lot. Then later comes the part of deep learning and throwing compute.

dartos2y ago

I’m sure there’s an argument to be made that all architectural improvements to basic feed forward neural nets are essentially optimizations for the amount of compute provided anyway.

If we had unlimited compute and time for training, I don’t think we would’ve really moved on from dense feed forward nets.

light_hue_12y ago

This is terrible and frankly self serving advice.

"Don't build on top of deep learning. Build on top of MCMC-like methods"

I used to do research into such methods. That game is over. It's a massive waste of time at the moment. The whole idea is how do we import what was good about those methods into the modern deep learning toolkit. How do I sample from distribution with dl? How do I get uncertainty estimates? How do I compose models, get disentangled representations, get few shot learning, etc.

The idea that people should go back to tools like MCMC today is pretty absurd. That entire research program was a failure and never scaled to anything. I say this of my many dozens of papers in the area too.

I would never give this advice to my PhD students.

Maybe in a decade or two someone will rescue MCMC like methods. In the meantime your PhD students will suffer by being irrelevant and having skills that no one needs.

mrcode0072y ago

You dismissed most of the modern computational finance and a lot of cross valuation adjustments mechanisms that are behind the modern financial system and risk estimates.

Edit: not to mention modern statistical mechanics…

tnecniv2y ago

Which then cycles back to ML methods. Diffusion models are non-equilibrium stat mech.

Sure building a better HMC algorithm might not be a good use of time, but the spirit of MCMC is alive

theGnuMe2y ago

Yep. Capture the distribution you want to sample from by training a DNN on the data and then sample it via generative/diffusion. Basically the DNN is a learnt parametrized distribution and replaces the Markov chain. The adversarial training approach seems to me like a forward backward algorithm over the dataset anyway.

1 more reply

dbmikus2y ago

My general belief, is that the best way to learn at the frontier of something is to pick a problem or a goal and try to solve it. Then you will learn what is in the way of getting that done.

Unless you already have deep expertise, I think it's a bad idea to pick a research area and just go and research that. You won't have intuition about why it's a good thing to research. However, you can have intuition about real world problems and the solutions you want to see, and then work backwards to what you need to research.

mdgrech232y ago

Sometimes when I drive to solve a really hard problem rather than solving directly I come to conclusion it's better to do it a different way where I don't run into the hard problem.

j7ake2y ago

The art of a scientist is knowing which problems to pick.

Too easy and you won’t find anything new, too hard and you won’t make any progress.

felipefar2y ago

What's the difference between a "problem" and a "research area"?

Research areas are made of problems to solve.

dbmikus2y ago

Agree, distinction is fuzzy. I meant, instead of picking a research area because it seems cool/interesting/etc, either:

  1. pick a product or user problem and try to solve that problem, then work back into the research you need to do so
  2. pick a research problem and try to solve that (again, focusing on the problem, not just an area of study)

The blog post seemed to be more for people doing applied things, so I was speaking to those doing option 1.

mathisfun1232y ago

>focusing on the problem, not just an area of study

how exactly do you plan on picking a problem that is either unsolved and/or not uninteresting without "picking a research area"? solving an already solved problem is a waste of your time (reviewer #2 will just point out the relevant missing citation). solving an uninteresting problem is a waste of everyone's time: it will take someone some time to figure out what you've actually done and that it's useless.

this is why generic platitudes like this are worse than useless - you're giving the impression/appearance of high wisdom that's sure to lead some naive kid astray; for example, the bulk of a phd is not actually solving some problem but finding the right problem to solve (interesting and unsolved).

1 more reply

blind6662y ago

It is interesting to look back and evaluate the preferences / intuitions prominent researchers had in the field (most of whom started their careers experimenting with MNIST-scale of data, at best)

With access to unfathomable amounts of data, especially over the last couple of years, the game changed entirely and is not seeming to cool down anytime soon.

The field, certainly, values engineering a lot more than it used to, and it is exciting to see how major advances together with open-source contributions are going to take us

Ologn2y ago

> It is interesting to look back and evaluate the preferences / intuitions prominent researchers had in the field (most of whom started their careers experimenting with MNIST-scale of data, at best)

Look back to how HN greeted the victory of the SuperVision team (Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton) in the 2012 ImageNet Large Scale Visual Recognition Challenge - https://news.ycombinator.com/item?id=4611830

visarga2y ago

We have been doing feature engineering, then we got into architecture engineering, and the future is dataset & prompt engineering. All models learn more or less the same, given the same training budget and dataset. But better data makes a better model.

Same for humans, the better the education and more advanced the science, the more we achieve with the same brains. The key ingredient is ideas, both for humans and AI.

danielmarkbruce2y ago

Maybe if you stick to purely theoretical stuff, it's easy. Actually building real systems that work and add value using deep learning isn't easy. There are so many gotchas.

hedgehog02y ago

https://www.inference.vc/we-may-be-surprised-again/

The author wrote this article one month ago, and he mentioned where he thought wrong about DL

jacobn2y ago

[2016]

Mimmy2y ago

thanks, didn't notice the year until your comment. wonder what his thoughts are on it now.

sullyj32y ago

Pretty straightforward case of the curse of knowledge (https://en.wikipedia.org/wiki/Curse_of_knowledge), in my opinion.

j / k navigate · click thread line to collapse

23 comments

jph002y ago

(I think in general when people say that their special area of study is particularly important, it should be taken with a grain of salt!)

inconfident20212y ago

> It turned out that architectural improvements really did matter

Transformer (2017). NET based Diffusion (2015). Score based diffusion (2019). DDPM (2020). Uses unet (2015). Clip (~2021) uses Resnet (2015) + ViT (2020). stable diffusion also uses Unet.

Yes, deep learning is easy. Throw compute, you get the answer.

Those with insights on the old problem are now churning papers because they throw compute and deep learning to the stuffs they understood well.

Whatever you do, just having the idea to understand problem matters a lot. Then later comes the part of deep learning and throwing compute.

dartos2y ago

I’m sure there’s an argument to be made that all architectural improvements to basic feed forward neural nets are essentially optimizations for the amount of compute provided anyway.

If we had unlimited compute and time for training, I don’t think we would’ve really moved on from dense feed forward nets.

light_hue_12y ago

This is terrible and frankly self serving advice.

"Don't build on top of deep learning. Build on top of MCMC-like methods"

I would never give this advice to my PhD students.

Maybe in a decade or two someone will rescue MCMC like methods. In the meantime your PhD students will suffer by being irrelevant and having skills that no one needs.

mrcode0072y ago

You dismissed most of the modern computational finance and a lot of cross valuation adjustments mechanisms that are behind the modern financial system and risk estimates.

Edit: not to mention modern statistical mechanics…

tnecniv2y ago

Which then cycles back to ML methods. Diffusion models are non-equilibrium stat mech.

Sure building a better HMC algorithm might not be a good use of time, but the spirit of MCMC is alive

theGnuMe2y ago

1 more reply

dbmikus2y ago

My general belief, is that the best way to learn at the frontier of something is to pick a problem or a goal and try to solve it. Then you will learn what is in the way of getting that done.

mdgrech232y ago

Sometimes when I drive to solve a really hard problem rather than solving directly I come to conclusion it's better to do it a different way where I don't run into the hard problem.

j7ake2y ago

The art of a scientist is knowing which problems to pick.

Too easy and you won’t find anything new, too hard and you won’t make any progress.

felipefar2y ago

What's the difference between a "problem" and a "research area"?

Research areas are made of problems to solve.

dbmikus2y ago

Agree, distinction is fuzzy. I meant, instead of picking a research area because it seems cool/interesting/etc, either:

  1. pick a product or user problem and try to solve that problem, then work back into the research you need to do so
  2. pick a research problem and try to solve that (again, focusing on the problem, not just an area of study)

The blog post seemed to be more for people doing applied things, so I was speaking to those doing option 1.

mathisfun1232y ago

>focusing on the problem, not just an area of study

1 more reply

blind6662y ago

It is interesting to look back and evaluate the preferences / intuitions prominent researchers had in the field (most of whom started their careers experimenting with MNIST-scale of data, at best)

With access to unfathomable amounts of data, especially over the last couple of years, the game changed entirely and is not seeming to cool down anytime soon.

The field, certainly, values engineering a lot more than it used to, and it is exciting to see how major advances together with open-source contributions are going to take us

Ologn2y ago

> It is interesting to look back and evaluate the preferences / intuitions prominent researchers had in the field (most of whom started their careers experimenting with MNIST-scale of data, at best)

visarga2y ago

Same for humans, the better the education and more advanced the science, the more we achieve with the same brains. The key ingredient is ideas, both for humans and AI.

danielmarkbruce2y ago

Maybe if you stick to purely theoretical stuff, it's easy. Actually building real systems that work and add value using deep learning isn't easy. There are so many gotchas.

hedgehog02y ago

https://www.inference.vc/we-may-be-surprised-again/

The author wrote this article one month ago, and he mentioned where he thought wrong about DL

jacobn2y ago

[2016]

Mimmy2y ago

thanks, didn't notice the year until your comment. wonder what his thoughts are on it now.

sullyj32y ago

Pretty straightforward case of the curse of knowledge (https://en.wikipedia.org/wiki/Curse_of_knowledge), in my opinion.

j / k navigate · click thread line to collapse