(I think in general when people say that their special area of study is particularly important, it should be taken with a grain of salt!)
Transformer (2017). NET based Diffusion (2015). Score based diffusion (2019). DDPM (2020). Uses unet (2015). Clip (~2021) uses Resnet (2015) + ViT (2020). stable diffusion also uses Unet.
Yes, deep learning is easy. Throw compute, you get the answer.
Those with insights on the old problem are now churning papers because they throw compute and deep learning to the stuffs they understood well.
You can look into any paper and see inspiration from old methods. Applying deep learning is hard, deep learning itself is quite easy. If it were really hard, it wouldn't have been popular at all! Nowadays most of the people don't even bother with architectural gains.
Sure if better architecture eventually comes up, people will throw new ones into the compute. I don't believe in bayesian stuff either. However it is worth learning otherwise you might miss insights on a lot of papers!
Whatever you do, just having the idea to understand problem matters a lot. Then later comes the part of deep learning and throwing compute.
If we had unlimited compute and time for training, I don’t think we would’ve really moved on from dense feed forward nets.
"Don't build on top of deep learning. Build on top of MCMC-like methods"
I used to do research into such methods. That game is over. It's a massive waste of time at the moment. The whole idea is how do we import what was good about those methods into the modern deep learning toolkit. How do I sample from distribution with dl? How do I get uncertainty estimates? How do I compose models, get disentangled representations, get few shot learning, etc.
The idea that people should go back to tools like MCMC today is pretty absurd. That entire research program was a failure and never scaled to anything. I say this of my many dozens of papers in the area too.
I would never give this advice to my PhD students.
Maybe in a decade or two someone will rescue MCMC like methods. In the meantime your PhD students will suffer by being irrelevant and having skills that no one needs.
Edit: not to mention modern statistical mechanics…
Sure building a better HMC algorithm might not be a good use of time, but the spirit of MCMC is alive
Unless you already have deep expertise, I think it's a bad idea to pick a research area and just go and research that. You won't have intuition about why it's a good thing to research. However, you can have intuition about real world problems and the solutions you want to see, and then work backwards to what you need to research.
Too easy and you won’t find anything new, too hard and you won’t make any progress.
Research areas are made of problems to solve.
1. pick a product or user problem and try to solve that problem, then work back into the research you need to do so
2. pick a research problem and try to solve that (again, focusing on the problem, not just an area of study)
The blog post seemed to be more for people doing applied things, so I was speaking to those doing option 1.With access to unfathomable amounts of data, especially over the last couple of years, the game changed entirely and is not seeming to cool down anytime soon.
The field, certainly, values engineering a lot more than it used to, and it is exciting to see how major advances together with open-source contributions are going to take us
Look back to how HN greeted the victory of the SuperVision team (Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton) in the 2012 ImageNet Large Scale Visual Recognition Challenge - https://news.ycombinator.com/item?id=4611830
Same for humans, the better the education and more advanced the science, the more we achieve with the same brains. The key ingredient is ideas, both for humans and AI.
The author wrote this article one month ago, and he mentioned where he thought wrong about DL