undefined | Better HN

0 pointsshawntan3y ago0 comments

But "don't try to codify 'insight' into the process" seems to suggest "don't try different approaches". I'm not sure how people can at once trot out the "Bitter Lesson" and interpret it as it is written, but still say "We're not saying not to think about new approaches".

Is the idea then to work only on methods that allow for faster compute of more data?

FWIW, the Transformer works faster on current methods of parallelisation, allowing for dramatic scaling that RNNs will find hard to compete on. But we do pay for that in terms of what can be computed (https://arxiv.org/pdf/2207.00729.pdf - TL;DR: Transformers are limited in the types of programs/functions it can compute because of parallelism).

Scaling, ironically, does seem to be the 'direction of steepest descent' in terms of what will bring the best performance (for now). Gradient descent does find pleasant local optima that may keep us happy for a while.

0 comments

famouswaffles3y ago

As far as approach is concerned, all the bitter lesson advises against is trying to shoehorn human high level processes into the architecture. There's still plenty of room for different approaches outside of just faster compute.

CNNs and Transformers are very different. Both can be used for computer vision. The bitter lesson wouldn't stop you from switching from one to the other.

shawntanOP3y ago

The scope of "what to try" is large, we (as a community) should prioritise things that we think would work. If the criteria is not only "faster compute" it would seem "things that mimic human high level processes" would be a good candidate.

We started with MLPs then CNNs were invented, and that brought on pretty large gains. Arguably CNNs are architectures inspired by "human high level processes".

Edit: I will say though, this is a new take on the nuance of "Bitter Lesson" that I've never heard, though even this interpretation I find to be strangely contradictory for the reasons above.

famouswaffles3y ago

>it would seem "things that mimic human high level processes" would be a good candidate.

That's the natural intuition yes. But I believe Sutton's point is that this very intuition seems to prove itself wrong in the long term.

The way I see it, the problem with the high level is that we don't actually know shit. If we knew so completely what it took to model language or vision in the first place, we wouldn't need deep learning at all.

It seems intuitive that trying to bake in some basic grammar rules might speed things up along.

Problem with that is that we often end up overfitting the models to those specific rules and constraints, limiting its ability to generalize and learn more complex and underlying patterns and structures in language. Patterns that we don't actually know of.

The low level processes result in the high level performance but not vice versa.

It's said that the one human neuron is equivalent to a CNN. I wouldn't really call the operations of neurons high level though.

1 more reply

j / k navigate · click thread line to collapse

0 comments

famouswaffles3y ago

CNNs and Transformers are very different. Both can be used for computer vision. The bitter lesson wouldn't stop you from switching from one to the other.

shawntanOP3y ago

We started with MLPs then CNNs were invented, and that brought on pretty large gains. Arguably CNNs are architectures inspired by "human high level processes".

Edit: I will say though, this is a new take on the nuance of "Bitter Lesson" that I've never heard, though even this interpretation I find to be strangely contradictory for the reasons above.

famouswaffles3y ago

>it would seem "things that mimic human high level processes" would be a good candidate.

That's the natural intuition yes. But I believe Sutton's point is that this very intuition seems to prove itself wrong in the long term.

It seems intuitive that trying to bake in some basic grammar rules might speed things up along.

The low level processes result in the high level performance but not vice versa.

It's said that the one human neuron is equivalent to a CNN. I wouldn't really call the operations of neurons high level though.

1 more reply

j / k navigate · click thread line to collapse