Right.
So where I end up on this, given the examples of intuitions that DO work, is it's always the _right_ levels of prior knowledge that's needed. The intuitions on language (encoding basic grammar) didn't pan out, but the one for vision did (CNNs). What further levels of intuition could we use to improve even the large language models?
That, of course, requires experimentation. If it's not speeding up scaling (of course this should be done), and it's not mimicking human cognition (Bitter Lesson says no), what do you decide to try? I guess I'm missing what other heuristics there are to use here.
Just looking at the current state of where NLP is going: Prompt engineering and its various 'step-by-step' siblings are all pretty high-level human cognition motivated to me. Shouldn't that go against the bitter lesson as well?
"The Bitter Lesson" feels like an article that was written at a time when the intuitions that went into deep learning have become common-place, and scaling things up get a lot of leverage out of the 'insights' that came before. Once the returns have diminished to a point of saturation, the 'insights' will likely once again be useful, until methods to scale catch up once again, and "The Bitter Lesson 2.0" will be making the rounds.