The heart of the bitter lesson is "don't try to codify "insight" into the process". It's basically the age old "you don't know what you don't know".
The Transformer is kind of a perfect example. It boasts algorithmic improvements over RNNs and LLMs are by far the best performing take on language modelling ever. And yet the architecture itself has basically no breakthrough from understanding language itself. It's an improvement over standard RNNs but not really because of any new found insight or implementation on language itself.
Basically trying to cram human high level instincts/insights into the process of solving a problem doesn't work better than giving a general architecture tons of data and letting it figure that all out by itself.