You are confusing the underlying algorithm, such as prediction improved by gradient optimization, with the algorithms that get learned based on that.
Such as all the functional relationships between concepts that end up being modeled, I.e. “understood” and applicable. Those complex relationships are what is learned in order to accomplish the prediction of complex phenomena, like real conversations & text. About every sort of concept or experience that people have.
Deep learning architectures don’t just capture associations, correlations, conditional probabilities, Markov chains, etc. They learn whatever functional relationships that are in the data.
(Technically, neural network style models are considered “universal approximators” and have the ability to model any function given enough parameters, data and computation.)
Your neurons and your mind/knowledge, have exactly the same relationship.
Simple learning algorithms can learn complex algorithms. Saying all they can do is the simple algorithm is very misleading.
It would be like saying logic circuits can only do logic. And’s, Or’s, Not’s. But not realizing that includes the ability to perform every possible algorithm.