To use your metaphor, TF-IDF will result in ‘fixed’ weights.
Attention makes it so that the weights of each token can be different in each sequence of tokens. Same token gets different weights depending on who its ‘neighbors’ in the sequence end up being.
This property allows the models to solve a variety of natural language problems and gets ‘used’ by the model to express context-aware dependencies.