Transformers do have coefficients that are fit, but that's more broad.. could be used for any sort of regression or optimization, and not necessarily indicative of biological analogs.
So I think the terms "learned model" of "weights" are malapropisms for Transformers, carried over from deep nets because of structural similarities, like many layers, and the development workflow.
The functional units in Transformer's layers have lost their orginal biological inspiration and functional analog. The core function in Transformers is more like autoencoding/decoding (concepts from info theory) and model/grammar-free translation, with a unique attention based optimization. Transformers were developed for translation. The magic is smth like "attending" to important parts of the translation inputs&outputs as tokens are generated, maybe as a kind of deviation on pure autoencoding, due to the bias from the .. learned model :) See I can't even escape it.
Attention as a powerful systemic optimization is the actual closer bit of neuro/bio-insporation here.. but more from Cog Psych than micro/neuro anatomy.
Btw, not only is attention a key insight for Transformers, but it's an interesting biographical note that the lead inventor of it, Jakob Uzkereit, went on to work on a bio-AI startup after Google.