Well when you put the sentiment head on a pretrained language model you kind of have to train that head a bit on the sentiment task right?
But if the rest of your model is frozen the head will never see actual words, just contextual vectors from the LM.
It feels like we are in strong agreement but using slightly different terms or something