I agree that a Transformer is an example of a "reflexive" behavior because it learns to react in a context (via gradient descent rather than evolution as the learning algorithm). It's a conditional categorical distribution on steroids.
I also agree it's not much different than what's going on in this petri dish with pong.
But I don't think that's a profound statement.
What I'm saying is that calling what a Transformer does "language development" isn't accurate. A Transformer can't "develop" language in that sense, it can only learn "reflexive" behavior from the data distribution it's trained on (it could never have produced that data distribution itself without the data existing in the first place).