I don't think this is necessarily true. Here is an example where researchers trained a transformer to generate legal sequences of moves in the board game Othello. Then they demonstrated that the internal state of the model did, in fact, have a representation of the board.