I might be able to learn more by chatting with you.
I think that the trained transformer has fixed weights and therefore cannot learn.
I think learning is one aspect of reasoning, and is demonstrated by challenges like navigation or puzzle solving where learning that one route to a solution is impossible is important.
I also think that the single forward pass of the model means that cyclic reasoning isn't feasible and that conditioning output by asking the model to "think" even when that thinking is done on the single forward pass means that logical processes are ruled out. The model isn't thinking in that case, the probabilities of the final part of the output are conditioned by requiring a longer initial output.