The argument is that letter level information is something llms don't have a chance to see.
It's a bit like asking human to read text and guess gender or emotional state of the author who wrote it. You just don't have this information.
Similarly you could ask why ":) is smiling and :D is happy" where the question will be seen as "[50372, 382, 62529, 326, 712, 35, 382, 7150]" - encoding looses this information, it's only visible in image rendering of this text.