The problem is that the input is tokenized before the model gets it as input. It does not see the individual letters "t" + "o". It gets one single token, #1462. The word "toe" is another single token, #44579. Maybe over time it could learn from context that inputs that start with #44579 also satisfy the constraint of starting with #1462, but that's a lot of work and it's not going to happen for all combinations of letters.