In general, a model has to learn to positively say "I don't know" instead of "I don't know" being in the negative space of tokens falling into a weak distribution. The softmax selector also normalizes the token logits, so if no options are any good (all next tokens suck) it could pick randomly from a bunch of bad choices, which then locks the model into a continuation based off of that first bad choice.