It's a language model. It assigns probabilities to tokens in a sequence. You give it a number of options and it responds with the one that it assigns the highest probability to. If there's nothing in the options you give it that makes sense in the context of your test phrase, then it will return something that doesn't make sense. If some of your options make sense, it might return something that makes sense, or not.
So if you put it in a situation where nothing it outputs makes sense (to you) then none of its output will make sense. But that's not fair to the poor model.