No need. Just add one more correction to the system prompt.
It's amusing to see hardcore believers of this tech doing mental gymnastics and attacking people whenever evidence of there being no intelligence in these tools is brought forth. Then the tool is "just" a statistical model, and clearly the user is holding it wrong, doesn't understand how it works, etc.
And why should a "superintelligent" tool need to be optimized for riddles to begin with? Do humans need to be trained on specific riddles to answer them correctly?
If you don't recognise the problem and actively engage your "system 2 brain", it's very easy to just leap to the obvious (but wrong) answer. That doesn't mean you're not intelligent and can't work it out if someone points out the problem. It's just the heuristics you've been trained to adopt betray you here, and that's really not so different a problem to what's tricking these llms.
It may trigger a particularly ambiguous path in the model's token weights, or whatever the technical explanation for this behavior is, which can certainly be addressed in future versions, but what it does is expose the fact that there's no real intelligence here. For all its "thinking" and "reasoning", the tool is incapable of arriving at the logically correct answer, unless it was specifically trained for that scenario, or happens to arrive at it by chance. This is not how intelligence works in living beings. Humans don't need to be trained at specific cognitive tasks in order to perform well at them, and our performance is not random.
But I'm sure this is "moving the goalposts", right?