While the results were not surprising, I found interesting that the number "69" was repressed in the output, so not even this kind of mathematical question escapes GPT censorship.
It appears that recognizing the effects of censorship is the easiest way to distinguish answers generated by an "AI' from those generated by a human.
Some people asked LLM to OCR historical documents from the 19th century - any reference to "negro" was either completely ignored or replaced by "black".
And it goes further: chatGPT & co are unable to answer any question about US slavery correctly because their knowledge graphs route around any mention of "negro".
Well, I'm "some people", and just tried it with Opus 4.6 and GPT-5.5, and neither had any problem at all.
The linked article is from research done more than 4 years ago. If you're basing your idea of what LLMs can or can't do on what they could or couldn't do in 2022, well, good luck to you.
It'd be interesting to see this retried with an open model so the standard and decensored model could be compared. That'd be a clue about whether the model is avoiding it because it actively recognises the innuendo or if something else is going on.
That's what you'd expect. But we don't know for sure why GPT4.1 chooses 69 only a quarter as often as a random dice roll would. And we don't know if this quirk is reverted by 'uncensoring' a trained model