These bots can be interrogated at scale, so in the end their innermost flaws become known. Imagine if you were fed a truth serum and were questioned by anyone who wanted to try and find flaws in your thinking or trick you into saying something offensive.
It's an impossibly high bar. Personally I don't like what OpenAI has done with this chatbot because you only get the end output so it just looks like some lame PC version of a GPT. And basically sets itself up to be manipulated, just like you might try and get the goodie goodie kid to say a swear word.
Much cooler would have been to add some actual explainability, ideally to show more about why it says what it says, or what sets off its censorship filter, to get an understanding of how it is working, which is much more useful than just worrying it might (or trying to get it to) say something its creators didn't want it to