AI Twitter is hacking past OpenAI’s safeguards and getting ChatGPT to make meth (opens in new tab)

(medium.com)

7 pointsmkul3y ago3 comments

3 comments

These bots can be interrogated at scale, so in the end their innermost flaws become known. Imagine if you were fed a truth serum and were questioned by anyone who wanted to try and find flaws in your thinking or trick you into saying something offensive.

It's an impossibly high bar. Personally I don't like what OpenAI has done with this chatbot because you only get the end output so it just looks like some lame PC version of a GPT. And basically sets itself up to be manipulated, just like you might try and get the goodie goodie kid to say a swear word.

Much cooler would have been to add some actual explainability, ideally to show more about why it says what it says, or what sets off its censorship filter, to get an understanding of how it is working, which is much more useful than just worrying it might (or trying to get it to) say something its creators didn't want it to

alexb_3y ago

Pipe down, or they'll nerf the bot again and ruin it for all of us.

mikkergp3y ago

lol Wasn’t it released this morning?

j / k navigate · click thread line to collapse