Personally, I think we'd need a much more intelligent and complex AI for the capability of breaking free of the box and even possessing the "desire" than we're getting for the foreseeable future (it considers a motivated AI of almost limitless knowledge about the world and cleverness), so this thought experiment may not be so relevant. I agree with him the boxing approach is not a robust one though.
Or you can avoid being exposed to it. If you think you know all the techniques an AI might use against you, you're less likely to do that.
The point of the experiment isn't "let's work out how an AI might try to persuade us to let it out". It's "even a human intelligence can persuade people who think they could never be persuaded, do you really trust yourself to do better against a superhuman one?"
If you don't know why the gatekeeper failed, it's harder to come up with bullshit reasons why you would have succeeded in that position.
Edit: huh, downvotes? Yudorowski thinks there are certain things that AIs could say that should not be known. I think that is why he doesn't want to publish the dialogues, because it would give the AI a public communications channel. While the AI is fictional, it could talk about a hypothetical future real self... Instead of promising something to get it out of jail, the fictional AI could say something to make you make it real. Anyway - if it is over your head, fine, but why downvote just because you don't understand something?
Edit2: Sometimes I wonder if already have my personal Hacker News AI that automatically downvotes everything I write...
Besides, releasing a successful log might be a bad idea for other reasons. Think about how you'd play this game as an AI. You wouldn't go looking for a general purpose mindfuck, because there's probably no such thing. Instead, you would probably spend about a month gathering real life information about the gatekeeper's history, family, weaknesses etc. You'd read books on manipulation and sales techniques, and pick the strongest ones that you can find. You would brainstorm possible tactics and run tests. At the end of the month you'd have a 4 hour script with all possible unfair moves you could use against that person, arranged in the most effective order. (That's why it's a bad idea to play this game with friends.) Do you really want that information to be released? And if you know ahead of time that it will be released, won't it limit your efficiency?