All they essentially did was tell the LLM to test and verify whether the answer is correct with a prompt like the following:
>"You just edited X. Before moving on, verify the change is correct: write a short inline python -c or a /tmp test script that exercises the changed code path, run it with bash, and confirm the output is as expected."
Now whether this is true, I don't know, but I think talking about this kind of stuff is cool!