The 'gotcha' part in this case was that the SSL keygen code did not just use a random number seed but was grasping for entropy from other sources too (PIDs, perhaps? I can't recall the detail). That unknown made initial attempts to recreate the problem difficult.
Plus the failing test in question was not directly testing the SSL, merely making use of it. There were other SSL-specific tests run separately but they missed this strange corner case (in 'normal' use, it wasn't like 1 in 256 transactions would fail, otherwise that would have been much more obvious and we'd have had spurious-seeming failures all over the place).
That brings up another test pain: You can write lots of specific unit tests for every individual feature your code has, and they can all pass just fine. But when feature A, D and H happen to all be in use at once, you hit a separate problem. Onwards to 100% coverage...