We know that the combination of all three results in finding lots of security vulnerabilities. That's what Mozilla is talking about. The quote from the curl story states that just 2 and 3, but with just regular SotA models, would have produced very similar results
Which is really the crux of all this hype around Mythos: would the results really be different if they used Claude Opus instead of Claude Mythos? How much is the model, how much the harness, and how much is just because Anthropic is running a big campaign systematically trying to find vulnerabilities?