There are two things mixed together here. There is targeted scanning that was done by both Anthropic and Mozilla employees, using first Opus and then Mythos. Then there are other non-employee security researchers using AI to find and file bugs, motivated mostly by bug bounties.
The researchers were filing a steady trickle of bugs presumably using Opus 4.6. (Or rather, I saw a steady trickle after other people triaged them; I imagine the incoming stream was a lot busier.) My impression is that those have mostly dried up now. That could be the bias in my sample (I only see a slice of incoming bugs, so my anecdata aren't that strong), or a result of the restrictions added to the generally available models, or a result of there being less to find now that we've fixed so many of the issues found by company-backed bughunts. Or a combination of all three.
I guess my opinion is mostly driven by the difference in the quality and magnitude of bugs coming in from the company-backed scans pre- and post-Mythos. With Opus, there was an initial rush, but then it mostly died down. (For our group. For other groups, it was a series of waves that they never quite made it over before the next one came crashing in.) With Mythos, it was a larger wave and the quality of the bugs was higher. Two quantitative differences that ended up feeling like a qualitative change. So it's my underinformed personal opinion, but to me it feels like: yes, you could continue to find more bugs using a roughly Opus 4.6-strength model, but not that many and not cheaply, and the success rate is going to depend a lot on the harness. In comparison, I don't think we've seen the end of the Mythos wave, and my sense is that Mythos requires much less in the way of a harness.
It feels like the bitter lesson is playing itself out again, which I kinda hate because I want human ingenuity and cleverness to make an important difference, even after the next model has seen what the humans are coming up with.