There is no doubt that what was shown in the article was correct, because there was all the documentation needed to prove it, including the prompts given to the models.
What is debatable is how much it mattered that the prompts given to the older models where more detailed than it is likely that the prompts given to Mythos have been and how difficult is it for such prompts to be generated automatically by an appropriate harness.
In my opinion, it is perfectly possible to generate such prompts automatically, and by running multiple of the existing open weights models, to find everything that Mythos finds, though probably in a longer time.
Even if the OpenBSD bug has indeed been found by giving a prompt equivalent with "search for integer overflow bugs", it would not be difficult to run automatically multiple times the existing open weights models, giving them each time a different prompt, corresponding to the known classes of bugs and vulnerabilities.
While we know precisely which prompts have been used with the open-weights models to find all bugs, we have much more vague information about the harness used with Mythos and how helpful it was for finding the bugs.
Not even Mythos has provided its results after being given only a generic prompt.
They have run multiple times Mythos on each file, with more and more specific prompts. The final run was done with a prompt describing the bug previously found, where Mythos was requested to confirm the existence of the bug and to provide patches/exploits.
See: https://red.anthropic.com/2026/mythos-preview/
So the authors of that article are right, that for finding bugs an appropriate harness is essential. Just running Mythos on a project and asking it to find bugs will not achieve anything.