What it highlights, is that Mythos doesn't seem so much better than other LLM driven tooling at finding security issues, which was the strongest claim Anthropic made in the first place.
“Mythos isn’t supposed to be that good at security, because actually Anthropic was referring more about running llms than mythos specifically”
“The opus model is worse because they have no compute because they are training mythos. The degraded performance is justified!”
“All the bugs in Claude code is just because the models are so good they are just looping and are shipping fast”
Constantly see people crawl out of the woodwork to defend a trillion dollars company overhyping every press release it gives
No, what others are doing, which I've done myself in the past too, is to evaluate how much their marketing matches up with reality, then share our experience about that. Very different than just "putting too much weight on marketing".
Funnily enough that was while Dario Amodei was their research director.
I do think they've said similar things in the past, but regardless Anthropic's BS marketing is something to behold and viewing it with extreme skepticism is smart.
> What it highlights, is that Mythos doesn't seem so much better than other LLM driven tooling at finding security issues, which was the strongest claim Anthropic made in the first place.
That's the conclusion Daniel makes and it definitely seems plausible, his opinion absolutely carries a lot of weight with me for sure.
But I hedge a little because we don't really know how much human labor was required to supplement those earlier LLM-assisted reviews of curl, nor do we know how easy it was for the person who used Mythos to generate the new batch. So the kind of bug hunting that might be "possible but still labor intensive" via current tooling might be far easier to accomplish with less skilled developers using Mythos.
And who knows, maybe Mythos is better on worse codebases, curl benefits from being very good to start from :)