Our example instruction is here: https://github.com/QuesmaOrg/BinaryAudit/blob/main/tasks/lig...
> However, [the approach of using AI agents for malware detection] is not ready for production.
Then the methodology does not support that. It's "the approach of using AI agents for malware detection with next to zero documentation or guidance is not ready for production."
Agree it is a good test to try, but there are huge benefits beings able to understand (better recreate) 0-conf tests.
The question we asked is if they can solve a problem autonomously, with instructions that would be clear for a reverse engineering specialist.
That say, I found these useful for many binary tasks - just not (yet) the end-to-end ones.