undefined | Better HN

0 pointsbotusaurus1mo ago0 comments

have you tried stuffing a whole set of tutorials on how to use ghidra in the context, especially for the 1 mil token context like gemini?

0 comments

stared1mo ago

No. To give it a fair test, we didn't tinker with model-specific context-engineering. Adding skills, examples, etc is very likely to improve performance. So is any interactive feedback.

Our example instruction is here: https://github.com/QuesmaOrg/BinaryAudit/blob/main/tasks/lig...

anamexis1mo ago

Why, though? That would make sense if you were just trying to do a comparative analysis of different agent's ability to use specific tools without context, but if your thesis is:

> However, [the approach of using AI agents for malware detection] is not ready for production.

Then the methodology does not support that. It's "the approach of using AI agents for malware detection with next to zero documentation or guidance is not ready for production."

ronald_petty1mo ago

Not the author. Just my thoughts on supplying context during tests like these. When I do tests, I am focused on "out of the box" experiences. I suspect the vast majority of actors (good and bad, junior and senior) will use out of the box more then they will try to affect the outcome based on context engineering. We do expect tweaking prompts to provide better outcomes, but that also requires work (for now). Maybe another way to think is reducing system complexity by starting at the bottom (no configuration) before moving to top (more configuration). We can't even replicate out of the box today much less any level of configuration (randomness is going to random).

Agree it is a good test to try, but there are huge benefits beings able to understand (better recreate) 0-conf tests.

stared1mo ago

You can solve any problem with AI if you give enough hints.

The question we asked is if they can solve a problem autonomously, with instructions that would be clear for a reverse engineering specialist.

That say, I found these useful for many binary tasks - just not (yet) the end-to-end ones.

2 more replies

decidu0us90341mo ago

All the docs are already in its training data, wouldn't that just pollute the context? I think giving a model better/non-free tooling would help as mentioned. binja code mode can be useful but you definitely need to give these models a lot of babysitting and encouragement and their limitations shine with large binaries or functions. But sometimes if you have a lot to go through and just need some starting point to triage, false pos are fine.

1 more reply

j / k navigate · click thread line to collapse

0 comments

stared1mo ago

No. To give it a fair test, we didn't tinker with model-specific context-engineering. Adding skills, examples, etc is very likely to improve performance. So is any interactive feedback.

Our example instruction is here: https://github.com/QuesmaOrg/BinaryAudit/blob/main/tasks/lig...

anamexis1mo ago

Why, though? That would make sense if you were just trying to do a comparative analysis of different agent's ability to use specific tools without context, but if your thesis is:

> However, [the approach of using AI agents for malware detection] is not ready for production.

Then the methodology does not support that. It's "the approach of using AI agents for malware detection with next to zero documentation or guidance is not ready for production."

ronald_petty1mo ago

Agree it is a good test to try, but there are huge benefits beings able to understand (better recreate) 0-conf tests.

stared1mo ago

You can solve any problem with AI if you give enough hints.

The question we asked is if they can solve a problem autonomously, with instructions that would be clear for a reverse engineering specialist.

That say, I found these useful for many binary tasks - just not (yet) the end-to-end ones.

2 more replies

decidu0us90341mo ago

1 more reply

j / k navigate · click thread line to collapse