undefined | Better HN

0 pointsnindalf10mo ago0 comments

The author is up front about the limitations of their prompt. They say

> In fact my entire system prompt is speculative in that I haven’t ran a sufficient number of evaluations to determine if it helps or hinders, so consider it equivalent to me saying a prayer, rather than anything resembling science or engineering. Once I have ran those evaluations I’ll let you know.

0 comments

0points10mo ago

Author seems to downplay their own expertise and attribute it to the LLM, while at the same time admitting he's vibe prompting the LLM and dismissing wrong results while hyping the ones that happen to work out for him.

This seems more like wishful thinking and fringe stuff than CS.

pixl9710mo ago

Science starts at the fringe with a "that's interesting"

The interesting thing here is the LLM can come to very complex correct answers some of the time. The problem space of understanding and finding bugs is so large that this isn't just by chance, it's not like flipping a coin.

The issue for any particular user is the amount of testing required to make this into science is really massive.

zavec9mo ago

What would be really interesting is if the LLM has the ability to write a proof of concept that actually exploits the vulnerability. Then you could filter for false positives by asking it to write a PoC and running the PoC with asan or similar to get a deterministic crash. Sort of like what google was doing with the theorem proving stuff where it had a llm come up with potential proofs, but then evaluated the potential proofs in a deterministic checker to see if they were actually valid.

Of course, if you try to do that for all of the potential false positives that's going to take a _lot_ of tokens, but then we already spend a lot of CPU cycles on fuzzing so depending on how long you let the LLM churn on trying to get a PoC maybe it's still reasonable.

j / k navigate · click thread line to collapse