undefined | Better HN

0 pointsAerroon1d ago0 comments

I think the workflows can be really interesting to read about. The other week I read a reddit post how someone got Qwen3.5 35B-A3B to go from 22.2% on the 45 hard problems of swebench-verified to 37.8% (opus 4.6 gets 40%).

All they essentially did was tell the LLM to test and verify whether the answer is correct with a prompt like the following:

>"You just edited X. Before moving on, verify the change is correct: write a short inline python -c or a /tmp test script that exercises the changed code path, run it with bash, and confirm the output is as expected."

Now whether this is true, I don't know, but I think talking about this kind of stuff is cool!

0 comments

AerroonOP6h ago

I forgot to give a link to the post: https://www.reddit.com/r/LocalLLaMA/comments/1rkdlqi/qwen353...

j / k navigate · click thread line to collapse

0 pointsAerroon1d ago0 comments

All they essentially did was tell the LLM to test and verify whether the answer is correct with a prompt like the following:

Now whether this is true, I don't know, but I think talking about this kind of stuff is cool!