undefined | Better HN

0 pointsminimaxir19d ago0 comments

Modern agents can write tests that are meaningful and require the agent to pass them with any change to avoid regressions. Humans can review the code/test the downstream application to ensure it works as intended.

It rarely takes a single prompt to get something done, but the agents can figure out as long as the human is specific about what constitutes accurate.

0 comments

vrighter18d ago

you can't use the thing that doesn't quite work to ensure that the output of the thing that doesn't quite work works

minimaxirOP18d ago

It's counterintuitive yes, but you can. You can just look at the tests to ensure they're consistent and with the latest models that has always been the case, and it's very rare that the models try to cheat the tests.

vrighter17d ago

you can convince yourself it can, by all means. But it doesn't make it true. In fact we even have this rule for apecoding: a developer cannot review their own code.

_wire_19d ago

Via Doctorow a day ago:

https://singletrackworld.com/2018/01/collision-course-why-th...

apothegm18d ago

Something about cycling?

j / k navigate · click thread line to collapse