undefined | Better HN

0 pointsgaigalas5mo ago0 comments

> Ask it to write tests that maximise code coverage

That is significantly harder to do than writing an implementation from tests, especially for codebases that previously didn't have any testing infrastructure.

0 comments

skissane5mo ago

Give a coding agent a codebase with no tests, and tell it to write some, it will - if you don’t tell it which framework to use, it will just pick one. No denying you’ll get much better results if an experienced developer provides it with some prompting on how to test than if you just let it decide for itself.

joshstrange5mo ago

This is a hilariously naive take.

If you’ve actually tried this, and actually read the results, you’d know this does not work well. It might write a few decent tests but get ready for an impressive number of tests and cases but no real coverage.

I did this literally 2 days ago and it churned for a while and spit out hundreds of tests! Great news right? Well, no, they did stupid things like “Create an instance of the class (new MyClass), now make sure it’s the right class type”. It also created multiple tests that created maps then asserted the values existed and matched… matched the maps it created in the test… without ever touching the underlying code it was supposed to be testing.

I’ve tested this on new codebases, old codebases, and vibe coded codebases, the results vary slightly and you absolutely can use LLMs to help with writing tests, no doubt, but “Just throw an agent at it” does not work.

lsaferite5mo ago

This highlights something that I wish was more prevalent, Path Coverage. I'm not sure of what testing suites handle path coverage, but I know XDebug for PHP could manage it back when I was doing PHP work. Simple line coverage doesn't tell you enough of the story while path coverage should let you be sure you've tested all code paths of a unit. Mix that with input fuzzing and you should be able to develop comprehensive unit tests for critical units in your codebase. Yes, I'm aware that's just one part of a large puzzle.

skissane5mo ago

But, did you actually give the agent access to a tool to measure code coverage?

If it can't measure whether it is succeeding in increasing code coverage, no wonder it doesn't do that great a job in increasing it.

Also, it can help if you have a pair of agents (which could even be just two different instances of the same agent with different prompting) – one to write tests, and one to review them. The test-writing agent writes tests, and submits them as a PR; the PR-reviewing agent read the PR and provides feedback; the test-writing agent updates the tests in response to the feedback; iterate until the PR-reviewing agent is satisfied. This can produce much better tests than just an agent writing tests without any automated review process.

gaigalasOP5mo ago

Have you tried? Beyond the first tests, going all the way up to decent coverage.

j / k navigate · click thread line to collapse

0 comments

skissane5mo ago

joshstrange5mo ago

This is a hilariously naive take.

lsaferite5mo ago

skissane5mo ago

But, did you actually give the agent access to a tool to measure code coverage?

If it can't measure whether it is succeeding in increasing code coverage, no wonder it doesn't do that great a job in increasing it.

gaigalasOP5mo ago

Have you tried? Beyond the first tests, going all the way up to decent coverage.

j / k navigate · click thread line to collapse