undefined | Better HN

0 pointsRealityVoid6mo ago0 comments

> Also, for embedded, forget even thinking about testing harnesses (which at least exist in some form with UEFI, it's just difficult to automate the execution and output for an LLM).

I think this doesn't have to be like this and we can do better for this. If LLMs keep this up, good testing infrastructure might become more important.

0 comments

koito176mo ago

One of my expectations for the future is the development of testing tools whose output is "optimized" in some way for LLM consumption. This is already occurring with Bun's test runner, for instance.[0] They are implementing a flag in the test runner so that the output is structured and optimized for token count.

Overall, I agree with your point. LLMs feel a lot more reliable when a codebase has thorough, easy-to-run tests. For a similar reason, I have been drifting towards strong, statically-typed languages. Both Rust and TypeScript have rich type systems that can express many kinds of runtime behavior with just types. When a compiler can make strong guarantees about a program's behavior, I assume that helps nudge the quality of LLM output a bit higher. Tests then help prevent silly regressions from occurring. I have no evidence for this besides my anecdotal experience using LLMs across several programming languages.

In general, I've had the best experience with LLMs when there's plenty of static analysis (and tests) on the codebase. When a codebase can't be easily tested, then I get much less productivity gains from LLMs. So yeah, I'm all for improving testing infrastructure.

[0] https://x.com/jarredsumner/status/1944948478184186366

j / k navigate · click thread line to collapse

0 comments

koito176mo ago

[0] https://x.com/jarredsumner/status/1944948478184186366

j / k navigate · click thread line to collapse