>Any tips on building robust LLM software?
Honestly, it's difficult.
One reason why we are very happy with our results is because the person in charge of experimenting with the LLM-flow, while highly intelligent, is not talented with languages (honestly, he's the opposite).
This forces him to come up with creative solutions where others might get more mileage just with flawless prompts. Thanks do this, we discovered some really interesting tricks which help us solve problems that the available literature does not discuss.
Based on the flows he designs, I revise his prompt templates for more precision and token efficiency.
>And yeah I agree on how important test cases are. Having some sort of objective benchmark for judging how effective prompts are is really useful.
Its not just about effectiveness, but about ensuring that no false-positive-inferences make it into production: for a given prompt template, deeply investigate which edge cases of input data would lead to false positives. then, adapt the template and api-params until the unit test has 100% success.