undefined | Better HN

0 pointsmsp262y ago0 comments

Any tips on building robust LLM software? And yeah I agree on how important test cases are. Having some sort of objective benchmark for judging how effective prompts are is really useful.

0 comments

inductive_magic2y ago

>Any tips on building robust LLM software?

Honestly, it's difficult.

One reason why we are very happy with our results is because the person in charge of experimenting with the LLM-flow, while highly intelligent, is not talented with languages (honestly, he's the opposite).

This forces him to come up with creative solutions where others might get more mileage just with flawless prompts. Thanks do this, we discovered some really interesting tricks which help us solve problems that the available literature does not discuss.

Based on the flows he designs, I revise his prompt templates for more precision and token efficiency.

>And yeah I agree on how important test cases are. Having some sort of objective benchmark for judging how effective prompts are is really useful.

Its not just about effectiveness, but about ensuring that no false-positive-inferences make it into production: for a given prompt template, deeply investigate which edge cases of input data would lead to false positives. then, adapt the template and api-params until the unit test has 100% success.

zelias2y ago

> Thanks do this, we discovered some really interesting tricks which help us solve problems that the available literature does not discuss.

Care to share? ;)

intended2y ago

Dont.

If you must, there is a continuum of tasks that range from suitable to risky in production settings.

Most definitely choose things on the suitable side of that scale (Eg - text generation, or classification).

More complex tasks like Data to text or Summarization? I personally would always avoid it, except if there are certain very specific workflows for your team/task.

Further, Its not just test cases - its an entire evaluation and prompt versioning layer. Of the few that I am aware of, most are not even openly available (Including Azure Prompt flow)

j / k navigate · click thread line to collapse