How are you running evals for AI agents?

1 pointsaneeqdhk1y ago0 comments

I have a couple of projects in my company where wer are creating AI agents to generate code and/or help people in designing software. The agents themselves are conversational. The code generated is most often UI code.

How are people going about evaluating the responses of AI agents these days? Particularly for conversational flows - the problem seems more complex because it could require keeping the entire conversation in context.

Any help or resources will be quite appreciated!

0 comments

No comments yet.