undefined | Better HN

0 pointsiamtherhino1y ago0 comments

We've been playing with that in the background. I can try to shoot you a preview in a few weeks. It works pretty well for reasoning tasks/NLP workloads but for workloads that need a "correct" answer, it's really tough to maintain accuracy when swapping models.

What we've seen most successful is making recommendations in the agent creation process for a given tool/workload and then leaving them somewhat static after creation.

0 comments

0xDEAFBEAD1y ago

That's fair. Maybe you could even send the user an email if you detect a new model release or pricing change which handles their workload for cheaper at comparable quality, to notify them to investigate.

iamtherhinoOP1y ago

That's a good idea-- then give them a link to "replay last X inferences with model ABC" so they can do a quick eyeball eval.

0xDEAFBEAD1y ago

Sweet, maybe you'll like my other idea in this thread too: https://news.ycombinator.com/item?id=43929194

j / k navigate · click thread line to collapse

0 comments

0xDEAFBEAD1y ago

iamtherhinoOP1y ago

That's a good idea-- then give them a link to "replay last X inferences with model ABC" so they can do a quick eyeball eval.

0xDEAFBEAD1y ago

Sweet, maybe you'll like my other idea in this thread too: https://news.ycombinator.com/item?id=43929194

j / k navigate · click thread line to collapse