We've been playing with that in the background. I can try to shoot you a preview in a few weeks. It works pretty well for reasoning tasks/NLP workloads but for workloads that need a "correct" answer, it's really tough to maintain accuracy when swapping models.
What we've seen most successful is making recommendations in the agent creation process for a given tool/workload and then leaving them somewhat static after creation.