FWIW the “group research” and “chess” examples from the notebooks folder in their repo have been the best for explaining the utility of this tech to others - the meme generator does a good job showing functions stripped down but misses a lot of the important bits
And the chess one: https://github.com/microsoft/autogen/blob/main/notebook/agen...
However from his examples (and his own admission) it seems that AutoGen isn't benefitting from full GPT4-level performance even tho he's pointed it directly at OpenAPI GPT4 (and other LLMs). The back and forth between the agents does not produce great results even tho similar prompts pumped directly into ChatGPT seem to give better results.
Anyone know whats going on?
Is there any reason to believe that the interaction of multiple agents (using the same model) will yield some emergent property that is beyond the capabilities of the agent model?
I'm not working with LLMs, but my intuition is that whatever these multi agent setups come up with could also be achieved by a single agent just talking to itself, as they all are "just guessing" what the most probable next token is.
Yes.
> multiple agents model is able to process more context at each steps of the reasoning chain
What?
How can a multi agent model have more context at a single step? The single step runs on a single agent. It would literally the same as a single agent?
The multi agent approach is simply packaging up different “personas” for single steps; and yes, it is entirely reasonable to assume that given N configurations for an agent (different props, different temp, different models even) you would see emergent behaviour that a single agent wouldn’t.
For example, you might have a “creative agent” to scaffold something and a “conservative” agent to fix syntax errors.
…but what are you talking about with different context sizes? I think you’re mixing domain terms; context is the input to an LLM. I don’t know what you’re referring to, but multi agent setups make absolutely no difference to the context size.
When you give it a specific role it essentially hones in on the relevant part of the training data. Researcher in X field? Papers from that field get priority in formulating responses and the accuracy of token prediction for contextually relevant tasks goes up.
OTOH, if you try to go 'meta' - ie. you give it a scenario where it imagines a group of scholars chatting with each other, then it hones in on situations where there is a dialogue amongst a group (ie. a play/script).
I think of agents more or less as python classes with a mixture of natural language and code functions. You design them to do something with information they produce, and to interface with other agents or “tools” in some way.
But all the agents can be the same language model under the hood, they are frames used to build different kinds of contexts.
And yes I think the idea is that emergent behaviour can be useful. This comes to mind
https://github.com/MineDojo/Voyager
But I think we are still a small ways off from being really smart about agents. My opinion is that we haven’t quite figured out what we are doing yet.
* sometimes we want an LLM with longer context, faster speed, higher quality, etc: so even in a model family, in the same job, diff model configs
* we do a lot of prompt tuning for agent calls, like what a good Splunk query is, what SQL tables are currently available, what a good chart is, how to using a graph library, ...
* we also do accompanying code-level work, like running a generated python data analysis in a sandbox and feeding back exceptions to the LLM, or checking for parse errors when running a DB query, which feed back to the LLM
* When working directly on data, we might run it through the LLM, which might get into parallel chunked calls, a summary tree, etc, where a single LLM call would be insufficient, costly, slow, etc
at the very least, you can get things done that would be extremely difficult or practically impossible to do with a single instance.
If you write a short story it's often better to split it into parts (make an outline, write the story, edit the story) than if you would try to do the whole process at once. The same can be true for LLMs I suppose.
In an LLM sense this would be like the different system prompts are sampling different parts of the training distribution, but I'm not able to validate such a claim or know if someone has validated it before.
Just like in our work environments and in our relationships, HOW conversations occur largely determines the impact of the conversation. With or without AutoGen
We're building a multi-agent postgres data analytics tool. If you're building agentic software, join the conversation: https://youtu.be/4o8tymMQ5GM
So instead of diluting attention across three separate character descriptions, your model will see just the chat log and the single description of the persona it should respond from. This may or may not make a difference.
AutoGen: A Multi-Agent Framework for Streamlining Task Customization - https://news.ycombinator.com/item?id=37855314 - Oct 2023 (1 comment)
Microsoft's AutoGen – Guide to code execution by LLMs - https://news.ycombinator.com/item?id=37822809 - Oct 2023 (1 comment)
Making memes with Autogen AI (open source LLM agent framework) [video] - https://news.ycombinator.com/item?id=37750897 - Oct 2023 (1 comment)
AutoGen: Enabling next-generation large language model applications - https://news.ycombinator.com/item?id=37647404 - Sept 2023 (1 comment)
AutoGen: Enabling Next-Gen GPT-X Applications - https://news.ycombinator.com/item?id=37220686 - Aug 2023 (1 comment)