Autogen: Enable next-gen large language model applications (opens in new tab)

(github.com)

163 pointsinfruset2y ago54 comments

54 comments

Have been working with this and very impressed so far - it’s a step ahead of LangChain agents and seems to be receiving more attention/development than LangChain was interested in committing to agents.

FWIW the “group research” and “chess” examples from the notebooks folder in their repo have been the best for explaining the utility of this tech to others - the meme generator does a good job showing functions stripped down but misses a lot of the important bits

simonw2y ago

Here's that group research notebook: https://github.com/microsoft/autogen/blob/main/notebook/agen...

And the chess one: https://github.com/microsoft/autogen/blob/main/notebook/agen...

ugh1232y ago

Matthew Berman has a good series on AutoGen with tutorials and demos: https://www.youtube.com/watch?v=10FCv-gCKug

However from his examples (and his own admission) it seems that AutoGen isn't benefitting from full GPT4-level performance even tho he's pointed it directly at OpenAPI GPT4 (and other LLMs). The back and forth between the agents does not produce great results even tho similar prompts pumped directly into ChatGPT seem to give better results.

Anyone know whats going on?

Tostino2y ago

Temperature being set differently is one culprit. There are a few hyper parameters that can be tweaked to get some pretty different output.

ProofHouse2y ago

This is a top potential cause for sure. The variability can change drastically with temperature differences

TaylorAlexander2y ago

This just reminds me: I have been wondering, if you get multiple instances of GPT-4 talking to each other, each seeded with a different personality prompt, do they have interesting conversations? I suspect it would devolve in to nonsense quickly, but I’ve never seen any chat log of two GPT instances talking. Does anyone have a reference for this? Thanks.

MiSeRyDeee2y ago

Check https://github.com/OpenBMB/ChatDev out, they simulate personas in a company and build products by simulating the interactions.

tuchsen2y ago

I did a DSL to facilitate this at https://prlang.com. I've had some success setting up agents to "act" out scenes, where each plays a different part, but it was kinda limited in that conversations would kinda de cohere into nonsense after a bit.

ProofHouse2y ago

XAgent ser

webappguy2y ago

easy enough to test, copy and paste the responses after initial prompt

TaylorAlexander2y ago

Right but someone with API access could do this much more easily. I don’t really want to sit there copying and pasting back and forth. I’d rather write two or three starting prompts and have a few agents do all the work of talking.

1 more reply

gavi2y ago

Anyone trying this - Please note the python package is called pyautogen

h4kor2y ago

A question for people researching LLMs and their capabilities:

Is there any reason to believe that the interaction of multiple agents (using the same model) will yield some emergent property that is beyond the capabilities of the agent model?

I'm not working with LLMs, but my intuition is that whatever these multi agent setups come up with could also be achieved by a single agent just talking to itself, as they all are "just guessing" what the most probable next token is.

lelag2y ago

Since a single inference is limited by context length, a multiple agents model is able to process more context at each steps of the reasoning chain, which might improve the overall quality. However, given how easy it is getting to fine tune models, it's likely that multi-agent models will make a lot of sense to split the workload and assign each part to a specialized agent.

wokwokwok2y ago

> a single inference is limited by context length,

Yes.

> multiple agents model is able to process more context at each steps of the reasoning chain

What?

How can a multi agent model have more context at a single step? The single step runs on a single agent. It would literally the same as a single agent?

The multi agent approach is simply packaging up different “personas” for single steps; and yes, it is entirely reasonable to assume that given N configurations for an agent (different props, different temp, different models even) you would see emergent behaviour that a single agent wouldn’t.

For example, you might have a “creative agent” to scaffold something and a “conservative” agent to fix syntax errors.

…but what are you talking about with different context sizes? I think you’re mixing domain terms; context is the input to an LLM. I don’t know what you’re referring to, but multi agent setups make absolutely no difference to the context size.

1 more reply

ca_tech2y ago

I think this is right inline with the utility of multi agent models. Whether distributing tasks to specialized agents trained on domain knowledge or collaborating with context aware agents. I think the context is where we are going to find limitations early on especially when models are expected to work on live data. Rather than constantly retraining a model, you leverage a model that is already primed through in-context learning based on previous interactions and relevant data.

ProofHouse2y ago

context window is fast becoming a non issue (memgpt, SPR, sink tokens etc)

brandall102y ago

My understanding is it's about attention.

When you give it a specific role it essentially hones in on the relevant part of the training data. Researcher in X field? Papers from that field get priority in formulating responses and the accuracy of token prediction for contextually relevant tasks goes up.

OTOH, if you try to go 'meta' - ie. you give it a scenario where it imagines a group of scholars chatting with each other, then it hones in on situations where there is a dialogue amongst a group (ie. a play/script).

lachlan_gray2y ago

In a way it is the same thing, agents are mostly an abstraction that make it easier to know what’s going on.

I think of agents more or less as python classes with a mixture of natural language and code functions. You design them to do something with information they produce, and to interface with other agents or “tools” in some way.

But all the agents can be the same language model under the hood, they are frames used to build different kinds of contexts.

And yes I think the idea is that emergent behaviour can be useful. This comes to mind

https://github.com/MineDojo/Voyager

But I think we are still a small ways off from being really smart about agents. My opinion is that we haven’t quite figured out what we are doing yet.

vimota2y ago

Given we know different prompts perform better on different tasks (via evals, papers, etc), you can think of multiple agents interacting (especially when there's a specialized "router" or orchestrator) as sub problems of a larger task being solved by "agents" specialized for that task - prompts + context crafted for that sub-problem.

lmeyerov2y ago

We do a ton in louie.ai bc of this:

* sometimes we want an LLM with longer context, faster speed, higher quality, etc: so even in a model family, in the same job, diff model configs

* we do a lot of prompt tuning for agent calls, like what a good Splunk query is, what SQL tables are currently available, what a good chart is, how to using a graph library, ...

* we also do accompanying code-level work, like running a generated python data analysis in a sandbox and feeding back exceptions to the LLM, or checking for parse errors when running a DB query, which feed back to the LLM

* When working directly on data, we might run it through the LLM, which might get into parallel chunked calls, a summary tree, etc, where a single LLM call would be insufficient, costly, slow, etc

famouswaffles2y ago

https://arxiv.org/abs/2307.07924

at the very least, you can get things done that would be extremely difficult or practically impossible to do with a single instance.

nodja2y ago

> Is there any reason to believe that the interaction of multiple agents (using the same model) will yield some emergent property that is beyond the capabilities of the agent model?

If you write a short story it's often better to split it into parts (make an outline, write the story, edit the story) than if you would try to do the whole process at once. The same can be true for LLMs I suppose.

In an LLM sense this would be like the different system prompts are sampling different parts of the training distribution, but I'm not able to validate such a claim or know if someone has validated it before.

andrewmutz2y ago

From my experience it's a modularization technique. It makes it easier to reason about and improve the system. For example, instead of one big model capable of doing anything, you can separate the system into specialized subsystems with different prompts and improve them over time.

icandoit2y ago

Mixture of experts: Make each model world-class within a single domain. If adding one more common-sense QnA makes the calculus-bot even slightly worse at caculus, don't do it.

https://en.wikipedia.org/wiki/Mixture_of_experts

rjvs2y ago

The “mixture of experts” concept in LLMs is a way of training a single model, it’s not based on training many different models (although that was the idea when the term was originally coined).

chpatrick2y ago

It's possible that they can only "wear so many hats" at the same time.

Tostino2y ago

And that is where MoE comes in with some more advancements in routing

ravix2y ago

The breakthrough I've had is realizing how important it is to control the conversation between agents.

Just like in our work environments and in our relationships, HOW conversations occur largely determines the impact of the conversation. With or without AutoGen

We're building a multi-agent postgres data analytics tool. If you're building agentic software, join the conversation: https://youtu.be/4o8tymMQ5GM

SunghoYahng2y ago

Unless I'm missing something, how is this library different from prompting a single chatbot: "Write a dialog in which A, B, and C, each playing a different role, have a conversation and do something D"?

leobg2y ago

You can have the character description more front and center, if that makes any sense.

So instead of diluting attention across three separate character descriptions, your model will see just the chat log and the single description of the persona it should respond from. This may or may not make a difference.

staticman22y ago

Maybe it depends on the model but I find you'll get a different result if you say "write a dialog in which, A, B, and C talk about D" versus "read what A said and reply as B". The latter will result in each participant talking longer.

webappguy2y ago

Not sure talking longer is the goal. More so, the focus and separation of each facilitates a interplay and dynamic with which an attention window on a segmented linear response (be A, B and C) cannot be individually represent nearly as rhobustly (the main inference is the primary focus). Would love to hear some other opinions chime in here.

1 more reply

m3kw92y ago

Having conversations amongst agents is it like treating each agent as your traditional nodes? Maybe in the future there would be millions of nodes(agents) conversing and maybe this is how next gen AGI will form

ShamelessC2y ago

Hot take.

lagrange772y ago

next gen AGI?

dang2y ago

A bunch of single-comment related threads. Others?

AutoGen: A Multi-Agent Framework for Streamlining Task Customization - https://news.ycombinator.com/item?id=37855314 - Oct 2023 (1 comment)

Microsoft's AutoGen – Guide to code execution by LLMs - https://news.ycombinator.com/item?id=37822809 - Oct 2023 (1 comment)

Making memes with Autogen AI (open source LLM agent framework) [video] - https://news.ycombinator.com/item?id=37750897 - Oct 2023 (1 comment)

AutoGen: Enabling next-generation large language model applications - https://news.ycombinator.com/item?id=37647404 - Sept 2023 (1 comment)

AutoGen: Enabling Next-Gen GPT-X Applications - https://news.ycombinator.com/item?id=37220686 - Aug 2023 (1 comment)

MawKKe2y ago

Is Microsoft chronically incapable of coming up with original names?

Jwarder2y ago

It isn't too bad; naming stuff is always hard. If the Microsoft marketers knew about it then I would expect to see Azure™ Gen.NET™ Live™.

meiraleal2y ago

> I would expect to see Azure™ Gen.NET™ Live™

Give them 3 major releases.

mnky9800n2y ago

Did they give up on ONE?

1 more reply

ugh1232y ago

you forgot to add 360™

digitcatphd2y ago

It doesn’t help you inherently solve the problem per se, but what it does allow you to do that is distinctive is keep the human and the loop that can assist the agents to solve problems. To some degree it can also keep problems in the logic chain from snowballing, and causing the overall objective to fail because there’s invalid logic in the sequence

a_bonobo2y ago

Are these 'safer' than using langchain-based agents that directly execute (arbitrary!) Python code? That was always my main issue with langchain

andystep2y ago

i noticed autogen creates a new docker container each time code is executed by agents (default behaviour, can be turned off), so it's safe as you think docker is safe

webappguy2y ago

can you elaborate on your docker comment? Not familiar. Tnx

zerop2y ago

Use cases for multi agents?

webappguy2y ago

software development, for 1. Nearly any use

webappguy2y ago

AutoGen is great, but have you heard of GeniA?

j / k navigate · click thread line to collapse

54 comments

anais92y ago

simonw2y ago

Here's that group research notebook: https://github.com/microsoft/autogen/blob/main/notebook/agen...

And the chess one: https://github.com/microsoft/autogen/blob/main/notebook/agen...

ugh1232y ago

Matthew Berman has a good series on AutoGen with tutorials and demos: https://www.youtube.com/watch?v=10FCv-gCKug

Anyone know whats going on?

Tostino2y ago

Temperature being set differently is one culprit. There are a few hyper parameters that can be tweaked to get some pretty different output.

ProofHouse2y ago

This is a top potential cause for sure. The variability can change drastically with temperature differences

TaylorAlexander2y ago

MiSeRyDeee2y ago

Check https://github.com/OpenBMB/ChatDev out, they simulate personas in a company and build products by simulating the interactions.

tuchsen2y ago

ProofHouse2y ago

XAgent ser

webappguy2y ago

easy enough to test, copy and paste the responses after initial prompt

TaylorAlexander2y ago

1 more reply

gavi2y ago

Anyone trying this - Please note the python package is called pyautogen

h4kor2y ago

A question for people researching LLMs and their capabilities:

Is there any reason to believe that the interaction of multiple agents (using the same model) will yield some emergent property that is beyond the capabilities of the agent model?

lelag2y ago

wokwokwok2y ago

> a single inference is limited by context length,

Yes.

> multiple agents model is able to process more context at each steps of the reasoning chain

What?

How can a multi agent model have more context at a single step? The single step runs on a single agent. It would literally the same as a single agent?

For example, you might have a “creative agent” to scaffold something and a “conservative” agent to fix syntax errors.

1 more reply

ca_tech2y ago

ProofHouse2y ago

context window is fast becoming a non issue (memgpt, SPR, sink tokens etc)

brandall102y ago

My understanding is it's about attention.

lachlan_gray2y ago

In a way it is the same thing, agents are mostly an abstraction that make it easier to know what’s going on.

But all the agents can be the same language model under the hood, they are frames used to build different kinds of contexts.

And yes I think the idea is that emergent behaviour can be useful. This comes to mind

https://github.com/MineDojo/Voyager

But I think we are still a small ways off from being really smart about agents. My opinion is that we haven’t quite figured out what we are doing yet.

vimota2y ago

lmeyerov2y ago

We do a ton in louie.ai bc of this:

* sometimes we want an LLM with longer context, faster speed, higher quality, etc: so even in a model family, in the same job, diff model configs

* we do a lot of prompt tuning for agent calls, like what a good Splunk query is, what SQL tables are currently available, what a good chart is, how to using a graph library, ...

* When working directly on data, we might run it through the LLM, which might get into parallel chunked calls, a summary tree, etc, where a single LLM call would be insufficient, costly, slow, etc

famouswaffles2y ago

https://arxiv.org/abs/2307.07924

at the very least, you can get things done that would be extremely difficult or practically impossible to do with a single instance.

nodja2y ago

> Is there any reason to believe that the interaction of multiple agents (using the same model) will yield some emergent property that is beyond the capabilities of the agent model?

andrewmutz2y ago

icandoit2y ago

Mixture of experts: Make each model world-class within a single domain. If adding one more common-sense QnA makes the calculus-bot even slightly worse at caculus, don't do it.

https://en.wikipedia.org/wiki/Mixture_of_experts

rjvs2y ago

The “mixture of experts” concept in LLMs is a way of training a single model, it’s not based on training many different models (although that was the idea when the term was originally coined).

chpatrick2y ago

It's possible that they can only "wear so many hats" at the same time.

Tostino2y ago

And that is where MoE comes in with some more advancements in routing

ravix2y ago

The breakthrough I've had is realizing how important it is to control the conversation between agents.

Just like in our work environments and in our relationships, HOW conversations occur largely determines the impact of the conversation. With or without AutoGen

We're building a multi-agent postgres data analytics tool. If you're building agentic software, join the conversation: https://youtu.be/4o8tymMQ5GM

SunghoYahng2y ago

leobg2y ago

You can have the character description more front and center, if that makes any sense.

staticman22y ago

webappguy2y ago

1 more reply

m3kw92y ago

ShamelessC2y ago

Hot take.

lagrange772y ago

next gen AGI?

dang2y ago

A bunch of single-comment related threads. Others?

AutoGen: A Multi-Agent Framework for Streamlining Task Customization - https://news.ycombinator.com/item?id=37855314 - Oct 2023 (1 comment)

Microsoft's AutoGen – Guide to code execution by LLMs - https://news.ycombinator.com/item?id=37822809 - Oct 2023 (1 comment)

Making memes with Autogen AI (open source LLM agent framework) [video] - https://news.ycombinator.com/item?id=37750897 - Oct 2023 (1 comment)

AutoGen: Enabling next-generation large language model applications - https://news.ycombinator.com/item?id=37647404 - Sept 2023 (1 comment)

AutoGen: Enabling Next-Gen GPT-X Applications - https://news.ycombinator.com/item?id=37220686 - Aug 2023 (1 comment)

MawKKe2y ago

Is Microsoft chronically incapable of coming up with original names?

Jwarder2y ago

It isn't too bad; naming stuff is always hard. If the Microsoft marketers knew about it then I would expect to see Azure™ Gen.NET™ Live™.

meiraleal2y ago

> I would expect to see Azure™ Gen.NET™ Live™

Give them 3 major releases.

mnky9800n2y ago

Did they give up on ONE?

1 more reply

ugh1232y ago

you forgot to add 360™

digitcatphd2y ago

a_bonobo2y ago

Are these 'safer' than using langchain-based agents that directly execute (arbitrary!) Python code? That was always my main issue with langchain

andystep2y ago

i noticed autogen creates a new docker container each time code is executed by agents (default behaviour, can be turned off), so it's safe as you think docker is safe

webappguy2y ago

can you elaborate on your docker comment? Not familiar. Tnx

zerop2y ago

Use cases for multi agents?

webappguy2y ago

software development, for 1. Nearly any use

webappguy2y ago

AutoGen is great, but have you heard of GeniA?

j / k navigate · click thread line to collapse