TinyTroupe, a new LLM-powered multiagent persona simulation Python library (opens in new tab)

(github.com)

143 pointspaulosalem1y ago48 comments

48 comments

This seems fundamentally unsuitable for its stated purpose, which is “understanding human behavior”.

While it may, as it says, produce “convincing interactions”, there is no basis at all peesented for believing it produces an accurate model of human behavior, so using it to “understand human behavior” is at best willful self-deception, and probably, with a little effort at tweaking inputs to produce the desired results, most often when used by someone who presents it as “enlightening productivity and business scenarios” it will be an engine for simply manufacturing support for a pre-selected option.

It is certainly easier and cheaper than exploring actual human interactions to understand human behavior, but then so is just using a magic 8-ball, which may be less convincing, but for all the evidence supporting this is just as accurate.

keeda1y ago

The source does not mention the underlying motivation (and it really should), but I think this is it:

https://www.linkedin.com/posts/emollick_kind-of-a-big-deal-a...

"... a new paper shows GPT-4 simulates people well enough to replicate social science experiments with high accuracy.

Note this is done by having the AI prompted to respond to survey questions as a person given random demographic characteristics & surveying thousands of "AI people," and works for studies published after the knowledge cut-off of the AI models."

A couple other posts along similar lines:

https://www.linkedin.com/posts/emollick_this-paper-suggests-...

"... LLMs automatically generate scientific hypotheses, and then test those hypotheses with simulated AI human agents.

https://www.linkedin.com/posts/emollick_formula-for-neat-ai-...

"Applying Asch's conformity experiment to LLMs: they tend to conform with the majority opinion, especially when they are "uncertain." Having a devil's advocate mitigates this effect, just as it does with people."

potatoman221y ago

I wonder how one could measure the how human-like the agents' opinions and interactions are? There's a ton of value in simulating preferences, but you're right that it's hard to know if the simulation is accurate.

I have a hunch that, through sampling many AI "opinions," you can arrive at something like the wisdom of the crowd, but again, it's hard to validate.

bpshaver1y ago

Section 6, "Controlled Evaluation," answers that question: https://arxiv.org/pdf/2304.03442

kaibee1y ago

cw: i don't actually work in ML, i just read a lot. if someone who is a real expert can tell me if my assessment here is correct, please let me know.

> I have a hunch that, through sampling many AI "opinions," you can arrive at something like the wisdom of the crowd, but again, it's hard to validate.

That's what an AI model already is.

Let's say you had 10 temperature sensors on a mountain and you logged their data at time T.

If you take the average of those 10 readings, you get a 'wisdom of the crowds' from the temperature sensors, which you can model as an avg + std of your 10 real measurements.

You can then sample 10 new points from the normal distribution defined by that avg + std. Cool for generating new similar data, but it doesn't really tell you anything you didn't already know.

Trying to get 'wisdom of crowds' through repeated querying of the AI model is equivalent to sampling 10 new points at random from your distribution. You'll get values that are like your original distribution of true values (w/ some outliers) but there's probably a better way to get at what you're looking to extract from the model.

ori_b1y ago

It's worse than that. LLMs have been tuned carefully to mostly produce output that will be inoffensive in a corporate environment. This isn't an unbiased sampling.

2 more replies

bsenftner1y ago

My first thought while reading was this would be a great academic framework in the hands of PhD students with extremely high attention to all the details and those detail interactions. But in the hands of any group or any individual with a less scientifically rigorous mindset, it's a construction set for justifications to do practically anything. It's becomes in the hands of biased laypersons a toolset for using statistics to lie exponentially changed into a nuclear weapon.

michaelmior1y ago

I would tend to agree. Although for something like testing ads it seems like it would be relatively straightforward to produce an A/B test that compares the performance of two ads relative t TinyTroupe's predictions.

A4ET8a8uTh01y ago

I did not test this library so I can't argue from that perspective ( I think I will though ; it does seem interesting ).

<< “enlightening productivity and business scenarios” it will be an engine for simply manufacturing support for a pre-selected option.

In a sense, this is what training employees is all about. You want to get them ready for various possible scenarios. For recurring tasks that do require some human input, it does not seem that far fetched.

<< produce “convincing interactions"

This is the interesting part. Is convincing a bad thing if it does what user would be expected to see?

mindcrime1y ago

> it will be an engine for simply manufacturing support for a pre-selected option.

There's nothing unique about this tool in that regard though. Pretty much anything can be mis-used in that way - spreadsheets, graphics/visualizations, statistical models, etc. etc. Whether tools are actually used to support better decision making, or simply to support pre-selected decisions, is more about the culture of the organization and the mind-set of its leaders.

dragonwriter1y ago

> There's nothing unique about this tool in that regard though.

Sure, it’s just part of an arms race where having a new thing with a hot selling pitch to cover that up and put a layer of buzzwords on top of it helps sell the results to audiences who have started to see through the existing ways of doing that.

1 more reply

A4ET8a8uTh01y ago

Agreed. At the end of the day, it is just another tool.

I think the issue is the human tendency to just rubber stamp whatever result is given. Not that long ago, few questioned the result of a study and now there won't even be underlying data to go back to see if someone made an error. Naturally, this would suggest that we will start seeing a lot of bad decisions, because human operators did not stop and think whether the response made sense.

That said, I am not sure what can be done about it.

simonw1y ago

It looks like this defaults to GPT-4o: https://github.com/microsoft/TinyTroupe/blob/7ae16568ad1c4de...

If you're going to try this out I would strongly recommend running it against GPT-4o mini instead. Mini is 16x cheaper and I'm confident the results you'll get out of it won't be 1/16th as good for this kind of experiment.

ttul1y ago

I suppose the Microsoft researchers default to 4o because the models are free in their environment…

simonw1y ago

Here's a quick way to start this running if you're using uv:

    cd /tmp
    git clone https://github.com/microsoft/tinytroupe
    cd tinytroupe
    OPENAI_API_KEY='your-key-here' uv run jupyter notebook

I used this pattern because my OpenAI key is stashed in a LLM-managed JSON file:

    OPENAI_API_KEY="$(jq -r '.openai' "$(dirname "$(llm logs path)")/keys.json")" \
      uv run jupyter notebook

(Which inspired me to add a new LLM feature: llm keys get openai - https://github.com/simonw/llm/issues/623)

xrd1y ago

I love jupyter notebooks. And, I'm amazed that a company like Microsoft would put a notebook front and center that starts off with a bunch of errors. Not a good look. I really think you can improve your AI marketing by a lot by creating compelling jupyter notebooks. Unsloth is a great example of the right way.

https://github.com/microsoft/TinyTroupe/blob/main/examples/a...

highcountess1y ago

I am glad I was not the only one that was taken aback a bit by that. I am not one to be too critical about loose ends or roughness in things that are provided for free and my ability to contribute a change, but it is a bit surprising that Microsoft would now have QA on this considering it ties into the current image they are trying to build.

itishappy1y ago

Here's the punchline from the Product Brainstorming example, imagining new AI-driven features to add to Microsoft Word:

> AI-driven context-aware assistant. Suggests writing styles or tones based on the document's purpose and user's past preferences, adapting to industry-specific jargon.

> Smart template system. Learns from user's editing patterns to offer real-time suggestions for document structure and content.

> Automatic formatting and structuring for documents. Learns from previous documents to suggest efficient layouts and ensure compliance with standards like architectural specifications.

> Medical checker AI. Ensures compliance with healthcare regulations and checks for medical accuracy, such as verifying drug dosages and interactions.

> AI for building codes and compliance checks. Flags potential issues and ensures document accuracy and confidentiality, particularly useful for architects.

> Design checker AI for sustainable architecture. Includes a database of materials for sustainable and cost-effective architecture choices.

Right, so what's missing in Word is an AI generated medical compliance check that tracks drug interactions for you and an AI architectural compliance and confidentiality... thing. Of course these are all followed by a note that says "drawbacks: None." Also, the penultimate line generated 7 examples but cut the output off at 6.

The intermediate output isn't much better, generally restating the same thing over and over and appending "in medicine" or "in architecture." They quickly drop any context this discussion relates to word processors in favor of discussing how a generic industrial AI could help them. (Drug interactions in Word, my word.)

Worth noting this is a Microsoft product generating ideas for a different Microsoft product. I hope they vetted this within their org.

As a proof of concept, this looks interesting! As a potentially useful business insight tool this seems far out. I suppose this might explain some of Microsoft's recent product decisions...

https://github.com/microsoft/TinyTroupe/blob/main/examples/p...

potatoman221y ago

That example is funny because 99% of doctors would not use Word to write their notes (and not because it doesn't have this hot new AI feature).

oulipo1y ago

So fucking sad that the use of AI is for... manipulating more humans into clicking on ads

Go get a fucking life, do something for the climate and repairing our societies social fabric instead

uniqueuid1y ago

Needs openai or azure APIs. I wonder if it's possible to just use any openapi-compatible local provider.

uniqueuid1y ago

Yup, looks like their azure api configuration is just a generic wrapper for openapi in which you can plug any endpoint url. Nice.

https://github.com/microsoft/TinyTroupe/blob/7ae16568ad1c4de...

thegabriele1y ago

I envision a future where ads are llms targetized. Is this worst or better than what we have now?

czbond1y ago

This is really cool. I can see quite a number of potential applications.

libertine1y ago

Could this be applied to mass propaganda and disinformation campaigns on social networks?

Like not only generating and testing narratives but then even use it for agents to generate engagement.

We've seen massive bot networks unchecked on X to help tilt election results, so probably this could be deployed there too.

Jimmc4141y ago

> We've seen massive bot networks unchecked on X to help tilt election results, so probably this could be deployed there too.

Do you have more details on this?

libertine1y ago

There are so many instances over the past year but some examples:

[0]https://www.cyber.gc.ca/en/news-events/russian-state-sponsor...

[1]https://www.justice.gov/opa/pr/justice-department-leads-effo...

isaacremuant1y ago

Let me guess. These bot networks that influence elections are from your political adversaries. Never from the party you support or your government when they're in power.

The election results tilting talk is tired and hypocritical.

libertine1y ago

So this is all delusions:

> The Federal Bureau of Investigation (FBI) and Cyber National Mission Force (CNMF), in partnership with the Canadian Centre for Cyber Security (CCCS), the Netherlands General Intelligence and Security Service (AIVD), and Netherlands Military Intelligence and Security Service (MIVD), and the National Police of the Netherlands (DNP) (hereinafter referred to as the authoring organizations), seek to warn of a covert tool used for disinformation campaigns benefiting the Russian Government. Affiliates of RT (formerly Russia Today), a Russian state-sponsored media organization, used this tool to create fictitious online personas, representing a number of nationalities, to post content on a social media platform.[0]

Nothing is happening?

Maybe the true question is, why are you going out of your way to make an effort to dismiss and normalize election interference and the usage of AI and bot farms to spread misinformation?

[0]https://www.cyber.gc.ca/en/news-events/russian-state-sponsor...

GlomarGadaffi1y ago

Actually Sburb.

thoreaux1y ago

How do I get the schizophrenic version of this

GlomarGadaffi1y ago

Following

A4ET8a8uTh01y ago

Hmm, now lets see if there is an effort anywhere to link it to a local llm.

ajcp1y ago

Just provide it the localhost:port for your instance of LM Studio/text-generation-webui as the Azure OpenAI endpoint in the config. Should work fine, but going to confirm now.

EDIT: Okay, this repo is a mess. They have "OpenAI" hardcoded in so many places that it literally makes this useless for working with Azure OpenAI Service OR any other openai style API. That wouldn't be terrible once you fiddled with the config IF they weren't importing the config BEFORE they set default values...

A4ET8a8uTh01y ago

Yeah, I was gonna say, I ran into all sorts of issues, but I couldn't immediately tell if it is my weird config or something. I don't want to give up on it, because the idea is interesting, but I will admit I only have so much time to spend on this today.

ajcp1y ago

I finally got it straightened out after I unborked their hardcoding and implemented my own non-OpenAI/Azure OpenAI Service client.

1 more reply

cen41y ago

Might help GRRM finish his books.

1 more reply

turing_complete1y ago

Written by George Hotz?

minimaxir1y ago

George Hotz does not have a monopoly on the word "Tiny" with respect to AI.

ttul1y ago

Something tells me he will never work for Microsoft. Even though they would probably love to employ him.

j / k navigate · click thread line to collapse

48 comments

dragonwriter1y ago

This seems fundamentally unsuitable for its stated purpose, which is “understanding human behavior”.

keeda1y ago

The source does not mention the underlying motivation (and it really should), but I think this is it:

https://www.linkedin.com/posts/emollick_kind-of-a-big-deal-a...

"... a new paper shows GPT-4 simulates people well enough to replicate social science experiments with high accuracy.

A couple other posts along similar lines:

https://www.linkedin.com/posts/emollick_this-paper-suggests-...

"... LLMs automatically generate scientific hypotheses, and then test those hypotheses with simulated AI human agents.

https://www.linkedin.com/posts/emollick_formula-for-neat-ai-...

potatoman221y ago

I have a hunch that, through sampling many AI "opinions," you can arrive at something like the wisdom of the crowd, but again, it's hard to validate.

bpshaver1y ago

Section 6, "Controlled Evaluation," answers that question: https://arxiv.org/pdf/2304.03442

kaibee1y ago

cw: i don't actually work in ML, i just read a lot. if someone who is a real expert can tell me if my assessment here is correct, please let me know.

> I have a hunch that, through sampling many AI "opinions," you can arrive at something like the wisdom of the crowd, but again, it's hard to validate.

That's what an AI model already is.

Let's say you had 10 temperature sensors on a mountain and you logged their data at time T.

If you take the average of those 10 readings, you get a 'wisdom of the crowds' from the temperature sensors, which you can model as an avg + std of your 10 real measurements.

You can then sample 10 new points from the normal distribution defined by that avg + std. Cool for generating new similar data, but it doesn't really tell you anything you didn't already know.

ori_b1y ago

It's worse than that. LLMs have been tuned carefully to mostly produce output that will be inoffensive in a corporate environment. This isn't an unbiased sampling.

2 more replies

bsenftner1y ago

michaelmior1y ago

A4ET8a8uTh01y ago

I did not test this library so I can't argue from that perspective ( I think I will though ; it does seem interesting ).

<< “enlightening productivity and business scenarios” it will be an engine for simply manufacturing support for a pre-selected option.

<< produce “convincing interactions"

This is the interesting part. Is convincing a bad thing if it does what user would be expected to see?

mindcrime1y ago

> it will be an engine for simply manufacturing support for a pre-selected option.

dragonwriter1y ago

> There's nothing unique about this tool in that regard though.

1 more reply

A4ET8a8uTh01y ago

Agreed. At the end of the day, it is just another tool.

That said, I am not sure what can be done about it.

simonw1y ago

It looks like this defaults to GPT-4o: https://github.com/microsoft/TinyTroupe/blob/7ae16568ad1c4de...

ttul1y ago

I suppose the Microsoft researchers default to 4o because the models are free in their environment…

simonw1y ago

Here's a quick way to start this running if you're using uv:

    cd /tmp
    git clone https://github.com/microsoft/tinytroupe
    cd tinytroupe
    OPENAI_API_KEY='your-key-here' uv run jupyter notebook

I used this pattern because my OpenAI key is stashed in a LLM-managed JSON file:

    OPENAI_API_KEY="$(jq -r '.openai' "$(dirname "$(llm logs path)")/keys.json")" \
      uv run jupyter notebook