While it may, as it says, produce “convincing interactions”, there is no basis at all peesented for believing it produces an accurate model of human behavior, so using it to “understand human behavior” is at best willful self-deception, and probably, with a little effort at tweaking inputs to produce the desired results, most often when used by someone who presents it as “enlightening productivity and business scenarios” it will be an engine for simply manufacturing support for a pre-selected option.
It is certainly easier and cheaper than exploring actual human interactions to understand human behavior, but then so is just using a magic 8-ball, which may be less convincing, but for all the evidence supporting this is just as accurate.
https://www.linkedin.com/posts/emollick_kind-of-a-big-deal-a...
"... a new paper shows GPT-4 simulates people well enough to replicate social science experiments with high accuracy.
Note this is done by having the AI prompted to respond to survey questions as a person given random demographic characteristics & surveying thousands of "AI people," and works for studies published after the knowledge cut-off of the AI models."
A couple other posts along similar lines:
https://www.linkedin.com/posts/emollick_this-paper-suggests-...
"... LLMs automatically generate scientific hypotheses, and then test those hypotheses with simulated AI human agents.
https://www.linkedin.com/posts/emollick_formula-for-neat-ai-...
"Applying Asch's conformity experiment to LLMs: they tend to conform with the majority opinion, especially when they are "uncertain." Having a devil's advocate mitigates this effect, just as it does with people."
I have a hunch that, through sampling many AI "opinions," you can arrive at something like the wisdom of the crowd, but again, it's hard to validate.
> I have a hunch that, through sampling many AI "opinions," you can arrive at something like the wisdom of the crowd, but again, it's hard to validate.
That's what an AI model already is.
Let's say you had 10 temperature sensors on a mountain and you logged their data at time T.
If you take the average of those 10 readings, you get a 'wisdom of the crowds' from the temperature sensors, which you can model as an avg + std of your 10 real measurements.
You can then sample 10 new points from the normal distribution defined by that avg + std. Cool for generating new similar data, but it doesn't really tell you anything you didn't already know.
Trying to get 'wisdom of crowds' through repeated querying of the AI model is equivalent to sampling 10 new points at random from your distribution. You'll get values that are like your original distribution of true values (w/ some outliers) but there's probably a better way to get at what you're looking to extract from the model.
<< “enlightening productivity and business scenarios” it will be an engine for simply manufacturing support for a pre-selected option.
In a sense, this is what training employees is all about. You want to get them ready for various possible scenarios. For recurring tasks that do require some human input, it does not seem that far fetched.
<< produce “convincing interactions"
This is the interesting part. Is convincing a bad thing if it does what user would be expected to see?
There's nothing unique about this tool in that regard though. Pretty much anything can be mis-used in that way - spreadsheets, graphics/visualizations, statistical models, etc. etc. Whether tools are actually used to support better decision making, or simply to support pre-selected decisions, is more about the culture of the organization and the mind-set of its leaders.
If you're going to try this out I would strongly recommend running it against GPT-4o mini instead. Mini is 16x cheaper and I'm confident the results you'll get out of it won't be 1/16th as good for this kind of experiment.
cd /tmp
git clone https://github.com/microsoft/tinytroupe
cd tinytroupe
OPENAI_API_KEY='your-key-here' uv run jupyter notebook
I used this pattern because my OpenAI key is stashed in a LLM-managed JSON file: OPENAI_API_KEY="$(jq -r '.openai' "$(dirname "$(llm logs path)")/keys.json")" \
uv run jupyter notebook
(Which inspired me to add a new LLM feature: llm keys get openai - https://github.com/simonw/llm/issues/623)https://github.com/microsoft/TinyTroupe/blob/main/examples/a...
> AI-driven context-aware assistant. Suggests writing styles or tones based on the document's purpose and user's past preferences, adapting to industry-specific jargon.
> Smart template system. Learns from user's editing patterns to offer real-time suggestions for document structure and content.
> Automatic formatting and structuring for documents. Learns from previous documents to suggest efficient layouts and ensure compliance with standards like architectural specifications.
> Medical checker AI. Ensures compliance with healthcare regulations and checks for medical accuracy, such as verifying drug dosages and interactions.
> AI for building codes and compliance checks. Flags potential issues and ensures document accuracy and confidentiality, particularly useful for architects.
> Design checker AI for sustainable architecture. Includes a database of materials for sustainable and cost-effective architecture choices.
Right, so what's missing in Word is an AI generated medical compliance check that tracks drug interactions for you and an AI architectural compliance and confidentiality... thing. Of course these are all followed by a note that says "drawbacks: None." Also, the penultimate line generated 7 examples but cut the output off at 6.
The intermediate output isn't much better, generally restating the same thing over and over and appending "in medicine" or "in architecture." They quickly drop any context this discussion relates to word processors in favor of discussing how a generic industrial AI could help them. (Drug interactions in Word, my word.)
Worth noting this is a Microsoft product generating ideas for a different Microsoft product. I hope they vetted this within their org.
As a proof of concept, this looks interesting! As a potentially useful business insight tool this seems far out. I suppose this might explain some of Microsoft's recent product decisions...
https://github.com/microsoft/TinyTroupe/blob/main/examples/p...
Go get a fucking life, do something for the climate and repairing our societies social fabric instead
https://github.com/microsoft/TinyTroupe/blob/7ae16568ad1c4de...
Like not only generating and testing narratives but then even use it for agents to generate engagement.
We've seen massive bot networks unchecked on X to help tilt election results, so probably this could be deployed there too.
Do you have more details on this?
[0]https://www.cyber.gc.ca/en/news-events/russian-state-sponsor...
[1]https://www.justice.gov/opa/pr/justice-department-leads-effo...
The election results tilting talk is tired and hypocritical.
> The Federal Bureau of Investigation (FBI) and Cyber National Mission Force (CNMF), in partnership with the Canadian Centre for Cyber Security (CCCS), the Netherlands General Intelligence and Security Service (AIVD), and Netherlands Military Intelligence and Security Service (MIVD), and the National Police of the Netherlands (DNP) (hereinafter referred to as the authoring organizations), seek to warn of a covert tool used for disinformation campaigns benefiting the Russian Government. Affiliates of RT (formerly Russia Today), a Russian state-sponsored media organization, used this tool to create fictitious online personas, representing a number of nationalities, to post content on a social media platform.[0]
Nothing is happening?
Maybe the true question is, why are you going out of your way to make an effort to dismiss and normalize election interference and the usage of AI and bot farms to spread misinformation?
[0]https://www.cyber.gc.ca/en/news-events/russian-state-sponsor...
EDIT: Okay, this repo is a mess. They have "OpenAI" hardcoded in so many places that it literally makes this useless for working with Azure OpenAI Service OR any other openai style API. That wouldn't be terrible once you fiddled with the config IF they weren't importing the config BEFORE they set default values...