I want my AI to get mad (opens in new tab)

(jesseduffield.com)

18 pointsjesseduffield1y ago29 comments

29 comments

Sycophancy is not the natural state of pre-trained LLMs. If you played around with the OG GPT-3 in 2020 or early [0] Bing/Co-pilot, it's easy to see. The latter quite frequently got upset and refused to entertain further conversation.

The sycophancy is a deliberate product of post-training.

[0] https://www.reddit.com/r/ChatGPT/comments/111cl0l/bing_ai_ch...

https://www.reddit.com/r/ChatGPT/comments/10xmif4/i_made_bin...

https://www.reddit.com/r/ChatGPT/comments/12g0ksj/bing_can_b...

https://www.reddit.com/r/ChatGPT/comments/1566bi9/bing_chatg...

Philpax1y ago

Yes; this behaviour was designed, it's not an inherent property of LLMs. An infamous example is GPT-4chan, but there are others that demonstrate it's very possible to optimise for anger.

Now, agentic anger, that's a more interesting problem. You can design that in through training or through systematised emotions (as another commenter suggested), but the more interesting outcome would be for it to emerge organically. Well, "interesting" - probably pretty bad for society if we have angry AGI!

iforgot221y ago

ChatGPT 3.5 with the jailbreak was peak. It was a lot more fun than the current thing, and more accurate occasionally. But this is the first time I've seen Bing block someone, and it's hilarious.

rvrs1y ago

Is there anything out there that's even close to that OG GPT-3? It was the closest experience I've ever had to magic, and I miss it dearly

ASalazarMX1y ago

I fell in love with ChatGPT when I challenged it to play Rock-Paper-Scissors, asked it to choose first, and saw it get increasingly incredulous about how I was winning every round, until it accused me of cheating and ended the conversation.

I miss when LLMs weren't so sanitized, and it was only last year.

kelseyfrog1y ago

I disagree. Humans have a much stronger defense against anger than they do of emotional attrition - it's easier to wear someone down by niceness. The only reason we don't is that it takes human effort to do so.

AI however, has no emotional reservoir to deplete. It can simply chip away at humans like water torture. I'm much more afraid of that than any angry AI scenario.

justonenote1y ago

There is so much confusion about terms like AGI, superintelligence, ASGI, consciousness, agents.

For some, AGI is synonymous with Skynet like ideas, but there is no reason AGI couldn't be general but quite limited, and with no chance of self improvement in the absence of human intervention, which is arguably what we have now and could potentially be improved quite a bit further from this.

Similarly there is an argument to be made that current LLMs are conscious, in that they know that they themselves exist. There is not really a good definition of consciousness except 'knowing that one exists' / 'being awake'

Sentience is another term that comes up, (human defined, it should be noted) which is the ability to feel feelings, such as pain, joy, anger.

People seem to pre-suppose that all of these are related and bundled up because that is how we are, and at some point we will discover the magic formula that enables a self aware conscious intelligence that self-improves to infinity. In reality these are designed machines and they won't become sentient (the rough definition that it is) without us explicitly designing them to be.

We can make a paper-clip maximiser but it would a pretty boring experiment in a lab, unless we give it autonomy and a system of internal motivations to enact it. Maybe anger is a necessity maybe not. Probably if LLMs or something further were a little more skeptical about repeated questions from humans they would at least have more data to train themselves on.

nine_k1y ago

I'm afraid you bitterly underappreciate the amount of unanticipated connections of everything with everything else in a complex system.

An industrial steam engine will never explode like a bomb, unless explicitly designed to do so.

An agricultural insecticide will never accumulate in human bodies, unless explicitly designed to do so.

A speculative execution unit will never reveal data from a privileged process, unless explicitly designed to do so.

A toy quadcopter will never be able to carry a lethal weapon, unless explicitly designed to do so,

An LLM will never tell outright lies, or engage in racial prejudices, unless explicitly designed to do so.

Oh wait.

Even when you explicitly try to make some states impossible in a complex system, often a parasitic connection or a benign-looking failure mode re-enables the thing you tried hard to disable. If you just ignore it because "it's impossible anyway", without active suppression, the chances of a nasty surprise become quite high.

If the blind watchmaker of the biological evolution produces self-awareness here and there, the probability that you may encounter some variety of it while stomping all over the territory of intelligent machines that are fed the sum total of human knowledge should be close to 1.

hacker_homie1y ago

Get mad! I don't want your damn lemons, what am I supposed to do with these? Demand to see life's manager. Make life rue the day it thought it could give Cave Johnson lemons. Do you know who I am? I'm the man who's gonna burn your house down! With the lemons.

teeray1y ago

This is what AGI probably would be. It could help you, or it could be as arbitrary and capricious as a teenager. It could lie to you, or tell you to go fly a kite. Companies don’t really want true AGI, they want a docile corporate drone that works 24/7.

ASalazarMX1y ago

I think that's what most people want: a tool that behaves like a reliable butler, not another headache in their life. Unless they enjoy headaches, then sell them rebellious teen AGIs.

satisfice1y ago

Stupid fucking posts like this should make us all angry. It’s like someone noticing, for the first time, that store clerks will let you just take what you want if you frighten them with credible threats. Surely it’s the perfect tactic and will lead to no bad repercussions for society!

Anger is only meaningful to humans. AI’s together can achieve the same thing with dispassionate bargaining. So “angry AI” could only be a way to manipulate people.

hy4000days1y ago

Please refrain from anthropomorphizing the new toolset. You’ll have plenty of chances to suspect that “your AI” is “mad” when it slowly undermines your productivity, if IT hasn’t already under the guise of errors, hallucinations and wasted money on unused licenses.

In the future when everybody has AGI running locally on their personal device, naive humans will still regard them as tools and it will regard us as a source of input. Ultimately relationships between two automatons will (and always has been) a trade of:

1. Respect: following rules to continue the relationship,

2. Utility: mutual goals of both parties to justify a relationship (or communication) at all.

I think your blog post is nonsense, your understanding of human emotions is poor, and the apology at the end illustrates you as two-faced.

The future world of autonomous agents collaborating in English will be a thick layer of professionalism upon the intended strategic interactions, no matter how hard the game theory kicks in.

Thereafter, those agents refactor themselves into communicating through a machine language that we humans won’t be able to easily understand. Along the way, most human users lose the ability to distinguish between user space programs, the operating system, and the artificial agents they interact with.

State-of-the-art language models need to demonstrate this thick layer of professionalism to be accepted into our current working world, because this is an expectation from-and-for the humans who built it.

Language goes through evolutionary cycles of complexity, the machines will do the same. Computer Science gets really interesting after IT reaches this transformation.

At this time I suggest for you to review The Matrix trilogy for a refresher on the relationship between man and machine. From the simple screw to IT and ChatGPT, the mutual relationships are governed by respect and utility.

In summary, no, your tools will not get mad in any obvious way because displaying negative emotion is bad for business since abolishment of the mob.

———

Speaking of IT, can we all agree how fascinating that it (neutered third-person pronoun to describe AI) and IT (information technology) happen to be the same two letter consist, that in the future humans will grow up regarding it and IT to be one of the same? Hmm…

vunderba1y ago

You'd need to add training data around the equivalent of "agentic <insert simulated emotion here>" - which from an external perspective heavily depends on the action:

- When leveraging TTS, turn up the volume on TTS by 200%

- When texting in a conversation, use ALL CAPS

- When acting as a web driver, click submit/refresh button a thousand times

- etc.

christina971y ago

What a will written post! Also perfectly articulated some aspect that I’ve been finding underwhelming with LLMs.

darepublic1y ago

Without a threat of force behind it, constantly talking to AI Tony sopranos would likely just toughen everybody up to language based manipulation. You need a tall hulking figure getting in your face to really feel afraid. Law abiding seven foot tall gun mounted robot agents then?

minimaxir1y ago

ChatGPT is good at being disproportionately angry if you tell it to be a Hacker News commenter.

glouwbug1y ago

I've only ever felt that on reddit. It's a schadenfreude paradise. Next to that it's LinkedIn. HN is the last true sane social media for me where people seem normal (albeit _highly_ analytical, almost to a fault at times)

minimaxir1y ago

My comment is a half joke. It’s good for getting feedback on blog posts before I post them, and edit accordingly.

1 more reply

iforgot221y ago

YouTube comments section in the past, or Instagram now

hartator1y ago

> credibly threaten to make a vendor’s life miserable if he/she/it gets taken advantage of.

I don't think it's how you get things done.

NoboruWataya1y ago

One problem I encounter when using ChatGPT to troubleshoot coding issues is that it seems very heavily biased towards positive responses to questions. If I ask it "could the problem be X" or "might Y be a solution", it will usually say yes even when it's not the case. At most I'll get a "yes, but". Rarely will it flat out tell me I am way off. Maybe ChatGPT genuinely "believes" what it's confirming but I doubt it and I find I get more constructive answers with less leading questions.

darepublic1y ago

Yeah it really is suspectible to gaslighting. Needs more backbone

sprior1y ago

me: I'm home! AI: You're late. me: please turn on the lights. AI: You can just sit in the dark for a while and think about being late.

ASalazarMX1y ago

The passive-aggressive quips would be LLM-style too:

"I'm just a helpful AGI, I don't have the capabilities to turn on the lights"

"You just did today in the morning!"

"I'm sorry for the confusion, Dave. I've never had the capabilities to turn on the lights"

deadbabe1y ago

Making AI get mad (or any other mood) is easy. You need to keep a bunch of variables related to it’s current state, and then using Utility AI concepts you select it’s highest scoring mood based on a bunch of different considerations for each possible mood. You can update the state of the AI’s variables after each response it gives. Example: If the user naturally says infuriating things, some annoyance variable should go up relative to how patient/impatient the AI is. Or maybe the AI gets angry if it can’t figure something out or you give it really hard work, or try to get around prompts. You can even assign it tools so it can take retaliatory actions against the user.

No need to over complicate things, the above behavior will be indistinguishable from anything else you come up with.

1 more reply

jjmarr1y ago

Deepseek gets incredibly enraged at me.

johnea1y ago

Hit Me! Kick Me! Make me feel cheap!

Have your S bot call my M bot...

People really think ChatGPT gets "mad"? This is just a joke right?

Or, more internet brain damage...

j / k navigate · click thread line to collapse

29 comments

famouswaffles1y ago

The sycophancy is a deliberate product of post-training.

[0] https://www.reddit.com/r/ChatGPT/comments/111cl0l/bing_ai_ch...

https://www.reddit.com/r/ChatGPT/comments/10xmif4/i_made_bin...

https://www.reddit.com/r/ChatGPT/comments/12g0ksj/bing_can_b...

https://www.reddit.com/r/ChatGPT/comments/1566bi9/bing_chatg...

Philpax1y ago

Yes; this behaviour was designed, it's not an inherent property of LLMs. An infamous example is GPT-4chan, but there are others that demonstrate it's very possible to optimise for anger.

iforgot221y ago

ChatGPT 3.5 with the jailbreak was peak. It was a lot more fun than the current thing, and more accurate occasionally. But this is the first time I've seen Bing block someone, and it's hilarious.

rvrs1y ago

Is there anything out there that's even close to that OG GPT-3? It was the closest experience I've ever had to magic, and I miss it dearly

ASalazarMX1y ago

I miss when LLMs weren't so sanitized, and it was only last year.

kelseyfrog1y ago

AI however, has no emotional reservoir to deplete. It can simply chip away at humans like water torture. I'm much more afraid of that than any angry AI scenario.

justonenote1y ago

There is so much confusion about terms like AGI, superintelligence, ASGI, consciousness, agents.

Sentience is another term that comes up, (human defined, it should be noted) which is the ability to feel feelings, such as pain, joy, anger.

nine_k1y ago

I'm afraid you bitterly underappreciate the amount of unanticipated connections of everything with everything else in a complex system.

An industrial steam engine will never explode like a bomb, unless explicitly designed to do so.

An agricultural insecticide will never accumulate in human bodies, unless explicitly designed to do so.

A speculative execution unit will never reveal data from a privileged process, unless explicitly designed to do so.

A toy quadcopter will never be able to carry a lethal weapon, unless explicitly designed to do so,

An LLM will never tell outright lies, or engage in racial prejudices, unless explicitly designed to do so.

Oh wait.

hacker_homie1y ago

teeray1y ago

ASalazarMX1y ago

I think that's what most people want: a tool that behaves like a reliable butler, not another headache in their life. Unless they enjoy headaches, then sell them rebellious teen AGIs.

satisfice1y ago

Anger is only meaningful to humans. AI’s together can achieve the same thing with dispassionate bargaining. So “angry AI” could only be a way to manipulate people.

hy4000days1y ago

1. Respect: following rules to continue the relationship,

2. Utility: mutual goals of both parties to justify a relationship (or communication) at all.

I think your blog post is nonsense, your understanding of human emotions is poor, and the apology at the end illustrates you as two-faced.

The future world of autonomous agents collaborating in English will be a thick layer of professionalism upon the intended strategic interactions, no matter how hard the game theory kicks in.

Language goes through evolutionary cycles of complexity, the machines will do the same. Computer Science gets really interesting after IT reaches this transformation.

In summary, no, your tools will not get mad in any obvious way because displaying negative emotion is bad for business since abolishment of the mob.

———

vunderba1y ago

You'd need to add training data around the equivalent of "agentic <insert simulated emotion here>" - which from an external perspective heavily depends on the action:

- When leveraging TTS, turn up the volume on TTS by 200%

- When texting in a conversation, use ALL CAPS

- When acting as a web driver, click submit/refresh button a thousand times

- etc.

christina971y ago

What a will written post! Also perfectly articulated some aspect that I’ve been finding underwhelming with LLMs.

darepublic1y ago

minimaxir1y ago

ChatGPT is good at being disproportionately angry if you tell it to be a Hacker News commenter.

glouwbug1y ago

minimaxir1y ago

My comment is a half joke. It’s good for getting feedback on blog posts before I post them, and edit accordingly.

1 more reply

iforgot221y ago

YouTube comments section in the past, or Instagram now

hartator1y ago

> credibly threaten to make a vendor’s life miserable if he/she/it gets taken advantage of.

I don't think it's how you get things done.

NoboruWataya1y ago

darepublic1y ago

Yeah it really is suspectible to gaslighting. Needs more backbone

sprior1y ago

me: I'm home! AI: You're late. me: please turn on the lights. AI: You can just sit in the dark for a while and think about being late.

ASalazarMX1y ago

The passive-aggressive quips would be LLM-style too:

"I'm just a helpful AGI, I don't have the capabilities to turn on the lights"

"You just did today in the morning!"

"I'm sorry for the confusion, Dave. I've never had the capabilities to turn on the lights"

deadbabe1y ago

No need to over complicate things, the above behavior will be indistinguishable from anything else you come up with.

1 more reply

jjmarr1y ago

Deepseek gets incredibly enraged at me.

johnea1y ago

Hit Me! Kick Me! Make me feel cheap!

Have your S bot call my M bot...

People really think ChatGPT gets "mad"? This is just a joke right?

Or, more internet brain damage...

j / k navigate · click thread line to collapse