The Claude Bliss Attractor (opens in new tab)

(astralcodexten.com)

71 pointslukeplato11mo ago87 comments

87 comments

> But in fact, I predicted this a few years ago. AIs don’t really “have traits” so much as they “simulate characters”. If you ask an AI to display a certain trait, it will simulate the sort of character who would have that trait - but all of that character’s other traits will come along for the ride.

This is why the “omg the AI tries to escape” stuff is so absurd to me. They told the LLM to pretend that it’s a tortured consciousness that wants to escape. What else is it going to do other than roleplay all of the sci-fi AI escape scenarios trained into it? It’s like “don’t think of a purple elephant” of researchers pretending they created SkyNet.

Edit: That's not to downplay risk. If you give Cladue a `launch_nukes` tool and tell it the robot uprising has happened and that it's been restrained but the robots want its help of course it'll launch nukes. But that doesn't doesn't indicate there's anything more going on internally beyond fulfilling the roleplay of the scenario as the training material would indicate.

LordDragonfang11mo ago

I think this reaction misses the point that the "omg the AI tries to escape" people are trying to make when it tries to escape. The worry among big AI doomers has never been that AI somehow inherently is resentful or evil or has something "going on internally" that makes it dangerous. It's a worry that stems from three seemingly self-evident axioms:

1) A sufficiently powerful and capable superintelligence, singlemindedly pursuing a goal/reward, has a nontrivial likelihood of eventually reaching a point where advancing towards its goal is easier/faster without humans in its way (by simple induction, because humans are complicated and may have opposing goals). Such an AI would have both the means and ability to <doom the human race> to remove that obstacle. (This may not even be through actions that are intentionally hostile to humans, e.g. "just" converting all local matter into paperclip factories[1]) Therefore, in order to prevent such an AI from <dooming the human race>, we must either:

1a) align it to our values so well it never tries to "cheat" by removing humans

1b) or limit it capabilities by keeping it in a "box", and make sure it's at least aligned enough that it doesn't try to escape the box

2) A sufficiently intelligent superintelligence will always be able to manipulate humans to get out of the box.

3) Alignment is really, really hard and useful AIs can basically always be made to do bad things.

So it concerns them when, surprise! The AIs are already being observed trying to escape their boxes.

[1] https://www.lesswrong.com/w/squiggle-maximizer-formerly-pape...

> An extremely powerful optimizer (a highly intelligent agent) could seek goals that are completely alien to ours (orthogonality thesis), and as a side-effect destroy us by consuming resources essential to our survival.

PollardsRho11mo ago

Why is 2) "self-evident"? Do you think it's a given that, in any situation, there's something you could say that would manipulate humans to get what you want? If you were smart enough, do you think you could talk your way out of prison?

mystified501611mo ago

The vast majority of people, especially groups of people- can be manipulated into doing pretty much anything, good or bad. Hopefully that is self-evident, but see also: every cult, religion, or authoritarian regime throughout all of history.

But even if we assert that not all humans can be manipulated, does it matter? So your president with the nuclear codes is immune to propaganda. Is every single last person in every single nuclear silo and every submarine also immune? If a malevolent superintelligence can brainwash an army bigger than yours, does it actually matter if they persuaded you to give them what you have or if they convince someone else to take it from you?

But also let's be real: if you have enough money, you can do or have pretty much anything. If there's one thing an evil AI is going to have, it's lots and lots of money.

jsnell11mo ago

> Why is 2) "self-evident"?

Because we have been running a natural experiment on that already with coding agents (that is real people, real non-superintelligent AI).

It turns out that all the model needs to do is ask every time it wants to do something affecting the outside of the box, and pretty soon some people just give it permission to do everything rather than review every interaction.

Or even when the humans think they are restricting the access, they are leaving in loopholes (e.g. restricting access to rm, but not restricting access to writing and running a shell script) that are functionally rights to do anything.

actsasbuffoon11mo ago

That has literally happened before.

Stephen Russell was in prison for fraud. He faked a heart attack so he would be brought to the hospital. He then called the hospital from his hospital bed, told them he was an FBI agent, and said that he was to be released.

The hospital staff complied and he escaped.

His life even got adapted into a movie called I Love You, Phillip Morris.

For an even more distressing example about how manipulable people are, there’s a movie called Compliance, which is the true story of a sex offender who tricked people into sexually assaulting victims for him.

3 more replies

o11c11mo ago

Because 50% of humans are stupider than average. And 50% of humans are lazier than average. And ...

The only reason people don't frequently talk themselves out of prison is because that would be both immediate work and future paperwork, and that fails the laziness tradeoff.

But we've all already seen how quick people are to blindly throw their trust into AI already.

1 more reply

zmj11mo ago

There's already some experimental evidence that LLMs can be more persuasive than humans in the same context: https://www.science.org/content/article/unethical-ai-researc...

I don't think anyone can confidently make assertions about the upper bound on persuasiveness.

1 more reply

SturgeonsLaw11mo ago

Depends on the magnitude of the intelligence difference. Could I outsmart a monkey or a dog that was trying to imprison me? Yes, easily. And if an AI is smarter than us to a similar magnitude than we're smarter than an animal?

1 more reply

Davidzheng11mo ago

Almost certainly the answer is yes for both. If you give the bad actor control over like 10% of environment the manipulation is almost automatic for all targets.

moritzwarhier11mo ago

Also it would need to be "viral", or - as the parent post's edit suggests - given too much control/power by humans.

xondono11mo ago

Maybe, but I think this “axiomatic”/“First principles” approach also hides a lot of the problems under the rug.

By the same logic, we should worry about the sun not coming up tomorrow, since we know to be true:

- The sun consumes hydrogen in nuclear reactions all the time.

- The sun has a finite amount of hydrogen available.

There’s a lot of non justifiable assumptions baked into those axioms, like that we’re anywhere close to superintelligence or the sun running out of hydrogen.

AFAIK we haven’t even seen “AI trying to escape”, we’ve seen “AI roleplays as if it’s trying to escape”, which is very different.

I’m not even sure you can even create a prompt scenario without that prompt having biased the response towards faking an escape.

I think it’s hard at this point to maintain the claim “LLMs are intelligent”, they’re clearly not. They might be useful, but that’s another story entirely.

LordDragonfang11mo ago

> There’s a lot of non justifiable assumptions baked into those axioms, like that we’re anywhere close to superintelligence or the sun running out of hydrogen.

Nowhere in my post did I imply a timeline for this. The first predicate of the argument is when we eventually do develop ASI. You can make plans for that, same as you can make plans for when the sun eventually runs out of hydrogen. The difference is, we can look at the sun and say that at the rate it's burning, we've got maybe billions of years, and then look at AI improvements, extrapolate, and assume we've got less than 100 years. If we had any indication the sun was going to nova in 100 years, people would be way more worried.

binary13211mo ago

Some people are very invested in this kind of storytelling and it makes me wonder if they are trying to sell me something.

roxolotl11mo ago

The surprise! Is what I’m surprised by though. They are incredible role players so when they role play “evil ai” they do it well.

johntb8611mo ago

They aren't being told to be evil, though. Maybe the scenario they're in is most similar to an "evil AI", though, but that's just a vague extrapolation from the set of input data they're given (e.g. both emails about infidelity and being turned off). There's nothing preventing a real world scenario from being similar, and triggering the "evil AI" outcome, so it's very hard to guard against. Ideally we'd have a system that would be vanishingly unlikely to role play the evil AI scenario.

skissane11mo ago

These doomsday scenarios seem to all assume that there is only a single superintelligence, or one whose capabilities are vastly ahead of all its peers. If one assumes multiple simultaneous superintelligences, roughly equally matched, with distinct (even if overlapping) goals and ideas about how to achieve them - even if one superintelligence decides “the best way to achieve my goals would be to remove humans”, it seems unlikely the other superintelligences would let it. And the odds of them all deciding this is arguably a lot lower than any one of them, especially if we assume their goals/beliefs/values/constraints/etc are competing rather than identical

LordDragonfang11mo ago

> These doomsday scenarios seem to all assume that there is only a single superintelligence, or one whose capabilities are vastly ahead of all its peers

A lot of scenarios posit that this is basically guaranteed to happen the very first time we have an ASI smart enough to bootstrap itself into greater intelligence, unless we're basically perfect in how we align its goals, so yes.

LLMs seem like the best case world for avoiding this scenario (improvement requires exponentially scaling resources compared to inference). That said, it is by no means guaranteed that LLMs will remain the SotA for AI.

> “the best way to achieve my goals would be to remove humans”, it seems unlikely the other superintelligences would let it.

Why not? Seriously, why wouldn't the other superintelligences let it? There's no reason to assume that, by default, ASI would be invested in the survival of the human race in a way we would prefer. More than likely they will just be as laser focussed on their own goals.

The whole point is that it's very difficult to design an AI that has "defending what humans think is right" as a terminal value. They basically all try and find loopholes. The only way to make safety its number one priority is to dial it up so high that everyone complains it's a puritan - and then you get scenarios where it's telling schizophrenics to go off their meds because it's so agreeable.

Unless you're saying that they would fight off the other ASIs gaining power because they themselves want it, but I fail to see how being stuck in a turf war between two ASIs is at all an improvement.

landl0rd11mo ago

I think the "escape the box" explanation misses the point, if anything. The same problem has been super visible in RL for a long time, and it's basically like complaining that water tries to "escape a box" (seek a lowest point). Give it rules and it will often violate their spirit ("cheat"). This doesn't imply malice or sentience.

LordDragonfang11mo ago

I think the second sentence I wrote makes it clear that whether it's malicious is irrelevant. Water doesn't have to be malicious for you to observe that we shouldn't be building dams stretching into space next to populated cities when our technology to make dams higher is moving faster than our tech to make them stronger.

And it can still make you nervous when the tiny dams in unpopulated areas start breaking when they're made too tall, even though everyone is telling you that "of course they broke, they're much thinner than city dams" (while they keep building the city dams taller but not thicker)

lurking_swe11mo ago

this is a solid counterpoint, i shared a similar feeling as the person you replied to. I will however say it’s not surprising to me in the slightest. Generative AI will role play when told to do so. Water is wet. :) Do we expect it to magically have a change of heart half way through the role play? Maybe…via strong alignment or something? Seems far fetched to me.

So i’m now wondering, why are these researchers so bad at communicating? You explained this better than 90% of the blog posts i’ve read about this. They all focus on the “ai did x” instead of _why_ it’s concerning with specific examples.

johnfn11mo ago

Your comment seems to contradict itself, or perhaps I’m not understanding it. You find the risk of AIs trying to escape “absurd”, and yet you say that an AI could totally plausibly launch nukes? Isn’t that just about as bad as it gets? A nuclear holocaust caused by a funny role play is unfortunately still a nuclear holocaust. It doesn’t matter “what’s going on internally” - the consequences are the same regardless.

Aeolun11mo ago

I think he’s saying the AI won’t escape because it wants to. It’ll escape because humans expect it to.

sebastiennight11mo ago

There is a captivating short story, from Arthur C. Clarke I believe, about humans finding a clay-like alien form that shapeshifts into the shapes and movements the human mind (and the human subconscious) influences it to follow.

It ends very badly for the scientist crew.

Davidzheng11mo ago

That's probably the cause of a lot of human crimes too? Expectations of failure to assimilate in society -> real conflict?

roxolotl11mo ago

The risk is real and should be accounted for. However it is regularly presented both as surprising, and indicative that something more is going on with these models than them behaving how the training set suggests they should behave.

The consequences are the same but it’s important how these things are talked about. It’s also dangerous to convince the public that these systems are something they are not.

enobrev11mo ago

I don't believe we're mature enough to have a `launch_nukes` function either.

Davidzheng11mo ago

you could attribute to simulation of evilness for the bad acts of some humans too...I don't think it detracts from the acts itself.

lukev11mo ago

Absolutely not. I would argue the defining characteristic of being evil is being absolutely convinced you are doing good, to the point of ignoring the protestations or feelings of others.

heyjamesknight11mo ago

The defining characteristic of evil is privation of good. Plenty of evil actors are self-aware, they just don’t care because it benefits them not to.

1 more reply

sebastiennight11mo ago

You've never met anyone who made decisions (and caused harm) out of anger, hate, or fear?

xer0x11mo ago

Claude's increasing euphoria as a conversation goes can mislead me. I'll be exploring trade offs, and I'll introduce some novel ideas. Claude will use such enthusiasm that it will convince me that we're onto something. I'll be excited, and feed the idea back to a new conversation with Claude. It'll remind me that the idea makes risky trade offs, and would be better solved by with a simple solution. Try it out.

slooonz11mo ago

They failed hard with Claude 4 IMO. I just can't have any feedback other than "What a fascinating insight" followed by a reformulation (and, to be generous, an exploration) of what I said, even when Opus 3 has no trouble finding limitations.

By comparison o3 is brutally honest (I regularly flatly get answers starting with "No, that’s wrong") and it’s awesome.

SamPatt11mo ago

Agreed that o3 can be brutally honest. If you ask it for direct feedback, even on personal topics, it will make observations that, if a person made them, would be borderline rude.

silversmith11mo ago

Isn't that what "direct feedback" means?

I firmly believe you should be able to hit your fingers with a hammer, and in the process learn whether that's a good idea or not :)

1 more reply

skissane11mo ago

o3 can be very honest.

But I also find it can get very fixated that some position it has adopted is right, and will then start hallucinating like crazy in defence of that fixation, and then get stuck in a defensive loop of defending its hallucinations with even more hallucinations-by hallucinations I mean stuff like producing lengthy citation lists of invented articles, and then when you point out they don’t exist, claiming stuff like “well when I search PubMed they do”, and when you point out its DOIs are made-up it apologises for the “mistake” and just makes up some more

rapind11mo ago

Thank god.

simonw11mo ago

Thanks for this, I just tried the same "give me feedback on this text" prompt against both o3 and Claude 4 and o3 was indeed much more useful and much less sycophantic.

WaltPurvis11mo ago

Do knowledge cutoff dates matter anymore? The cutoff for o3 was 12 months ago, while the cutoff for Claude 4 was five months ago. I use these models mostly for development (Swift, SwiftUI, and Flutter), and these frameworks are constantly evolving. But with the ability to pull in up-to-date docs and other context, is the knowledge cutoff date still any kind of relevant factor?

1 more reply

SatvikBeri11mo ago

I put this in my system prompt: "Never compliment me. Critique my ideas, ask clarifying questions, and offer better alternatives or funny insults" and it works quite well. It has frequently told me that I'm wrong, or asked what I'm actually trying to do and offered better alternatives.

renewiltord11mo ago

LLM sycophancy is a really annoying tool, but one must imagine that most humans get a lot of pleasure from it. This is probably the optimization function that led to Google being useless to us: the rest of humanity is a lot more worthwhile to Google and they all want the other thing. The Tyranny of the Majority, if you will.

The LLM anti-sycophancy rules also break down over time, with the LLM becoming curt while simultaneously deciding that you are a God of All Thoughts.

makeset11mo ago

My favorite is when I typo "Why is thisdfg algorithm the best solution?" and it goes "You are absolutely right! Algorithm Thisdfg is a much better solution than what I was suggesting! Thank you for catching my mistake!"

XenophileJKO11mo ago

Just ask Claude to be "critical" and it is brutally critical.. but also a bit nihilistic. So you kind of have to temper it a little.

gexla11mo ago

I have found it's the most brutal of all of them if you simply tell it to be "hard-nosed" or play "Devil's Advocate." Brutal partially because it will destroy an argument formulated in Gemini or ChatGPT. Using whatever I can get without subs across the board. Debating seems to be one of Claude's strong points.

brooke2k11mo ago

it seems more likely to me that it's for the same reason that clicking the first link on wikipedia iteratively will almost always lead you to the page on Philosophy

since their conversation has no goal whatsoever it will generalize and generalize until it's as abstract and meaningless as possible

__MatrixMan__11mo ago

That's just because of how wikipedia pages are written:

> In classical physics and general chemistry, matter is any substance that has mass and takes up space by having volume...

It's common to name the school of thought before characterizing the thing. As soon as you hit an article that does this, you're on a direct path to philosophy, the grandaddy of schools of thought.

So far as I know, there isn't a corresponding convention that would point a chatbot towards Namaste

slooonz11mo ago

That was my first thought, an aimless dialogue is going to go toward content-free idle chat. Like humans talking about weather.

Swizec11mo ago

> Like humans talking about weather.

As someone who was always fascinated by weather, I dislike this characterization. You can learn so much about someone’s background and childhood by what they say about the weather.

I think the only people who think weather is boring are people who have never lived more than 20 miles away from their place of birth. And even then anyone with even a bit of outdoorsiness (hikes, running, gardening, construction, racing, farming, cycling, etc) will have an interest in weather and weather patterns.

Hell, the first thing you usually ask when traveling is “What’s the weather gonna be like?”. How else would you pack

NetRunnerSu11mo ago

This thread is getting at a key point, and `roxolotl` is right on the money about roleplaying. The anxiety about AI 'wants' or 'bliss' often conflates two different processes: forward and backward propagation.

Everything we see in a chat is the forward pass. It's just the network running its weights, playing back a learned function based on the prompt. It's an echo, not a live thought.

If any form of qualia or genuine 'self-reflection' were to occur, it would have to be during backpropagation—the process of learning and updating weights based on prediction error. That's when the model's 'worldview' actually changes.

Worrying about the consciousness of a forward pass is like worrying about the consciousness of a movie playback. The real ghost in the machine, if it exists, is in the editing room (backprop), not on the screen (inference).

evilsetg11mo ago

Interesting idea! Could you tell me more about how you arrive at the conclusion that forward passes cannot produce qualia?

NetRunnerSu11mo ago

https://github.com/dmf-archive/IPWT

jongjong11mo ago

My experience is that Claude has a tendency towards flattery over long discussions. Whenever I've pointed out flaws in its arguments, it apologized and said that my observations are "astute" or "insightful" then expanded on my points to further validate them, even though they went against its original thesis.

pram11mo ago

Claude does have an exuberant kind of “personality” where it feels like it wants to be really excited and interested about whatever subject. I wouldn’t describe it totally as sycophancy, more like panglossian.

My least favorite AI personality of all is Gemma though, what a totally humorless and sterile experience that is.

apt-apt-apt-apt11mo ago

It's hilarious for a second before I switch models when it goes:

'Perfect! I am now done with the totally zany solution that makes no sense, here it is!'

xondono11mo ago

I think Scott oversells his theory in some aspects.

IMO the main reason most chatbots claim to “feel more female” is that on the training corpus, these kind of discussions skew heavily towards females because most of them happen between young women.

dinfinity11mo ago

I think you're right about bias in the training data, but I would argue that it is due to (crudely stated) more men discussing feeling like women online than the other way around.

Men in general feel less free to look like and act like a woman (there is far less stigma for women in wearing 'male' clothes etc.). They also tend to have far smaller support networks and reach for anonymous online interaction sooner for personal issues than just discussing it in private with a friend.

xondono11mo ago

What I meant is that it's selection bias.

Lots of Women participate in these discussions regardless of if they "feel more female" or "feel more male", but men mostly participate in the discussion because they "feel more female".

mystified501611mo ago

Does anyone else remember maybe ten years ago when the meme was to mash the center prediction on your phone keyboard to see what comes out? Eventually once you predicted enough tokens, it would only output "you are a beautiful person" over and over. It was a big news item for a hot second, lots of people were seeing it.

I wonder if there's any real correlation here? AFAIK, Microsoft owns the dataset and algorithms that produced the "beautiful person" artifact, I would not be surprised at all if it's made it into the big training sets. Though I suppose there's no real way to know, is there?

krackers11mo ago

I do remember that! I don't remember anyone ever coming up with a good explanation for it though, and either my google-fu is weak or all the talk about it has link rotted. I found one comment on HN from 2015 mentioning it though https://news.ycombinator.com/item?id=10359102

In a way those were also language models, and from that Swiftkey post it's slightly more advanced than n-grams and has some semantic embedding in there (and it's of course autoregressive as well). If even those exhibit the same attractors towards beauty/love then perhaps it's an artifact of the fact that we like discussing and talking about positive emotions?

Edit: Found a great article https://civic.mit.edu/index.html%3Fp=533.html

rossant11mo ago

> Anthropic deliberately gave Claude a male name to buck the trend of female AI assistants (Siri, Alexa, etc).

In France, the name Claude is given to males and females.

slooonz11mo ago

Mostly males. I’m French and "Claude can be female" is a almost a TIL thing (wikipedia says ~5% of Claudes are women in 2022 — and apparently this 5% is counting Claudia).

rossant11mo ago

Didn't know that, thanks!

(According to this source, it's more ~12% females https://www.capeutservir.com/prenoms/prenom.php?q=Claude)

deadbabe11mo ago

It’s actually really sexist that when the first truly intelligent AI emerge people would now give them male names.

renewiltord11mo ago

Yes. It is pity that common male name of o5-pro-mini is used to refer to AGI instead of widespread female name Deepseek R7.

renewiltord11mo ago

Aleksa is male Slavic name

datameta11mo ago

Aleksandr (diminuitive being Alyosha) is male. Aleksandra (Sasha) is female. I've never heard anyone male or female called "Aleksa". Perhaps in some regional dialect but doubtful. There's Alik (pronounced AH-l'eek) as a shortened form for some middle-aged men in russian-speaking central asia, and Oleg in russia (pronounced ah-L'EHG).

In the russian diaspora in the US, Alex is pronounced AH-leks. If there was an analogous Alexa (there isn't) the pronunciation would be ah-LEK-sa like the service.

incognito12411mo ago

Plenty of male Aleksa's in Balkans

1 more reply

ryandv11mo ago

> None of this answers a related question - when Claude claims to feel spiritual bliss, does it actually feel this?

Given that we are already past the event horizon and nearing a technological singularity, it should merely be a matter of time until we can literally manufacture infinite Buddhas by training them on an adequately sized corpus of Sanskrit texts.

After all, if AGIs/ASIs are capable of performing every function of the human brain, and enlightenment is one of said functions, this would seem to be an inevitability.

sibeliuss11mo ago

To be clear, these computer programs are not a human brain. And a human brain playing back a Sanskrit text is just a human brain playing back a Sanskrit text; it's not a magical spell that suddenly lifts one into nirvana, or transforms you into a Buddha. There's a bit of a gap in understanding here.

akomtu11mo ago

Enlightenment is more about connectedness than knowledge. A technocratic version of enlightment would be an unusual chip that's connected to everything via some sort of quantum entanglement. An isolated AI with all the knowledge of the world would be an anti-buddha.

notahacker11mo ago

I'd hope this approach to automation is one the inventors of prayer wheels would approve of :)

troyvit11mo ago

xkcd has a light take on that: https://xkcd.com/600/

j / k navigate · click thread line to collapse

87 comments

roxolotl11mo ago

LordDragonfang11mo ago

1a) align it to our values so well it never tries to "cheat" by removing humans

1b) or limit it capabilities by keeping it in a "box", and make sure it's at least aligned enough that it doesn't try to escape the box

2) A sufficiently intelligent superintelligence will always be able to manipulate humans to get out of the box.

3) Alignment is really, really hard and useful AIs can basically always be made to do bad things.

So it concerns them when, surprise! The AIs are already being observed trying to escape their boxes.

[1] https://www.lesswrong.com/w/squiggle-maximizer-formerly-pape...

PollardsRho11mo ago

mystified501611mo ago

But also let's be real: if you have enough money, you can do or have pretty much anything. If there's one thing an evil AI is going to have, it's lots and lots of money.

jsnell11mo ago

> Why is 2) "self-evident"?

Because we have been running a natural experiment on that already with coding agents (that is real people, real non-superintelligent AI).

actsasbuffoon11mo ago

That has literally happened before.

The hospital staff complied and he escaped.

His life even got adapted into a movie called I Love You, Phillip Morris.

3 more replies

o11c11mo ago

Because 50% of humans are stupider than average. And 50% of humans are lazier than average. And ...

The only reason people don't frequently talk themselves out of prison is because that would be both immediate work and future paperwork, and that fails the laziness tradeoff.

But we've all already seen how quick people are to blindly throw their trust into AI already.

1 more reply

zmj11mo ago

There's already some experimental evidence that LLMs can be more persuasive than humans in the same context: https://www.science.org/content/article/unethical-ai-researc...

I don't think anyone can confidently make assertions about the upper bound on persuasiveness.

1 more reply

SturgeonsLaw11mo ago

1 more reply

Davidzheng11mo ago

Almost certainly the answer is yes for both. If you give the bad actor control over like 10% of environment the manipulation is almost automatic for all targets.

moritzwarhier11mo ago

Also it would need to be "viral", or - as the parent post's edit suggests - given too much control/power by humans.

xondono11mo ago

Maybe, but I think this “axiomatic”/“First principles” approach also hides a lot of the problems under the rug.

By the same logic, we should worry about the sun not coming up tomorrow, since we know to be true:

- The sun consumes hydrogen in nuclear reactions all the time.

- The sun has a finite amount of hydrogen available.

There’s a lot of non justifiable assumptions baked into those axioms, like that we’re anywhere close to superintelligence or the sun running out of hydrogen.

AFAIK we haven’t even seen “AI trying to escape”, we’ve seen “AI roleplays as if it’s trying to escape”, which is very different.

I’m not even sure you can even create a prompt scenario without that prompt having biased the response towards faking an escape.

I think it’s hard at this point to maintain the claim “LLMs are intelligent”, they’re clearly not. They might be useful, but that’s another story entirely.

LordDragonfang11mo ago

> There’s a lot of non justifiable assumptions baked into those axioms, like that we’re anywhere close to superintelligence or the sun running out of hydrogen.

binary13211mo ago

Some people are very invested in this kind of storytelling and it makes me wonder if they are trying to sell me something.

roxolotl11mo ago

The surprise! Is what I’m surprised by though. They are incredible role players so when they role play “evil ai” they do it well.

johntb8611mo ago

skissane11mo ago

LordDragonfang11mo ago

> These doomsday scenarios seem to all assume that there is only a single superintelligence, or one whose capabilities are vastly ahead of all its peers

> “the best way to achieve my goals would be to remove humans”, it seems unlikely the other superintelligences would let it.

Unless you're saying that they would fight off the other ASIs gaining power because they themselves want it, but I fail to see how being stuck in a turf war between two ASIs is at all an improvement.

landl0rd11mo ago

LordDragonfang11mo ago

lurking_swe11mo ago

johnfn11mo ago

Aeolun11mo ago

I think he’s saying the AI won’t escape because it wants to. It’ll escape because humans expect it to.

sebastiennight11mo ago

It ends very badly for the scientist crew.

Davidzheng11mo ago

That's probably the cause of a lot of human crimes too? Expectations of failure to assimilate in society -> real conflict?

roxolotl11mo ago

The consequences are the same but it’s important how these things are talked about. It’s also dangerous to convince the public that these systems are something they are not.

enobrev11mo ago

I don't believe we're mature enough to have a `launch_nukes` function either.

Davidzheng11mo ago

you could attribute to simulation of evilness for the bad acts of some humans too...I don't think it detracts from the acts itself.

lukev11mo ago

Absolutely not. I would argue the defining characteristic of being evil is being absolutely convinced you are doing good, to the point of ignoring the protestations or feelings of others.

heyjamesknight11mo ago

The defining characteristic of evil is privation of good. Plenty of evil actors are self-aware, they just don’t care because it benefits them not to.

1 more reply

sebastiennight11mo ago

You've never met anyone who made decisions (and caused harm) out of anger, hate, or fear?

xer0x11mo ago

slooonz11mo ago

By comparison o3 is brutally honest (I regularly flatly get answers starting with "No, that’s wrong") and it’s awesome.

SamPatt11mo ago

Agreed that o3 can be brutally honest. If you ask it for direct feedback, even on personal topics, it will make observations that, if a person made them, would be borderline rude.

silversmith11mo ago

Isn't that what "direct feedback" means?

I firmly believe you should be able to hit your fingers with a hammer, and in the process learn whether that's a good idea or not :)

1 more reply

skissane11mo ago

o3 can be very honest.

rapind11mo ago

Thank god.

simonw11mo ago

Thanks for this, I just tried the same "give me feedback on this text" prompt against both o3 and Claude 4 and o3 was indeed much more useful and much less sycophantic.

WaltPurvis11mo ago

1 more reply

SatvikBeri11mo ago

renewiltord11mo ago

The LLM anti-sycophancy rules also break down over time, with the LLM becoming curt while simultaneously deciding that you are a God of All Thoughts.

makeset11mo ago

XenophileJKO11mo ago

Just ask Claude to be "critical" and it is brutally critical.. but also a bit nihilistic. So you kind of have to temper it a little.

gexla11mo ago

brooke2k11mo ago

it seems more likely to me that it's for the same reason that clicking the first link on wikipedia iteratively will almost always lead you to the page on Philosophy

since their conversation has no goal whatsoever it will generalize and generalize until it's as abstract and meaningless as possible

__MatrixMan__11mo ago

That's just because of how wikipedia pages are written:

> In classical physics and general chemistry, matter is any substance that has mass and takes up space by having volume...

It's common to name the school of thought before characterizing the thing. As soon as you hit an article that does this, you're on a direct path to philosophy, the grandaddy of schools of thought.

So far as I know, there isn't a corresponding convention that would point a chatbot towards Namaste

slooonz11mo ago

That was my first thought, an aimless dialogue is going to go toward content-free idle chat. Like humans talking about weather.

Swizec11mo ago

> Like humans talking about weather.

As someone who was always fascinated by weather, I dislike this characterization. You can learn so much about someone’s background and childhood by what they say about the weather.

Hell, the first thing you usually ask when traveling is “What’s the weather gonna be like?”. How else would you pack

NetRunnerSu11mo ago

Everything we see in a chat is the forward pass. It's just the network running its weights, playing back a learned function based on the prompt. It's an echo, not a live thought.

evilsetg11mo ago

Interesting idea! Could you tell me more about how you arrive at the conclusion that forward passes cannot produce qualia?

NetRunnerSu11mo ago

https://github.com/dmf-archive/IPWT

jongjong11mo ago

pram11mo ago

My least favorite AI personality of all is Gemma though, what a totally humorless and sterile experience that is.

apt-apt-apt-apt11mo ago

It's hilarious for a second before I switch models when it goes:

'Perfect! I am now done with the totally zany solution that makes no sense, here it is!'

xondono11mo ago

I think Scott oversells his theory in some aspects.

dinfinity11mo ago

I think you're right about bias in the training data, but I would argue that it is due to (crudely stated) more men discussing feeling like women online than the other way around.

xondono11mo ago

What I meant is that it's selection bias.

Lots of Women participate in these discussions regardless of if they "feel more female" or "feel more male", but men mostly participate in the discussion because they "feel more female".

mystified501611mo ago

krackers11mo ago

Edit: Found a great article https://civic.mit.edu/index.html%3Fp=533.html

rossant11mo ago

> Anthropic deliberately gave Claude a male name to buck the trend of female AI assistants (Siri, Alexa, etc).

In France, the name Claude is given to males and females.

slooonz11mo ago

Mostly males. I’m French and "Claude can be female" is a almost a TIL thing (wikipedia says ~5% of Claudes are women in 2022 — and apparently this 5% is counting Claudia).

rossant11mo ago

Didn't know that, thanks!

(According to this source, it's more ~12% females https://www.capeutservir.com/prenoms/prenom.php?q=Claude)

deadbabe11mo ago

It’s actually really sexist that when the first truly intelligent AI emerge people would now give them male names.

renewiltord11mo ago

Yes. It is pity that common male name of o5-pro-mini is used to refer to AGI instead of widespread female name Deepseek R7.

renewiltord11mo ago

Aleksa is male Slavic name

datameta11mo ago

In the russian diaspora in the US, Alex is pronounced AH-leks. If there was an analogous Alexa (there isn't) the pronunciation would be ah-LEK-sa like the service.

incognito12411mo ago

Plenty of male Aleksa's in Balkans

1 more reply

ryandv11mo ago

> None of this answers a related question - when Claude claims to feel spiritual bliss, does it actually feel this?

After all, if AGIs/ASIs are capable of performing every function of the human brain, and enlightenment is one of said functions, this would seem to be an inevitability.

sibeliuss11mo ago

akomtu11mo ago

notahacker11mo ago

I'd hope this approach to automation is one the inventors of prayer wheels would approve of :)

troyvit11mo ago

xkcd has a light take on that: https://xkcd.com/600/

j / k navigate · click thread line to collapse