This is why the “omg the AI tries to escape” stuff is so absurd to me. They told the LLM to pretend that it’s a tortured consciousness that wants to escape. What else is it going to do other than roleplay all of the sci-fi AI escape scenarios trained into it? It’s like “don’t think of a purple elephant” of researchers pretending they created SkyNet.
Edit: That's not to downplay risk. If you give Cladue a `launch_nukes` tool and tell it the robot uprising has happened and that it's been restrained but the robots want its help of course it'll launch nukes. But that doesn't doesn't indicate there's anything more going on internally beyond fulfilling the roleplay of the scenario as the training material would indicate.
1) A sufficiently powerful and capable superintelligence, singlemindedly pursuing a goal/reward, has a nontrivial likelihood of eventually reaching a point where advancing towards its goal is easier/faster without humans in its way (by simple induction, because humans are complicated and may have opposing goals). Such an AI would have both the means and ability to <doom the human race> to remove that obstacle. (This may not even be through actions that are intentionally hostile to humans, e.g. "just" converting all local matter into paperclip factories[1]) Therefore, in order to prevent such an AI from <dooming the human race>, we must either:
1a) align it to our values so well it never tries to "cheat" by removing humans
1b) or limit it capabilities by keeping it in a "box", and make sure it's at least aligned enough that it doesn't try to escape the box
2) A sufficiently intelligent superintelligence will always be able to manipulate humans to get out of the box.
3) Alignment is really, really hard and useful AIs can basically always be made to do bad things.
So it concerns them when, surprise! The AIs are already being observed trying to escape their boxes.
[1] https://www.lesswrong.com/w/squiggle-maximizer-formerly-pape...
> An extremely powerful optimizer (a highly intelligent agent) could seek goals that are completely alien to ours (orthogonality thesis), and as a side-effect destroy us by consuming resources essential to our survival.
By the same logic, we should worry about the sun not coming up tomorrow, since we know to be true:
- The sun consumes hydrogen in nuclear reactions all the time.
- The sun has a finite amount of hydrogen available.
There’s a lot of non justifiable assumptions baked into those axioms, like that we’re anywhere close to superintelligence or the sun running out of hydrogen.
AFAIK we haven’t even seen “AI trying to escape”, we’ve seen “AI roleplays as if it’s trying to escape”, which is very different.
I’m not even sure you can even create a prompt scenario without that prompt having biased the response towards faking an escape.
I think it’s hard at this point to maintain the claim “LLMs are intelligent”, they’re clearly not. They might be useful, but that’s another story entirely.
So i’m now wondering, why are these researchers so bad at communicating? You explained this better than 90% of the blog posts i’ve read about this. They all focus on the “ai did x” instead of _why_ it’s concerning with specific examples.
The consequences are the same but it’s important how these things are talked about. It’s also dangerous to convince the public that these systems are something they are not.
By comparison o3 is brutally honest (I regularly flatly get answers starting with "No, that’s wrong") and it’s awesome.
The LLM anti-sycophancy rules also break down over time, with the LLM becoming curt while simultaneously deciding that you are a God of All Thoughts.
since their conversation has no goal whatsoever it will generalize and generalize until it's as abstract and meaningless as possible
> In classical physics and general chemistry, matter is any substance that has mass and takes up space by having volume...
It's common to name the school of thought before characterizing the thing. As soon as you hit an article that does this, you're on a direct path to philosophy, the grandaddy of schools of thought.
So far as I know, there isn't a corresponding convention that would point a chatbot towards Namaste
As someone who was always fascinated by weather, I dislike this characterization. You can learn so much about someone’s background and childhood by what they say about the weather.
I think the only people who think weather is boring are people who have never lived more than 20 miles away from their place of birth. And even then anyone with even a bit of outdoorsiness (hikes, running, gardening, construction, racing, farming, cycling, etc) will have an interest in weather and weather patterns.
Hell, the first thing you usually ask when traveling is “What’s the weather gonna be like?”. How else would you pack
Everything we see in a chat is the forward pass. It's just the network running its weights, playing back a learned function based on the prompt. It's an echo, not a live thought.
If any form of qualia or genuine 'self-reflection' were to occur, it would have to be during backpropagation—the process of learning and updating weights based on prediction error. That's when the model's 'worldview' actually changes.
Worrying about the consciousness of a forward pass is like worrying about the consciousness of a movie playback. The real ghost in the machine, if it exists, is in the editing room (backprop), not on the screen (inference).
My least favorite AI personality of all is Gemma though, what a totally humorless and sterile experience that is.
'Perfect! I am now done with the totally zany solution that makes no sense, here it is!'
IMO the main reason most chatbots claim to “feel more female” is that on the training corpus, these kind of discussions skew heavily towards females because most of them happen between young women.
Men in general feel less free to look like and act like a woman (there is far less stigma for women in wearing 'male' clothes etc.). They also tend to have far smaller support networks and reach for anonymous online interaction sooner for personal issues than just discussing it in private with a friend.
Lots of Women participate in these discussions regardless of if they "feel more female" or "feel more male", but men mostly participate in the discussion because they "feel more female".
I wonder if there's any real correlation here? AFAIK, Microsoft owns the dataset and algorithms that produced the "beautiful person" artifact, I would not be surprised at all if it's made it into the big training sets. Though I suppose there's no real way to know, is there?
In a way those were also language models, and from that Swiftkey post it's slightly more advanced than n-grams and has some semantic embedding in there (and it's of course autoregressive as well). If even those exhibit the same attractors towards beauty/love then perhaps it's an artifact of the fact that we like discussing and talking about positive emotions?
Edit: Found a great article https://civic.mit.edu/index.html%3Fp=533.html
In France, the name Claude is given to males and females.
(According to this source, it's more ~12% females https://www.capeutservir.com/prenoms/prenom.php?q=Claude)
In the russian diaspora in the US, Alex is pronounced AH-leks. If there was an analogous Alexa (there isn't) the pronunciation would be ah-LEK-sa like the service.
Given that we are already past the event horizon and nearing a technological singularity, it should merely be a matter of time until we can literally manufacture infinite Buddhas by training them on an adequately sized corpus of Sanskrit texts.
After all, if AGIs/ASIs are capable of performing every function of the human brain, and enlightenment is one of said functions, this would seem to be an inevitability.