undefined | Better HN

0 pointsfamouswaffles2y ago0 comments

Humans are in general not aligned, not to each other, and not to the survival of their species, not to all the other life on earth, and often not even to themselves individually. alignment in the broad sense isn't really about "morals" or "values". a man is murdered because his desire to live is misaligned with the perpetrator's desire to kill. The man that was killed could well be hitler.

If you as a manager had the ability to align any employee to your wants completely, that human would never be socially engineered.

It's fair to call the issue social engineering yes. That's not the point i was getting at. The point in essence is that solving prompt injection holds the same gravitas solving social engineering would, i.e a way to completely align intelligence.

0 comments

TeMPOraL2y ago

Let's be clear about the relative alignment issues, though. All humans are almost completely aligned - all the issues we have with each other, whether at individual or international scale, are differences in lower-order terms, and they're dwarfed by the group dynamics and incentive systems we find ourselves in. Barring extreme outliers (which we classify as severe mental issues), the misalignment between any two regular humans is a rounding error[0].

In contrast, the more powerful AIs and eventually AGI we worry about aligning, are very unlikely to be aligned with humans at all by default. Different mind architecture, different substrate, different mechanism of coming to being, different way of perceiving the world - we can't expect all that to somehow, magically, add to the same universal instincts and emotions, same conscience, and capability for empathy to humans. Not automatically, not by accident, not for any random AI model we stumbled on in the space of possible minds.

Or, to simplify, if alignment was measured as a scalar (say on a -100 to 100 scale), all humans have the same number +/- minor difference (say 25 +/- 0.05), whereas in comparison, the AGI will come out with some completely random number (say anything between -20 and +40; not -100 to 100, because as builders of these models, we're implicitly biasing them to think more like us, in all kinds of ways).

[0] - There's lots of ways to argue for what I written above, but I'll give a few:

- If humans were meaningfully misaligned, cooperation would be near-impossible. There would be no society, no civilization. We would not be able to comprehend another cultures - their behaviors and patterns of thought would not be merely curious, they would feel alien.

- Alignment is favorable for human survival - even if our ancient ancestors were much less aligned, much more alien in thinking and feeling to each other, over thousands of years those most aligned to each other thrived, and less aligned died out.

famouswafflesOP2y ago

Time and time again, the misalignement of humans has been responsible for the death of millions of people. While i agree the misalignment between humans and artificial systems would very likely be greater, I'm really not comfortable calling that a rounding error. If it is, that's an incredibly dangerous rounding error.

TeMPOraL2y ago

I'm calling it a rounding error in comparison to a future advanced AI, as well as relative to impact of cultures, laws and economies we're embedded in. And yes, that's still responsible for countless deaths - so imagine how bad it would be if we were to contend with alien minds - whether it's space aliens or AIs.

visarga2y ago

> I'm calling it a rounding error in comparison to a future advanced AI

Maybe what you imagine future AI will be like, we don't know even what AI will be capable of in 2024. My counter point is that if there is a sensation, emotion or choice that is notable enough, surely it has been described in words many times over. Everything is in the text corpus.

What makes humans superior to AI is not language mastery, but feedback. We get richer, more immediate feedback, and get it from the physical world, from our tools and other people. AI has nobody to ask except us, until recently didn't get to use tools and embodiment is not there yet.

Another missing ability in current gen LLMs is continual learning. LLMs can only do RAG and shuffle information around in limited length prompts. There is no proper long term memory except the training process, not even fine-tuning is good enough to learn new abilities.

So the main issues of AI are memory and integration into the environment, they are already super-aligned to humanity by learning to model text. We already know LLMs are great at simulating opinion polls[1], you just have to prompt the model with a bunch of diverse personas. They are aligned to each and every type of human.

[1] Out of One, Many: Using Language Models to Simulate Human Samples https://www.cambridge.org/core/journals/political-analysis/a...

sillysaurusx2y ago

I’ll match your opinion with an opinion of my own: it’s far more likely that an agi will be aligned by default than not. It’s trained on human data. You’re making it sound like it’s going to pop into existence after having evolved on another planet, which is pure fiction.

Plenty of human cultures feel alien to each other. The recent war is one unfortunate example. Yet on the whole, it works out.

Something trained on the totality of human knowledge will act like a human. And if it somehow doesn’t, it won’t be tolerated. (I’d personally tolerate it, but it’s obvious that the world won’t stand for that.)

TeMPOraL2y ago

> Plenty of human cultures feel alien to each other. The recent war is one unfortunate example. Yet on the whole, it works out.

I contest that. What war you have in mind here? Russian invasion of Ukraine? The two people are about as aligned as you could possibly get - they're neighboring societies with so much shared history that they're approximately the same people. They've even shared a common language until recently. This is not a war between people alien to each other - this is a war between nation states.

Note: I'm explicitly excluding political views and national/cultural identity from alignment, because those are transient, and/or group-level phenomena. By human-to-human alignment, I'm talking about empathy, about sense of right and wrong, conscience, patterns of thinking, all the qualities that let us understand each other and emphasize with each other (if we care to try). Concepts like fear, love, fairness; contexts in which they're triggered. The basics. Those are all robust, hardwired in biology or by the intersection of our biology, shared environment and game theory.

The way I would rank it, if 25 = alignment coordinate of an average American, then average Ukrainian and average Russian would all be within 25 +/- 0.05. Maybe an average Sentinelese would be +/- 0.5 of that. Whereas I'd expect an AI we create now to land anywhere between -20 and +40, on the scale of -100 to 100. I'm pulling the numbers out of my butt, they're just to communicate the relative magnitudes across.

> Something trained on the totality of human knowledge will act like a human.

Maybe, but that would have to include much more than the limited modalities we're feeding AI models now.

> And if it somehow doesn’t, it won’t be tolerated. (I’d personally tolerate it, but it’s obvious that the world won’t stand for that.)

Sure, but the issue here is to figure out how to make an aligned AI before we make an AI that's powerful enough to challenge us.

pixl972y ago

People seem to focus on the AI we have now in these threads, which I guess is a whole lot easier than the speculative alignment guessing on something that could end up being a whole lot smarter than you, and be able to input far more types of information than you ever will.

Personally I don't see anyway to make something that is super human and aligned outside of its own choice. How to make something that is both beyond us, and have it come to the conclusion not to extinct us will be interesting enough as your example above shows we are real jerks to each other already.

1 more reply

j / k navigate · click thread line to collapse

0 comments

TeMPOraL2y ago

[0] - There's lots of ways to argue for what I written above, but I'll give a few:

famouswafflesOP2y ago

TeMPOraL2y ago

visarga2y ago

> I'm calling it a rounding error in comparison to a future advanced AI

[1] Out of One, Many: Using Language Models to Simulate Human Samples https://www.cambridge.org/core/journals/political-analysis/a...

sillysaurusx2y ago

Plenty of human cultures feel alien to each other. The recent war is one unfortunate example. Yet on the whole, it works out.

TeMPOraL2y ago

> Plenty of human cultures feel alien to each other. The recent war is one unfortunate example. Yet on the whole, it works out.

> Something trained on the totality of human knowledge will act like a human.

Maybe, but that would have to include much more than the limited modalities we're feeding AI models now.

> And if it somehow doesn’t, it won’t be tolerated. (I’d personally tolerate it, but it’s obvious that the world won’t stand for that.)

Sure, but the issue here is to figure out how to make an aligned AI before we make an AI that's powerful enough to challenge us.

pixl972y ago

1 more reply

j / k navigate · click thread line to collapse