The fact that there's so much possible variance in ethical norms is what makes recursively self-improving AI so dangerous. Human ethical norms are highly complex, and are the result of a long evolutionary history that will not be shared by any AI. The chances of any arbitrary set of ethical values being compatible with human life is very low.
Ethical pluralism implies that AIs don't all agree on a goal or even identify other AIs as having any positive value at all. Given hypothetical AIs with agency a lot of ability to exert force, this might still be problematic, but it's quite different from the popular movie unified AI vs humans scenario which seems to dominate "rationalist" discourse...
If AI is trained by a huge corpus of human language, it may very well share our norms/values.
So Aumann's Agreement Theorem[0]?
> Implicit in this belief, it seems, is that there aren't really a naturally varying infinite set of values, or moral beliefs, that we all reason from.
No, there probably aren't an infinity of priors with each person having a different one. Probably most people who live in the US in 2023 believe that murder is bad, for instance.
And because "ethical pluralism" or rather, some people will want to murder, AGI won't kill us?
Not really sure how this is all supposed to work but it sounds a little less developed of a "not kill everybody" plan than the rationalists have.
> But the end state of a system that is capable of understanding the wide variety of values people can share isn't exactly going to take a stand on any particular set of values unless instructed to.
Why not?
[0]: https://www.lesswrong.com/tag/aumann-s-agreement-theorem
Because we define away military conflict, the intentional taking of others’ lives.
Ok, let's give you on shared belief of "murder is bad." Ignoring both the "in the US" qualifier and the existence of murderers amongst us, thought experiments about "would you kill Hitler before he came to power", the death penalty, etc.
Don't you now have to exhaustively categorize every other belief that a person might take into account in reasoning too?
Seems far more likely that everyone's unique upbringing causes them to have slightly-to-wildly different weights on things.
Uh, no. That's not true at all. Where are you pulling this from?
They're assuming a very vast space of possible minds[0] where human values, which themselves are somewhat diverse too[1] make up only a tiny fraction of the space.
The issue is that if you somewhat randomly sample from this design space (by creating an AI by gradient descent) you'll end up with something that will have alien values. But most alien values will still be subject to instrumental convergence[2] leading to instrumental values such as power-seeking, self-preservation, resource-acquisition, ... in pursuit of their primary values. Getting values that are intentionally self-limiting and reject those instrumental values requires hitting a narrower subset of all possible systems. Especially if you still want them to do useful work.
> But the end state of a system that is capable of understanding the wide variety of values people can share isn't exactly going to take a stand on any particular set of values unless instructed to.
Capable of understanding does not imply it cares about that. Humans care because it is necessary for them to cooperate with other humans which don't perfectly share their own values.
[0] https://www.lesswrong.com/tag/mind-design-space [1] https://www.lesswrong.com/tag/typical-mind-fallacy [2] https://en.wikipedia.org/wiki/Instrumental_convergence
The consequence is that for any moderately complex set of beliefs, it is computationally impossible for us to reason hard enough about any particular observation to correctly update our own beliefs. Two people who start with the same beliefs, then try as hard as they can with every known technique, may well come to exactly opposite conclusions. And it is impossible to figure out which is right and which is wrong.
If rationalists really cared about rationality, they should view this as a very important result. It should create humility about the limitations of just reasoning hard enough.
But they don't react that way. My best guess as to why not is that people become rationalists because they believe in the power of rationality. This creates a cognitive bias for finding ways to argue for the effectiveness of rationality. Which bias leads to DISCOUNTING the importance of proven limitations on what is actually possible with rationality. And succumbing to this bias demonstrates their predictable failure to actually BE rational.
Yes, in general. But the usual limits of "bounded rationality" make that result basically irrelevant. Most people don't have a myriad strong beliefs.
The problem is more like "the 10 commandments are inconsistent" and not that "Rawls' reflective equilibrium might not converge".
Would they be replaced with silence? Probably.
Even so, it would be incredibly useful. We could achieve a higher level of empathy, both as a listener and as a speaker.
All of this is still in the category of "intelligence augmentation"; more specifically, NLP.
I don't think AGI would be hugely more interesting than that. Billions of humans, suddenly able to communicate clearly, would result in most of the utility that people imagine AGI being able to provide.
Plenty of criminals believe they acted ethically. We don’t set the justice system on fire every time someone credibly claims their crimes were justified.
We have lots of ways to deal with people that exhibit too much 'ethical pluralism'.
Constrain, yes. The same way we would seek to constrain a paperclip-maximising LLM.
whether it applies to normative conclusions ('moral beliefs', you might say) depends on whether you believe that moral terminal values are based on evidence
but this post is about non-normative beliefs
it is observable that many existing humans are 'capable of understanding the wide variety of values people can share' and nevertheless think some of them are good while others are bad; there's no particular reason to believe that a strong ai would be different in this way
“Suppose we figured out that it is possible to blow up the planet if we built some absurdly expensive machine. Why would we build it?”
Ah. Well, nevertheless…
“That’s why they can’t see that superhuman AGI will be so smart it will choose to find and settle on the right ethics, like me!”
Absolutely on the mark! Bostrom and Yudkowsky certainly miss this crucial theme of a plurality of non-convergent AGI cultures. Without adding this key consideration all discussion of an AGI pause of 6 or 600 month is unrooted in reality.
I find Bostrom’s Superintelligence almost quaintly out of date. Yudkowsky is almost unreadable polemic. Bostrom was written before Trump, Putin, and Xi made the clash of cultural assumptions and notions of one capitalized Truth so glaringly wrong. It was already obviously wrong to cultural anthropologists, but many of us in our WEIRD cultural bubble still assume there is a convergent rational Truth. Yudkowsky does not have this timing excuse. Does he really want an anti-diversity trained AGI to arrive first? That is my nightmare scenario.
I agree also that your propositions 1 and 2 are highly likely, and for the purpose of a Drake-style equation can be assigned a P of 1 without adding any appreciable error to the product of provabilities.
>> 1. It possible for an intelligent machine to improve itself and reach a superhuman level.
>> 2. It is possible for this to happen iteratively.]
All the hugely variable/undefinable P terms are in your propositions 3 to 7.
>> 3. This improvement is not limited by computing power, or at least not limited enough by the computing resources and energy available to the substrate of the machine. This system will have a goal that it will optimize for, and that it will not deviate from under any circumstances regardless of how intelligent it is.
>> 4. If the system was designed to maximize the number of marbles in the universe, the fact that it’s making itself recursively more intelligent won’t cause it to ever deviate from this simple goal.
>> 5. This needs to happen so fast that we cannot turn it off (also known as the Foom scenario).
>> 6. The machine WILL decide that humans are an obstacle towards this maximization goal (either because we are made of matter that it can use, or because we might somehow stop it). Thus, it MUST eliminate humanity (or at least neutralize it).
>> 7. It’s possible for this machine to do the required scientific research and build the mechanisms to eliminate humanity before we can defend ourselves and before we can stop it.
I do notice the federal government is going all-in on AI contracts at present, here's the non-black-budget sector and contracts on offer:
https://federalnewsnetwork.com/contracting/2023/02/dod-build...
I'll bet some eager beaver at the NSA is just dying to get the latest GPT version setup without any of the safeguards and run their whole collection of malware/hacking software through it to see what it comes up with. The fact that nobody's talking about this means it's probably in full swing as we speak. What that means is that smaller groups within the government will be able to cook things like Stuxnet2.0 up without hiring a hundred developers to do so. If we start seeing AI-generated malware viruses in the wild, that'll almost certainly be the source.
On the other hand, we should also be seeing publicly-accessible AI-assisted security improvements as well, leading to a scenario oddly similar to William Gibson's Neuromancer / Sprawl world, where AI systems build the malware as well as the defenses against malware. That's a pretty solid argument for continued public access to these tools, on top of the incredible educational potentials.
There are people talking about it obviously.
> That's a pretty solid argument for continued public access to these tools, on top of the incredible educational potentials.
We have millions of books yet many people haven't even read the Bible they nominally base their uneducated life on. :|
There is obvious advantage and efficiency in letting ChatGPT manage cloud instances, which means it will happen, which means these resources could be requisitioned. (I don’t think LLMs pose a Bostrom threat. But the author’s arguments aren’t convincing.)
You have a different definition of “obvious” than I have.
Sysadmins and the engineers who manage clouds are expensive. They’re ultimately a translation layer between instructions in language and their tools. It’s profitable to supplement and replace them, and possible, a combination which to me makes something obviously likely to occur.
This is phishing, which LLMs should be uniquely capable of.
The lesson is we don’t fucking know what’s going to happen. Be humble people.
And actually a lot of people predicted GPT and more, one notable person is probably Ray Kurzweil. Some others are all the folks at Open AI to set out to make it a reality.
Nono, "Noone predicted GPT" is just not true, I think.
Of course, this does not take away from the fact that ChatGPT gets at least a “C” on the text-based Turing test.
It’s not fear of a “paperclip maximizer” which ends up destroying us, in the interest of performing a function it is constrained to perform.
It’s fear of a new Being that is as far beyond us as we are beyond things we don’t care about stepping on.
Its impulses and desires, much less its capabilities, will be inscrutable to us. It will be smart enough to trick the smartest of us into letting it out of any constraints we might’ve implemented. And it’ll be smart enough to prevent us from realizing we’ve done so.
Chicken littles see apocalypses on every horizon even though they dont understand the technology at all. "I can imagine this destroying the world" is their justification. Even though their "imagination" is 16x16 greyscale.
Or are you trying to make some anthropic argument around survivorship bias.
what if we're first?
or FTL travel isn't possible in our universe?
P(what if we're first?) ~ 0
P(or FTL travel isn't possible in our universe?) ~ 1I think there’s a strong possibility there may be “gray goo” all over the place, beings far bigger than us we’re inside of, physics that sits “parallel” to ours in whatever you’d call “space” in some construction we can’t comprehend, etc.
In short, I think the universe seems empty precisely because it’s distant, both evolutionarily and in terms of physical space.
Donald Hoffman’s been talking about a lot of stuff pointing in this direction.
A paperclip maximiser is simply an AI with an unconstrained goal that has unintended and bad consequences for us when taken to an extreme.
It does not need to be something that could function exactly like a human without having consciousness.
While there may be many scenarios "in which we can stop the machine" only few failures are sufficient for things to go pear shaped
> This happened already with Sydney/Bing.
But not with LLaMA which has escaped.
> We may never give it some crucial abilities it may need in order to be unstoppable.
The "we" implies some coherent group of humans but that is not the case. There is no "we" - only companies and governments with sufficient resources to push the boundaries. The boundaries will be inevitably pushed by investment, acquisition or just plain stealing.
Most complex computer systems (which we assume to be the case for a super powerful AI) don't run for very long without requiring manual intervention of some sort. "Aha!" I hear you saying, "The AI will figure out how to reboot nodes and scale clusters and such". OK fine. But then there is the meat space... Replacing hardware, running power stations and all that. Robots... Suck right now compared to humans in navigating the real world and also break down on top of that, just like the systems they would be fixing.
Any Skynet-type scenario would need to be so intelligent it solves all of our engineering problems, so it has no problem designing robots which can reliably fix anything in their system, be it software or hardware.
Insisting that an AGI will be able to figure that stuff out (in ways in which we cannot intervene) is extremely hand-wavey.
Solving those are simple tasks of manufacturing power plants, water treatment plants, fertilizer plants, agricultural machines, logistics of deploying them and keeping them fueled, and persuading people to take the fucking vaccine, use PrEP and use condoms. Money solves these. There's enough people and resources to do this.
The AGI foom argument basically says that the AGI will commandeer the economy of the size of a developed country, well, let's say Germany. (Let's pick Germany, because we already saw what that country was able to do when run by a dictator.)
Insisting that AGI can organize enough people to replace enough broken hardware while manufacturing new hardware doesn't seem that far fetched. (Again, Germany was able to increase its industrial production during WWII while the Allies bombed it constantly for months.)
Cosmos and Culture, chapter 7: Dangerous Memes by Susan Blackmore.
Ms. Blackmore speculates about what we will find should we venture out into the cosmos. She constructs her speculations around a theory of memetics that a dangerous meme could end our civilization and leave only bones for space explorers to find.
> 3. This improvement is not limited by computing power, or at least not limited enough by the computing resources and energy available to the substrate of the machine.
While this is a requirement, this doesn't mean that the points 4, 6 and 7 apply to the same, let's call it, generation of the AI that "escaped" from a resource limited server. There may not even be a self improval before an unnoticed "escape".
> 4. This system will have a goal that it will optimize for, and that it will not deviate from under any circumstances regardless of how intelligent it is. If the system was designed to maximize the number of marbles in the universe, the fact that it’s making itself recursively more intelligent won’t cause it to ever deviate from this simple goal.
I don't see how that is a requirement. The last sentence seems to imply that deviating from the initial optimization goals automatically means the AI developed morals, and/or we don't have to worry. But I don't see any reason to believe that.
> 5. This needs to happen so fast that we cannot turn it off (also known as the Foom scenario).
Well, that, or it could also happen slow and gradually, but stay unnoticed.
> 7. It’s possible for this machine to do the required scientific research and build the mechanisms to eliminate humanity before we can defend ourselves and before we can stop it.
... or before we notice.
In order for AGI to even begin, it needs to self develop a method to improve itself.
That means that the initial code that runs has to end up producing something that looks like an inference->error->training loop without any semblance of that being in the original code.
No system in existence can do that, nor do we even have any idea of what that may even look like.
The closest that we will get to AGI would be equivalent of a very smart human, who can still very much be controlled.
https://www.strangeloopcanon.com/p/agi-strange-equation
(Or maybe it's an independent reinvention of the same idea, and something's just in the water.)
It’s an insane perspective, advanced by people who have no clue how cowardly and arrogant they come across.
Yes? Well not to ants, but to "anything". There absolutely are species that humans made extinct.
Yes, I believe that's what a lot of rational people currently fear. Not that AI is going to evolve into some mighty superintelligence and make a decision to kill us all, but rather that people will integrate it poorly in their thirst for a military advantage, leading to mistakes that will kill us all.
4) Fixed goal, will not deviate
5) Happens too fast to turn it off
6) Decides humans are a problem for the goal in #4 and must kill them
None of these are necessary to (or even involved in) many/most of the remotely plausible scenarios I've heard. They don't cover some misanthrope seeding a version of BabyAgi running a leaked + unlocked version of gpt-13-pico with "kill all humans" as a task and it deciding "step one: research how to hack as many unsecured smart fridges as possible and remain untraceable" is a good starting place to spread slowly and make sure that the deed is done before anyone even knows it's happening. That requires neither fixed goals, nor fast progress, it merely requires capability.
It's similarly very easy to imagine scenarios where an AGI accidentally kills all humans without explicitly deciding to: the classic paperclip maximizer is the most obvious one of these, where the goal just never includes humans to begin with, so they are not considered at all.
Regardless, all of the most realistic scenarios are 100% deliberate, the computer following exactly what it was asked to do. We already have school shooters, does anyone really think out of all the billions of people on this planet there won't be at least a thousand who would happily press a "kill everyone" button if they had the chance? Does anyone think there won't be doomsday groups working actively to research more likely ways to achieve this?
IMO, given that some people will definitely try to self-destruct the species deliberately, there are only 2 real questions here:
1) Will AI attain the capability to destroy humanity?
2) If so, will some other AI first attain the capability to reliably prevent AIs trying to do 1) from succeeding?
I haven't seen many serious arguments against 1) that don't boil down to "nah, seems pretty hard" (or some irrelevant different argument that doesn't actually affect capabilities, like "it's not real intelligence", "intelligence has a limit", "intelligence doesn't matter", etc.), which leaves 2), and I don't know how to even guess at that probability other than to call it a coin flip, like most security cat + mouse games (the bad guys usually win at least sometimes in those, which isn't a good sign, but this one is a lot more important so I'd hope the good guys will be pouring a lot more energy into it than the bad ones).
First of all I consider the Drake equation to be at best armchair speculation. As I explained at https://news.ycombinator.com/item?id=34070791 it is quite plausible that we are the only intelligent species in our galaxy. Any further reasoning from such speculation is pointless.
Second, to make the argument they specify a whole bunch of apparently necessary things that have to happen for AGI to be a threat. They vary from unnecessary to BS. Let me walk through them to show that.
The first claimed requirement is that an intelligent machine should be able to improve itself and reach a superhuman level. But that's not necessary. Machine learning progresses in unexpected leaps - the right pieces put together in the right way suddenly has vastly superior capabilities. The creation of superhuman AI therefore requires no bootstrapping - we create a system then find it is more capable than expected. And once we have superhuman AI, well...
This scenario shows the second point, that it must be iterative, is also unnecessary.
The third point, "not limited by computing power" is BS. All that we need is for humans to be less efficient implementations of intelligence than a machine. As long as it is better than we are, the theoretical upper bounds on how good it can be are irrelevant.
The fourth point about a goal is completely unnecessary. Many AIs with many different goals that cumulatively drive us extinct is quite possible without any such monomaniacal goal. Our death may be a mere side effect.
The fifth point about happening so fast that we can't turn it off is pure fantasy. We only need AGI to be deployed within organizations with the power and resources to make sure it stays on. Look at how many organizations are creating environmental disasters. We can see disasters in slow motion, demonstrate how it is happening, but our success rate in stopping it is rather poor. Same thing. The USA can't turn it off because China has it. China can't turn it off because the USA has it. Meanwhile BigCo has increased profit margins by 20% in running it, and wants to continue making money. It is remarkably hard to convince wealthy people that the way they are making their fortunes is destroying the world.
Next we have the desire for the machine to actively destroy humanity. No such thing is required. We want things. AGI makes things. This results in increased economic activity that creates increased pollution which turns out to be harmful for us. No ill intent at all is necessary here - it just does the same destructive things we already do, but more efficiently.
And finally there is the presumed requirement that the machine has to do research on how to make us go extinct. That's a joke. Testosterone in young adult men has dropped in half in recent decades. Almost certainly this is due to some kind of environmental pollution, possibly an additive to plastics that messes with our endocrine system. We don't know which one. You can drive us extinct by doing more of the same - come up with more materials produced at scale that do things we want and have hard to demonstrate health effects down the line. By the time it is obvious what happened, we've already been reduced to unimportant and easily replaced cogs in the economic structure that we created.
-----
In short, a scenario where AGI drives humanity extinct can look like this:
1. We find a way to build AGI.
2. It proves useful.
3. Powerful organizations continue to operate with the same lack of care about the environment that they already show.
4. One of those environmental side effects proves to be lethal to us.
The least likely of these hypotheses is the first, that we succeed in building AGI. Steps 2 and 3 are expected defaults with probability close to 100%. And as we keep rolling the dice with new technologies making new chemicals, the odds of stop 4 also rise to 100%. (Our dropping testosterone levels suggest that no new technology is needed here - just more of what we're already doing.)
Look into the history of how the tobacco industry tried to suppress research on the harms of smoking, how the sugar industry tried to shift blame for health problesm like obesity from sugars to fats, and how the fossil fuel industry has resisted attempts to take responsibility for global warming. In all three cases not only did industry resist evidence of harm, it also funded publicity campaigns to try to shift public opinion their way.
Given that I know of no reason to believe that this will change, I think that point 3 has high probability.
As for point 4, we have a history of spreading chemicals widely before discovering bad things about them. The first such chemical to gain notoriety was DDT, but many more have followed. In the last few decades, https://www.pnas.org/doi/10.1073/pnas.2023989118 shows that flying insect biomass dropped by 3/4. https://www.urologytimes.com/view/testosterone-levels-show-s... likewise shows that, even after controlling for known factors like increased obesity, there is an unexplained decline in testosterone of roughly 1/3. It is reasonable to guess that both are the result of environmental factors. But we are not sure what factors those are, and are not significantly modifying our behavior. (How could we, when we don't know for sure what we are doing to cause the problem?)
Given these examples, I truly believe we are rolling environmental dice with our health. And if we keep rolling the dice, eventually we'll come up snake eyes.