The paperclip optimizer could be called a "bug" I guess. But the comparison to Tesla autopilot is interesting. First, is anyone calling that an AI? Second, when an objective baseline comparison to human drivers is possible, shouldn't that determine whether the AI is a net benefit or loss? And then (assuming a benefit) we can say with some degree of confidence that the AI is not malicious?
Humanity doesn't hate polar bears. We just love burning oil more than we love them. Polar bears die as an unintended side effect. We might even be sad about that. We might even keep a few alive in ZOOs. But we won't change our whole economy to save a few cute bears.
Of course these kinds of threats will start to appear only when it's smart enough. The main problem with AGI is that we probably won't know until it's too late, because of how fast this technology develops once it can improve itself.
Can I ask you another question - what do you think will happen?
1. it won't get smarter than us 2. it will care about us 3. we will somehow keep it in check despite the fact it's smarter than us
Cause it seems to me that most people still intuitively think (1) because it's "too sci-fi". And even if they became persuaded that (1) is no longer certain - they didn't updated the rest of their beliefs with that new information, they are still believing AI is safe like they did when (1) was assumed true because they haven't updated the cache, or maybe they don't even realize there's dependency somewhere in their train of thought that needs to be updated.
This is how living in a world undergoing singularity will be like, BTW - you can't think one thought to the end without realizing the assumptions might have changed since the last time you thought about it. So you go down and realize the assumptions there are also changing. And so on.
Sure. And TSLA unveils their beta FSD during their AI Day.
Brian Christian's book "The Alignment Problem" contains many examples of harms from systems which people plausibly see as being based on precursor technology.
A software bug is generally what we call the situation where software does exactly what its creator asked it to do, but that thing is not a thing its creator actually intended or wanted. The creator did not correctly express their request, and/or did not properly think through all the effects of carrying it out.
Think about people giving each other instructions and making rules for each other as we normally do day to day in English. English is not a very precise language for expressing what we actually want to happen, and also humans are not very good at rigorously specifying what they want to happen, relying instead on assumed implicit shared understanding; these assumptions lead to much misery in human/human interactions, never mind human/computer. Worse, humans are not very good at actually knowing either what they want the world to be like or how to make that happen. With the best of intentions, we make rules and set policies, intending that the world become better for it, and for every instruction we give, rule we set, policy we make, we invariably end up with some unintended consequences. This is the human condition: every day, we try to make the world a little better, but in the end things turn out like they always do. General human communication relies on shared values, but our values are not actually universally shared, we don't know how to even begin rigorously expressing our values, and much of the time we can't agree on what they are or even properly explain our own values to ourselves - all those fuzzy open questions in philosophy arise from this.
When we interact with a general AI, we are programming a computer system, in English. On top of all the usual problems, the computer system lacks our shared understanding, because not only do we not know how to impart it, we don't even really know what to impart. It is not aligned to human values. It can't be: we can't even achieve alignment with each other, never mind a lump of silicon. So miscommunication is inevitable. Worse, the recent direction in AI has been to throw away any attempt to actually express what behaviours we want explicitly, and instead just throw the entire contents of the internet at a giant statistical model and hope the correlations it makes are somehow useful to us. The honestly surprising thing is that this even works to any extent at all. But a few minutes' interaction quickly assures us that the resulting systems react quite unpredictably to our input.
We will ask for things, the AI will do exactly what we asked for, and we will find that what we literally asked for is not actually what we want: the system that is the combination of our request with the AI will contain bugs.
The amount of resulting harm is determined by what it is the software is controlling and how much time we have to react to the unintended consequences. The AI doomer claim is that, unless we do better at the alignment problem, as our software gets faster and we give it control over more stuff, the inevitable bugs will cause bad things to happen faster than we can react to prevent harm, and the consequences will be worse than we can tolerate; worse, as we link everything together, we might not even realise we are working indirectly with safety critical systems until they do whatever it is we told them to but did not mean.
The solution should be obvious: don't put software in control of devices that interact with the real world in ways that could cause serious harm without first rigorously proving that no combination of inputs can result in behaviours that make the situation worse instead of better. Traditional engineering sounds expensive, hard and tedious, and it is, and it is not very shiny or sexy, but we can do it, and do do it in situations where serious harm will otherwise result, like aviation or (most of) the automotive industry. Include the fuzzy ill-conditioned statistics software, by all means, but don't wire it directly to the controls - make it an input to a traditionally engineered and well understood system, treated like any other noisy and potentially broken input, with the system as a whole rigorously designed to produce safe outputs when it can and to safely shut down when it cannot.
Surely the AI doomers are overstating the risk of doom - surely no-one working in safety critical systems would do things any other way? "Has there ever been a case of harm form AI misalignment?" - this is the real thrust of that question. What sort of idiot would wire an unintended consequence generator directly to anything that might harm or hurt?
Tesla autopilot is interesting precisely as a current ongoing real-world example of the new-style fuzzy black box tech being wired directly to several tons of trundling metal, harm to property and life resulting from unintended behaviours, and instead of putting things on hold pending fault analysis and a more rigorous approach, we just double down on throwing more data at the fuzzy black box and hoping the fact that we can't reproduce the last bug with a few quick tests means it's gone away.
I agree with what you're saying in your post, but I want to further refine the part of your statement 'in English'....
LLMs are far beyond that... We're not just interacting in English, we're interacting in all languages the model was trained on. The key word here is language because for the average human this entails the primary language they grew up with. But for many people that are bilingual they realize the complexity of language potential is far greater than a person that speaks a single language, some languages can contain concepts that don't exist in another language. Now extent that ever further. Programming languages are not much different from human spoken language, just more formalized. Mathematics is a language that contains formal language.
And it can extend even further than that... That 802.11 signal in the air is a language, along with all of our other wireless signals. Yes, people have used deep learning to decode wireless.
I brought all this up because as models become more multi-modal us humans are going to get stuck in thinking that what we say/type to the AI might be what it is interpreting when the actual model may be working on a far larger and richer dataset then we are giving it credit for. As you said above, this would cause us to incorrectly interpret the capabilities of the model, likely significantly underestimating its ability in places where humans are incapable without additional tooling.
You don't have to look far to find people doing exactly that. AutoGPT, ChaosGPT (!), a lot of random people copying its output into a Python prompt.
"It's not safety-critical", you might say, but that's only because the AI isn't smart enough yet. A human-level AI could easily do a lot of damage this way, and I don't think we'll learn our lesson until it's already happened. Here's hoping it happens before we get superhuman AI!