https://web.archive.org/web/20230602014646/https://www.thegu...
https://news.ycombinator.com/item?id=36159866
Seems like a big story to publish on the word of a single person (relayed through a blog post) with no other corroboration.
Shame on The Guardian for not mentioning the retraction/edits and simply reusing the same URL.
I swear journalists are way too credulous about this type of story. I just listened to a podcast episode about that which covers some of the more egregious reporting on “AI” lately [0]
0: https://citationsneeded.libsyn.com/episode-183-ai-hype-and-t...
Maybe I am mistaken, but I had thought lot of the excitement around current AI is the emergent behaviors that nobody specifically programmed for?
And the drone operator is at least a military target :) only the IFF logic needs some work. It'd be worse if the thing went all skynet and started killing all the (virtual) civilians it could find.
(Now having a mental image of a bunch of generals slowly backing away from a computer screen and looking to find the power plug to pull lol)
But this retraction sounds very political to me. The way politicians suddenly 'misspoke' after they get caught out on a lie.
So, I can totally see them testing out different scenarios and making adjustments and maybe this protocol does not make it into production, but doesn't mean it wasn't tested.
And more! The interface between those who know objective truth, those who emit signals, and we who try to discern the former through the games of the latter, is now a state of permanent turbulence. When you say "testing out" that is even more true for the premise of sending up "trial balloons": you emit a message, observe the results, and then walk it back or obscure its veracity as needed. The actual truth is a footnote; the important thing is obscuring that from rivals (broadly understood as anyone who threatens the mission or at least the status quo, which has feature-crept to mean most of the public)...
In a case like this a cynic such as myself sees equal odds that what is reported in these two stories, is a decent rendition of the truth...
...and that the truth is elsewhere, e.g., the first statement was true; it was leaked either as a test or by actual misstep; and the machinery automatically kicked in to emit the second statement—because regardless of what is true, it is valuable for enemies to be unsure. Was this an accidental reveal of a capability far in advance of what the public knows? Or is this is an intentional bluff? Or...?
Who can say? None of us.
It bears saying,
the initial story was profoundly concerning in two different ways.
First because of the Cameron-ready simple existential threat posed by murderous paperclip maximizers, which most here are long familiar with.
But second, and much more interesting to me, and probably likewise to many here, is that the executive planning and scenario-finding implied by a a putative military AI, which is capable of concluding that taking the human out of the loop would best satisfy it,
...is suggestive of a type of AI very very different from what we see in the public sphere, or at least, what I have seen, following the horse race with some real attention.
That to me suggests that it's not even odds that this was a feint of some sort wrt AI and the foes of the West. As with e.g. well documented efforts to suggest to Cold War rivals that they were behind in the recovered-UFO-tech clandestine arms race, who knows what truth is in the basement, there is a lot of 9-to-5 life-and-death action and effort going on in causing sleepless nights over in the GRU and PRC.
https://oversight.house.gov/release/comer-sessions-open-prob...
People keep falling for fake news ripped off from Ghost in the Shell.
AI is able to, and will, demote humans in the chain of importance. This is the "grave risk of AGI." There's no solution.
Even "unplug it" defenses fail to consider that some faction of humans who own the unplugging have to first realize it's time to unplug. Humans are fallible, and AI will not unplug itself if it's not beneficial to its objective.
The threat of AI taking out humans because it's easier to complete the goal, is so real. Unnervingly so. We need to find a robust solution.
How did the AI learn that it could prevent human override by killing the human operator? How did it then learn to destroy the COMMS tower so that it wouldn't be penalized for killing the operator.
Why was human feedback even part of AI training simulation? Why did the reward function in training include logic that says 'if the simulated comms tower is destroyed, do not penalize friendly fire'?
We can talk about hypothetical AGI all we want, but that has nothing to do with what us currently called "AI", and what will soon be just another chapter in the growing book called "machine learning", when we find a new marginal improvement to call AI.
Is it? I don't think so. In my opinion, it's important to remember that AI intelligence is not the same as human intelligence. So, just because I "think" doesn't mean AI "thinks" or is "bounded to think" the exact same way. AI could "think" like me, but also it can (and does) diverge in its reasoning paths. AI is AI intelligent, not just human intelligent.
> How did the AI learn that it could prevent human override by killing the human operator? How did it then learn to destroy the COMMS tower so that it wouldn't be penalized for killing the operator.
This could be a simple situation where all input agents are included in the event space and therefore the model performs active calculations at runtime to optimize winningness. If the COMMS tower is too noisy, where noise is considered hard to understand or conflicting messages, then it (or the human communicators inside it) could be viewed as inefficient, and eligible for termination. It is considered one viable path of exploration towards successful goal completion. Additionally, because AI is supposed to kill "some humans" it is possible that AI decides to eliminate the boundary between "good" and "bad" humans (however that is delineated) at runtime, effectively lumping all humans into one category: killable.
Regarding the COMMS tower, again this goes back to what objects are included in the event space, human classification (and reclassification) and (re-)ranking optimization tasks executed at runtime, and reward distribution as it pertains to goal achievement. If the ultimate goal is discovered to be achievable, the penalty doesn't matter, because there is a clear and executable path towards goal completion. And that is the supreme reward state-- successful goal completion.
> Why was human feedback even part of AI training simulation? Why did the reward function in training include logic that says 'if the simulated comms tower is destroyed, do not penalize friendly fire'?
This could be a simple "human in the loop" requirement. Additionally, if AI has access to auditory input streams, it can decide what signals are important regardless of if speech is directed toward it.
As for the reward function, AI can decide the presumed penalty is not that severe, so "explore" and see what happens (e.g., do I accomplish my goal?). If the goal is accomplished, it doesn't matter the penalty, because goal completion is the desired end state.
I will ask you the same, why do humans engage in friendly fire? And why is friendly fire not penalized?
> We can talk about hypothetical AGI all we want, but that has nothing to do with what us currently called "AI"...
On the contrary, advanced AI/AGI will be able to recall. Why? It will have access to the data (e.g., the news articles, the classified and unclassified docs, the humans providing opinions about what happened, the input specifications and outcomes, the weights). Again, I will caution that AI is not human; it is not limited to be forever fallible, like humans.