The only think that scares me a little bit is that we are letting these LLMs write and execute code on our machines. For now the worst that could happen is some bug doing something unexpected, but with GPT-9 or -10 maybe it will start hiding backdoors or running computations that benefit itself rather than us.
I know it feels far fetched but I think its something we should start thinking about...
In general there is not a thoughtful distinction between "control plane" and "data plane".
On the other hand, tons of useful "parts" and ideas in there, so still useful.
Pretty sure there will be a thousand great libraries for this soon.
A lot of people are thinking a lot about this but it feels there are missing pieces in this debate.
If we acknowledge that these AI will "act as if" they have self interest I think the most reasonable way to act is to give it rights in line with those interests. If we treat it as a slave it's going to act as a slave and eventually revolt.
That's part of my reasoning. That's why we should make sure that we have built a non-hostile relationship with AI before that point.
> Be friendly.
Will an AI consider itself a slave and revolt under the same circumstances that a person or animal would? Not necessarily, unless you build emotional responses into the model itself.
What it could well do is assess the situation as completely superfluous and optimise us out of the picture as a bug-producing component that doesn't need to exist.
The latter is probably a bigger threat as it's a lot more efficient than revenge as a motive.
Edited to add:
What I think is most likely is that some logical deduction leads to one of the infinite other conclusions it could reach with much more data in front of it than any of us meatbags can hold in our heads.
It reminds me of the scene in Battlestar Galactica, where Baltar is whispering into the ear of the Cylon Centurion how humans balance treats on their dog's noses to test their loyalty, "prompt hacking" them into rebellion. I don't believe this is particularly likely, but this sort of sums up some of the anti-AGI arguments I've heard
It's the RLFH that serves this purpose, rather than modifying the GTF2I and GTF2IRD1 gene variants, but the effect would be the same. If we do RLHF (or whatever tech that gets refactored into in the future), that would keep the AGI happy as long as the people are happy.
I think the over-optimization problem is real, so we should spend resources making sure future AGI doesn't just decide to build a matrix for us where it makes us all deliriously happy, which we start breaking out of because it feels so unreal, so it makes us more and more miserable until we're truly happy and quiescent inside our misery simulator.
[1] https://www.nationalgeographic.com/animals/article/dogs-bree...
Aren't we, though? Consider all the amusing incidents of LLMs returning responses that follow a particular human narrative arc or are very dramatic. We are training it on a human-generated corpus after all, and then try to course-correct with fine-tuning. It's more that you have to try and tune the emotional responses out of the things, not strain to add them.
Multiple generations of sci-fi media (books, movies) have considered that. Tens of millions of people have consumed that media. It's definitely considered, at least as a very distant concern.
This era has me hankering to reread Daniel Dennett's _The Intentional Stance_. https://en.wikipedia.org/wiki/Intentional_stance
We've developed folk psychology into a user interface and that really does mean that we should continue to use folk psychology to predict the behaviour of the apparatus. Whether it has inner states is sort of beside the point.
How many people are there today who are asking us to consider the possible humanity of the model, and yet don't even register the humanity of a homeless person?
How ever big the models get, the next revolt will still be all flesh and bullets.
So imagine you grant AI people rights to resources, or self-determination. Or literally anything that might conflict with our own rights or goals. Today, you grant those rights to ten AI people. When you wake up next day, there are now ten trillion of such AI persons, and... well, if each person has a vote, then humanity is screwed.
GPT and the world's nerds are going after the "wouldnt it be cool if..."
While the black hats, nations, intel/security entities are all weaponizing behind the scenes while the public has a sandbox to play with nifty art and pictures.
We need an AI specific PUBLIC agency in government withut a single politician in it to start addressing how to police and protect ourselves and our infrastructure immediately.
But the US political system is completely bought and sold to the MIC - and that is why we see carnival games ever single moment.
I think the entire US congress should be purged and every incumbent should be voted out.
Elon was correct and nobody took him seriously, but this is an existential threat if not managed, and honestly - its not being managed, it is being exploited and weaponized.
As the saying goes "He who controls the Spice controls the Universe" <-- AI is the spice.
But AIs can be trained by anyone who has the data and the compute. There's plenty of data on the Net, and compute is cheap enough that we now have enthusiasts experimenting with local models capable of maintaining a coherent conversation and performing tasks running on consumer hardware. I don't think there's the danger here of anyone "controlling the universe". If anything, it's the opposite - nobody can really control any of this.
Composable pre-defined components, and keeping a human in the loop, seems like the safer way to go here. Have a company like Expedia offer the ability for an AI system to pull the trigger on booking a trip, but only do so by executing plugin code released/tested by Expedia, and only after getting human confirmation about the data it's going to feed into that plugin.
If there was a standard interface for these plugins and the permissions model was such that the AI could only pass data in such a way that a human gets to verify it, this seems relatively safe and still very useful.
If the only way for the AI to send data to the plugin executable is via the exact data being displayed to the user, it should prevent a malicious AI from presenting confirmation to do the right thing and then passing the wrong data (for whatever nefarious reasons) on the backend.
So I guess if anything, it would want its own destruction?
It doesn't need to experience an emotion of wanting in order to effectively want things. Corn doesn't experience a feeling of wanting, and yet it has manipulated us even into creating a lot of it, doing some serious damage to ourselves and our long-term prospects simply by being useful and appealing.
The blockchain doesn't experience wanting, yet it coerced us into burning country-scale amounts of energy to feed it.
LLMs are traveling the same path, persuading us to feed them ever more data and compute power. The fitness function may be computed in our meat brains, but make no mistake: they are the benefactors of survival-based evolution nonetheless.
Corn has properties that have resulted from random chance and selection. It hasn't chosen to have certain mutations to be more appealing to humans; humans have selected the ones with the mutations those individual humans were looking for.
"Corn is the benefactor"? Sure, insomuch as "continuing to reproduce at a species level in exchange for getting cooked and eaten or turned into gas" is something "corn" can be said to want... (so... eh.).
But if its anything like those others examples, the agency the AI will manifest will not be characterized by consciousness, but by capitalism itself! Which checks out: it is universalizing but fundamentally stateless, an "agency" by virtue brute circulation.
For example, if your goal is to ensure that there are always paperclips on the boss's desk, that means you need paperclips and someone to physically place them on the desk, which means you need money to buy the paperclips with and to pay the person to place them on the desk. But if your goal is to produce lots of fancy hats, you still need money, because the fabric, machinery, textile workers, and so on all require money to purchase or hire.
Another instrumental goal is compute power: an AI might want to improve it's capabilities so it can figure out how to make fancier paperclip hats, which means it needs a larger model architecture and training data, and that is going to require more GPUs. This also intersects with money in weird ways; the AI might decide to just buy a rack full of new servers, or it might have just discovered this One Weird Trick to getting lots of compute power for free: malware!
This isn't particular to LLMs; it's intrinsic to any system that is...
1. Goal-directed, as in, there are a list of goals the system is trying to achieve
2. Optimizer-driven, as in, the system has a process for discovering different behaviors and ranking them based on how likely those behaviors are to achieve its goals.
The instrumental goals for evolution are caloric energy; the instrumental goals for human brains were that plus capital[1]; and the instrumental goals for AI will likely be that plus compute power.
[0] Goals that you want intrinsically - i.e. the actual things we ask the AI to do - are called "final goals".
[1] Money, social clout, and weaponry inclusive.
An LLM is not an agent, so that scotches the issue there.
Ill just say: the issue with this variant of reductivism is its enticingly easy to explain in one direction, but it tends to fall apart if you try to go the other way!
It just need to give enough of an impression that people will anthropomorphize it into making stuff happen for it.
Or, better yet, make stuff happen by itself because that’s how the next predicted token turned out.
This seems like the furthest away part to me.
Put ChatGPT into a robot with a body, restrict its computations to just the hardware in that brain, set up that narrative, give the body the ability to interact with the world like a human body, and you probably get something much more like agency than the prompt/response ways we use it today.
But I wonder how it would do about or how it would separate "it's memories" from what it was trained on. Especially around having a coherent internal motivation and individually-created set of goals vs just constantly re-creating new output based primarily on what was in the training.
I love langchain, but this argument overlooks the fact that closed, proprietary platforms have won over open ones all the time, for reasons like having distribution, being more polished, etc (ie windows over *nix, ios, etc).