Understanding Agent Cooperation (opens in new tab)

(deepmind.com)

126 pointspiokuc9y ago57 comments

57 comments

The AI can minimize loss / maximize fitness by either moving to look for additional resources, or fire a laser.

Turns out that when resources are scarce, the optimal move is to knock the opponent away. I think this tells us more about the problem space than the AI itself; it's just optimizing for the specific problem.

swalsh9y ago

I think you're right, had the AI (like the second game) had the incentive to maximize the well being of the cooperating actors, rather than itself, the outcome would be different.

But if advanced AI is being developed in a capitalist economy by independent actors, it seems most likely the incentives will be anything other than optimizing the output for the individual outcome.

If that AI finds a way to "hurt" the other actor, there could be some major boat load of unintended consequences.

laretluval9y ago

> I think this tells us more about the problem space than the AI itself; it's just optimizing for the specific problem.

My reading is that the point of this research is to find out what problem spaces are conducive to cooperation, not to find out details of how these particular agents work.

JamilD9y ago

Yeah — after reading this article and the paper, I agree. The previous link [0] on this submission was much more sensationalized and very misleading.

[0] http://www.sciencealert.com/google-s-new-ai-has-learned-to-b...

projektir9y ago

I'm rather worried about the wording used, and AI being created in that context. Do we really not realize what we're doing? AI is not magic, it's not free from fundamental math, it's not free from corruption. It's just going to multiply it that much more.

Any AI that has been programmed to highly value winning is not going to be very cooperative. For it to be cooperative, especially in situations that simulate survival, it needs to have higher ideals than winning, just like humans. It needs to be able to see and be aware of the big picture. You don't need to look at AI for that, you can just look at the world.

Development of AI's of this nature will just lead to a super-powered Moloch. Cooperative ethics is a highly advanced concept, it's not going to show up on its own from mere game theory without a lot of time.

philipov9y ago

Cooperative ethics arise immediately in the Prisoner's Dilemma merely by adding an unknown number of iterations to the game. The most efficient strategy is a version of tit-for-tat.

projektir9y ago

I'm assuming you're referring to something like this: https://egtheory.wordpress.com/2015/03/02/ipd/

I think we shouldn't confuse efficient strategies with the chosen strategies. What causes Moloch is the inability to see the big picture, to see outside of the self in the collective (maybe Buddhism has a point).

An efficient strategy may very well be something we'd prefer, such as tit-for-tat. But is that the strategy we choose? Looking at the long history of evolution, I'd say no.

2 more replies

jerf9y ago

Not entirely spawned by this article, but the whole genre and some other comments on HN by other users: I wonder if part of the "mystery" of cooperation in these simulations is that these people keep investigating the question of cooperation using simulations too simplistic to model any form of trade. A fundamental of economics 101 is that valuations for things differ for different agents. Trade ceases to exist in a world where everybody values everything exactly the same, because the only trade that makes any sense is to two trade two things of equal value, and even then, since the outcome is a wash and neither side obtains any value from it, why bother? I'm not sure the simulation hasn't been simplified to the point that the phenomena we're trying to use the simulation to explain are not capable of manifesting within the simulation.

I'm not saying that Trade Is The Answer. I would be somewhat surprised if it doesn't form some of the solution eventually, but that's not the argument I'm making today. The argument I'm making is that if the simulation can't simulate trade at all, that's a sign that it may have been too simplified to be useful. There are probably other things you could say that about; "communication" being another one. The only mechanism for communication being the result of iteration is questionable too, for instance. Obviously in the real world, most cooperation doesn't involve human speech, but a lot of ecology can be seen to involve communication, if for no other reason than you can't have the very popular strategy of "deception" if you don't have "communication" with which to deceive.

Which may also explain the in-my-opinion overpopular and excessively studied "Prisoner's Dilemma", since it has the convenient characteristic of explicitly writing communication out of it. I fear its popularity may blind us to the fact that it wasn't ever really meant to be the focus of study of social science, but more a simplified word problem for game theory. Studying a word problem over and over and over may be like trying to understand the real world of train transportation systems by repeatedly studying "A train leaves from Albuquerque headed towards Boston at 1pm on Tuesday and a train leaves from Boston headed towards Albuquerque at 3pm on Wednesday, when do they pass each other?" over and over again.

(Or to put it really simply in machine learning terms, what's the point of trying to study cooperation in systems whose bias does not encompass cooperation behaviors in the first place?)

titanomachy9y ago

Iterated prisoner's dilemma allows a sort of "communication". As the number of iterations grows, the cost of losing each individual round becomes negligible in the long run and agents can learn to use their decisions (COOPERATE or DEFECT) as a binary communication channel. So instead of saying "let's cooperate" over some side-channel, an agent indicates its intention to cooperate by simply cooperating.

In iterated prisoners dilemma and other similar games, the "API" with which agents interact with the world is extremely simple. The statement of the problem is also very simple. The agent itself can be any computable algorithm for deciding to cooperate or defect based on the past history of game rounds. I find it interesting to see agents learn recognizable behaviours like "communication" or "trade" when they aren't explicitly programmed to do those things.

cs7029y ago

The folks at DeepMind continue to produce clever original work at an astounding pace, with no signs of slowing down.

Whenever I think I've finally gotten a handle on the state-of-the-art in AI research, they come up with something new that looks really interesting.

They're now training deep-reinforcement-learning agents to co-evolve in increasingly more complex settings, to see if, how, and when the agents learn to cooperate (or not). Should they find that agents learn to behave in ways that, say, contradict widely accepted economic theory, this line of work could easily lead to a Nobel prize in Economics.

Very cool.

bitwize9y ago

Oh great.

It's just a matter of time before it floods the Enrichment Center with deadly neurotoxin.

vanderZwan9y ago

You know, rather than being scared by this, I think it's an excellent opportunity to learn how and when aggression evolves, and maybe learn how we can set up systems that nudge people to collaborate, perhaps even when resources are scarce.

katzgrau9y ago

The article at first suggests that more intelligent versions of AI led to greed and sabotage.

But I do wonder if an even more intelligent AI (perhaps in a more complex environment) would take the long view instead and find a reason to co-habitate.

It's kind of like rocks, paper scissors - when you attempt to think several levels deeper than your opponent and guess which level they stopped at. At some intelligence level for AI, cohabitation seems optimal - at the next level, not so much, and so on.

We're probably going to end up building something so complex that we don't quite understand it and end up hurting somebody.

jonbaer9y ago

Why is this done on such a small level? I would have thought that with systems now in place that evolutionary game theory could be done in simulations on such a much larger scale (say 7bn agents +) ... if anything AI systems should be able to determine if certain strategies work (like items like blocking resources - such as a case of geopolitical theory) so see what cooperations occur at that level. Still amazing work but it should be applied to a larger scale for real meaning. More eager to see how RL applied to RTS games will explore and develop strategies more than anything.

george_ciobanu9y ago

"Scarce resources cause competition" and "Scarce but close to impossible to catch on own resources cause cooperation". Is that really a discovery worth publishing?

tawpKek9y ago

>Self-interested people often work together to achieve great things. Why should this be the case, when it is in their best interest to just care about their own wellbeing and disregard that of others?

I think this is a kind of strong statement to take as a given, especially as an opening. This is taking social darwinism as law, and could use more scrutiny.

falsedan9y ago

Is it just me, or is this article extremely light on content? The core of it seems to be

  > sequential social dilemmas, and us[ing] artificial agents trained by deep multi-agent reinforcement learning to study [them]

But I didn't find out how to recognise a sequential social dilemma, nor their training method.

roymurdock9y ago

Here's the actual paper: https://storage.googleapis.com/deepmind-media/papers/multi-a...

Don't expect any crazy deep insights, but it's a useful read if you want to set up a similar experiment or understand the research methodology.

d--b9y ago

Mmmh, the problem of modeling social behavior is in defining the reward function, not in implementing optimal strategies to maximize the reward.

In a game where you are given the choice of killing 10,000 people or be killed yourself, which is the most rewarding outcome?

tmcpro9y ago

I wonder how Deepmind will simulate game theory as it advances

mtanski9y ago

I imagine you'll be able to get answer to more complex games. Things like combination of N players, multiple stable states, different optima for different players, external factors / stimuli. Answers by simulation rather then proof.

c3534l9y ago

I know what I'm writing my systems science paper on.

bencollier499y ago

What an awful headline. "AI learns to compete in competitive situations" should be the precis.

Basically, it learned that it didn't need to fight until there was resource scarcity in a simulation.

leereeves9y ago

I think aggressive is a better (more descriptive, narrower) word than compete here.

Two racers are competing to see who runs faster, but if one pulls out a laser gun and shoots the other, that's aggressive.

bencollier499y ago

It's a loaded word in the context of the headline, and the manner in which it was used in the story body, especially when combined with "stress" to create an image of a sort of edgy killing machine.

Actually, it's an interesting word. Dictionary definitions of aggression frequently revolve around emotions - it's a very human word, probably not suitable for AI.

itsdrewmiller9y ago

The game included the ability to shoot the other player with no consequence - it sounds aggressive because "laser beam" but if you called it "tagging" then it would more clearly just be an in-rules option.

1 more reply

jfoster9y ago

To the AI, there is nothing aggressive about the "laser gun"; as far as it was aware, the "laser gun" could be any tool. It was just doing what it had determined may help it achieve a better score.

1 more reply

tutts9y ago

Not if laser guns are a part of the race. Then it's just competitive.

2 more replies

tdkl9y ago

Progressive left double speak, nothing new.

dang9y ago

Please don't post unsubstantive comments, and especially don't do partisan name-calling here. It destroys the kind of discussion HN exists for.

Edit: unfortunately you've been doing this a lot. We ban accounts that do this, so please stop.

saycheese9y ago

This reads as click-bait, here's the original blog post and research paper by DeepMind:

"Understanding Agent Cooperation" https://news.ycombinator.com/edit?id=13635218

dang9y ago

Thanks! We changed to that from http://www.sciencealert.com/google-s-new-ai-has-learned-to-b....

doener9y ago

https://news.ycombinator.com/item?id=13620518

creo9y ago

Bait.

j / k navigate · click thread line to collapse

57 comments

JamilD9y ago

The AI can minimize loss / maximize fitness by either moving to look for additional resources, or fire a laser.

swalsh9y ago

I think you're right, had the AI (like the second game) had the incentive to maximize the well being of the cooperating actors, rather than itself, the outcome would be different.

But if advanced AI is being developed in a capitalist economy by independent actors, it seems most likely the incentives will be anything other than optimizing the output for the individual outcome.

If that AI finds a way to "hurt" the other actor, there could be some major boat load of unintended consequences.

laretluval9y ago

> I think this tells us more about the problem space than the AI itself; it's just optimizing for the specific problem.

My reading is that the point of this research is to find out what problem spaces are conducive to cooperation, not to find out details of how these particular agents work.

JamilD9y ago

Yeah — after reading this article and the paper, I agree. The previous link [0] on this submission was much more sensationalized and very misleading.

[0] http://www.sciencealert.com/google-s-new-ai-has-learned-to-b...

projektir9y ago

philipov9y ago

Cooperative ethics arise immediately in the Prisoner's Dilemma merely by adding an unknown number of iterations to the game. The most efficient strategy is a version of tit-for-tat.

projektir9y ago

I'm assuming you're referring to something like this: https://egtheory.wordpress.com/2015/03/02/ipd/

An efficient strategy may very well be something we'd prefer, such as tit-for-tat. But is that the strategy we choose? Looking at the long history of evolution, I'd say no.

2 more replies

jerf9y ago

(Or to put it really simply in machine learning terms, what's the point of trying to study cooperation in systems whose bias does not encompass cooperation behaviors in the first place?)

titanomachy9y ago

cs7029y ago

The folks at DeepMind continue to produce clever original work at an astounding pace, with no signs of slowing down.

Whenever I think I've finally gotten a handle on the state-of-the-art in AI research, they come up with something new that looks really interesting.

Very cool.

bitwize9y ago

Oh great.

It's just a matter of time before it floods the Enrichment Center with deadly neurotoxin.

vanderZwan9y ago

katzgrau9y ago

The article at first suggests that more intelligent versions of AI led to greed and sabotage.

But I do wonder if an even more intelligent AI (perhaps in a more complex environment) would take the long view instead and find a reason to co-habitate.

We're probably going to end up building something so complex that we don't quite understand it and end up hurting somebody.

jonbaer9y ago

george_ciobanu9y ago

"Scarce resources cause competition" and "Scarce but close to impossible to catch on own resources cause cooperation". Is that really a discovery worth publishing?

tawpKek9y ago

I think this is a kind of strong statement to take as a given, especially as an opening. This is taking social darwinism as law, and could use more scrutiny.

falsedan9y ago

Is it just me, or is this article extremely light on content? The core of it seems to be

  > sequential social dilemmas, and us[ing] artificial agents trained by deep multi-agent reinforcement learning to study [them]

But I didn't find out how to recognise a sequential social dilemma, nor their training method.

roymurdock9y ago

Here's the actual paper: https://storage.googleapis.com/deepmind-media/papers/multi-a...

Don't expect any crazy deep insights, but it's a useful read if you want to set up a similar experiment or understand the research methodology.

d--b9y ago

Mmmh, the problem of modeling social behavior is in defining the reward function, not in implementing optimal strategies to maximize the reward.

In a game where you are given the choice of killing 10,000 people or be killed yourself, which is the most rewarding outcome?

tmcpro9y ago

I wonder how Deepmind will simulate game theory as it advances

mtanski9y ago

c3534l9y ago

I know what I'm writing my systems science paper on.

bencollier499y ago

What an awful headline. "AI learns to compete in competitive situations" should be the precis.

Basically, it learned that it didn't need to fight until there was resource scarcity in a simulation.

leereeves9y ago

I think aggressive is a better (more descriptive, narrower) word than compete here.

Two racers are competing to see who runs faster, but if one pulls out a laser gun and shoots the other, that's aggressive.

bencollier499y ago

It's a loaded word in the context of the headline, and the manner in which it was used in the story body, especially when combined with "stress" to create an image of a sort of edgy killing machine.

Actually, it's an interesting word. Dictionary definitions of aggression frequently revolve around emotions - it's a very human word, probably not suitable for AI.

itsdrewmiller9y ago

1 more reply

jfoster9y ago

To the AI, there is nothing aggressive about the "laser gun"; as far as it was aware, the "laser gun" could be any tool. It was just doing what it had determined may help it achieve a better score.

1 more reply

tutts9y ago

Not if laser guns are a part of the race. Then it's just competitive.

2 more replies

tdkl9y ago

Progressive left double speak, nothing new.

dang9y ago

Please don't post unsubstantive comments, and especially don't do partisan name-calling here. It destroys the kind of discussion HN exists for.

Edit: unfortunately you've been doing this a lot. We ban accounts that do this, so please stop.

saycheese9y ago

This reads as click-bait, here's the original blog post and research paper by DeepMind:

"Understanding Agent Cooperation" https://news.ycombinator.com/edit?id=13635218