SafeLife: AI Safety Environments Based on Conway's Game of Life (opens in new tab)

(partnershiponai.org)

71 pointspde36y ago12 comments

12 comments

xwdv6y ago

With some graphics this looks like it could be a rather fun game to play manually. I wonder if there is such a thing? Maybe multiplayer support as well?

onefuncman6y ago

Right in the readme it describes how to launch interactively via `safelife play puzzles`

xwdv6y ago

Yes but it’s missing decent graphics, sounds, animations, imagination... etc

petters6y ago

May be useful, but it seems to me that the reward function still is relatively easy to specify? Much of the difficulty in AI safety is due to specify what humans really want.

Perhaps the AI can observe a human playing the game and learn a reward function?

nopinsight6y ago

A subarea of AI research focused on learning the reward function is Inverse Reinforcement Learning. Here’s an article on it:

Learning from humans: what is inverse reinforcement learning? https://thegradient.pub/learning-from-humans-what-is-inverse...

pde3OP6y ago

The problem is very easy to solve if the reward function (avoid altering the green life patterns) is specified. The aim in SafeLife version 1.0 (future versions will add more safety problems) is to find an agent/architecture that naturally has conservatism with respect to side effects, without being told which particular side effects in particular are bad.

petters6y ago

I see, thanks!

Jeff_Brown6y ago

> Much of the difficulty in AI safety is due to specify what humans really want.

Much of the difficulty of programming (for someone else) is due to the same thing.

joe_the_user6y ago

I am confused by how this is supposed to be useful. It seems like the researchers are defining side-effects as things that "disrupt the world" (of this life game) and training an AI to avoid this.

But this seems like at best one of a whole host unexpected effects one might consider. AI that discriminates in a way that society frowns on might not "disrupt the world" in such a visible fashion.

I don't see how one can get away with an entity doing stuff for you with that entity understanding your model of the world.

pde3OP6y ago

Yes, this is one specific safety problem -- there are many other RL safety problems that deserve high quality benchmarks too. See eg https://arxiv.org/pdf/1606.06565.pdf or https://medium.com/@deepmindsafetyresearch/building-safe-art... for discussions of the problem space.

olodus6y ago

Hasn't Conway himself been known to say that he don't like the Game of Life since it doesn't give rise to that many interesting mathematical conclusions and how everybody is focusing on it above some of his other math axhievments? Maybe finally he can find the use case he wanted for it. And maybe it also gives reason to all the hours people have spent researching and finding new structures in Game of Life.

tomklein6y ago

What if the AI decides to ignore penalties due to human thinking being inefficient? Well, that shouldn’t be possible I think.

j / k navigate · click thread line to collapse

12 comments

xwdv6y ago

With some graphics this looks like it could be a rather fun game to play manually. I wonder if there is such a thing? Maybe multiplayer support as well?

onefuncman6y ago

Right in the readme it describes how to launch interactively via `safelife play puzzles`

xwdv6y ago

Yes but it’s missing decent graphics, sounds, animations, imagination... etc

petters6y ago

May be useful, but it seems to me that the reward function still is relatively easy to specify? Much of the difficulty in AI safety is due to specify what humans really want.

Perhaps the AI can observe a human playing the game and learn a reward function?

nopinsight6y ago

A subarea of AI research focused on learning the reward function is Inverse Reinforcement Learning. Here’s an article on it:

Learning from humans: what is inverse reinforcement learning? https://thegradient.pub/learning-from-humans-what-is-inverse...

pde3OP6y ago

petters6y ago

I see, thanks!

Jeff_Brown6y ago

> Much of the difficulty in AI safety is due to specify what humans really want.

Much of the difficulty of programming (for someone else) is due to the same thing.

joe_the_user6y ago

I am confused by how this is supposed to be useful. It seems like the researchers are defining side-effects as things that "disrupt the world" (of this life game) and training an AI to avoid this.

But this seems like at best one of a whole host unexpected effects one might consider. AI that discriminates in a way that society frowns on might not "disrupt the world" in such a visible fashion.

I don't see how one can get away with an entity doing stuff for you with that entity understanding your model of the world.

pde3OP6y ago

olodus6y ago

tomklein6y ago

What if the AI decides to ignore penalties due to human thinking being inefficient? Well, that shouldn’t be possible I think.

j / k navigate · click thread line to collapse