Learning Game of Life with a Convolutional Neural Network (opens in new tab)

(danielrapp.github.io)

71 pointsDanielRapp10y ago13 comments

13 comments

shaggyfrog10y ago

This is the perfect place to quote one of the hacker koans:

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

"What are you doing?", asked Minsky.

"I am training a randomly wired neural net to play Tic-Tac-Toe" Sussman replied.

"Why is the net wired randomly?", asked Minsky.

"I do not want it to have any preconceptions of how to play", Sussman said.

Minsky then shut his eyes.

"Why do you close your eyes?", Sussman asked his teacher.

"So that the room will be empty."

At that moment, Sussman was enlightened.

claar10y ago

If you don't understand this (like me), see https://en.wikipedia.org/wiki/Hacker_koan#Uncarved_block for an explanation.

According to that link, it's based on a true story:

  So Sussman began working on a program. Not long after,
  this odd-looking bald guy came over. Sussman figured the
  guy was going to boot him out, but instead the man sat
  down, asking, "Hey, what are you doing?" Sussman talked
  over his program with the man, Marvin Minsky. At one point
  in the discussion, Sussman told Minsky that he was using a
  certain randomizing technique in his program because he
  didn't want the machine to have any preconceived notions.

  Minsky said, "Well, it has them, it's just that you don't
  know what they are." It was the most profound thing Gerry
  Sussman had ever heard. And Minsky continued, telling him
  that the world is built a certain way, and the most
  important thing we can do with the world is avoid
  randomness, and figure out ways by which things can be
  planned. Wisdom like this has its effect on
  seventeen-year-old freshmen, and from then on Sussman was
  hooked.

mark_l_watson10y ago

Neat example! I had not run across convnet.js before, so thanks. Karpathy was I believe one of Hinton's students. I looked over my archived courses last night and saved all of Hinton's videos and viewgraphs from his coursera class in 2012. Really good stuff, and it is a real shame that COursera does not continually re-run that course on auto-pilot (I am sure that Hinton and his students are too busy to participate in the forums but the videos, auto-generated homework, and the quizes would stand on their own without TA intervention.)

Convolutional networks, deep learning, and machine learning in general are arguably the most disruptive technologies in our lifetimes. BTW, pardon the shameless plug, but I just started a new blog this morning to host ML resources and my own experiments: http://blog.cognition.tech/?view=classic

JabavuAdams10y ago

That course was fantastic. Work prevented me from doing the assignments, but I've also archived the materials.

tlarkworthy10y ago

Cool, but convo nets are not the right tool though. Neural networks have an implicit smoothness and locality prior (which is what back propagation exploits) which this game does not really posses. Decision trees (e.g. random forests), are able to deal with sharp discontinuities better and would be more appropriate. I would expect less training, and perfect results.

azakai10y ago

I agree decision trees are more natural here, but even they are overkill, I think: all that needs to be learned is a function from the 8 neighbors to the center, which means from 8 bits to 2 bits, or in other words, there are just 256 values to be learned.

Literally any reasonable learning algorithm, even nearest neighbor, will learn that fairly quickly. The article had to use millions of samples, which is extremely excessive - a more efficient algorithm should only require thousands.

What might be interesting could be to see whether the neural network learns the underlying rule, that placement does not matter, only the sum matters, and that the crucial value is "3". That would be cool to see.

mdpopescu10y ago

... there are just 256 values to be learned

This is a very good point and it's the reason why I don't find much value in GAs. There is an inherent conflict between "an algorithm for a specific problem" (which can be optimized to run in a short time - the above problem can be reduced to a lookup in a 256 x 2 bit matrix and thus run in O(1)) and "an algorithm for any problem (in a class)" which, while theoretically capable of finding a solution, is prohibitive in time and/or space.

As far as I can tell, GAs are barely any better than a random walk and thus not useful for any non-trivial problems. I am not yet sure if NNs are in the same category, though I suspect they are.

boothead10y ago

Convolution is a comonad. http://kukuruku.co/hub/haskell/cellular-automata-using-comon...

I love it when there's a nugget of rigor behind things like this that can be pulled out and applied elsewhere!

hughes10y ago

I guess it works for a few specific cases you mentioned, but the example animation at the end diverges after twelve steps[1]

[1] http://imgur.com/a/gcUCH

karpathy10y ago

It does seem like it wasn't trained in the most optimal fashion.

- The conv layer has only 20 relu units, I'm not sure if that suffices to memorize the rules. The net might be "underfitting" the rule set, and therefore making mistakes in rare pixel arrangements.

- More worryingly it's also strange that the author uses a fully connected layer right after the conv layer (this is implicitly added in ConvNetJS when you specify a loss layer). This means that the output neurons are a function of the entire preceding CONV layer activations everywhere, while the game of life rules are local. The way to do this would be to instead use a 1x1 CONV layer with 2 neurons on top of the first 3x3 CONV layer, and interpret it as computing the class scores at every spatial position. However, in this case you'd want to apply the loss on every spatial position, and this "fully-convolutional" loss use case is not supported out of the box in ConvNetJS, but could be written.

- However, with a fully-convolutional loss zero-padding of 1 used around the borders might cause trouble. Normally this is okay with images, but here this might cause trouble because the neurons all share parameters spatially (and hence compute the same function) and don't "know" if they are at the border on in the middle of the image. I'm not sure how how this game of life handles boundary conditions, but if borders obey different dynamics then you'd want to distinguish the border pixels with a special "border" feature vector at the input. E.g. each pixel is a 3-vector, with a 1-hot encoding for (border, positive pixel, negative pixel).

- And it's also strange that the author uses "regression" loss for some that is a binary classification problem.

So, nice attempt but several funny choices, and clear why it didn't fully work :)

nicklo10y ago

Glad to see a critical eye on these sorts of things. I had a question about the choice of loss function. So regression is definitely the wrong choice since our outputs are binary (0 or 1) and not real-valued (0-1).

We could use a softmax classifier as this is meant for multi-class binary classifications. Each class in this case would represent a neighboring pixel, a classification of that class would represent activating that pixel. However, softmax assumes one-label and the probabilities add up to 1. Our problem is a multi-label multi-class binary classification.

We could train 9 of such softmax node groupings for each neighboring pixel but that immediately seems to be a bad idea.

Another awful solution - make each possible pixel configuration (2^9 of these) a class and do a standard soft-max.

I can start to see why the author chose to use regression loss as a sort of hack to get this to work, but I'm trying to think out the best, proper solution.

Any thoughts?

JabavuAdams10y ago

Comments like this, and Hinton's Coursera course remind me that there's a whole "Black Art" to training these systems.

Do you think a similar approach could be used to learn to generalize Navier-Stokes by looking at fluid flows?

1 more reply

j / k navigate · click thread line to collapse

13 comments

shaggyfrog10y ago

This is the perfect place to quote one of the hacker koans:

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

"What are you doing?", asked Minsky.

"I am training a randomly wired neural net to play Tic-Tac-Toe" Sussman replied.

"Why is the net wired randomly?", asked Minsky.

"I do not want it to have any preconceptions of how to play", Sussman said.

Minsky then shut his eyes.

"Why do you close your eyes?", Sussman asked his teacher.

"So that the room will be empty."

At that moment, Sussman was enlightened.

claar10y ago

If you don't understand this (like me), see https://en.wikipedia.org/wiki/Hacker_koan#Uncarved_block for an explanation.

According to that link, it's based on a true story:

  So Sussman began working on a program. Not long after,
  this odd-looking bald guy came over. Sussman figured the
  guy was going to boot him out, but instead the man sat
  down, asking, "Hey, what are you doing?" Sussman talked
  over his program with the man, Marvin Minsky. At one point
  in the discussion, Sussman told Minsky that he was using a
  certain randomizing technique in his program because he
  didn't want the machine to have any preconceived notions.

  Minsky said, "Well, it has them, it's just that you don't
  know what they are." It was the most profound thing Gerry
  Sussman had ever heard. And Minsky continued, telling him
  that the world is built a certain way, and the most
  important thing we can do with the world is avoid
  randomness, and figure out ways by which things can be
  planned. Wisdom like this has its effect on
  seventeen-year-old freshmen, and from then on Sussman was
  hooked.

mark_l_watson10y ago

JabavuAdams10y ago

That course was fantastic. Work prevented me from doing the assignments, but I've also archived the materials.

tlarkworthy10y ago

azakai10y ago

mdpopescu10y ago

... there are just 256 values to be learned

As far as I can tell, GAs are barely any better than a random walk and thus not useful for any non-trivial problems. I am not yet sure if NNs are in the same category, though I suspect they are.

boothead10y ago

Convolution is a comonad. http://kukuruku.co/hub/haskell/cellular-automata-using-comon...

I love it when there's a nugget of rigor behind things like this that can be pulled out and applied elsewhere!

hughes10y ago

I guess it works for a few specific cases you mentioned, but the example animation at the end diverges after twelve steps[1]

[1] http://imgur.com/a/gcUCH

karpathy10y ago

It does seem like it wasn't trained in the most optimal fashion.

- The conv layer has only 20 relu units, I'm not sure if that suffices to memorize the rules. The net might be "underfitting" the rule set, and therefore making mistakes in rare pixel arrangements.

- And it's also strange that the author uses "regression" loss for some that is a binary classification problem.

So, nice attempt but several funny choices, and clear why it didn't fully work :)

nicklo10y ago

We could train 9 of such softmax node groupings for each neighboring pixel but that immediately seems to be a bad idea.

Another awful solution - make each possible pixel configuration (2^9 of these) a class and do a standard soft-max.

I can start to see why the author chose to use regression loss as a sort of hack to get this to work, but I'm trying to think out the best, proper solution.

Any thoughts?

JabavuAdams10y ago

Comments like this, and Hinton's Coursera course remind me that there's a whole "Black Art" to training these systems.

Do you think a similar approach could be used to learn to generalize Navier-Stokes by looking at fluid flows?

1 more reply

j / k navigate · click thread line to collapse