FlappyBird hack using Deep Q-Learning (opens in new tab)

(github.com)

96 pointsDaGardner10y ago13 comments

13 comments

zodPod10y ago

Finally we're trying to teach algorithms to feel anger! I love it!

EDIT: Just to be clear, this is a joke based on the game being flappy bird.

Seriously, though, this is awesome. I love this kind of stuff!

SoonDead10y ago

First AlphaGo and now this. Our world has truly come to an end!

ThisIBereave10y ago

Interesting implementation detail

while "pigs" != "fly":

arfar10y ago

My favourite when writing C is:

    #define ever (;;)
    ...
    for ever {
        ...
    }

yenchenlin199410y ago

Hello, I'm the author, this is nice! Please send a PR if there's a Python version.

kotach10y ago

LSTM would converge even faster.

A K-level breadth first search mimicking the optimal policy and a simple learning to search algorithm with a cost sensitive binary linear classifier would work well too.

After training it would be a constant time evaluation of what to do next.

yenchenlin199410y ago

Hello I'm the author. Cool idea, would you please provide some paper so that I can absorb?

kotach10y ago

Checkout Dagger [2], SEARN [3] and LOLS [1] (LOLS is available in vowpal wabbit search capabilities). A lot of interesting stuff on mimicking optimal policies, local optimality, joint learning and similar stuff :D

The whole point of playing is doing your decisions jointly, dependent on the previous decisions. If you learn your model that way it'll make its decisions trying to minimize future regret.

Local optimality is a very nice property. It means that if you play out a game, not a single change of any of the previous moves could lead you to a better result. Of course, local optimality is hard but for some problems it's pretty easy to achieve if your optimal policy is good, and your features are adequate (which they will be if you use neural networks).

Of course, flappy bird is pretty local game and all of this might be an overkill :D

AlphaGo wasn't trained jointly over Go games, so it's lacking in that regard. But the power of neural networks is compensating. Who can imagine what AlphaGo would be like if they trained their policy networks jointly? :D

A nice introduction to LSTMs: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

[1]: http://arxiv.org/pdf/1502.02206.pdf

[2]: http://arxiv.org/pdf/1011.0686.pdf

[3]: http://searn.hal3.name

1 more reply

filthydumbidiot10y ago

Interesting. Here's a similar project from a couple years ago:

http://sarvagyavaish.github.io/FlappyBirdRL/

vessenes10y ago

I'd like to see this hooked up to a physical phone and actuator. Anybody here seen anything done using a realtime physical loop in the learning process?

hcrisp10y ago

I was sort of hoping that the bird would hit a pipe at the end of the 6-minute video.

yenchenlin199410y ago

Hello, I'm the author, actually it did hit a pipe at the end of the 6-minute video. I accidentally trim that part, sorry ...

1 more reply

j / k navigate · click thread line to collapse

13 comments

zodPod10y ago

Finally we're trying to teach algorithms to feel anger! I love it!

EDIT: Just to be clear, this is a joke based on the game being flappy bird.

Seriously, though, this is awesome. I love this kind of stuff!

SoonDead10y ago

First AlphaGo and now this. Our world has truly come to an end!

ThisIBereave10y ago

Interesting implementation detail

while "pigs" != "fly":

arfar10y ago

My favourite when writing C is:

    #define ever (;;)
    ...
    for ever {
        ...
    }

yenchenlin199410y ago

Hello, I'm the author, this is nice! Please send a PR if there's a Python version.

kotach10y ago

LSTM would converge even faster.

A K-level breadth first search mimicking the optimal policy and a simple learning to search algorithm with a cost sensitive binary linear classifier would work well too.

After training it would be a constant time evaluation of what to do next.

yenchenlin199410y ago

Hello I'm the author. Cool idea, would you please provide some paper so that I can absorb?

kotach10y ago

The whole point of playing is doing your decisions jointly, dependent on the previous decisions. If you learn your model that way it'll make its decisions trying to minimize future regret.

Of course, flappy bird is pretty local game and all of this might be an overkill :D

A nice introduction to LSTMs: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

[1]: http://arxiv.org/pdf/1502.02206.pdf

[2]: http://arxiv.org/pdf/1011.0686.pdf

[3]: http://searn.hal3.name

1 more reply

filthydumbidiot10y ago

Interesting. Here's a similar project from a couple years ago:

http://sarvagyavaish.github.io/FlappyBirdRL/

vessenes10y ago

I'd like to see this hooked up to a physical phone and actuator. Anybody here seen anything done using a realtime physical loop in the learning process?

hcrisp10y ago

I was sort of hoping that the bird would hit a pipe at the end of the 6-minute video.

yenchenlin199410y ago

Hello, I'm the author, actually it did hit a pipe at the end of the 6-minute video. I accidentally trim that part, sorry ...

1 more reply

j / k navigate · click thread line to collapse