EDIT: Just to be clear, this is a joke based on the game being flappy bird.
Seriously, though, this is awesome. I love this kind of stuff!
while "pigs" != "fly":
#define ever (;;)
...
for ever {
...
}A K-level breadth first search mimicking the optimal policy and a simple learning to search algorithm with a cost sensitive binary linear classifier would work well too.
After training it would be a constant time evaluation of what to do next.
The whole point of playing is doing your decisions jointly, dependent on the previous decisions. If you learn your model that way it'll make its decisions trying to minimize future regret.
Local optimality is a very nice property. It means that if you play out a game, not a single change of any of the previous moves could lead you to a better result. Of course, local optimality is hard but for some problems it's pretty easy to achieve if your optimal policy is good, and your features are adequate (which they will be if you use neural networks).
Of course, flappy bird is pretty local game and all of this might be an overkill :D
AlphaGo wasn't trained jointly over Go games, so it's lacking in that regard. But the power of neural networks is compensating. Who can imagine what AlphaGo would be like if they trained their policy networks jointly? :D
A nice introduction to LSTMs: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
[1]: http://arxiv.org/pdf/1502.02206.pdf