Do we know what reenforcement method they used ? Did the training on one level of breakout had the algorithm perform well on other levels of the same game without any new training ?
Did those games had any kind of random behavior or does the same things happen all the time at the same time ?
It is a progress, i agree, but all those games are just about issueing sequences of "left right" commands to maximize the time spent playing the game.
Things would be a lot different if they could somehow analyze the structure of the network's "conceptual" layer to identify functions over areas ( like " this is where ball trajectory is identified, and we can see it rest and activate depending on the ball's motion" or something similar).
But the slide on his presentation shows a big question mark there, which isn't really reassuring.