Skip to content

Top New Best Ask Show Jobs

Beating the World’s Best at Super Smash Bros. with Deep Reinforcement Learning | Better HN

Beating the World’s Best at Super Smash Bros. with Deep Reinforcement Learning (opens in new tab)

(arxiv.org)

202 pointswillwhitney9y ago55 comments

55 comments

gwern9y ago

Note: it doesn't learn from pixels but features directly from RAM; and superhuman reaction time, with performance badly degrading when human-like delays added.

Good discussions on Reddit: https://www.reddit.com/r/MachineLearning/comments/5vh4ae/r_a... https://www.reddit.com/r/smashbros/comments/5vin8x/beating_t...

stcredzero9y ago

I could see this technology used for the bootstrapping of highly emergent MMO game worlds. It could be used to populate a world with fake "player" NPCs that are actually part of a simulated online ecosystem. Give the NPCs a large enough population, such that players cannot exert significant selection pressure, but give the NPCs real selection pressure through interaction with artificial life evolved with Genetic Algorithms. The rate of evolution of the a-life and the NPCs could be tuned to provide a comfortable rate of change for the human players, and the NPCs would insulate the players from the frustrations GAs might cause.

League of legends for example has bots appear in PvP games. While these bots are not produced by the game's developers not a lot was done then to get rid of these things. I guess they were tolerated since it just make the queue times smaller for human players.

( http://boards.na.leagueoflegends.com/en/c/gameplay-balance/b... )

Or even some kind of "Always online" MMO like chronicles of elyria. Being able to just tell your character to play while you are away without 'scripting' them would be nice.

https://chroniclesofelyria.com/

This reminds me of Starcraft AI experiments. They can't actually make the computer smart, so they just jam 2000 button presses per second down the tube, giving every single unit its own simultaneous AI, and it out micromanages anyone.

With Marines usually.

RoboTeddy9y ago

I heard that the DeepMind Starcraft project intends to limit their AI's APM (actions per minute) down to something human-like.

nickpsecurity9y ago

That's not what Starcraft AI field is about. They actually started with a combo of people doing planner-oriented systems and micro-oriented systems. Hybrids followed that. There's many methods at play. Here's a survey:

https://www.cs.mun.ca/~dchurchill/pdf/starcraft_survey.pdf

The competitions that involved humans showed humans destroyed them by spotting their patterns and beating those patterns. Also with bluffing or distractions such as having one unit do weird things around their base as the human player built up an army. The bots that beat humans will have to learn to spot bluffs and other weird patterns humans will do to screw with them. On top of all the stuff prior AI did with human-level talent. My money is on humans for DeepMind vs Starcraft although I'm happy to be proven wrong.

In starcraft there's a much bigger advantage since humans are inherently "single threaded", and so you can get much bigger discrepancies in APM (or EPM). Smash is more like 1 unit vs 1 unit micro. The precision and timing are still advantages for the AI, but not so much raw parallel compute.

erik9y ago

Broodwar bots perform poorly against competent humans though. Micro advantage or not, the strategic decision making isn't there yet.

hkmurakami9y ago

Or individual muta micro. That was the winning "strategy" in the first BWAI cup many years ago.

stale20029y ago

Honestly, it still isn't even that good. Best startcraft AI in the world that cheats, still can't beat the low tier pros.

That's an interesting definition of smart that doesn't including being able to manage hundreds of units simultaneously.

I was similarly disappointed when I read this, but upon further reflection I still like this paper. It is very plausible that both of these problems could be fixed, it would just take a lot more time/power to train, and the resulting system would likely not run in real time making it impossible to test against real humans.

Further advancement in this area will require huge leaps in hardware performance. Luckily in the next few years I expect that the pace of improvement in specialized hardware for neural nets will far outpace Moore's Law.

gwern9y ago

I'm not nearly that pessimistic. Beating SSBM is well within the capability of a well-tuned A3C, and definitely within the capabilities of a group like DeepMind. More neuromorphic hardware is unnecessary and with current RL methods, they are more CPU-bound than GPU-bound (take a look at the NN they use, it's trivially small; most of the computation goes towards running many SSB games in parallel in order to generate any data to do some small updates on the NN).

I believe they've handicapped themselves, actually, with their shortcuts: the performance of agents is crippled by the inability to see projectiles due to the choice to avoid learning from pixels (which I bet would actually be quite fast, as learning from pixels is not the bottleneck in ALE), and likewise the use of the other RAM features is the path of the Dark Side - allowing immediate quick learning through huge dimensionality reduction, seductively simple, yes, yet poison in the end as the agent is unable to learn all the other things it would've learned (such as projectiles). I suspect that this is why their current implementation is unable to learn to play multiple characters: because it can't see which character it is and what play style it should use.

So I would not be surprised at all to hear in a year or two that human-delay-equivalent agent using raw pixels could beat human champs routinely.

willwhitneyOP9y ago

Handling delays (and the uncertainty they entail) is a huge challenge, and I think it'll be a rich area of research. The simplest part of the problem is that delays in action or perception also slow the propagation of reward signals, and credit assignment is still a really hard problem.

Thinking further afield, future models could learn to adapt their expectations to fit the behavior of a particular opponent. This kind of metalearning is pretty much a wide open problem, though a pair of (roughly equivalent) papers in this direction recently came out from DeepMind: https://arxiv.org/abs/1611.05763 and OpenAI: https://arxiv.org/abs/1611.02779 It's going to be really exciting to see how these techniques scale.

hyperbovine9y ago

Really naive question, can't they just train the net to react instantaneously on a $d$-delayed screen? I don't see conceptually why this approach would succeed with d=0 but fail for (say) d=25ms. (I am too busy/lazy to read the papers and understand what breaks down.)

revelation9y ago

we instead use features read from the game’s memory on each frame, consisting of each player’s position, velocity, and action state, along with several other values

So it's cheating, presumably knowing the opponents action before the animation even starts to play.

scythe9y ago

Smash is played on analog displays precisely so that the lag between RAM and the display can be as small as possible, usually 50 ms. In fact there's a 50 ms delay added to the AI for this reason. However, the AI takes no account of the fact that it takes about 230 ms for a signal to travel from a human's retina through the occipital lobe and motor cortex and activate the motor neurons in the hand. The AI can also generate input sequences that are nearly impossible for a human, such as the "dustless [i.e. perfect] dashdance".

But this is what a top player (who regularly beats both of the players tested in the study) looks like playing against a hand-coded bot:

https://www.youtube.com/watch?v=9qWHM8DNdr8

and this is what the humans eventually learned to do:

https://www.youtube.com/watch?v=be8UDlVuAl8

Even if you add reaction time, a big part of Smash skill for humans comprises accurately manipulating the analog stick. The computer can just declare any angle it wants; you're not having a fair competition until you build a robot thumb that manipulates a joystick the way humans do, IMO. Otherwise a character like Pikachu can recover perfectly every time.

It does not see the opponent's actions before they take effect on screen, and the actual controller states are not part of the feature representation we used (though they actually are somewhere in the RAM).

stagbeetle9y ago

Part of the skill in competitive play is to be able to predict what move your opponent is going to do next.

Most mid-level players already have a good grasp of prediction, which is arguably along the sames lines of being able to know with certainty what action your opponent is taking a few frames before he does it.

Coupling that with pretty obscene frame-lag for Smash, it's not really that much of an advantage.

As well that competitive isn't really that impressive considering how limited your actions are by banning items and more dynamic stages (see: restricting RNG). In this way, it's nothing more than a simple chess-bot. Now, if it could actually take in complex environments and multiple tools, that'd be pretty next level.

brilee9y ago

Video of the AI here, playing as the black captain falcon: https://www.youtube.com/watch?v=dXJUlqBsZtE

We all know that Mew2King is first reinforcement learning AI capable of beating Super Smash Bros pro players.

https://www.youtube.com/watch?v=z-1YfhUFtbY&feature=youtu.be...

forgotmysn9y ago

and he still can't beat Armada

I am possibly being here the person who accidentally takes the joke literally, but Mew2King has in fact beaten Armada on three occasions: Once at SKTAR 3, once at Smash Summit 2, and most recently at UGC Smash Open.

jwtadvice9y ago

While the AI might be cheating by taking salient features from RAM rather than from pixel values, this is still an incredible feat. Just a few years ago we did not have generic algorithms that could take even salient features and self-learn policies to near this level this quickly.

willwhitneyOP9y ago

Yup, it's definitely an advantage to get all the correct values from the game state. But not as much as you might think; the vision portion of a DQN or similar trains quite quickly.

Plus, our bot doesn't have any clue about projectiles. We don't know where they live in memory, so the network doesn't get to know about them at all.

Can I ask what the feature set looked like? I always kind of wanted to do this with the Skullgirls AI, but never had the time while we were developing it. As a developer, I obviously had full access to the game state, but I'm still not really sure what the best way to represent that state to a neural network is.

Blackthorn9y ago

Getting them from RAM instead of the screen doesn't give you an advantage on (for example) DI or ledge teching?

smaili9y ago

As someone who's played for quite a while I can tell you SSBM is one of the most complex games I've ever come across.

jensv9y ago

Why do you think the game is complex? Fairly simple game with low barrier to entry which is great when you invite guests over for games. Super Simple Button Mash!

chrisdbaldwin9y ago

Likely due to the advanced, non-intuitive mechanics that have been discovered over the years. The entry barrier may be low, but the skill cap is high.

lanius9y ago

I'm impressed it beat the likes of S2J and Zhu. I wonder how it'd fare against the Five Gods?

WhitneyLand9y ago

What's the key insight here compared to previous systems?. As far as I can tell, still no one can beat simple non-deterministic games that require some planning.

My favorite example is Ms. Pac Man because it seems so old and simplistic. Been tried by a dozen teams and no one can beat a decent human.

cerved9y ago

Civ AI has denounced this research

I was expecting a video.

j / k navigate · click thread line to collapse