Clojure Plays Mario (opens in new tab)

(blog.phronemophobic.com)

201 pointshotcrossbunny2y ago40 comments

40 comments

Y'know, the thing I least like about these AI video game players is how unlike humans they look. I was wondering about the difference, and I think it comes down to two parts. First and foremost, human players generally prefer routes with a lot of tolerance for input error. Second, humans take frequently "mental planning breaks," stopping for a moment in safe spots before challenging areas.

I think you could juggle the heuristics to demonstrate the preference for input error. For ML training, you could just random vary input timing by up to 20ms or so to teach the algorithm to favor safer moves. For path finding, it's trickier, but there's probably a way to favor "wide" paths. I'm less sure how to express the second concept, pausing briefly in "safe areas," but I imagine it's maybe noticing a place where significant amounts of entering no inputs does not affect the results.

tnecniv2y ago

What you’re basically describing is bounded rationality, which has been widely studied in behavioral economics, psychology, and engineering applications (Simon and Gigerenzer are two big names to google). A common framework for formalizing it is as what boils down versions of rate-distortion problems from information theory (very related to Bayesian statistics).

The reason it’s of engineering interest is, like you observe, bounded-rationality gives you solutions that are sub-optimal but more robust and often simpler.

Moreover, finding wide path solutions emerges naturally from sampling-based motion planners. These planners are asymptotically optimal, but if you terminate them early, they are more likely to give you a solution that goes through large gaps, not smaller ones, because it’s unlikely to sample a trajectory that goes through a tight space without heavy sampling. You could probably formulate that in the rate-distortion framework but I haven’t thought about how to do it precisely.

CobrastanJorji2y ago

Oh cool! It's always hugely useful to learn the word for the thing you're thinking about. It can be really tricky to figure out if a vague idea has a name unless you're already pretty well read in a field. Now I have some reading to do, thanks!

2 more replies

nomel2y ago

> game players is how unlike humans they look.

I think you should watch some speed runners. They also don't look human, since they have some form of optimization in mind, compared to a casual player.

tialaramex2y ago

But, you will see that in most games RTA (Real Time Attack, speed running by humans, live) does choose different strategies than TAS (Tool Assisted Speedrun, still humans, but using tools to record and splice together a sequence of inputs for the game).

For a TAS you can justify taking fifty one-in-ten chances in a row, because every time it doesn't come off you just throw that away and re-record, so maybe you do a few hundred re-records for that section, not bad at all. In RTA that's never going to make any sense, it kills essentially 100% of runs.

1 more reply

sdwr2y ago

The tolerance for error is a huge thing! I also see (or intuit) people making tradeoffs between fault-tolerance and speed, ex. taking a tight curve on a regular block and a wide one on a piranhna plant.

That's more interesting and reassuring to watch. I think it's because the player's mind comes across in the playstyle. It's almost as if their entire history with the game is revealed.

lvncelot2y ago

I guess what's tangentially related is the tendency of AlphaGo (and friends) to favor an extremely likely 1 point win over a still likely (but not as much) 10 point win, making the endgame seem very non-human like. They still absolutely dominate human players but not with the point difference you'd expect a much stronger player to have.

bee_rider2y ago

I think the second issue would be pretty hard to solve without doing something artificial like slowing down the computer a ton, and then trying to come up with an algorithm to, like, delay for processing time.

It is just the nature of computers that they do simple things very fast. Humans do complex things very slowly (but can actually do them). This is why we are friends, we complement each other.

Although I do wonder, if the paths were not made to such tight tolerances (using your input delay solution), maybe AI Mario would spend a little longer lingering in areas just to let things align nicely for less-tight jumps.

twism2y ago

It's really not a time issue. The case is if the AI can pause mid game in a "safe area" to deduce an optimal path not just greedy/lookback

sooheon2y ago

For resting, you can build on your first idea: the random error increases with uninterrupted play time, and resting quickly lowers it.

michaelteter2y ago

I don't think making humans comfortable is the goal with respect to AI. The goal is to actually solve a problem. Performance is second. Human comfort is a distant third or beyond.

When AI can reliably solve a problem without significant negative consequenses from time to time, it's a win. How humans feel about the method is effectively irrelevant.

oivey2y ago

The near miss behavior is very much like overfitting. Mario is simple and deterministic enough that it doesn’t matter, but think about a scenario like a self driving car. A calculated near miss turns into a crash if the other car’s driver is just a little slower or their tires a little slicker than anticipated.

pedrosorio2y ago

> I don't think making humans comfortable is the goal with respect to AI

According to whom?

AI in games, has historically been all about human comfort/enjoyment. Extremely good AI that seems "unnatural" to humans is usually not the goal.

1 more reply

jbaber2y ago

I just couldn't bear all that walking and jumping into near misses in the video.

userbinator2y ago

The AI doesn't care that it's one pixel away from death.

That said, I could see some highly-skilled players (like those who do speedruns) showing off their precision and adopting a similar "scare the audience" style for a new genre of competition.

epiccoleman2y ago

Also just chilling with the Bullet Bill a few pixels away for half the run!

adamddev12y ago

This is really cool. I hadn't thought of a path-finding algorithm working on something like a Mario run. Nice to see AI approaches like this aren't getting forgotten for all the neural nets and machine learning.

1 more reply

ianbutler2y ago

Nice to see this! Path finding in Mario works very well! I implemented a bunch of classical methods including an A* based implementation in java back in college to play Mario for coursework and honestly a whole bunch of methods people may not think work well for something like this absolutely do.

https://github.com/iantbutler01/MarioAIImplementation

Notably we were the only group to choose Mario over Pacman and the framework the professor had us use was broken so on top of the algo implementations I also rewrote the forward model and more!

Waterluvian2y ago

I’m guessing because Mario Bros 1 (among others) is deterministic? Ie same reason it can be speedrun so well? Monsters always in exactly the same spot if your previous frames inputs were the same.

scrame2y ago

Huh, nice. I got a little anxiety watching 6-3 of going the low path. That's not a human route, but then watching 8-2 there were a few nice bits that a skilled player would make.

I get that its a lot of trial and error, and its a nice attempt. What would it take to beat the maze levels, though? I guess since the reward is always just moving forward, its hard to tell when you're progressing without an internal map telling you that it's changed and that you're expecting that?

SMB1 i think is great to train on stuff like this, but it does have some secrets and some hiccups. Like, what would it take to get this AI to find the warp zones?

jxf2y ago

The game state and the heuristic need some extra information, since (spoiler?) Mario leaves the screen during the lead-up to the warp zones, and it doesn't look like it can account for that sort of positioning.

dustingetz2y ago

emulator integration is https://www.libretro.com/index.php/api/?amp=1 (warning, f’ed website) and the clojure wrapper (by OP) is: https://github.com/phronmophobic/clj-libretro/blob/67f186e87...

really cool!

matheusmoreira2y ago

That's really cool. Does that mean libretro has gained debugging inferfaces?

crispyambulance2y ago

One thing I've always wondered about these automated game-playing applications is what they use as input.

Are they just getting images (frame by frame) as input, perceiving them in some way, and producing an output stream of controller button-pushes?

Or does it only detect "death" and start over, systematically changing it's robotically timed output stream in response with the goal of getting further along in the game?

I know the retro games were very deterministic. Pacman had "patterns" you could use. Was Mario like this? No random adversaries at all?

cfiggers2y ago

I have seen systems that use frames/screen captures as input (look up sentdex' GTA series on YouTube for an example) but that isn't what's happening here. In this case, it looks like the `dist` function is directly accessing key game values by reading bytes out of the emulator's memory. So the AI is taking multiple signals from the game's state as input, not just "alive/dead."

The blog post explains that part a bit under the "Distance" section.

matheusmoreira2y ago

They're reading the game's state from memory. This is also commonly done in tool assisted speedruns.

> Was Mario like this? No random adversaries at all?

Yeah. Read all about it here:

https://tasvideos.org/GameResources/NES/SuperMarioBros

https://tasvideos.org/GameResources/CommonTricks#LuckManipul...

https://tasvideos.org/GameResources/CommonTricks#ExamineGame...

https://tasvideos.org/LuckManipulation

anandvc2y ago

Can anyone help me understand this: How does the AI "see" what's happening on the screen? Is there some sort of object/scene recognition going on?

cfiggers2y ago

It looks like the `dist` function is directly accessing key game values by reading bytes out of the emulator's memory in order to "see" the current game state. This is explained a bit under the "Distance" section.

capableweb2y ago

Quick scan of the code (https://github.com/phronmophobic/clj-libretro/blob/67f186e87...) seems to indicate that there is at least some screenshotting going on, + serialization of game state. What is used for detecting thing, I couldn't find from just skimming the code.

sbierwagen2y ago

I think the screenshots are just for assembling the time lapse video.

Check out `(defn dist` or read the explanation of that function in the blog post. It can't see enemies or pickups at all. It can only tell how far away it is from the end of the level, and if it's dead or not, then brute forces every button combination for every frame. The "in progress" video shows that it spends most of its time falling into pits, but eventually makes it to the end.

A fun metric would be how many deaths it takes to complete a given level. It's got to be in the tens of thousands.

devinprater2y ago

Now just imagine an AI that could help me, a blind person, play Mario, or Zelda, and such.

ccppurcell2y ago

Here's a writing tip: delete that first sentence.

hotcrossbunnyOP2y ago

I can never look at Mario the same

charlieyu12y ago

Pretty interesting. Human players are grinding Mario to squeeze some milliseconds in speed runs, while the AIs still aren’t close to that, to the point that we are happy that at least AI is beating a level

j / k navigate · click thread line to collapse

40 comments

CobrastanJorji2y ago

tnecniv2y ago

The reason it’s of engineering interest is, like you observe, bounded-rationality gives you solutions that are sub-optimal but more robust and often simpler.

CobrastanJorji2y ago

2 more replies

nomel2y ago

> game players is how unlike humans they look.

I think you should watch some speed runners. They also don't look human, since they have some form of optimization in mind, compared to a casual player.

tialaramex2y ago

1 more reply

sdwr2y ago

That's more interesting and reassuring to watch. I think it's because the player's mind comes across in the playstyle. It's almost as if their entire history with the game is revealed.

lvncelot2y ago

bee_rider2y ago

It is just the nature of computers that they do simple things very fast. Humans do complex things very slowly (but can actually do them). This is why we are friends, we complement each other.

twism2y ago

It's really not a time issue. The case is if the AI can pause mid game in a "safe area" to deduce an optimal path not just greedy/lookback

sooheon2y ago

For resting, you can build on your first idea: the random error increases with uninterrupted play time, and resting quickly lowers it.

michaelteter2y ago

I don't think making humans comfortable is the goal with respect to AI. The goal is to actually solve a problem. Performance is second. Human comfort is a distant third or beyond.

When AI can reliably solve a problem without significant negative consequenses from time to time, it's a win. How humans feel about the method is effectively irrelevant.

oivey2y ago

pedrosorio2y ago

> I don't think making humans comfortable is the goal with respect to AI

According to whom?

AI in games, has historically been all about human comfort/enjoyment. Extremely good AI that seems "unnatural" to humans is usually not the goal.

1 more reply

jbaber2y ago

I just couldn't bear all that walking and jumping into near misses in the video.

userbinator2y ago

The AI doesn't care that it's one pixel away from death.

That said, I could see some highly-skilled players (like those who do speedruns) showing off their precision and adopting a similar "scare the audience" style for a new genre of competition.

epiccoleman2y ago

Also just chilling with the Bullet Bill a few pixels away for half the run!

adamddev12y ago

1 more reply

ianbutler2y ago

https://github.com/iantbutler01/MarioAIImplementation

Notably we were the only group to choose Mario over Pacman and the framework the professor had us use was broken so on top of the algo implementations I also rewrote the forward model and more!

Waterluvian2y ago

I’m guessing because Mario Bros 1 (among others) is deterministic? Ie same reason it can be speedrun so well? Monsters always in exactly the same spot if your previous frames inputs were the same.

scrame2y ago

Huh, nice. I got a little anxiety watching 6-3 of going the low path. That's not a human route, but then watching 8-2 there were a few nice bits that a skilled player would make.

SMB1 i think is great to train on stuff like this, but it does have some secrets and some hiccups. Like, what would it take to get this AI to find the warp zones?

jxf2y ago

dustingetz2y ago

emulator integration is https://www.libretro.com/index.php/api/?amp=1 (warning, f’ed website) and the clojure wrapper (by OP) is: https://github.com/phronmophobic/clj-libretro/blob/67f186e87...

really cool!

matheusmoreira2y ago

That's really cool. Does that mean libretro has gained debugging inferfaces?

crispyambulance2y ago

One thing I've always wondered about these automated game-playing applications is what they use as input.

Are they just getting images (frame by frame) as input, perceiving them in some way, and producing an output stream of controller button-pushes?

Or does it only detect "death" and start over, systematically changing it's robotically timed output stream in response with the goal of getting further along in the game?

I know the retro games were very deterministic. Pacman had "patterns" you could use. Was Mario like this? No random adversaries at all?

cfiggers2y ago

The blog post explains that part a bit under the "Distance" section.

matheusmoreira2y ago

They're reading the game's state from memory. This is also commonly done in tool assisted speedruns.

> Was Mario like this? No random adversaries at all?

Yeah. Read all about it here:

https://tasvideos.org/GameResources/NES/SuperMarioBros

https://tasvideos.org/GameResources/CommonTricks#LuckManipul...

https://tasvideos.org/GameResources/CommonTricks#ExamineGame...

https://tasvideos.org/LuckManipulation

anandvc2y ago

Can anyone help me understand this: How does the AI "see" what's happening on the screen? Is there some sort of object/scene recognition going on?

cfiggers2y ago

capableweb2y ago

sbierwagen2y ago

I think the screenshots are just for assembling the time lapse video.

A fun metric would be how many deaths it takes to complete a given level. It's got to be in the tens of thousands.

devinprater2y ago

Now just imagine an AI that could help me, a blind person, play Mario, or Zelda, and such.

ccppurcell2y ago

Here's a writing tip: delete that first sentence.

hotcrossbunnyOP2y ago

I can never look at Mario the same

charlieyu12y ago

j / k navigate · click thread line to collapse