I think you could juggle the heuristics to demonstrate the preference for input error. For ML training, you could just random vary input timing by up to 20ms or so to teach the algorithm to favor safer moves. For path finding, it's trickier, but there's probably a way to favor "wide" paths. I'm less sure how to express the second concept, pausing briefly in "safe areas," but I imagine it's maybe noticing a place where significant amounts of entering no inputs does not affect the results.
The reason it’s of engineering interest is, like you observe, bounded-rationality gives you solutions that are sub-optimal but more robust and often simpler.
Moreover, finding wide path solutions emerges naturally from sampling-based motion planners. These planners are asymptotically optimal, but if you terminate them early, they are more likely to give you a solution that goes through large gaps, not smaller ones, because it’s unlikely to sample a trajectory that goes through a tight space without heavy sampling. You could probably formulate that in the rate-distortion framework but I haven’t thought about how to do it precisely.
I think you should watch some speed runners. They also don't look human, since they have some form of optimization in mind, compared to a casual player.
For a TAS you can justify taking fifty one-in-ten chances in a row, because every time it doesn't come off you just throw that away and re-record, so maybe you do a few hundred re-records for that section, not bad at all. In RTA that's never going to make any sense, it kills essentially 100% of runs.
That's more interesting and reassuring to watch. I think it's because the player's mind comes across in the playstyle. It's almost as if their entire history with the game is revealed.
It is just the nature of computers that they do simple things very fast. Humans do complex things very slowly (but can actually do them). This is why we are friends, we complement each other.
Although I do wonder, if the paths were not made to such tight tolerances (using your input delay solution), maybe AI Mario would spend a little longer lingering in areas just to let things align nicely for less-tight jumps.
When AI can reliably solve a problem without significant negative consequenses from time to time, it's a win. How humans feel about the method is effectively irrelevant.
According to whom?
AI in games, has historically been all about human comfort/enjoyment. Extremely good AI that seems "unnatural" to humans is usually not the goal.
That said, I could see some highly-skilled players (like those who do speedruns) showing off their precision and adopting a similar "scare the audience" style for a new genre of competition.
https://github.com/iantbutler01/MarioAIImplementation
Notably we were the only group to choose Mario over Pacman and the framework the professor had us use was broken so on top of the algo implementations I also rewrote the forward model and more!
I get that its a lot of trial and error, and its a nice attempt. What would it take to beat the maze levels, though? I guess since the reward is always just moving forward, its hard to tell when you're progressing without an internal map telling you that it's changed and that you're expecting that?
SMB1 i think is great to train on stuff like this, but it does have some secrets and some hiccups. Like, what would it take to get this AI to find the warp zones?
really cool!
Are they just getting images (frame by frame) as input, perceiving them in some way, and producing an output stream of controller button-pushes?
Or does it only detect "death" and start over, systematically changing it's robotically timed output stream in response with the goal of getting further along in the game?
I know the retro games were very deterministic. Pacman had "patterns" you could use. Was Mario like this? No random adversaries at all?
The blog post explains that part a bit under the "Distance" section.
> Was Mario like this? No random adversaries at all?
Yeah. Read all about it here:
https://tasvideos.org/GameResources/NES/SuperMarioBros
https://tasvideos.org/GameResources/CommonTricks#LuckManipul...
https://tasvideos.org/GameResources/CommonTricks#ExamineGame...
Check out `(defn dist` or read the explanation of that function in the blog post. It can't see enemies or pickups at all. It can only tell how far away it is from the end of the level, and if it's dead or not, then brute forces every button combination for every frame. The "in progress" video shows that it spends most of its time falling into pits, but eventually makes it to the end.
A fun metric would be how many deaths it takes to complete a given level. It's got to be in the tens of thousands.