Show HN: LLM plays Pokémon (open sourced) - https://news.ycombinator.com/item?id=43187231
Now here's a concept for anyone with more money than sense: ClaudePlaysTwitchPlaysPokemon, where it's TPP but every participant is Claude. Would hivemind AI consensus perform better than a single AI? Anthropic's certainly looking into it! [1]
[0]: https://www.oneusefulthing.org/p/a-new-generation-of-ais-cla...
[1]: https://www.anthropic.com/news/visible-extended-thinking
Right now, Claude has been stuck in Mt Moon for nearly a day. It keeps forgetting where it has been. It also almost always runs from battles instead of changing Pokemon or fighting.
At one point it got stuck in a Pokemon center when it mistook the character's red hat for the red carpet around the exit. It kept pressing down and wondering why it wasn't working. It only broke out of that when it mistakenly concluded it had successfully exited the Pokemon center. Then it wandered around a bit and only realized it was still in the Pokemon center after talking to Nurse Joy.
> It also almost always runs from battles instead of changing Pokemon or fighting.
I believe this is because all of its Pokemon are on the verge of fainting, so it's trying to conserve them while it tries to find its way out.
> It keeps forgetting where it has been.
I'm wondering if this could be solved with a better harness; on one hand, that hurts the elegance of having one model dedicated to playing the game, but their existing harness is already cheating a little (they have a second LLM for verification). They're frequently compacting what's in context, which means its visual memory is quite poor - that could potentially be a point of improvement?
Good luck
One of the biggest challenge this Claude version faces is to read the visual data accurately. It was stuck in the Viridian forest and Pokemarts for a while because the overworld objects like trees and paths kept confusing it.
That's precisely the pet project I'd take on if/when I bother to take the time making some deep learning agent. There's a bot that plays one of the ladders already but it's just a decision tree and the best players know how to predict its moves. It's like ~1500 ELO in a ladder where the best players are 1800+. Still not bad, to be fair; it would probably beat me.
The bot has a pre-selected team, which I believe always starts with the same mon. I'd be more interested in an agent that fully played the game, start-to-finish, including making a team based on play data and selecting a starter based on the current opponent's team.
But Nintendo will never take down anything that is related to Showdown because it would highlight their massive hypocrisy!
It would set a precedent. People would go: "wait, but why did they never take down Showdown itself? Could it be that it's because they actually benefit from its existence? Then why did they take down X/Y/Z? Oh! It's because copyright law only applies when you want it to! It's all arbitrary and made up! You just need to be friends with the right people in the VGC and your pet project will be immune from all legal backlash!"
Or something.
Seriously I hate it so fucking much that Nintendo does nothing about Showdown, which blatantly steals a ton of game assets, and then nukes some random guy's fan project that no one ever played.
https://x.com/AnthropicAI/status/1894419017756029427?t=xDXk6...
https://pwhiddy.github.io/pokerl-map-viz/
(works best on desktop)
First get the model to beat a game, then work on better decision-making, then try to speed up the decision-making. Then repeat when better models come out.