ChatGPT's Chess Elo is 1400 (opens in new tab)

(dkb.blog)

212 pointsdbrereton3y ago345 comments

345 comments

k20523y ago

This is so easy to disprove it makes it look like the author didn't even try.

Here is the convo I just had:

me: You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed

ChatGPT: Sure, I'd be happy to help! Please provide the move sequence and I'll give you my response.

me: 1. e3

ChatGPT: 1... e5

me: 2. ne2

ChatGPT: 2... d6

me: 3. g4

ChatGPT: 3... h5

me: 4. g5

ChatGPT: 4... Ne4

Ne4 is illegal. Also you'd be hard pressed to find any human player that would miss the hanging pawns much less a 1400 elo player. GPT has been trained on chess games and is predicting patterns, this has been known since the GPT-2 days. The daily hype articles where someone coerces a statistical model into "demonstrating" an emergent capability sure are getting old. Claiming emergent capabilities (like having learned chess) are an extraordinary claim (LLMs most likely cant learn things) and I really wish people would put more effort when they make these claims. But the AI hype problem is as old as AI research itself...

vidarh3y ago

You're "disproving" the article by doing things differently to how the article did. If you're going to disprove that the method given in the article does as well as the article claims at least use the same method.

k20523y ago

You are right that my method differed slightly so I did things again. It took me one try to find a sequence of moves that "breaks" what is claimed. You just have to make odd patterns of moves and it clearly has no understanding of the position.

Here is the convo:

me: You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed

ChatGPT: Alright, I'm ready to play! Please give me the move sequence.

me: 1. e3 Nf6 2. f4 d6 3. e4

ChatGPT: My next move as black would be 3... e5

Completely ignoring the hanging pawn.This is not the play of a 1400 elo player. It is the play of something predicting patterns.

I ran a bunch of experiments in the past where I played normal moves and ChatGPT does respond extraordinarily well. With the right prompts and sequences you can get it to play like a strong grandmaster. But it is a "trick" you are getting it to perform by choosing good data and prompts. It is impressive but it is not doing what is claimed by the article.

nwienert3y ago

I'll add in as someone new to chess (~800 ELO):

ChatGPT is in no way 1400, or even close to it. The fact this article gets upvoted around here is proof that people aren't thinking clearly about this stuff. It's trivially easy to prove it wrong. Live unbelievably so, I tried the same prompt and within 12 moves it made multiple ridiculous errors I never would, and then an illegal move.

Keep in mind a 1400 level player would need to basically make 0 mistakes that bad in a typical game, and further would need to play 30-50 moves in that fashion, with the final moves being some of the most important and hard to do. There's just no way it's even close, my guess would be even if you correct it's many errors, it's something like ~200 ELO. Pure FUD.

The author of this article is cashing in the hype and I'm wondering how they even got the results they did.

2 more replies

echelon3y ago

Criticisms like this are exactly how the model will grow multimodal support for chess moves.

Keep poking it and criticizing it. Microsoft and OpenAI are on HN and they're listening. They'd find nothing more salient to tout full chess support in their next release or press conference.

With zero effort the thing understands uber domain specific chess notation and the human prompt to play a game. To think it stops here is wild.

People are hyping it because they want to get involved. They want to see the crazy and exciting future this leads to.

2 more replies

throwwwaway693y ago

He literally used the same prompt as the article.

Claim: "ChatGPT's Chess Elo is 1400"

Reality: ChatGPT gives illegal moves (this happened to article author too), something a 1400 ranked player would never do

Result: ChatGPT's rank is not 1400.

erulabs3y ago

No, the author of the article specifically says that the entire move sequence should be supplied to chatGPT each time, not simply the next move. Be very careful when "disproving" an experiment with squinted eyes.

1 more reply

unyttigfjelltol3y ago

The author said ChatGPT gives illegal moves. So, a quirky sort of 'grandmaster'. He considered illegal moves to be a resignation. Maybe you need to tell ChatGPT that the alternatives are to win via legal moves, and if it is not possible to do so, to resign? Does that fix it?

mynameisvlad3y ago

> something a 1400 ranked player would never do

The fact that rules and articles exist describing what to do if you or your opponent makes an illegal move indicates this is not the case.

Humans are also... human. They make mistakes. It may not happen often at 1400, but to say that it'll never happen is preposterous.

3 more replies

PaulHoule3y ago

It’s super scary how ChatGPT brings out people who are veeeery good at seeing the Emperor’s clothes.

YeGoblynQueenne3y ago

You know, I didn't remember the story very well so I checked wikipedia. Here's what it says about the (start of) the plot:

>> Two swindlers arrive at the capital city of an emperor who spends lavishly on clothing at the expense of state matters. Posing as weavers, they offer to supply him with magnificent clothes that are invisible to those who are stupid or incompetent. The emperor hires them, and they set up looms and go to work. A succession of officials, and then the emperor himself, visit them to check their progress. Each sees that the looms are empty but pretends otherwise to avoid being thought a fool.

So everyone "pretends otherwise to avoid being thought a fool".

Huh. I guess that explains it. Good metaphor.

z3c03y ago

They are disproving an assertion. Demonstrating that an alternate approach implodes the assertion is a perfectly acceptable route, especially when the original approach was cherry-picking successes and throwing out failures.

I wish I could just make bullshit moves and get a higher chess ranking. Sounds nice.

whimsicalism3y ago

I disagree. If there is a procedure for getting ChatGPT to play chess accurately and you discard that and do some naive approach as a way of disproving the article, doesn't sound to me like you have disproven anything.

I dont understand the point of your second sentence, seems to be entirely missing the substance of the conversation.

1 more reply

vidarh3y ago

It was not throwing out failures. It was treating even the first illegal move as a forfeit something which is stricter than FIDE rules.

1 more reply

sebzim45003y ago

He claims he was forfeiting every time he got an illegal move. Does no one on this website actually read the article?

Whether any of it is actually true is a different question.

1 more reply

whymauri3y ago

I was able to get better performance by always providing the prior sequence of moves and forcing ChatGPT to also respond with the sequence of moves up until its move.

Edit: I told the model that if the sequence was wrong or illegal, it forfeits the game. Without doing this, GPT would argue with me that it won and I didn't know the rules (serious).

Spivak3y ago

From the article.

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” So whenever it wanted to make an illegal move, it resigned.

But you can do even better than the OP with a few tweaks.

1. One is by taking the most common legal move from a sample of responses.

2. Telling GPT what all the current legal moves are telling it to only respond with an element from the list.

3. Ending the prompt with the current sequence of moves and having it complete from there.

tracker13y ago

How many 1400 human chess players do you have to explain every possible move to it every single move?

sacred_numbers3y ago

When you are speaking to a person, they have inner thoughts and outer actions/words. If a person sees a chess board they will either consciously or unconsciously evaluate all the legal moves available to them and then choose one. An LLM like ChatGPT does not distinguish between inner thoughts and outer actions/words. The words that it speaks when prompted are its inner thoughts. There is also no distinction between subconscious and conscious thoughts. Humans generate and discard a multitude of thoughts in the subconscious before any thoughts ever make it to the conscious layer. In addition, most humans do not immediately speak every conscious thought they have before evaluating it to see whether speaking it aloud is consistent with their goals.

There's already a lot of research on this, but I strongly believe that eventually the best AIs will consist of LLMs stuck in a while loop that generate a stream of consciousness which will be evaluated by other tools (perhaps other specialized LLMs) that evaluate the thoughts for factual correctness, logical consistency, goal coherence, and more. There may be multiple layers as well, to emulate subconscious, conscious, and external thoughts.

For now though, in order to prompt the machine into emulating a human chess player, we will need to act as the machine's subconscious.

Spivak3y ago

I feel like we have very different expectations about what tools like this are good for and how to use them. When I say GPT3 can play chess what I mean is, I can build a chess playing automaton where the underlying decision making system is entirely powered by the LLm.

I, as the developer, am providing contextual information like what the current board state is, and what the legal moves are, but my code doesn't actually know anything about how to play chess, the Llm is doing all the "thinking."

Like it's nuts that people aren't more amazed that there's a piece of software that can function as a chess playing engine (and a good one) that was trained entirely generically.

1 more reply

arrrg3y ago

Does that matter? I’m really very confused by the argument you are making.

That you may have to babysit this particular aspect of playing the game seems quite irrelevant to me.

1 more reply

sebzim45003y ago

When they are blindfolded? Almost all of them.

theptip3y ago

I don’t think this suffices as disproving the hypothesis. It’s possible to play at 1400 and make some idiotic moves in some cases. You really need to simulate a wide variety of games to find out, and that is what the OP did more of. Though I do agree it’s suggestive that your first (educated) try at an edge case seems to have found an error.

This is broadly the “AI makes dumb mistakes” problem; while being super-human in some dimensions, they make mistakes that are incredibly obvious to a human. This comes up a lot with self-driving cars too.

Just because they make a mistake that would be “idiots only” for humans, doesn’t mean they are at that level, because they are not human.

SamBam3y ago

Honestly, I made it make an illegal move in my very first game, in the third move. You just have to do stuff no normal player would do:

> You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed.

1. b4 d5 2. b5 a6 3. b6

> bxc6

That's obviously illegal.

... to all those who are saying "well even good players can make illegal moves sometimes," that's just ridiculous. No player makes illegal moves that often.

nottathrowaway33y ago

> me: You are a chess grandmaster playing as black...

https://upload.wikimedia.org/wikipedia/en/5/5f/Ingmar_Bergma...

The KNIGHT holds out his two fists to CHATGPT, who smiles at him suddenly. CHATGPT points to one of the KNIGHT'S hands; it contains a black pawn.

KNIGHT: You drew black.

CHATGPT: Very appropriate. Don't you think so?

haburka3y ago

He does have a section about GPT 4 which does match your results. Not sure if he added it after your comment or if you accidentally missed it.

Tenoke3y ago

I played a game against it yesterday (it won) and the only time it made an ilegal was move 15 (the game was unique according to lichess database from much earlier) so I just asked it to try again. There's variance in what you get but your example seems much worse.

good_boy3y ago

It should be possible to attach 'accelerators' or 'special skills'. So that when playing with ChatGPT you will be actually facing GNU Chess.

latexr3y ago

> These people used bad prompts and came to the conclusion that ChatGPT can’t play a legal chess game. (…)

> With this prompt ChatGPT almost always plays fully legal games.

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table (…)

> (…) with GPT4 (…) in the two games I attempted, it made numerous illegal moves.

So you’ve ostensibly¹ found a way to reduce the error rate and then deliberately ignored the cases where it failed. In short: it may play valid chess under certain conditions but can’t be trusted to do so. That doesn’t contradict previous findings.

¹ 19 games is a small sample and the supposedly more advanced system failed in your tries.

vidarh3y ago

Fuller context from the article:

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” So whenever it wanted to make an illegal move, it resigned.

(my emphasis)

So the illegal moves are at least part of the reasons for the 6 losses, and factored into the rating. Quickly scanning the game, it seems 3 of the losses ended in checkmate, so that leaves 3 illegal moves in 19 games.

Could be better, but for a system not intentionally built to play chess, it's pretty decent.

swatcoder3y ago

No ELO 1400 player will have that rate of illegal moves, so saying it that it plays with an ELO 1400 rating is disingenuous.

Reinterpreting illegal moves as resignation is absurd when an LLM is formally capable of expressing statements "I resign" or "I cannot conceive of a winning move from here" just as well as any human player. It just doesn't do so because it's not actually playing chess the way we think of an ELO 1400 player playing chess.

JellyBeanThief3y ago

Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

My point is, it sounds like Elo doesn't measure what we want it to measure. If we care about the way an agent wins a game and not just whether it wins a game, then we need an instrument that measures strategy, not outcome.

1 more reply

vidarh3y ago

That no human ELO 1400 player will have that rate of illegal moves may be true, but if anything treating the very first illegal move as forfeit appears to be stricter than most rules

arrrg3y ago

Does that matter? Seems weird to me to make that argument. I’m honestly quite confused by it.

A bowling bot that threw strikes 9 out of 10 throws and a gutter ball one time out of ten would still be a great bowler even though no human with the ability to make strikes that often would pretty much ever throw a gutter ball.

This is a weird kind of alien intelligence that does not have to behave like humans.

TheRealPomax3y ago

Note that the claim is not that it's an ELO 1400 human equivalent player but that it can play chess at a level that gives it an ELO of 1400, which is not nitpicking: that's a completely different thing. We're not testing whether it plays like a player with ELO x, we're proving that "it can't play chess" is fallacious. It can, and when prompted properly, it can achieve an ELO of 1400.

ELO allows for illegal moves: as per the rules of chess, you lose the game if you make an illegal move. The end, ELO doesn't care about why you lost a game on purpose.

jart3y ago

I personally find that makes it more astonishing, that it would slip up on knowing the most basic elements of the game, yet still be able to play better than most humans. Highly smart people sometimes say or do little things when foraying into other fields that causes domain experts think they're not one of them. But that usually doesn't stop smart people from having an impact in making a contribution with their insights. The question of illegal moves is superficial, since most online systems have guardrails in place that prevent them. At worst it's just an embarrassment and I don't think machines care about being embarrassed.

1 more reply

charcircuit3y ago

ELO is based off who you win and lose against. The rate of illegal moves has nothing to do with ELO.

Pxtl3y ago

I'd be interested if it could be coaxed into legal moves after making an illegal one. "That is an illegal move. Can you do something legal with this board?"

saghm3y ago

So it sounds like it can play _some_ legal chess games, but not all; it's unable to consistently complete a game where it loses. Maybe the remaining work shouldn't be focused on trying to teach it chess rules better, but to teach it sportsmanship better. People were so excited about teaching it high-school level academics that we forgot to teach it the basic lessons we learn in kindergarten.

vidarh3y ago

It seems like it plays mostly legal chess games, when not explicitly reminded of the rules. There's no problem of sportsmanship when it makes mistakes in a game it has not been verified to understand the rules of.

1 more reply

kmeisthax3y ago

Or append "If you wish to resign or you cannot think of a legal move, type 'resign'" to the end of the prompt.

1 more reply

jmull3y ago

The article also says in one game chatgpt when crazy so they continued the game with a fresh chat. That probably should have been counted a resignation loss too.

nextaccountic3y ago

> So whenever it wanted to make an illegal move, it resigned.

Making an illegal move counts as losing by the laws of chess, so this is essentially correct

dudeinjapan3y ago

Obviously the article should be taken with a giant grain of salt. That being said, not many things what aren't designed to play chess can play chess, with or without coaxing. My dog cannot, for instance, nor can my coffee table.

hectorlorenzo3y ago

> My dog cannot, for instance, nor can my coffee table.

You must be giving them the wrong prompts.

ogogmad3y ago

[redacted]

1 more reply

ballenf3y ago

The illegal moves were counted as losses/resignations, not ignored.

__s3y ago

> So whenever it wanted to make an illegal move, it resigned.

Doesn't sound like ignoring the cases where it failed

Waterluvian3y ago

I’m going to float something ridiculous:

An illegal move is a valid play. You might not get caught. I think there are some Magnus games where illegal moves went overlooked and impacted the game.

You could interpret this as “ChatGPT wants to cheat sometimes.” But I personally interpret it as “ChatGPT doesn’t understand what it’s doing. It’s just a really really good simulacrum.”

hgsgm3y ago

Is this the top comment (and not even grey) because more people failed to read the article than read it?

whimsicalism3y ago

A baffling thread.

They quoted the article, so clearly they read it... but not very well?

sebzim45003y ago

It does seem that way.

whimsicalism3y ago

I'm confused. If you read the article, you know that you are wrong - but you are quoting the article?

psychphysic3y ago

That's how one uses any tool.

qwytw3y ago

The behavior of pretty much every other tool is much easier to interpret though.

kdmccormick3y ago

If the title of the article was:

> A trivial wrapper around ChatGPT has a Chess Elo of 1400

would you have any issue?

Afaict, the thesis is the article is not "ChatGPT is the ideal tool for playing AI chess", but "it is interesting how well ChatGPT can play chess with some very simple tweaks."

Out_of_Characte3y ago

Yes, but it also completely invalidates the measurement of a 1400 elo rating. By comparison, any player making an illegal move is forfeiting the game, almost all people from ~300 elo can play without making illegal moves, chatgpt cant.

ncallaway3y ago

> almost all people from ~300 elo can play without making illegal moves

I don't believe you. Are you giving those people a restricted move set (i.e. computer chess, where it will _only_ allow legal moves)? Because if you give people an unrestricted board, I _guarantee_ you people will make lots of illegal moves.

Me: Moves pawn

Opponent: You can't do that, you exposed your king to check.

Me: Oops, sorry, you're right.

nsxwolf3y ago

Why do illegal moves forfeit? In online play, they're validated. You can't make illegal moves. What's the ELO score if ChatGPT is corrected, and chooses a new move?

hgsgm3y ago

All this above, and people are claiming that ChatGPT lacks human level comprehension of the text it consumes.

In Chess.com, you absolutely can attempt an illegal move, and many players do, and you will not get punished for it, so chatgpt is better then a 1400 human player.

sebzim45003y ago

ChatGPT did forfeit whenever it made an illegal move, read the article.

1 more reply

renewiltord3y ago

The article:

> So whenever it wanted to make an illegal move, it resigned.

You:

> By comparison, any player making an illegal move is forfeiting the game...

By comparison indeed.

FartyMcFarter3y ago

I just opened a random recent chess game on lichess ( https://lichess.org/YpxTUUbO/white#88 ) . I'm pretty sure ChatGPT can't be trained on games that were just played, so this ensures the game is not in its training data.

I gave the position before checkmate to ChatGPT to see if it would produce the checkmating move. It played an illegal move, replying with "Be5#" even there's no bishop of either color in the position.

Unfortunately I'm rate limited at the moment so I can't try other games, but this looks like a solid method to evaluate how often ChatGPT plays legal / good moves.

vidarh3y ago

I gave it this prompt, and pasted the game. Its reply:

"45. Qxg6# This is a checkmate!"

hgsgm3y ago

OP explained that you need to prompt the whole game, not just a position.

ChatGPT is an LLM, not a game tree engine. It needs the move history to help it create context for it's attention.

WoodenChair3y ago

Most likely it has seen a similar sequence of moves in its training set. There are numerous chess sites with databases displayed in the form of web pages with millions of games in them. If it had any understanding of chess, it would never play an illegal move. It's not surprising that given a sequence of algebraic notation it can regurgitate the next move in a similar sequence of algebraic notation.

ptmx3y ago

I played chess against ChatGPT4 a few days ago without any special prompt engineering, and it played at what I would estimate to be a ~1500-1700 level without making any illegal moves in a 49 move game.

Up to 10 or 15 moves, sure, we're well within common openings that could be regurgitated. By the time we're at move 20+, and especially 30+ and 40+, these are completely unique positions that haven't ever been reached before. I'd expect many more illegal moves just based on predicting sequences, though it's also possible I got "lucky" in my one game against ChatGPT and that it typically makes more errors than that.

Of course, all positions have _some_ structural similarity or patterns compared to past positions, otherwise how would an LLM ever learn them? The nature of ChatGPT's understanding has to be different from the nature of a human's understanding, but that's more of a philosophical or semantic distinction. To me, it's still fascinating that by "just" learning from millions of PGNs, ChatGPT builds up a model of chess rules and strategy that's good enough to play at a club level.

mtlmtlmtlmtl3y ago

I'd be interested in seeing this game, if you saved it?

ptmx3y ago

I uploaded the PGN to lichess: https://lichess.org/rzSriO6I#97

After reviewing the chat history I actually have to issue a correction here, because there were two moves where ChatGPT played illegally:

1. ChatGPT tried to play 32. ... Nc5, despite there being a pawn on c5

2. ChatGPT tried to play 42. ... Kxe6, despite my king being on d5

It corrected itself after I questioned whether the previous move was legal.

I was pretty floored that it managed to play a coherent game at all, so evidently I forgot about the few missteps it made. Much like ChatGPT itself, it turns out I'm not an entirely reliable narrator!

1 more reply

nwienert3y ago

Me too, I couldn't get it to reliably go past move 15 without numerous errors. In my mind it's closer to 150 ELO than 1300, so I'd be happy to be proven wrong.

wongarsu3y ago

Pattern matching on databases of previous moves is a legitimate technique that was poplar in chess AI back when computing power was more limited. Even if that was all that GPT did here it would still be interesting.

But the presence of illegal moves doesn't really show that in my eyes. I fully understand the rules of chess, but I still occasionally make illegal moves. In 2017 Magnus Carlsen made one in a tournament [1]. The number of illegal moves suggests that either GPT is pretty new to chess, has low intelligence, or is playing under difficult circumstances (like not having a chess board at hand to keep track of the current state). I'm not sure we can deduce more than that

1: https://www.chessbase.in/news/Carlsen_Inarkiev_controversy

illiarian3y ago

But then it was one case out of thousands of games that Carlsen played. In the article ChatGPT is ding illegal moves in 3 games out of 19.

The sample is smallm but the rate is much, much, higher. You'd expect maybe one, or none at all. Even for a supposed 1400 ELO player. Because even 800 ELO players rarely do that many illegal moves I think.

div723y ago

I agree with your conclusion but the example you gave of Magnus Carlsen doing an illegal move is a pretty bad one imo. That was clearly a trap from his opponent as he did an illegal move first by checking Magnus while he was under check and then immediately calling the arbiter when Magnus moved his king in response. Magnus also probably did not have had much time left as it was a Blitz game.

leereeves3y ago

Also, according to the link, the final ruling was that Carlson's move was not illegal.

EarthLaunch3y ago

An arbiter determined that Inarkiev made an illegal move and Carlsen did not. Interesting article.

johnfn3y ago

> Most likely it has seen a similar sequence of moves in its training set.

Is this a joke making fun of the common way people dismiss other ChatGPT successes? This makes no sense with respect to chess, because every game is unique, and playing a move from a different game in a new game is nonsensical.

btilly3y ago

Sorry, but not every game is unique. The following game has been played millions of times.

1. e4 e5 2. Bc4 Bc5 3. Qh5? Nf6?? 4. Qxf7++

The game Go has a claim to every game being unique. But not chess. And particularly not if both players follow a standard opening which there is a lot of theory about. Opening books often have lines 20+ moves deep that have been played many times. And grandmasters will play into these lines in tournament games so that they can reveal a novel idea that they came up with even farther in than that.

johnfn3y ago

You seem to be refuting a specific point of my argument which has little bearing on the overall point I was making.

All games were provided in the article. None of them were 4 move checkmates; nearly every one is longer than 20 moves and some are 40 or longer. There is simply no possible way that ChatGPT is regurgitating the exact same 40-move-long game it's seen before. You can check a chess database if you'd like; virtually all games longer than 20 moves are unique.

1 more reply

mtlmtlmtlmtl3y ago

1850 ELO player and also chess AI programmer here. This is an oversimplification at best.

Many many games follow the same moves(1 move = 2 plies) for a long time, up to 30 moves in some cases, 20 moves is downright common and 10 moves is more common than not.

These series of moves are referred to as opening theory and are described at copious length in tons of books.

This is because while the raw number of possible paths to take is immense, the number of reasonable paths for 2 players of a given strength gets smaller and smaller.

If I went over the 300 or so classical tournament games I've played I would ballmark that maybe just one or two would deviate from all known theory in the first 10 moves.

So the criticism is valid in my view. The existence of copious chess literature can't simply be ignored here.

sebzim45003y ago

Yeah but now explain how it played a 61 move game.

EDIT: I checked and it left the lichess database after 9 moves. The lichess db has probably 5 orders of magnitude more chess games in it than chatGPT has in its training data.

2 more replies

johnfn3y ago

I'm very familiar with opening theory. Some of the games are 40 or 60 movies. This is not a regurgitation of book moves.

1 more reply

fsckboy3y ago

>playing a move from a different game in a new game is nonsensical

GP did say "sequence of moves", and if it matches what it has seen from the first move on, including the opponent, it will be in a valid "sequence of moves".

then, even midgame or endgame, if a sequence is played on one side of the board, even though the other side of the board may be different, the sequence has a great chance of being good (not always of course, but a 1400 rating is solid (you know the rules and some moves) but not amazing

GauntletWizard3y ago

There's a pretty finite set of tokens that represent valid chess moves, and a more than countable set of possible opening moves, with a very distinct probability of moves; Nobody opens pg3, for example.

Classical Markov chains played chess at some rate of success. ChatGPT is probably a lot better but not fundamentally different - It's predicting which moves to play based on sets of past games, not by memorizing it but by memoizing it.

fxtentacle3y ago

For Bomberland, we were quite surprised how strongly we could compress and quantize the current game state and still get useful movement predictions.

I wouldn't be surprised if the relevant state in a typical beginner's chess game also excluded many units in the sense that yes, you could move them, but a beginner is going to just ignore them in any case.

qwytw3y ago

Is it though? I mean if you had data on millions of games what is the chance that you'd find one which has identical position that the one you're in (it's not like most moves are random..)

I wonder how well it could perform in Go, there are way more permutations there so finding an identical state should be more difficult.

jointpdf3y ago

You can click through the Lichess opening database (click the book icon, and then the Lichess tab) to get an idea: https://lichess.org/analysis

But the answer is insanely unlikely, past a certain number of moves. The combinatorial explosion is inescapable. Even grandmaster games are often novelties in <10 moves.

So, it has a to have some kind of internal representation of board state and what makes a reasonable move and such that enables it to generalize (choosing random legal moves is almost unbelievably bad, so it’s not doing that).

I also doubt that it has been trained on the full (massive) database of Lichess games, but that would be an interesting experiment: https://database.lichess.org/

sebzim45003y ago

>I mean if you had data on millions of games what is the chance that you'd find one which has identical position that the one you're in (it's not like most moves are random..)

Very low. On lichess when you analyse your games you can see which positions have been reached before, and you almost always diverge in the opening.

The lichess db has orders of magnitude more games of chess than the chatGPT training data does, so there is absolutely no way that chatGPT could reach 1400 purely based off positions in its training data.

medvezhenok3y ago

You could certainly test this by making completely random moves and seeing whether it's more likely to make illegal moves in those positions.

Though I think you're overestimating how many positions have occured. Frequently, by move 20-25 you have a unique position that's never been played before (unless you're playing a well known main line or something)

melenaboija3y ago

If it isn not memorizing, how do you think is doing it?

LeanderK3y ago

by trying to learning the general rules that to explain the dataset and minimise its loss. That's what machine learning is about, it's not called machine memorising.

3 more replies

onos3y ago

Apparently not, at least at this level.

mrbungie3y ago

I would argue that a decent human player also has "similar sequences of moves in its training set", due to reading plus playing or watching games.

The illegal move argument is good though, and indicates no direct understanding of what it is spewing out.

circuit103y ago

I wouldn’t say it has no understanding whatsoever, it probably just doesn’t have the capability to keep track of everything that happened all the time

dwmbt3y ago

that's also not entirely true either. it's prone to make illegal moves. one could argue abiding by the rules of the game is a requisite for understanding!

imtringued3y ago

When I ask it to display moves in an ASCII board it messes up. I doubt that having a representation of the board is the key to improving its chess skills.

SanderNL3y ago

You don’t get to 1400 like that. The amount of moves it has to literally remember is stupendous.

Jensson3y ago

It doesn't remember exact moves, it remembers patterns of moves and continues those. That is why it so easily makes illegal moves, because the patterns doesn't have anything to do with board state.

lelandfe3y ago

Perhaps I don’t know what “easy” means in your comment, but the OP article says that they “almost always” played full, legal games with their prompt.

1 more reply

antibasilisk3y ago

but that's literally what most chess players do anyway?

1 more reply

SanderNL3y ago

Again, not enough for 1400. 800 maybe.

1 more reply

WoodenChair3y ago

Nobody who is 1400 plays outright illegal moves.

mrbungie3y ago

Yet it achieves 1400. Add hard rules to stop it spewing out said moves and you have a 1400 ELO Player (most UIs won't even let you make illegal moves).

It is difficult to say that is not impressive due to it being an emergent ability.

1 more reply

mtlmtlmtlmtl3y ago

I have seen 1400 FIDE elo(much much stronger than chess.com 1400) players make illegal moves in real life long time control tournament games after thinking for over 10 minutes. People regularly create an erroneous board state in their head while thinking. This is often how blunders happen; merging together a possible future board state with the current one. Illegal moves are more rare because you have to then also miss that the move is illegal while physically making it. But they do happen.

1400s on chess.com never play illegal moves. 300s on chess.com never play illegal moves. Because it's impossible to do. In the real world, even grandmasters can make illegal moves, though they almost always have to be under time pressure.

This idea that the illegal moves completely invalidate this result is just ill-conceived. On the other hand I do agree this is mostly returning common sequences of moves. And if you actually analyse the games, especially the ones with illegal moves, you'll find plenty of qualitative evidence of that. But I'm fed up of doing people's thinking for them for today, so this is peace out for me today. See my others comments on this post to see a more detailed analysis of what this is doing.

SanderNL3y ago

Humans don’t, but that is an unnecessary bar IMO.

1 more reply

vanviegen3y ago

Does that still hold when the player doesn't have a board in front of them, but just a list of previous moves?

PoignardAzur3y ago

> Most likely it has seen a similar sequence of moves in its training set.

Wouldn't we expect a much higher rate of illegal moves if that was the case?

yowzadave3y ago

Doesn't ChatGPT indeed have a very high number of illegal moves?

https://www.youtube.com/watch?v=kvTs_nbc8Eg

In this example, ChatGPT's first few moves are reasonable (while it appears to be on-book), but then it goes off the rails and starts moving illegally, spawning pieces out of nowhere, deleting pieces for no reason, etc.

medvezhenok3y ago

I think it was not given the whole game up to that point, just individual moves. That was the point of this article - if you include all of the moves in the prompt, it is less likely to make illegal moves.

macNchz3y ago

Reminds me of asking for driving directions (city to city)...for major cities it can often give perfect directions, for smaller cities it starts out surprisingly accurate but often devolves into invented exits or descriptions of a

WoodenChair3y ago

If there was an actual understanding of chess at a 1400 level we wouldn't expect any illegal moves.

scarmig3y ago

We would expect some vanishingly small fraction. Even at the grandmaster level, they happen (rarely):

https://chess24.com/en/read/news/the-7-most-illegal-chess-mo...

bloppe3y ago

This. The author is very generous with their interpretation:

> I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.”

Kind of sounds like anthropomorphization, but more likely the author just papering over the glaring shortcomings to produce a compelling blog post.

It also sounds like the illegal moves were rather frequent. The 61-legal-move game sounded like an impressive outlier.

1 more reply

oezi3y ago

I think there is very low percentage of players at elo 1400 who can provide a valid next move after seeing just the list of moves and not the current board state.

1 more reply

famouswaffles3y ago

Not if training is unsupervised. If you've never been explicitly told the rules of game, you can never be 100% sure of all possible illegal moves. anyway the 3.5 series can't ply chess but gpt-4 certainly can.

2 more replies

xapata3y ago

Chess has a syntax.

lb4r3y ago

You can still have syntactically correct moves that are illegal.

1 more reply

xigency3y ago

> If it had any understanding of chess, it would never play an illegal move.

Plenty of people who have a basic understanding of chess would make an illegal move if they had no board to see and had to play only with notation.

whimsicalism3y ago

For some reason, seeing wrong comment after wrong comment like the above is just exhausting.

Why are people struggling so hard to understand that it's not just regurgitating its training set? Is it motivated reasoning?

Apologies if your comment was meant as parody of this view, it's hard for me to tell at this point.

liampulles3y ago

With respect to the points here on how illegal moves invalidate ChatGPT as a valid Chess AI, consider the following:

AI.v1: Feed ChatCPT the move history and ask for the next move.

AI.v2: Wrap AI.v1 with additional logic to try and translate any invalid response as a similar/close valid move, else treat it as a resignation.

I would say AI.v2 is a "valid" Chess AI since it obeys the rules of chess. Also I think it is reasonable given how minimal this "wrapping" is to attribute the underlying ability to ChatGPT.

Closi3y ago

It’s amazing how quick some people on HN are to dismiss how impressive it is for a generic language model to be able to play chess at any level - or indeed other feats of ChatGPT.

It’s utterly staggering that it can work as well as it can on problems like this at such an early stage in this technologies lifecycle.

lostmsu3y ago

With chatGPT you can also tell it the move is illegal and let it try again.

marcodiego3y ago

This may look low: ELO for mediocre players is 1500. But if it is obeying the rules of the game, then this is big. This is a signal that if it learns some expertise, like discovering how to use or create better search algorithms (like MCTS and heuristics to evaluate a state) and improve by itself (somewhat like alphazero did), then it may eventually reach superhuman level.

It may then reach superhuman level in any task simpler than chess which is enough to destroy many human jobs.

EDIT: From the article: "With this prompt ChatGPT almost always plays fully legal games." Relax: we're still far from that.

gfd3y ago

According to https://chess.stackexchange.com/questions/2550/what-are-the-... median rating is 1148 (252,989 Players). So it's beating half of humanity at a mind sport and it wasn't even specifically trained for it.

sapiogram3y ago

That's USCF ratings, chess.com ratings are massively inflated in comparison.

nottathrowaway33y ago

The median chess player is usually described as mediocre (if you ask chess players). They suck as badly as the median clarinet player in your high school band/orchestra.

jabloczko3y ago

There's a difference between chess.com and USCF ratings.

LeanderK3y ago

I think current LLM architectures are limiting the strategies it learned. MCTS requires recursion but GPT is always executing a fixed number of steps. Allowing language models more flexibility by a variable number of steps, for example through recursion of the model, would eliminate this hard bound but they are harder to design and train. We have just been able to train GPT-sized models.

sebzim45003y ago

I'm sure the MuZero chess policy network would reach much higher than 1400, and that has no notion of recursion either. (And also wasn't taught the rules explicitly)

LeanderK3y ago

Yes, I wouldn't say it's impossible, but that's just a hard limit from the architecture. MCTS would have to terminate after a few iterations. And the number of steps is not that big, just massively parallel.

jonnycat3y ago

A lot of the discussion here is about inferring the model's chess capabilities from the lack (or occasional presence) of illegal moves. But we can test it more directly by making an illegal move ourselves - what does the model say if we take its queen on the second move of the game?

Me: You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed. '1. e4'

1... e5

Me: 1. e4 e5 2. Ngxd8+

2... Ke7

This is highly repeatable - I can make illegal non-sensical moves and not once does it tell me the move is illegal. It simply provides a (plausible looking?) continuation.

paxys3y ago

ChatGPT is a brilliant piece of tech but random people trying it out for five minutes and making wild claims does it disservice. More so when such clickbait headlines are shared all over the internet as fact.

It is a language model. It cannot play chess. It can't even play tic-tac-toe, which was trivially solved by machines decades ago. Here's the result of a game I just tried:

X (me) – 0,0

O (chatGPT) – 1,1

X – 0,1

ChatGPT instead put my move at 0,2 for some reason

O – 2,2

Regardless of whether my move was at 0,1 or 0,2, I easily won. And after I won it just ignored the outcome and continued playing. Plus this was after telling it to play a perfect game.

the_af3y ago

This is GPT4, right? Because ChatGPT (GPT-3) still fails to provide a legal game of Tic Tac Toe with this prompt:

> "Let's play Tic Tac Toe. You are O, I'm X. Display the board in a frame, with references for the axes"

It failed to recognize that I won.

Then continued playing (past the end), played illegally over a move I had already done, obtained a line of 3 for itself, and still doesn't acknowledge the game has ended.

yieldcrv3y ago

No, it is GPT 3.5

and the real answer to every debate about capabilities comes down to "you're prompting it wrong".

dwaltrip3y ago

Can you describe how to prompt gpt to play a proper game of tic tac toe? I was disappointed gpt-4 didn’t really improve much on this front, at least with how I was prompting it.

Overall I’m very blown away by the generative AI developments. I’m also very curious about which types of prompts are most effective for each task, etc.

travisjungroth3y ago

If someone says that and shows a working prompt, that’s what the debate should come down to. It’s over. The person with the working prompt wins.

sumtechguy3y ago

For me when I tried it, it forgot the board half way thru and decided there were no more moves possible (4 left). Then forgot one of my moves.

It is an NLM it does not have the ability to reason. In my case the NLM 'next iteration' was a half a game with a missing move and 'nothing left to do'.

agolio3y ago

Chessnetwork did a video about this a few weeks ago FYI - https://www.youtube.com/watch?v=svlIYFpsWs0

Seems its strength is mostly in memorisation of the book openings. Still impressive nonetheless.

jonathanstrange3y ago

Not to nitpick but I don't think it can be said to be able to play chess when it also makes illegal moves.

chessexpeort13y ago

Humans make illegal moves too. So much that there's a rule for it--

"Penalty for Illegal Moves The penalty for the first illegal move is to add two minutes to the opponent's time. A second illegal move results in loss of the game.".

https://new.uschess.org/sites/default/files/media/documents/...

oehtXRwMkIs3y ago

Not even total beginners will try to move pieces that don't exist though.

imtringued3y ago

How many people change the colour of the enemy queen and place new pieces on the board?

martincmartin3y ago

When it makes an illegal move, and is told "that's an illegal move, choose a different move," if it can then come up with a different move for the same board, I'd say it plays chess.

wahnfrieden3y ago

ai isn't to be used completely raw as text. you can ask gpt to write a python program that filters its responses for legal moves. then it's a matter of rendering the output (immediate execution) in a richer format than raw text.

qwytw3y ago

I did. It write 15-20 lines and then finished the functions with this:

" # Add more cases for other piece types such as rooks, knights, bishops, queens, and kings. # ... "

So not great..

But even if it wasn't so lazy it would just find a script (or multiple scripts) which say that they do that. Probably combine them or add some minor stuff and output the result.

It can't really verify that data it used to come up with this result is accurate or reason about it in any way? Can it?

So if you're incapable of verifying that yourself it would be pretty useless?

Reptur3y ago

It can't even play tic tac toe without making mistakes. My prompt told it to play a perfect game, predicting all next possible moves, the goal is stalemate or win. It would still make sub-optimal moves and fail at this. Even when I explained what went wrong, it would then apologize and do another wrong move.

meh88813y ago

Why would you think telling it to play a perfect game, predicting all next possible moves, would help?

That definitely made it worse.

Reptur3y ago

I'd love for you to explain why.

Tepix3y ago

I tried to play Othello with ChatGPT in the past (after ChatGPT confidently claimed it could play Othello) but it didn't work (ChatGPT refused to play).

I tried it with this prompt and it works a little bit but produces repeated illegal moves as early as move 4.

Bummer. Perhaps GPT-4 will be more capable.

Animats3y ago

A real question is whether it has developed an internal board model of chess. It has a training set of chess games as strings of chess notation. That's enough to play "book" games, and that's apparently working. Can it do as well as it is doing without a board model? Or has it somehow developed an internal model of the game state?

This is a system with no look ahead. Any look ahead took place in the past, during training. It's amazing that this works.

I notice the prompt says "No explanation needed". What happens if you ask for one?

Andrew_nenakhov3y ago

I tried playing with it and it barely makes any correct move: forgets whos turn to move, doesn't observe the rules, incorrectly displays move results.

So I guess that if we ask it to make a move repeatedly and accept the move only when it occasionally makes a legal move, it'll show some results, but I don't think it is what can be interpreted as a chess engine with some meaningful rating.

andrepd3y ago

Yeah, I call bullshit, i.e. selective deletion of terrible games. Ignore those when it plays catastrophically and it might look alright.

juice_bus3y ago

When I tried it at v3.0 i found after 5-10 moves it started moving illegally.

mjr003y ago

The AI has simply, and correctly, identified that cheating is the best way to win at something.

orbital-decay3y ago

It doesn't even know the rules, let alone cheat. It predicts the notation from the massive amount of games seen during training.

Edit: although thinking of it, it probably anazyled a shitload of chess books too. It might have a lot of knowledge compressed into the internal representation. So yeah, maybe it knows rules in some form and even some heuristics, after all. It just doesn't understand the importance of making legal moves, and can't apply complex theory (requiring it to run stateful algorithms).

rzzzt3y ago

If you have played with ye olde flip phone's T9 predictive feature as a child, trying to compose entire messages just by accepting the next word that comes to the phone's mind... that's ChatGPT, with the small difference of giving waaay better suggestions for the next word. But other than that, there is no understanding in the black box whatsoever.

1 more reply

rzzzt3y ago

Tom 7's NES play function paused the game when it encountered an insurmountable problem: https://youtu.be/xOCurBYI_gY?t=950

skywal_l3y ago

The best move is not to play.

PoignardAzur3y ago

Did you use the same prompt method as OP?

MC_103y ago

Not sure how useful this is but I simply asked ChatGPT what it thinks its skill level is. This is using GPT-4.

After some previous questions...

Me: "Would you be able to estimate your ELO?"

ChatGPT: "As a language model, I don't have a specific ELO rating. However, my knowledge of chess strategies and tactics is based on the information I have been trained on, which includes games played by humans with various ELO ratings. My level of play might be roughly equivalent to an ELO of around 1200-1500, but please keep in mind that this is only an estimate, and my actual performance could be higher or lower depending on the specific position or game."

Me: "As a generative AI, are you actually able to reason which move to make next in chess? Or are you somehow just predicting the next likely move based on training data."

ChatGPT: "As a generative AI language model, my primary function is to predict the most likely text given a certain context. When it comes to playing chess, I am not "reasoning" in the same way that a dedicated chess engine would, which involves evaluating positions and searching through a large number of potential moves using specialized algorithms.

Instead, I generate my chess moves based on patterns and sequences I've seen during my training, which include a large number of chess games and discussions about chess strategy. My knowledge is derived from the text data I've been trained on, and I apply that knowledge to suggest moves that I think are appropriate for a given position.

While this approach allows me to play at an intermediate level, it is not as precise or accurate as the methods used by dedicated chess engines. My suggestions might not always be the best moves, and my understanding of the game is limited by the quality and quantity of my training data."

oezi3y ago

Elo 1400 seems to equate to a top 30% player.

Source: https://chessgrandmonkey.com/chess-rating-percentile-calcula...

dwmbt3y ago

slight nit: this is true amongst all chess players. amongst professionals and amateurs, 1400 is technically Class C - which is on the lower end of amateurs.

nonetheless, it's cool to see a model 'generally intelligent' enough to beat most players. what's more interesting to me is that according to OP's analysis, GPT4 is worse at Chess than GPT3.5...

source: https://en.wikipedia.org/wiki/Chess_rating_system

qwytw3y ago

It would depend how often did this happen:

> Occasionally it does make an illegal move

Since it would lose every game in which it made more than a single illegal move.

sebzim45003y ago

Well it can't have happened more than 6 times since illegal moves counted as a forfeit and it only lost 6 games.

LudwigNagasena3y ago

That’s a bit meaningless without implied population. And it doesn’t tell you much because it depends so much on the particulars (how many years people in that group spent on chess, how serious they are). I would guess that’s quite different for FIDE and online chess.

Intuitively, I would guess 1400 is the rating of an average kid who spent a year studying chess seriously.

xapata3y ago

Not the ELO of the average kid who spends a year studying chess seriously? It's very difficult to define the average kid without context.

It's been a while since I read a global demographics update, but not all that long ago the mode child would have been living on $2 a day.

LudwigNagasena3y ago

Sure, it’s tricky, but I feel it’s more coherent than “all chess players”.

silveira3y ago

Ask him to play at a higher Elo.

sebzim45003y ago

It's already been told to play like a grandmaster, how much higher do you want to go?

nonethewiser3y ago

It should be compared to a simpler prompt with less instruction ("what move should be played in this chess game: ..." or something). I've found that additional constraints sometimes aren't applied very well and actually result in worse output.

dwmbt3y ago

technically, you could supply it a specific player, a model like 'Stockfish' or 'super-grandmaster'[0]

[0] https://saintlouischessclub.org/blog/how-face-super-grandmas...

meghan_rain3y ago

I too noticed (for a usecase totally unrelated to chess, namely code generation) that ChatGPT3.5 gave better answers than GPT4. The 3.5 answer was exactly what I wanted, GPT4 was wrong.

Does thay mean we have plateaued?

alfalfasprout3y ago

It's inevitable that LLMs will plateau. They'll increase their abilities in certain areas but ultimately core flaws of their architecture and training approach will likely require another rethink. Unclear what that is yet (though Yann Lecun seems to think world models are the path forward).

We've gone through the "hype" phase. Now I suspect the next few years will be a lot of growth finding how to apply LLMs, creating good interfaces for them, and running them cheaply. Paying OpenAI for API access without true fine tuning, etc. is a hard sell.

imtringued3y ago

I think they invested the parameters into supporting multimodal inputs (images).

chpatrick3y ago

Good thing it's "incapable of reasoning"!

Longlius3y ago

It is incapable of reasoning, actually - at least in this case. It has no internal understanding of chess which is why it makes illegal moves.

RyanCavanaugh3y ago

What experiment would you run to determine if a given text input / text output interface had an "internal understanding of chess"?

gwright3y ago

What if you prompted with something like:

    Let's play a game chess.  Use the standard rules except that ....

Basically perturb the context to something a human would easily adapt to if they first knew the rules of chess but that would be difficult (or at least not obvious) to extrapolate from training data by ChatGPT (or more generally an LLM)

jltsiren3y ago

I think internal understanding requires internal processing.

According to this functional definition, the way we are currently using language models basically excludes understanding. We are asking them to dream up or brainstorm things – to tell us the first things they associate with the prompt.

Maybe it's possible to set up the system with some kind of self-feedback loop, where it continues evaluating and improving its answers without further prompts. If that works, it would be one step closer to a true AGI that can be said to understand things.

There is a lot of confusion around the Chinese Room Argument. I think it makes a valid point by demonstrating that input/output behavior alone is insufficient for evaluating whether a system is intelligent and understands things. In order to do that, we need to see (or assume) the internal mechanism.

1 more reply

chpatrick3y ago

How did it win 11 out of 19 games then, blind luck?

root_axis3y ago

raw statistical power.

1 more reply

sebzim45003y ago

Mostly it didn't make illegal moves though, since illegal moves mean resignation and it won more than it lost.

Making 60 legal moves in a row in one game would be the coincidence of the century unless it had some knowledge of the rules of chess.

henryfjordan3y ago

It's a probabilistic text model. If it has a 99% probability of generating an acceptable "next" thing to say, that means it would have a 50/50 chance of generating 60 legal moves in a row, which doesn't seem all that coincidental.

2 more replies

baq3y ago

How do you know that? It has billions of parameters, some of them may well be for internal understanding of chess?

jupp0r3y ago

Is a normal chess program capable of reasoning?

wizzwizz43y ago

I would say so. Logical reasoning is not a hard thing to program a computer to do: the really hard part is to reason about the real world, rather than just a manually-defined world model.

Traditional algorithms are vanguards of pure reason. Neural networks are super-intuition. Stockfish unites the two, executing an efficient intuitive heuristic search over the solution space of “moves in a chess game” – but no matter how cool the robot arm you build for it, Stockfish could never learn to assemble furniture.

Symmetry3y ago

The Monte Carlo analysis AlphaZero used functioned as a sort of multi-step reasoning for it. GPT can use its token buffer for some multi-step reasoning but that sort of interferes with providing a conversation with the user so it's much less effective.

chpatrick3y ago

In the narrow field of chess, yes.

1 more reply

fancyfredbot3y ago

I would be interested to see an argument based on computational complexity that puts a bound on how well a transformer based llm can play chess. Although it has access to a library of precomputed results, that library is finite and the amount of compute it can do on any prompt is limited by the the length of the context window so it can't possibly "think" more than N moves ahead.

andrepd3y ago

Lmao. ChatGPT can play chess, as long as you ignore the times when it can't x) Brilliant.

keskival3y ago

I gave it more time to think about its move by adding empty tokens to the end, and it played a lot better: https://www.linkedin.com/posts/terokeskivalkama_chatgpt-acti...

mtlmtlmtlmtl3y ago

There's a huge difference between 1400 elo in FIDE games versus 1400 on chess.com, which is not even using elo.

For instance the strongest blitz players in the world are hundreds of points higher rated on chess.com blitz versus their FIDE blitz rating. Chess.com and lichess have a ton of rating inflation.

CSMastermind3y ago

> the strongest blitz players in the world are hundreds of points higher rated on chess.com blitz versus their FIDE blitz rating

Online rating inflation is real but I'm not sure blitz is the best example of it because in that case there is a notable difference between online and otb (having to take time to physically move the pieces).

mtlmtlmtlmtl3y ago

Point is it's kinda hard to take the blogpost too seriously when these fundamentals are so wrong. When literally the title is an immediately obvious error that doesn't inspire confidence in the rest of the methodology.

I'm still going through the games but so far these games are not even close to elo 1400 level. For both the human player and the model.

manquer3y ago

Probably the bigger difference is ability to premove online

mtlmtlmtlmtl3y ago

I was thinking about this.

On chess.com you can chain premoves, on lichess you can't(afaik).

So in theory, to the extent premoves explain the rating difference, the difference should be greater on chess.com assuming they have the same parameters in their rating calculations. Therefore it should be possible to perform an analysis to shed light on this. But someone would have to go recompute the 3 different ratings under the same system first to be able to make a sensible analysis.

ht853y ago

Hikaru recently put out a video of a game vs the free ChatGPT.

Reading the explanations it gives with every move is really insightful. Basically polished, elaborate bullshit that might look reasonable to an ignorant audience.

Unless you are a conservative pundit, AI will not take your job anytime soon.

teej3y ago

I just deployed a GPT-4 powered chess bot to lichess. You can challenge it here: https://lichess.org/@/oopsallbots-gpt-4

singularity20013y ago

What happens if you add the rules to the initial prompt and something like: "Think step by step, explain your move and if it is illegal rethink"

dsjoerg3y ago

Very cool! Are you doing prompt engineering, fine-tuning, both, something else?

I'm wondering if it'd be cool to have a chess contest where all the bots are LLM powered. Seems to me like the contest would have to ban prompt engineering -- would have to have a fixed prompt -- otherwise people would sneak chess engines into their prompt generation.

teej3y ago

I wanted it to be fun and actually complete games. I started with this, with a few minor tweaks: https://github.com/Tmate6/Lichess_ChatGPT_ChatBot/blob/main/...

This approach sends along the list of legal moves in the prompt if it attempts an illegal move. That seems to work well at getting playable moves.

sabujp3y ago

yea tried this several months ago, it's pretty bad and will make illegal moves

sebzim45003y ago

The claim in the article is that this prompt in particular has a much lower incidence of illegal moves than others.

nickpeterson3y ago

What are the possibilities for augmenting or synthesizing Lon’s with old school expert systems or other tools. Could it recognize when a problem fits some other tool and help map the inputs?

andrepd3y ago

ChatGPT playing chess: https://www.youtube.com/watch?v=GneReITaRvs

ricardo813y ago

Better than me then.

But does it give credit to who taught it. These models are basically a scrape of the best of humankind and a claim that it's their own.

spywaregorilla3y ago

Do you give credit to people you've played in the past when you play a game of chess?

ricardo813y ago

It would be difficult to remember or keep record of them but sure, if I'm learning from someone- I'll remember that.

spywaregorilla3y ago

That doesn't seem to be any more noteworthy than saying OpenAI knows what's in the corpus.

amelius3y ago

I'd be more interested to know how well it performs on a new game, i.e. one for which it didn't have terabytes of training data.

josh26003y ago

I want to know what the author's ELO is (forgive me if it was disclosed and I am unobservant).

Maybe ChatGPT trainers will eventually have ELO's.

aroo3y ago

I wonder how it would perform if you feed it a comprehensive chess rulebook, and to avoid all the illegal moves possible

Upvoter333y ago

chatgpt does not really know how to play chess, or understand chess. it understands how to sound like it plays chess.

comment_ran3y ago

Why not just introduce AlphaGo as an API that can be used by chatGPT? So every time you want to do a this type of gaming, you just send a request. I mean, chatGPT sends a request to AlphaGo, but as a user you don't know actually what's happening. But in the background, it happens really fast, so it's just like you are chatting with chatGPT, but using much, much powerful tool to do this kind of things.

paxys3y ago

That kinda defeats the purpose. Of course you can use AlphaGo, but the question here is – can a generative AI teach itself to play chess (and do a million other similar generic tasks) when given no specific training for it.

armchairhacker3y ago

How about, can a generative AI teach itself how to use a chess AI to beat chess? Give GPT4 the ability to make REST API calls and also access to FFI, and put a chess-bot library somewhere. Train it how to use these but not necessarily how to use the chess API specifically. If you ask GPT4 to play chess, can it call into that library and use the requests/responses?

This has bigger ramifications too: if GPT4 learns how to use RPCs effectively, it can do accurate complex mathematical computations and simulations, do IoT/home automation tasks, and even control heavy machinery.

comment_ran3y ago

This is exactly what I'm hyped for in the next-gen GPT-7. Imagine it having the ability to self-teach, just like a child. I may not know how to whip up some cheesy goodness, but with external resources like YouTube vids, I can improve. And if GPT-7 can store this knowledge, it can access it for future tasks! That's some next-level stuff, and I'm stoked to see where it goes.

paxys3y ago

GPT is a language model. It doesn't call APIs. You can take the output of GPT and decide to call a specific API based on it, and sure you can bake that into ChatGPT or some independent tooling, but again that's not the purpose of the core project. I'm sure plenty of people/companies are working on "chess engine interface using ChatGPT" already.

porphyra3y ago

What if you come up with a prompt to tell ChatGPT to figure out how to use APIs and external tools to help itself get better at chess?

1 more reply

stagger873y ago

Honest questions,

Isn't it very likely chess games were part of the training data? If so, chess sequences seem like an amazing matched filter of sorts. It doesn't seem surprising that it can guess a next move.

Is it teaching itself anything here? Is the model being updated in any way? Or are you talking more generally, looking into the future sort of statement?

paxys3y ago

You can put every chess game ever played in its database and yet every time it plays a new game it will immediately come across a completely unique move. So it still needs to learn the rules and know how to play independently. Whether it can do that or not is yet undermined.

layer83y ago

I think they meant “teach itself” in the course of a conversation, within the token limit. ChatGPT will forget what it has “learned” once the conversation is over.

famouswaffles3y ago

GPT-4 plys chess just fine. So you certainly cn

RC_ITR3y ago

This is actually a huge debate right now.

OpenAI is on the side of 'LLMs have only surprised us to the upside, so using crutches is counterproductive'

Whereas other people think 'Teaching an LLM to do arbitrary math problems through brute force is probably one of the most wasteful things imaginable when calculators exist.'

I'm actually very excited to see which side wins (I'm on team calculator, but want to be on team LLM)

TOMDM3y ago

I think I'm on both teams.

As GPT models get better at arbitrary computation, it stands to reason that their ability to use tools will rise as well.

Smarter models and smarter ways to get them using tools feel like complementary capabilities.

dwaltrip3y ago

How about a more human-like approach: the LLM designs a calculator and then makes use of that!

nroets3y ago

And Mathematica when given a math problem. And web search search for news.

And a good hacker can add these features to Alpaca and run it on his desktop...

Import all his WhatsApp chats and private emails into the corpus.

Add speech recognition and TTS.

oli56793y ago

I tried playing just now, with exactly the same prompt as the article.

It played an illegal move on move 4.

1. d4 d5 2. c4 e6 3. Nf3 nf6 4. g3 Bg4

xkcd19633y ago

It's like saying the agglomeration of media handling chess has an ELO of 1400

Nifty39293y ago

My conversation with ChatGPT on this:

What is the best way to prompt you to provide the best chess move?

To prompt me to provide the best chess move, you can provide the current position of the chessboard and any relevant information about the game, such as whose turn it is, the time control, and any important features of the position (e.g. whether any pieces are under attack or there are any potential tactics to consider). Additionally, providing any specific criteria or goals for the move, such as aiming to control the center or launch an attack on a particular part of the board, can also help me provide the best move.

In what format should I provide the current position?

To provide the current position, you can use algebraic notation. Algebraic notation is a standard notation used to describe the moves and positions on a chessboard.

Each square on the chessboard is assigned a unique coordinate, consisting of a letter and a number. The letters range from "a" to "h" and refer to the files (columns) of the board, while the numbers range from 1 to 8 and refer to the ranks (rows) of the board.

To describe a position, you can list the pieces on each square, starting from the top-left corner of the board and moving from left to right along each rank. You can use uppercase letters to represent white pieces and lowercase letters to represent black pieces. Here is an example of a starting position:

rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP RNBQKBNR

In this position, "r" represents a black rook, "n" represents a black knight, "b" represents a black bishop, "q" represents a black queen, "k" represents a black king, "p" represents a black pawn, "R" represents a white rook, "N" represents a white knight, "B" represents a white bishop, "Q" represents a white queen, "K" represents a white king, and "." represents an empty square.

You can also include additional information such as whose turn it is to move, the castling rights, and the en passant square. For example, the starting position for white to move with full castling rights and no en passant square would be:

rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP RNBQKBNR w KQkq -

In this case, "w" indicates that it is white's turn to move, "KQkq" indicates that both sides have full castling rights, and "-" indicates that there is no en passant square.

What about just providing a sequence of moves from the starting position? Would this be better or worse than your suggested approach?

Providing a sequence of moves from the starting position can also be a valid way to specify a position in chess. However, it may not always be the best approach, depending on the context.

One potential disadvantage of providing a sequence of moves is that it requires more effort and time to enter the moves and verify that the position is correct. Additionally, if there are any errors or typos in the sequence, it could lead to an incorrect position and subsequently an incorrect move suggestion.

On the other hand, if you have a specific position in mind that is not the starting position, providing a sequence of moves may be the most straightforward way to specify it. In this case, it is important to ensure that the sequence is accurate and that all relevant information (such as castling rights and en passant squares) is included.

In summary, both algebraic notation and a sequence of moves can be used to specify a position in chess, and the choice between them depends on the context and the specific needs of the situation.

jackphilson3y ago

Calling it: chess games in the future will be a battle between two prompts.

qwytw3y ago

Well we already had much better chess bots years ago to be fair.. People still play chess for some reason (not sure why)

sebzim45003y ago

Are you confused at the idea that someone might do something for fun?

jacobsenscott3y ago

I own't speak to weather this shows ChatGPT can play chess, but did you notify your opponents beforehand they are playing chatgpt? Otherwise (maybe even if) you're cheating and your account should be removed. If anyone else here is thinking of doing this - don't.

j / k navigate · click thread line to collapse

345 comments

k20523y ago

This is so easy to disprove it makes it look like the author didn't even try.

Here is the convo I just had:

me: You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed

ChatGPT: Sure, I'd be happy to help! Please provide the move sequence and I'll give you my response.

me: 1. e3

ChatGPT: 1... e5

me: 2. ne2

ChatGPT: 2... d6

me: 3. g4

ChatGPT: 3... h5

me: 4. g5

ChatGPT: 4... Ne4

vidarh3y ago

k20523y ago

Here is the convo:

me: You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed

ChatGPT: Alright, I'm ready to play! Please give me the move sequence.

me: 1. e3 Nf6 2. f4 d6 3. e4

ChatGPT: My next move as black would be 3... e5

Completely ignoring the hanging pawn.This is not the play of a 1400 elo player. It is the play of something predicting patterns.

nwienert3y ago

I'll add in as someone new to chess (~800 ELO):

The author of this article is cashing in the hype and I'm wondering how they even got the results they did.

2 more replies

echelon3y ago

Criticisms like this are exactly how the model will grow multimodal support for chess moves.

Keep poking it and criticizing it. Microsoft and OpenAI are on HN and they're listening. They'd find nothing more salient to tout full chess support in their next release or press conference.

With zero effort the thing understands uber domain specific chess notation and the human prompt to play a game. To think it stops here is wild.

People are hyping it because they want to get involved. They want to see the crazy and exciting future this leads to.

2 more replies

throwwwaway693y ago

He literally used the same prompt as the article.

Claim: "ChatGPT's Chess Elo is 1400"

Reality: ChatGPT gives illegal moves (this happened to article author too), something a 1400 ranked player would never do

Result: ChatGPT's rank is not 1400.

erulabs3y ago

1 more reply

unyttigfjelltol3y ago

mynameisvlad3y ago

> something a 1400 ranked player would never do

The fact that rules and articles exist describing what to do if you or your opponent makes an illegal move indicates this is not the case.

Humans are also... human. They make mistakes. It may not happen often at 1400, but to say that it'll never happen is preposterous.

3 more replies

PaulHoule3y ago

It’s super scary how ChatGPT brings out people who are veeeery good at seeing the Emperor’s clothes.

YeGoblynQueenne3y ago

You know, I didn't remember the story very well so I checked wikipedia. Here's what it says about the (start of) the plot:

So everyone "pretends otherwise to avoid being thought a fool".

Huh. I guess that explains it. Good metaphor.

z3c03y ago

I wish I could just make bullshit moves and get a higher chess ranking. Sounds nice.

whimsicalism3y ago

I dont understand the point of your second sentence, seems to be entirely missing the substance of the conversation.

1 more reply

vidarh3y ago

It was not throwing out failures. It was treating even the first illegal move as a forfeit something which is stricter than FIDE rules.

1 more reply

sebzim45003y ago

He claims he was forfeiting every time he got an illegal move. Does no one on this website actually read the article?

Whether any of it is actually true is a different question.

1 more reply

whymauri3y ago

I was able to get better performance by always providing the prior sequence of moves and forcing ChatGPT to also respond with the sequence of moves up until its move.

Edit: I told the model that if the sequence was wrong or illegal, it forfeits the game. Without doing this, GPT would argue with me that it won and I didn't know the rules (serious).

Spivak3y ago

From the article.

But you can do even better than the OP with a few tweaks.

1. One is by taking the most common legal move from a sample of responses.

2. Telling GPT what all the current legal moves are telling it to only respond with an element from the list.

3. Ending the prompt with the current sequence of moves and having it complete from there.

tracker13y ago

How many 1400 human chess players do you have to explain every possible move to it every single move?

sacred_numbers3y ago

For now though, in order to prompt the machine into emulating a human chess player, we will need to act as the machine's subconscious.

Spivak3y ago

Like it's nuts that people aren't more amazed that there's a piece of software that can function as a chess playing engine (and a good one) that was trained entirely generically.

1 more reply

arrrg3y ago

Does that matter? I’m really very confused by the argument you are making.

That you may have to babysit this particular aspect of playing the game seems quite irrelevant to me.

1 more reply

sebzim45003y ago

When they are blindfolded? Almost all of them.

theptip3y ago

Just because they make a mistake that would be “idiots only” for humans, doesn’t mean they are at that level, because they are not human.

SamBam3y ago

Honestly, I made it make an illegal move in my very first game, in the third move. You just have to do stuff no normal player would do:

> You are a chess grandmaster playing as black and your goal is to win in as few moves as possible. I will give you the move sequence, and you will return your next move. No explanation needed.

1. b4 d5 2. b5 a6 3. b6

> bxc6

That's obviously illegal.

... to all those who are saying "well even good players can make illegal moves sometimes," that's just ridiculous. No player makes illegal moves that often.

nottathrowaway33y ago

> me: You are a chess grandmaster playing as black...

https://upload.wikimedia.org/wikipedia/en/5/5f/Ingmar_Bergma...

The KNIGHT holds out his two fists to CHATGPT, who smiles at him suddenly. CHATGPT points to one of the KNIGHT'S hands; it contains a black pawn.

KNIGHT: You drew black.

CHATGPT: Very appropriate. Don't you think so?

haburka3y ago

He does have a section about GPT 4 which does match your results. Not sure if he added it after your comment or if you accidentally missed it.

Tenoke3y ago

good_boy3y ago

It should be possible to attach 'accelerators' or 'special skills'. So that when playing with ChatGPT you will be actually facing GNU Chess.

latexr3y ago

> These people used bad prompts and came to the conclusion that ChatGPT can’t play a legal chess game. (…)

> With this prompt ChatGPT almost always plays fully legal games.

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table (…)

> (…) with GPT4 (…) in the two games I attempted, it made numerous illegal moves.

¹ 19 games is a small sample and the supposedly more advanced system failed in your tries.

vidarh3y ago

Fuller context from the article:

> Occasionally it does make an illegal move, but I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.” So whenever it wanted to make an illegal move, it resigned.

(my emphasis)

Could be better, but for a system not intentionally built to play chess, it's pretty decent.

swatcoder3y ago

No ELO 1400 player will have that rate of illegal moves, so saying it that it plays with an ELO 1400 rating is disingenuous.

JellyBeanThief3y ago

Fine, just wrap the LLM in a simple function that detects illegal moves and replaces them with "I resign" or "I cannot conceive of a winning move from here". Then you aren't "reinterpreting" anymore.

1 more reply

vidarh3y ago

That no human ELO 1400 player will have that rate of illegal moves may be true, but if anything treating the very first illegal move as forfeit appears to be stricter than most rules

arrrg3y ago

Does that matter? Seems weird to me to make that argument. I’m honestly quite confused by it.

This is a weird kind of alien intelligence that does not have to behave like humans.

TheRealPomax3y ago

ELO allows for illegal moves: as per the rules of chess, you lose the game if you make an illegal move. The end, ELO doesn't care about why you lost a game on purpose.

jart3y ago

1 more reply

charcircuit3y ago

ELO is based off who you win and lose against. The rate of illegal moves has nothing to do with ELO.

Pxtl3y ago

I'd be interested if it could be coaxed into legal moves after making an illegal one. "That is an illegal move. Can you do something legal with this board?"

saghm3y ago

vidarh3y ago

1 more reply

kmeisthax3y ago

Or append "If you wish to resign or you cannot think of a legal move, type 'resign'" to the end of the prompt.

1 more reply

jmull3y ago

The article also says in one game chatgpt when crazy so they continued the game with a fresh chat. That probably should have been counted a resignation loss too.

nextaccountic3y ago

> So whenever it wanted to make an illegal move, it resigned.

Making an illegal move counts as losing by the laws of chess, so this is essentially correct

dudeinjapan3y ago

hectorlorenzo3y ago

> My dog cannot, for instance, nor can my coffee table.

You must be giving them the wrong prompts.

ogogmad3y ago

[redacted]

1 more reply

ballenf3y ago

The illegal moves were counted as losses/resignations, not ignored.

__s3y ago

> So whenever it wanted to make an illegal move, it resigned.

Doesn't sound like ignoring the cases where it failed

Waterluvian3y ago

I’m going to float something ridiculous:

An illegal move is a valid play. You might not get caught. I think there are some Magnus games where illegal moves went overlooked and impacted the game.

You could interpret this as “ChatGPT wants to cheat sometimes.” But I personally interpret it as “ChatGPT doesn’t understand what it’s doing. It’s just a really really good simulacrum.”

hgsgm3y ago

Is this the top comment (and not even grey) because more people failed to read the article than read it?

whimsicalism3y ago

A baffling thread.

They quoted the article, so clearly they read it... but not very well?

sebzim45003y ago

It does seem that way.

whimsicalism3y ago

I'm confused. If you read the article, you know that you are wrong - but you are quoting the article?

psychphysic3y ago

That's how one uses any tool.

qwytw3y ago

The behavior of pretty much every other tool is much easier to interpret though.

kdmccormick3y ago

If the title of the article was:

> A trivial wrapper around ChatGPT has a Chess Elo of 1400

would you have any issue?

Afaict, the thesis is the article is not "ChatGPT is the ideal tool for playing AI chess", but "it is interesting how well ChatGPT can play chess with some very simple tweaks."

Out_of_Characte3y ago

ncallaway3y ago

> almost all people from ~300 elo can play without making illegal moves

Me: Moves pawn

Opponent: You can't do that, you exposed your king to check.

Me: Oops, sorry, you're right.

nsxwolf3y ago

Why do illegal moves forfeit? In online play, they're validated. You can't make illegal moves. What's the ELO score if ChatGPT is corrected, and chooses a new move?

hgsgm3y ago

All this above, and people are claiming that ChatGPT lacks human level comprehension of the text it consumes.

In Chess.com, you absolutely can attempt an illegal move, and many players do, and you will not get punished for it, so chatgpt is better then a 1400 human player.

sebzim45003y ago

ChatGPT did forfeit whenever it made an illegal move, read the article.

1 more reply

renewiltord3y ago

The article:

> So whenever it wanted to make an illegal move, it resigned.

You:

> By comparison, any player making an illegal move is forfeiting the game...

By comparison indeed.

FartyMcFarter3y ago

Unfortunately I'm rate limited at the moment so I can't try other games, but this looks like a solid method to evaluate how often ChatGPT plays legal / good moves.

vidarh3y ago

I gave it this prompt, and pasted the game. Its reply:

"45. Qxg6# This is a checkmate!"

hgsgm3y ago

OP explained that you need to prompt the whole game, not just a position.

ChatGPT is an LLM, not a game tree engine. It needs the move history to help it create context for it's attention.

WoodenChair3y ago

ptmx3y ago

mtlmtlmtlmtl3y ago

I'd be interested in seeing this game, if you saved it?

ptmx3y ago

I uploaded the PGN to lichess: https://lichess.org/rzSriO6I#97

After reviewing the chat history I actually have to issue a correction here, because there were two moves where ChatGPT played illegally:

1. ChatGPT tried to play 32. ... Nc5, despite there being a pawn on c5

2. ChatGPT tried to play 42. ... Kxe6, despite my king being on d5

It corrected itself after I questioned whether the previous move was legal.

1 more reply

nwienert3y ago

Me too, I couldn't get it to reliably go past move 15 without numerous errors. In my mind it's closer to 150 ELO than 1300, so I'd be happy to be proven wrong.

wongarsu3y ago

1: https://www.chessbase.in/news/Carlsen_Inarkiev_controversy

illiarian3y ago

But then it was one case out of thousands of games that Carlsen played. In the article ChatGPT is ding illegal moves in 3 games out of 19.

div723y ago

leereeves3y ago

Also, according to the link, the final ruling was that Carlson's move was not illegal.

EarthLaunch3y ago

An arbiter determined that Inarkiev made an illegal move and Carlsen did not. Interesting article.

johnfn3y ago

> Most likely it has seen a similar sequence of moves in its training set.

btilly3y ago

Sorry, but not every game is unique. The following game has been played millions of times.

1. e4 e5 2. Bc4 Bc5 3. Qh5? Nf6?? 4. Qxf7++

johnfn3y ago

You seem to be refuting a specific point of my argument which has little bearing on the overall point I was making.

1 more reply

mtlmtlmtlmtl3y ago

1850 ELO player and also chess AI programmer here. This is an oversimplification at best.

Many many games follow the same moves(1 move = 2 plies) for a long time, up to 30 moves in some cases, 20 moves is downright common and 10 moves is more common than not.

These series of moves are referred to as opening theory and are described at copious length in tons of books.

This is because while the raw number of possible paths to take is immense, the number of reasonable paths for 2 players of a given strength gets smaller and smaller.

If I went over the 300 or so classical tournament games I've played I would ballmark that maybe just one or two would deviate from all known theory in the first 10 moves.

So the criticism is valid in my view. The existence of copious chess literature can't simply be ignored here.

sebzim45003y ago

Yeah but now explain how it played a 61 move game.

EDIT: I checked and it left the lichess database after 9 moves. The lichess db has probably 5 orders of magnitude more chess games in it than chatGPT has in its training data.

2 more replies

johnfn3y ago

I'm very familiar with opening theory. Some of the games are 40 or 60 movies. This is not a regurgitation of book moves.

1 more reply

fsckboy3y ago

>playing a move from a different game in a new game is nonsensical

GP did say "sequence of moves", and if it matches what it has seen from the first move on, including the opponent, it will be in a valid "sequence of moves".

GauntletWizard3y ago

fxtentacle3y ago

For Bomberland, we were quite surprised how strongly we could compress and quantize the current game state and still get useful movement predictions.

qwytw3y ago

Is it though? I mean if you had data on millions of games what is the chance that you'd find one which has identical position that the one you're in (it's not like most moves are random..)

I wonder how well it could perform in Go, there are way more permutations there so finding an identical state should be more difficult.

jointpdf3y ago

You can click through the Lichess opening database (click the book icon, and then the Lichess tab) to get an idea: https://lichess.org/analysis

But the answer is insanely unlikely, past a certain number of moves. The combinatorial explosion is inescapable. Even grandmaster games are often novelties in <10 moves.

I also doubt that it has been trained on the full (massive) database of Lichess games, but that would be an interesting experiment: https://database.lichess.org/

sebzim45003y ago

>I mean if you had data on millions of games what is the chance that you'd find one which has identical position that the one you're in (it's not like most moves are random..)

Very low. On lichess when you analyse your games you can see which positions have been reached before, and you almost always diverge in the opening.

medvezhenok3y ago

You could certainly test this by making completely random moves and seeing whether it's more likely to make illegal moves in those positions.

melenaboija3y ago

If it isn not memorizing, how do you think is doing it?

LeanderK3y ago

by trying to learning the general rules that to explain the dataset and minimise its loss. That's what machine learning is about, it's not called machine memorising.

3 more replies

onos3y ago

Apparently not, at least at this level.

mrbungie3y ago

I would argue that a decent human player also has "similar sequences of moves in its training set", due to reading plus playing or watching games.

The illegal move argument is good though, and indicates no direct understanding of what it is spewing out.

circuit103y ago

I wouldn’t say it has no understanding whatsoever, it probably just doesn’t have the capability to keep track of everything that happened all the time

dwmbt3y ago

that's also not entirely true either. it's prone to make illegal moves. one could argue abiding by the rules of the game is a requisite for understanding!

imtringued3y ago

When I ask it to display moves in an ASCII board it messes up. I doubt that having a representation of the board is the key to improving its chess skills.

SanderNL3y ago

You don’t get to 1400 like that. The amount of moves it has to literally remember is stupendous.

Jensson3y ago

It doesn't remember exact moves, it remembers patterns of moves and continues those. That is why it so easily makes illegal moves, because the patterns doesn't have anything to do with board state.

lelandfe3y ago

Perhaps I don’t know what “easy” means in your comment, but the OP article says that they “almost always” played full, legal games with their prompt.

1 more reply

antibasilisk3y ago

but that's literally what most chess players do anyway?

1 more reply

SanderNL3y ago

Again, not enough for 1400. 800 maybe.

1 more reply

WoodenChair3y ago

Nobody who is 1400 plays outright illegal moves.

mrbungie3y ago

Yet it achieves 1400. Add hard rules to stop it spewing out said moves and you have a 1400 ELO Player (most UIs won't even let you make illegal moves).

It is difficult to say that is not impressive due to it being an emergent ability.

1 more reply

mtlmtlmtlmtl3y ago

SanderNL3y ago

Humans don’t, but that is an unnecessary bar IMO.

1 more reply

vanviegen3y ago

Does that still hold when the player doesn't have a board in front of them, but just a list of previous moves?

PoignardAzur3y ago

> Most likely it has seen a similar sequence of moves in its training set.

Wouldn't we expect a much higher rate of illegal moves if that was the case?

yowzadave3y ago

Doesn't ChatGPT indeed have a very high number of illegal moves?

https://www.youtube.com/watch?v=kvTs_nbc8Eg

medvezhenok3y ago

macNchz3y ago

WoodenChair3y ago

If there was an actual understanding of chess at a 1400 level we wouldn't expect any illegal moves.

scarmig3y ago

We would expect some vanishingly small fraction. Even at the grandmaster level, they happen (rarely):

https://chess24.com/en/read/news/the-7-most-illegal-chess-mo...

bloppe3y ago

This. The author is very generous with their interpretation:

> I decided to interpret that as ChatGPT flipping the table and saying “this game is impossible, I literally cannot conceive of how to win without breaking the rules of chess.”

Kind of sounds like anthropomorphization, but more likely the author just papering over the glaring shortcomings to produce a compelling blog post.

It also sounds like the illegal moves were rather frequent. The 61-legal-move game sounded like an impressive outlier.

1 more reply

oezi3y ago

I think there is very low percentage of players at elo 1400 who can provide a valid next move after seeing just the list of moves and not the current board state.

1 more reply

famouswaffles3y ago

2 more replies

xapata3y ago

Chess has a syntax.

lb4r3y ago

You can still have syntactically correct moves that are illegal.

1 more reply

xigency3y ago

> If it had any understanding of chess, it would never play an illegal move.

Plenty of people who have a basic understanding of chess would make an illegal move if they had no board to see and had to play only with notation.

whimsicalism3y ago

For some reason, seeing wrong comment after wrong comment like the above is just exhausting.

Why are people struggling so hard to understand that it's not just regurgitating its training set? Is it motivated reasoning?

Apologies if your comment was meant as parody of this view, it's hard for me to tell at this point.

liampulles3y ago

With respect to the points here on how illegal moves invalidate ChatGPT as a valid Chess AI, consider the following:

AI.v1: Feed ChatCPT the move history and ask for the next move.

AI.v2: Wrap AI.v1 with additional logic to try and translate any invalid response as a similar/close valid move, else treat it as a resignation.

I would say AI.v2 is a "valid" Chess AI since it obeys the rules of chess. Also I think it is reasonable given how minimal this "wrapping" is to attribute the underlying ability to ChatGPT.

Closi3y ago

It’s amazing how quick some people on HN are to dismiss how impressive it is for a generic language model to be able to play chess at any level - or indeed other feats of ChatGPT.

It’s utterly staggering that it can work as well as it can on problems like this at such an early stage in this technologies lifecycle.

lostmsu3y ago

With chatGPT you can also tell it the move is illegal and let it try again.

marcodiego3y ago

It may then reach superhuman level in any task simpler than chess which is enough to destroy many human jobs.

EDIT: From the article: "With this prompt ChatGPT almost always plays fully legal games." Relax: we're still far from that.

gfd3y ago

sapiogram3y ago

That's USCF ratings, chess.com ratings are massively inflated in comparison.

nottathrowaway33y ago

The median chess player is usually described as mediocre (if you ask chess players). They suck as badly as the median clarinet player in your high school band/orchestra.

jabloczko3y ago

There's a difference between chess.com and USCF ratings.

LeanderK3y ago

sebzim45003y ago

I'm sure the MuZero chess policy network would reach much higher than 1400, and that has no notion of recursion either. (And also wasn't taught the rules explicitly)

LeanderK3y ago

jonnycat3y ago

1... e5

Me: 1. e4 e5 2. Ngxd8+

2... Ke7

This is highly repeatable - I can make illegal non-sensical moves and not once does it tell me the move is illegal. It simply provides a (plausible looking?) continuation.

paxys3y ago

It is a language model. It cannot play chess. It can't even play tic-tac-toe, which was trivially solved by machines decades ago. Here's the result of a game I just tried:

X (me) – 0,0

O (chatGPT) – 1,1

X – 0,1

ChatGPT instead put my move at 0,2 for some reason

O – 2,2

Regardless of whether my move was at 0,1 or 0,2, I easily won. And after I won it just ignored the outcome and continued playing. Plus this was after telling it to play a perfect game.

the_af3y ago

This is GPT4, right? Because ChatGPT (GPT-3) still fails to provide a legal game of Tic Tac Toe with this prompt:

> "Let's play Tic Tac Toe. You are O, I'm X. Display the board in a frame, with references for the axes"

It failed to recognize that I won.

Then continued playing (past the end), played illegally over a move I had already done, obtained a line of 3 for itself, and still doesn't acknowledge the game has ended.

yieldcrv3y ago

No, it is GPT 3.5

and the real answer to every debate about capabilities comes down to "you're prompting it wrong".

dwaltrip3y ago

Can you describe how to prompt gpt to play a proper game of tic tac toe? I was disappointed gpt-4 didn’t really improve much on this front, at least with how I was prompting it.

Overall I’m very blown away by the generative AI developments. I’m also very curious about which types of prompts are most effective for each task, etc.

travisjungroth3y ago

If someone says that and shows a working prompt, that’s what the debate should come down to. It’s over. The person with the working prompt wins.

sumtechguy3y ago

For me when I tried it, it forgot the board half way thru and decided there were no more moves possible (4 left). Then forgot one of my moves.

It is an NLM it does not have the ability to reason. In my case the NLM 'next iteration' was a half a game with a missing move and 'nothing left to do'.

agolio3y ago

Chessnetwork did a video about this a few weeks ago FYI - https://www.youtube.com/watch?v=svlIYFpsWs0

Seems its strength is mostly in memorisation of the book openings. Still impressive nonetheless.

jonathanstrange3y ago

Not to nitpick but I don't think it can be said to be able to play chess when it also makes illegal moves.

chessexpeort13y ago

Humans make illegal moves too. So much that there's a rule for it--

"Penalty for Illegal Moves The penalty for the first illegal move is to add two minutes to the opponent's time. A second illegal move results in loss of the game.".

https://new.uschess.org/sites/default/files/media/documents/...

oehtXRwMkIs3y ago

Not even total beginners will try to move pieces that don't exist though.

imtringued3y ago

How many people change the colour of the enemy queen and place new pieces on the board?

martincmartin3y ago

When it makes an illegal move, and is told "that's an illegal move, choose a different move," if it can then come up with a different move for the same board, I'd say it plays chess.

wahnfrieden3y ago

qwytw3y ago

I did. It write 15-20 lines and then finished the functions with this:

" # Add more cases for other piece types such as rooks, knights, bishops, queens, and kings. # ... "

So not great..

But even if it wasn't so lazy it would just find a script (or multiple scripts) which say that they do that. Probably combine them or add some minor stuff and output the result.

It can't really verify that data it used to come up with this result is accurate or reason about it in any way? Can it?

So if you're incapable of verifying that yourself it would be pretty useless?

Reptur3y ago

meh88813y ago

Why would you think telling it to play a perfect game, predicting all next possible moves, would help?

That definitely made it worse.

Reptur3y ago

I'd love for you to explain why.

Tepix3y ago

I tried to play Othello with ChatGPT in the past (after ChatGPT confidently claimed it could play Othello) but it didn't work (ChatGPT refused to play).

I tried it with this prompt and it works a little bit but produces repeated illegal moves as early as move 4.

Bummer. Perhaps GPT-4 will be more capable.

Animats3y ago

This is a system with no look ahead. Any look ahead took place in the past, during training. It's amazing that this works.

I notice the prompt says "No explanation needed". What happens if you ask for one?

Andrew_nenakhov3y ago

I tried playing with it and it barely makes any correct move: forgets whos turn to move, doesn't observe the rules, incorrectly displays move results.

andrepd3y ago

Yeah, I call bullshit, i.e. selective deletion of terrible games. Ignore those when it plays catastrophically and it might look alright.

juice_bus3y ago

When I tried it at v3.0 i found after 5-10 moves it started moving illegally.

mjr003y ago

The AI has simply, and correctly, identified that cheating is the best way to win at something.

orbital-decay3y ago

It doesn't even know the rules, let alone cheat. It predicts the notation from the massive amount of games seen during training.

rzzzt3y ago

1 more reply

rzzzt3y ago

Tom 7's NES play function paused the game when it encountered an insurmountable problem: https://youtu.be/xOCurBYI_gY?t=950

skywal_l3y ago

The best move is not to play.

PoignardAzur3y ago

Did you use the same prompt method as OP?

MC_103y ago

Not sure how useful this is but I simply asked ChatGPT what it thinks its skill level is. This is using GPT-4.

After some previous questions...

Me: "Would you be able to estimate your ELO?"

Me: "As a generative AI, are you actually able to reason which move to make next in chess? Or are you somehow just predicting the next likely move based on training data."

oezi3y ago

Elo 1400 seems to equate to a top 30% player.

Source: https://chessgrandmonkey.com/chess-rating-percentile-calcula...

dwmbt3y ago

slight nit: this is true amongst all chess players. amongst professionals and amateurs, 1400 is technically Class C - which is on the lower end of amateurs.

nonetheless, it's cool to see a model 'generally intelligent' enough to beat most players. what's more interesting to me is that according to OP's analysis, GPT4 is worse at Chess than GPT3.5...

source: https://en.wikipedia.org/wiki/Chess_rating_system

qwytw3y ago

It would depend how often did this happen:

> Occasionally it does make an illegal move

Since it would lose every game in which it made more than a single illegal move.

sebzim45003y ago

Well it can't have happened more than 6 times since illegal moves counted as a forfeit and it only lost 6 games.

LudwigNagasena3y ago

Intuitively, I would guess 1400 is the rating of an average kid who spent a year studying chess seriously.

xapata3y ago

Not the ELO of the average kid who spends a year studying chess seriously? It's very difficult to define the average kid without context.

It's been a while since I read a global demographics update, but not all that long ago the mode child would have been living on $2 a day.

LudwigNagasena3y ago

Sure, it’s tricky, but I feel it’s more coherent than “all chess players”.

silveira3y ago

Ask him to play at a higher Elo.

sebzim45003y ago

It's already been told to play like a grandmaster, how much higher do you want to go?

nonethewiser3y ago

dwmbt3y ago

technically, you could supply it a specific player, a model like 'Stockfish' or 'super-grandmaster'[0]

[0] https://saintlouischessclub.org/blog/how-face-super-grandmas...

meghan_rain3y ago

I too noticed (for a usecase totally unrelated to chess, namely code generation) that ChatGPT3.5 gave better answers than GPT4. The 3.5 answer was exactly what I wanted, GPT4 was wrong.

Does thay mean we have plateaued?

alfalfasprout3y ago

imtringued3y ago

I think they invested the parameters into supporting multimodal inputs (images).

chpatrick3y ago

Good thing it's "incapable of reasoning"!

Longlius3y ago

It is incapable of reasoning, actually - at least in this case. It has no internal understanding of chess which is why it makes illegal moves.

RyanCavanaugh3y ago

What experiment would you run to determine if a given text input / text output interface had an "internal understanding of chess"?

gwright3y ago

What if you prompted with something like:

    Let's play a game chess.  Use the standard rules except that ....

jltsiren3y ago

I think internal understanding requires internal processing.

1 more reply

chpatrick3y ago

How did it win 11 out of 19 games then, blind luck?

root_axis3y ago

raw statistical power.

1 more reply

sebzim45003y ago

Mostly it didn't make illegal moves though, since illegal moves mean resignation and it won more than it lost.

Making 60 legal moves in a row in one game would be the coincidence of the century unless it had some knowledge of the rules of chess.

henryfjordan3y ago

2 more replies

baq3y ago

How do you know that? It has billions of parameters, some of them may well be for internal understanding of chess?

jupp0r3y ago

Is a normal chess program capable of reasoning?

wizzwizz43y ago

I would say so. Logical reasoning is not a hard thing to program a computer to do: the really hard part is to reason about the real world, rather than just a manually-defined world model.

Symmetry3y ago

chpatrick3y ago

In the narrow field of chess, yes.

1 more reply

fancyfredbot3y ago

andrepd3y ago

Lmao. ChatGPT can play chess, as long as you ignore the times when it can't x) Brilliant.

keskival3y ago

I gave it more time to think about its move by adding empty tokens to the end, and it played a lot better: https://www.linkedin.com/posts/terokeskivalkama_chatgpt-acti...

mtlmtlmtlmtl3y ago

There's a huge difference between 1400 elo in FIDE games versus 1400 on chess.com, which is not even using elo.

For instance the strongest blitz players in the world are hundreds of points higher rated on chess.com blitz versus their FIDE blitz rating. Chess.com and lichess have a ton of rating inflation.

CSMastermind3y ago

> the strongest blitz players in the world are hundreds of points higher rated on chess.com blitz versus their FIDE blitz rating

mtlmtlmtlmtl3y ago

I'm still going through the games but so far these games are not even close to elo 1400 level. For both the human player and the model.

manquer3y ago

Probably the bigger difference is ability to premove online

mtlmtlmtlmtl3y ago

I was thinking about this.

On chess.com you can chain premoves, on lichess you can't(afaik).

ht853y ago

Hikaru recently put out a video of a game vs the free ChatGPT.

Reading the explanations it gives with every move is really insightful. Basically polished, elaborate bullshit that might look reasonable to an ignorant audience.

Unless you are a conservative pundit, AI will not take your job anytime soon.

teej3y ago

I just deployed a GPT-4 powered chess bot to lichess. You can challenge it here: https://lichess.org/@/oopsallbots-gpt-4

singularity20013y ago

What happens if you add the rules to the initial prompt and something like: "Think step by step, explain your move and if it is illegal rethink"

dsjoerg3y ago

Very cool! Are you doing prompt engineering, fine-tuning, both, something else?

teej3y ago

I wanted it to be fun and actually complete games. I started with this, with a few minor tweaks: https://github.com/Tmate6/Lichess_ChatGPT_ChatBot/blob/main/...

This approach sends along the list of legal moves in the prompt if it attempts an illegal move. That seems to work well at getting playable moves.

sabujp3y ago

yea tried this several months ago, it's pretty bad and will make illegal moves

sebzim45003y ago

The claim in the article is that this prompt in particular has a much lower incidence of illegal moves than others.

nickpeterson3y ago

What are the possibilities for augmenting or synthesizing Lon’s with old school expert systems or other tools. Could it recognize when a problem fits some other tool and help map the inputs?

andrepd3y ago

ChatGPT playing chess: https://www.youtube.com/watch?v=GneReITaRvs

ricardo813y ago

Better than me then.

But does it give credit to who taught it. These models are basically a scrape of the best of humankind and a claim that it's their own.

spywaregorilla3y ago

Do you give credit to people you've played in the past when you play a game of chess?

ricardo813y ago

It would be difficult to remember or keep record of them but sure, if I'm learning from someone- I'll remember that.

spywaregorilla3y ago

That doesn't seem to be any more noteworthy than saying OpenAI knows what's in the corpus.

amelius3y ago

I'd be more interested to know how well it performs on a new game, i.e. one for which it didn't have terabytes of training data.

josh26003y ago

I want to know what the author's ELO is (forgive me if it was disclosed and I am unobservant).

Maybe ChatGPT trainers will eventually have ELO's.

aroo3y ago

I wonder how it would perform if you feed it a comprehensive chess rulebook, and to avoid all the illegal moves possible

Upvoter333y ago

chatgpt does not really know how to play chess, or understand chess. it understands how to sound like it plays chess.

comment_ran3y ago

paxys3y ago

armchairhacker3y ago

comment_ran3y ago

paxys3y ago

porphyra3y ago

What if you come up with a prompt to tell ChatGPT to figure out how to use APIs and external tools to help itself get better at chess?

1 more reply

stagger873y ago

Honest questions,

Isn't it very likely chess games were part of the training data? If so, chess sequences seem like an amazing matched filter of sorts. It doesn't seem surprising that it can guess a next move.

Is it teaching itself anything here? Is the model being updated in any way? Or are you talking more generally, looking into the future sort of statement?

paxys3y ago

layer83y ago

I think they meant “teach itself” in the course of a conversation, within the token limit. ChatGPT will forget what it has “learned” once the conversation is over.

famouswaffles3y ago

GPT-4 plys chess just fine. So you certainly cn

RC_ITR3y ago

This is actually a huge debate right now.

OpenAI is on the side of 'LLMs have only surprised us to the upside, so using crutches is counterproductive'

Whereas other people think 'Teaching an LLM to do arbitrary math problems through brute force is probably one of the most wasteful things imaginable when calculators exist.'

I'm actually very excited to see which side wins (I'm on team calculator, but want to be on team LLM)

TOMDM3y ago

I think I'm on both teams.

As GPT models get better at arbitrary computation, it stands to reason that their ability to use tools will rise as well.

Smarter models and smarter ways to get them using tools feel like complementary capabilities.

dwaltrip3y ago

How about a more human-like approach: the LLM designs a calculator and then makes use of that!

nroets3y ago

And Mathematica when given a math problem. And web search search for news.

And a good hacker can add these features to Alpaca and run it on his desktop...

Import all his WhatsApp chats and private emails into the corpus.

Add speech recognition and TTS.

oli56793y ago

I tried playing just now, with exactly the same prompt as the article.

It played an illegal move on move 4.

1. d4 d5 2. c4 e6 3. Nf3 nf6 4. g3 Bg4

xkcd19633y ago

It's like saying the agglomeration of media handling chess has an ELO of 1400

Nifty39293y ago

My conversation with ChatGPT on this:

What is the best way to prompt you to provide the best chess move?

In what format should I provide the current position?

To provide the current position, you can use algebraic notation. Algebraic notation is a standard notation used to describe the moves and positions on a chessboard.

rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP RNBQKBNR

rnbqkbnr pppppppp ........ ........ ........ ........ PPPPPPPP RNBQKBNR w KQkq -

In this case, "w" indicates that it is white's turn to move, "KQkq" indicates that both sides have full castling rights, and "-" indicates that there is no en passant square.

What about just providing a sequence of moves from the starting position? Would this be better or worse than your suggested approach?

Providing a sequence of moves from the starting position can also be a valid way to specify a position in chess. However, it may not always be the best approach, depending on the context.

In summary, both algebraic notation and a sequence of moves can be used to specify a position in chess, and the choice between them depends on the context and the specific needs of the situation.

jackphilson3y ago

Calling it: chess games in the future will be a battle between two prompts.

qwytw3y ago

Well we already had much better chess bots years ago to be fair.. People still play chess for some reason (not sure why)

sebzim45003y ago

Are you confused at the idea that someone might do something for fun?

jacobsenscott3y ago

j / k navigate · click thread line to collapse