undefined | Better HN

0 pointsgengelbro1y ago0 comments

Possible it's in the training set then?

0 comments

Authors note that this is probably the case:

> we wanted to verify whether the model is actually capable of reasoning by building a simulation for a much simpler game - Connect 4 (see 'llmc4.py'). > When asked to play Connect 4, all LLMs fail to do so, even at most basic level. This should not be the case, as the rules of the game are simpler and widely available.

bongodongobob1y ago

Wouldn't there have to be historical matches to train on? Tons of chess games out there but doubt there are any connect 4 games. Is there even official notation for that?

My assumption is that chatgpt can play chess because it has studied the games rather than just reading the rules.

mewpmewp21y ago

Good point, would be interesting to have one public dataset and one hidden as well, just to see how scores compare, to understand if any of it might actually have got to a dataset somewhere.

freediver1y ago

I'd be quite surprised if OpenAI took such a niche and small dataset into consideration. Then again...

mewpmewp21y ago

I would assume it goes over all the public github codebases, but no clue if there's some sort of filtering for filetypes, sizes or amount of stars on a repo etc.

j / k navigate · click thread line to collapse

0 comments

unbrice1y ago

Authors note that this is probably the case:

bongodongobob1y ago

Wouldn't there have to be historical matches to train on? Tons of chess games out there but doubt there are any connect 4 games. Is there even official notation for that?

My assumption is that chatgpt can play chess because it has studied the games rather than just reading the rules.

mewpmewp21y ago

Good point, would be interesting to have one public dataset and one hidden as well, just to see how scores compare, to understand if any of it might actually have got to a dataset somewhere.

freediver1y ago

I'd be quite surprised if OpenAI took such a niche and small dataset into consideration. Then again...

mewpmewp21y ago

I would assume it goes over all the public github codebases, but no clue if there's some sort of filtering for filetypes, sizes or amount of stars on a repo etc.

j / k navigate · click thread line to collapse