Can GPT-5 Beat My Favorite Daily Puzzle Game? (opens in new tab)

(nicksypteras.com)

10 pointsnsypteras6mo ago4 comments

4 comments

Very cool! The massive outperformance of GPT-5 looks like there is something different in their training data indeed. Considering their previous work on games, wouldn't be surprising if they generated some synthetic game data.

nsypterasOP6mo ago

Ya interesting thought - would be fascinating if generating games w/solutions is part of the training data pipeline. There's been previous work done on on testing LLMs on logic puzzles[1][2][3] so they could possibly be building off those ideas to improve performance.

[1] https://huggingface.co/papers/2504.00043 [2] https://huggingface.co/blog/yuchenlin/zebra-logic [3] https://arxiv.org/pdf/2403.12094

srekhi6mo ago

interesting - and thx for making reproducible

sonnynomnom6mo ago

grok 4's results lolol

j / k navigate · click thread line to collapse