Show HN: Can you beat an AI at "being human" using one word? (opens in new tab)

(turingduel.com)

1 pointsjacob_indie1mo ago4 comments

I built TuringDuel, a Turing Test game where each move isjust one word. It's based on a research paper called "A Minimal Turing Test". You play human vs AI until one hits 4 points; an AI judge scores each round.

I’m collecting data to benchmark different models as both players and judges (OpenAI / Anthropic / Gemini / Mistral / DeepSeek), but I only have ~45 games so far and need way more before publishing comparisons. (5 AI players and 4 judges at random gives 20 different game setups to evaluate)

It's fully free (I pay for all the tokens), not even a signup required for the first game: https://turingduel.com

Questions + criticism welcome! I will share aggregated results once there’s enough signal.

4 comments

chocolateteeth1mo ago

Some words were not allowed. Not a good look.

jacob_indieOP1mo ago

I did try a version without a dictionary (basically any string goes) but that gave the humans an unfair advantage: Typos or "sldihfs" would consistently win, and LLMs weren't able to replicate that behavior well enough.

Same goes for slurs, etc.

Sorry if a legit word wasn't possible, the dictionary I found and used is not perfect but was the best I could find.

Anyway, thank you for playing and your comment!

altmanaltman1mo ago

Kind of ironic "Poop" is the word that stands out the most. But having an AI judge it seems weird. To get a true benchmark, the judge must be a human who is susceptible to the 'irrational' cues (like 'Poop' or humor) that the original paper highlighted.

jacob_indieOP1mo ago

Thanks for the comment, I agree re the irony of having AI judges. Human judges would just not be feasible for now...

What is interesting though is that there are different judges and how they compare to each other (first looks at the data shows they are different).

Also, it is interesting to see how well the AI opponents and judges are picking up personality and clues based on round history. Some LLMs pick it up very well and counter humans, some are quite "dumb" and just submit random words.

Same for AI judges

I do store the reasoning of opponents and judges in the background but am not displaying it for the moment; maybe something interesting to add for later, but it would distort the data ;)

j / k navigate · click thread line to collapse

Show HN: Can you beat an AI at "being human" using one word? (opens in new tab)

(turingduel.com)

1 pointsjacob_indie1mo ago4 comments

It's fully free (I pay for all the tokens), not even a signup required for the first game: https://turingduel.com

Questions + criticism welcome! I will share aggregated results once there’s enough signal.

4 comments

chocolateteeth1mo ago

Some words were not allowed. Not a good look.

jacob_indieOP1mo ago

Same goes for slurs, etc.

Sorry if a legit word wasn't possible, the dictionary I found and used is not perfect but was the best I could find.

Anyway, thank you for playing and your comment!

altmanaltman1mo ago

jacob_indieOP1mo ago

Thanks for the comment, I agree re the irony of having AI judges. Human judges would just not be feasible for now...

What is interesting though is that there are different judges and how they compare to each other (first looks at the data shows they are different).

Same for AI judges

I do store the reasoning of opponents and judges in the background but am not displaying it for the moment; maybe something interesting to add for later, but it would distort the data ;)

j / k navigate · click thread line to collapse