So, it sounds like these two are in direct conflict. My solver is deterministic, and it looks like the adversarial is too, so they always play the same game:
S E R A I (0 green, 0 yellow, 429 words remain)
M O L D Y (1 green, 0 yellow, 35 words remain)
C E N T U (0 greem, 2 yellow, 2 words remain)
V O U C H (4 green, 0 yellow, 1 word remains)
P O U C H (5 green, 0 yellow, 0 words remain)Could you say more how you've determined your solver to be deterministic in 6 steps?
Also, I'm not sure I understand what this means:
> It selects a word to minimize the number of words assuming the worst-case green/yellow.
Particularly the part "assuming the worst-case green/yellow". Do you mind elaborating?
So, to generate a single guess, the algorithm will basically try every single word in the dictionary against every possible remaining word. For any given guess, it tracks the count of various possible scorings. The scorings are the positions of greens, and yellows for a total of 243 possible scorings. After looping through all possible remaining words, it finds the max occurrence count within the scoring array. That maximum count becomes that guess's overall utility. It then finds the overall guess that minimizes that utility. (See: https://en.wikipedia.org/wiki/Minimax) After typing this all out, I feel like I didn't do a great job explaining, so let me know what you think.
Yeah, I was trying to determine the utility function that would determine how to select words, but I'm more of a gut-feel, intuitive sort of person. I also tried a squared term like yours, but for some reason, it didn't feel right when I tested it. That version decided the best initial word was `LARES`. I have a sneaky suspicion that we need to account for how common a word was. I think my solver was getting to hung up worrying about BRAXY and CRURA, and giving them similar weight to a word like TRACK.
However, it was very hard to debug because a slightly buggy version was still decent at playing the game! In fact, I'm fairly certain I still have some bugs. I need to comment my code and get it up on Github. It's also super brute force O(n^2)
NARES
DOILY
TOUCH
And then VOUCH / COUCH / POUCH / GOUCH / MOUCH.I found my code a bit difficult to debug at the end because it's still halfway decent at playing wordle. So, I fully admit I might not be doing it right!
But one thing I also did was allow it to pick a word it knew wasn't possible but could eliminate a lot of possibilities quickly.
ARISE
MOLDY
COUNT
VOUCH
POUCH
"Full writeup coming another time but mainly it doesn't pick any single word, it maintains a list of possibilities and reduces the list as little as possible with each guess."
Was quite surprised to see two other takes on this idea when I was getting ready to post this! IIRC there are a few different undergrad CS programs that have you implement something like this for hangman.
...and a bit more explanation in the Twitter thread: https://mobile.twitter.com/zwegner/status/148011092775217561...
https://en.wikipedia.org/wiki/Jotto
Apparently it was invented in 1955.
If wonder if anyone still has any kind of patent or other IP that would be relevant...
S T E R N (0 green, 0 yellow)
P L A I D (0 green, 0 yellow)
M U C K Y (1 green, 0 yellow)
G O R G E (1 green, 0 yellow)
W O O F Y (3 green, 0 yellow)
B O O Z Y (4 green, 0 yellow)
B O O B Y (5 green, 0 yellow)
You guessed successfully in 7 guesses!
S T E R N (0 green, 0 yellow)
P L A I D (0 green, 0 yellow)
B O U G H (2 green, 0 yellow)
W O O Z Y (3 green, 0 yellow)
B O O K Y (4 green, 0 yellow)
B O O B Y (5 green, 0 yellow)
You guessed successfully in 6 guesses! T R I A D (0 green, 0 yellow)
H O N E S (0 green, 0 yellow)
C L U M P (0 green, 2 yellow)
B U L K Y (3 green, 0 yellow)
F U L L Y (4 green, 0 yellow)
G U L L Y (5 green, 0 yellow)
Looking at the similarities, I'm wondering if the adversarial nature will generally lead to higher-than-naively-expected frequencies of:Words with 'y', as people will tend to 'eliminate' the common vowels early, leaving 'y' as a 'substitute vowel' to fill in.
Words with repeated and doubled letters, as a round progresses and the pool ov available letters shrinks.
I wonder what other patterns might emerge from the ruleset?
F I L L S (0 green, 0 yellow)
W H E R E (0 green, 0 yellow)
D O U B T (0 green, 0 yellow)
M A N N A (3 green, 0 yellow)
C A N N Y (4 green, 0 yellow)
N A N N Y (5 green, 0 yellow)
A L L O Y (0 green, 0 yellow)
F E T C H (0 green, 1 yellow)
I N D E X (0 green, 2 yellow)
P R I Z E (3 green, 0 yellow)
B R I B E (3 green, 0 yellow)
G R I M E (5 green, 0 yellow)It looks like optimal play is 4 moves: https://twitter.com/zwegner/status/1480037275803308034
I'm really curious what the algorithm is find the 4 move play. Mine only optimizes for a single look-ahead.
I think five steps is as good as it gets, as 5*5 is about the size of the alphabet.
Edit: I stand corrected. YEARN (216) -> FLOUT (12) -> CHAMP (1) -> HUMPH.
It usually solves this in under 6 guesses. The guessing could be improved; at the moment it's random, but it could select for words with non-repeating letters to narrow down the search space faster.
An excellent chance to use my Wordle solver based on optimizing information gain (ie guessing to minimize the number of words remaining in a worst case scenario)
Since this is essentially Wordle golf, my score was 5.
RAVED SPLIT BUNCO BOOBY BOOZY
His dictionary is missing some of the SGB wordlist ...
The simple greedy algorithm choses NARES first, then DOILY, then TOUCH.
And then it is left with VOUCH, COUCH, POUCH, GOUCH, MOUCH. This solver algorithm is far from optimal. It is like a 1-ply chess solver.
https://everything2.com/node/superdoc/Everything+User+Search...
Then I learnt that the inventor of Wordle is named Josh Wardle, which could be a portmanteau of war and wordle.
Soft-solved in that he doesn't necessarily have a proof this is the optimal solution, nor an enumeration of all optimal solutions, nor the optimal play from any game state - so there's still plenty to be explored here
I wonder if the author can improve the adversarial methodology to prevent a solution in 5. Eg, being left with BLACK/FLACK/SLACK as options is worse than having TRACE/BRACE/TRACK/CRACK, because the former requires 3 guesses to solve and the latter can be done in two. (Maybe this example is impossible because of TRACT, but I'm certain this effect exists).
Same idea, different implementation.
Winner, winner, chicken dinner.
C R A N K (0 green, 0 yellow)
M Y T H S (0 green, 0 yellow)
P L U M B (0 green, 1 yellow)
L Y R E S (0 green, 2 yellow)
O I L E D (2 green, 2 yellow)
F I E L D (4 green, 0 yellow)
W I E L D (5 green, 0 yellow)
No automation (apart from the first two words that I use in real Wordle)