Case in point: If you get 3 green 2 yellow in the first word, you can solve on the next guess.
Of course, you can constrain your strategies to "I always use the same two/three starting words", and in many cases that will be fine. But it's quite obviously not optimal.
Also, the optimal strategy must depend on your goal metric. Do you go for "least average guesses", "least maximum guesses", or "least average guesses while never losing"? There's lots of unstated assumptions in all the analyses thrown around...
I took one step further and calculated the optimal starting word for obtaining green matches (I assume that this also makes it likely to produce yellow matches, although I did not explicitly optimize for that).
Beginning with the full list of 5-letter words, I calculated the frequency of each letter of the alphabet in each of the 5 possible positions for a 5 letter word.
Then I iterated through the list a second time, this time assigning a score for each word equal to the sum of frequencies for each letter in its respective position.
By a significant margin, the highest score is SLATE (over 1400). Runners up (over 1300) are SAUTE, SHIRE, and CRATE.
Caveat: this approach assumes that all possible words are equally likely to be the answer.
[0] https://gist.github.com/popey456963/a654e98d0180566b897b70ee...
There’s a great many 5-letter words that the creator will never consider because they’re too obscure. Treating all 5-letter words as possible is a mistake when calculating strategy for this.
High-frequency letters will be over-represented in your analysis.
That's a very specific edge case, though.
I agree with you that the second word should vary depending on the first result, but picking the best first and second words require lots of analysis. The first word should be picked in order to open up the best options for second word, and while I believe the second word should not contain letters from the first, unless you got extremely lucky, the distribution of letters of remaining available words probably changes which letters you want to cover.
Failing to do the required analysis, my current strategy is to pick two words that cover the 10 most common letters in english, ETAOINSHRD. Sometimes it's "ethos" and "nadir", sometimes "thine" and "roads", etc. So far, it's worked well.
Definitely. The likelihood of a letter appearing in a given place changes depending on the letters around it. A Q will almost always be followed with a U, for example.
I wrote a script yesterday which spits out the relative probabilities of possible letters in each unknown position, given the current known/excluded letters -- it was interesting to see the effect in action.
An optimal guess is found by looking at the list of possible solutions and for each of those possible solutions checking how much each possible guess would narrow down the list of possible solutions.
Once you have a list of all possible guesses and how much each of those possible guesses would narrow down the total number of possible solutions for each possible solution, sort them by the maximum number of total remaining guesses.
The optimal first guesses from my solver are ARISE, RAISE, AESIR, REAIS, or SERAI because each will narrow down the possible word list to at minimum 168 remaining words.
Each guess after that uses the same algorithm with the list of possible guesses filtered with the information you learned in previous guesses.
edit: formatting
1. https://wordlesolver.com source: https://github.com/christiangenco/wordlesolver
2. https://math.stackexchange.com/questions/1192961/knuths-mast...
After reading your code, I see an error in it. In fact, only RAISE is optimal; the others are worse, and leave bigger lists (according to my code):
RAISE: 168
…
REAIS: 203
…
AESIR: 220
ARISE: 220
…
SERAI: 241
The error in your code is that your "evaluateGuess" function (right at the top, in the first 10 or so lines of https://github.com/christiangenco/wordlesolver/blob/9c3bd94a...): function evaluateGuess({ solution, guess }) {
return [...guess].map((letter, index) => {
return {
letter,
included: solution.includes(letter),
position: letter === solution[index],
};
});
}
is too simplistic, and not actually what the real game does. In the game, for each letter position, there are three possible responses:• Correct (Green), what you call "position"
• Present (Yellow), what you call "included"
• Absent (Grey)
Here are three test cases, that you could try out on the real Wordle in recent days:
• When the solution is "FAVOR" and our guess is "ERROR", Wordle's response is [Grey, Grey, Grey, Green, Green] — note that for the first two Rs in "ERROR", the correct response is Grey (Absent), because the last R has already "used up" the "Green" response.
• When the solution is "FAVOR" and our guess is "ROARS", Wordle's response is [Yellow, Yellow, Yellow, Grey, Grey] — note that only the first R in "ROARS" gets a Yellow response and the second one gets Grey, because there's only one R in the solution.
• (As pointed out by @pedrosorio in a sibling comment) When the solution is "ABBEY" and our guess is "APNEA", Wordle's response is [Green, Grey, Grey, Green, Grey], but your solver thinks that the second "A" would get a "Yellow" response too.
You mentioned Donald Knuth's Mastermind paper; in fact in the paper (http://www.cs.uni.edu/~wallingf/teaching/cs3530/resources/kn...) Knuth points this out on the very first page:
> Rule 2 is somewhat difficult to state precisely and unambiguously, and the manufacturers have in fact not succeeded in doing so on the directions they furnish with the game […]
and gives an exact rule that you may want to study carefully.
In my code, the `response` function I use (it's not the most efficient, but we can just memoize it) is:
def response(h, g):
'''
- The hidden word is h.
- The guess is g.
For each position in the word g, some color:
- 'green' if in the same position
- 'yellow' if present (after subtracting 'green's)
- 'grey' if absent (after subtracting "green"s and "yellow"s)
'''
assert len(h) == len(g)
L = len(h)
green = [i for i in range(L) if h[i] == g[i]]
yellow = []
for i in range(L):
# We want to check whether g[i] is "present" in h
if i in green: continue
for j in range(L):
if j in green: continue
if j in yellow: continue
if h[j] == g[i]:
yellow.append(i)
break
return (green, yellow)
Note the three "continue" statements — they are crucial, to match the behaviour of the real Wordle (or Master Mind) on the three test cases I mentioned above. def response(h, g):
assert len(h) == len(g)
L = len(h)
correct = [i for i in range(L) if h[i] == g[i]]
present_h = []
present_g = []
for i in range(L):
# We want to check whether g[i] is "present" in h
if i in correct: continue
for j in range(L):
if j in correct: continue
if j in present_h: continue
if h[j] == g[i]:
present_g.append(i)
present_h.append(j)
break
return (correct, present_g)
and now I too get (AESIR, ARISE, RAISE, REAIS, SERAI) all leaving 168 words. (But the testcases in the above comment still hold, though, make sure your code works for them.)For today's word (abbey):
- input "arise" (match "a", wrong position "e")
- solver says there are 10 possible words, one of them is "apnea"
- input "apnea" (match "a", match "e") EDIT: fixed
- solver says there are 0 possible words (it assumes last "a" should appear yellow, but the wordle page has the last "a" as grey because there is only one "a" in the word)
For instance, "waxed" is not suggested even though it is a legitimate word
5.291 soare
5.294 roate
5.299 raise
5.311 raile
5.311 reast
5.321 slate
5.342 crate
5.342 salet
5.345 irate
5.346 trace
5.356 arise
5.360 orate
5.370 stare
5.382 carte
5.390 raine
5.400 caret
5.402 ariel
5.406 taler
5.406 carle
5.407 slane
Shown are the twenty best initial guesses using Claude Shannon's definition of information entropy. Each number is the expected number of yes/no questions needed to resolve the remaining uncertainty. Shannon is the "father of information theory", and this is the right measure.One might recognize SOARE as identified elsewhere by a different measure. I am relieved that I give up very little by moving down to the first word I recognize on this list.
This takes about 20 minutes to code in Ruby, and four minutes to run. There's no point to using a more efficient language, or wasting part of an hour as I did searching for a clever way to score guess words.
In the 1960's my dad used entropy to program Jotto on Kodak's computers. It wasn't feasible then to evaluate every possible clue word, but he determined that one did well enough with a random subset.
I have a counter example for a simplified version the game with the following rule changes:
1. The player is only told which letters in the guess are correct (i.e. they are not told about letters that are present but in a different location).
2. If the player knows there is only one possible solution, the player wins immediately (without having to explicitly guess that word).
3. The set of words that the player is allowed to guess may be disjoint from the set of possible solutions.
Here is the list of possible solutions:
aaaa
aaab
aaba
babb
abaa
bbab
bbba
bbbb
(There are 8 words. The 2nd, 3rd and 4th letters are the binary patterns of length 3, and the 1st letter is a carefully chosen "red herring".)Here is the dictionary of words the player is allowed go guess:
axxx
xaxx
xxax
xxxa
(Each guess effectively lets the player query a single letter of the solution.)The information gain for each possible initial guess is identical (all guesses result in a 4-4 split), so a strategy based on information gain would have to make an arbitrary choice.
If the initial guess is axxx (the "red herring"), the expected number of guesses is 3.25.
But a better strategy is to guess xaxx (then guess xxax and xxxa). The expected number of guesses is then 3.
(In this example information gain was tied, but I have a larger example where the information gain for the "red herring" is greater than the information gain for the optimal first guess.)
I suspect a law of large numbers / central limit theorem type result that Shannon entropy is asymptotically optimal for randomly chosen lists, even those generated by state machines like gibberish generators that nearly output English words. In other words, I conjecture that your configurations are rare for long lists.
Early in my career, I was naive enough to code up Grobner bases with a friend, to tackle problems in algebraic geometry. I didn't yet know that computer scientists at MIT had tried random equations with horrid running times, and other computer scientists at MIT had established special cases with exponential space complete complexity. Our first theorem explained why algebraic geometers were lucky here. This is a trichotomy one often sees: "Good reason for asking" / "Monkeys at a keyboard" / "Troublemakers at a demo".
Languages evolve like coding theory, attempting a Hamming distance between words to enhance intelligibility. It could well be that the Wordle dictionary behaves quasirandomly, more uniformly spaced that a true random dictionary, so Shannon entropy behaves better than expected.
The best first guess using Shannon entropy is a dead heat between 12953 and 12539. Not surprisingly to the number theorists out there...
If you study the JavaScript, apparently this guy's girlfriend knows every prime.
I'd love to see scoring a Wordle guess added to one of those programming problem sites, as code golf in any language would be amusing. I was once a commercial APL programmer, so APL comes to mind.
It's not cheating, it's the better way. If you don't like it, you can turn on hard mode!
So, hard mode, plus no reusing grey letters, plus no using yellow letters in the same incorrect slot.
but i do slip up at times and jump to punch in a word that i later realize violates one of those yellow rules for instance
If you're going to write a program to optimise your guesses then why not just go to the source and pull the answer straight from the encoded answer list? I know people derive fun from these things in different ways but that's kind of where I landed.
Wordle's wordlist is interesting. There is a large (~10,000 words) list of words that it will accept as guesses but that will never be the answer, and a much smaller (~2500 words) list of words it will both accept and could be the answer.
My tool's simple algorithm scores words by taking the product of the frequency with which each of its letters appears in the wordlist multiplied by the frequency that the letter appears in that specific spot. When looking at the entire list of words the top five choices are:
1. tares 2. lares 3. cares 4. pares 5. dares
However, if you just look at the words that could be answers (which I what I've decided to do), the list changes to:
1. soare 2. saine 3. slane 4. saice 5. slate
In either case I've never had a puzzle where repeatedly choosing the first suggested word didn't get me to the answer in six or fewer guesses.
I would recommend not even trying to guess the word on attempt 2 - much better almost always to guess 5 totally different letters.
I didn't entirely succeed due to the obvious exponential runtime, but I tried a couple constraints: using only words in the dictionary of possible answers; hard mode; or "harder mode" where all guesses must be consistent with the information you have. (The actual hard mode is a weaker constraint, but I misinterpreted it as "harder mode" at first.) Harder mode is of course much easier to brute force, because there are fewer options.
Anyway I tried on several tuples: (max guesses, mode, dictionary size). What I found:
- (4, harder, large): no solution.
- (4, normal, small): no solution.
- (4, normal, large): didn't finish after like a day, but no solution found.
- (5, harder, small): this is pretty hard to guarantee a win; best starting word is SCAMP (-FLOUT-DEIGN if no hits).
- (6, harder, small): best staring word is PLATE (-CHURN-MOODY-SKIFF if no hits).
- (5, normal, small): didn't finish; best starting word so far is TRACE (-GODLY-SPUNK if no hits).
- (5, harder, large): didn't finish; best starting word so far is PALET.
- (6, harder, large): didn't finish; best starting word so far is SALET.
Hard mode really is hard. There are lots of clusters with many options, such as /.OUND/, /.IGHT/, /.ASTE/, /S.ORE/ etc. You can easily end up matching a cluster without enough freedom to solve it in time, especially if you start with common letters.
It isn't possible to guarantee a hard mode win by starting with with e.g. RAISE, because if you get e.g. RS in yellow and E in green, then you have 6 remaining words matching /S.ORE/ (plus 4 others), and you can't deal with more than one consonant per guess. Starting with TRACE is even worse: you can't even guarantee a win in 7 guesses due to the cluster /.ATCH/.
Whether or not you use the target word list, it's important to recognize that there *are* two separate lists: you can guess almost any plausible 5-letter word, even bullshit Scrabble words like SOARE. But the target word will always be a relatively common word, because the game is designed to be winnable for ordinary English speakers. Honestly the word list is still a little harder than I would have picked for a broad audience; eg REBUS the other day was on the obscure side.
It's also worth noting that the target word will never be a 3- or 4-letter word pluralized by adding -ES or -S. This is something that you could notice simply by playing for a couple weeks. These seem to have been excluded manually or by a regex: it does have plural words of other forms, and it has words with other endings like -ER or -ING.
One strategy which is obviously optimal is to use minimax (recursively, not like Knuth's Mastermind strategy which was mentioned by @christiangenco), however this strategy is not computationally feasible.
---
Aside: There is something broken with this site's history manipulation. When I open the blog post it creates two entries in the history list, and clicking the browser's back button takes me to a page with the same URL as the blog post that displays "Error Page Not Found".
Full brute force is allllmost computationally feasible, at least for some metrics. Like you could probably exhaust the search space in a month on a small cluster, at least if you're minimizing either (worst case, average case) or (pr(lose), pr(take 6 guesses), pr(take 5 guesses) ...). It's also significantly easier to brute-force in hard mode.
With some restrictions it's possible to do a full minimax. In that case, you still don't know if it's optimal, but you've at least got an upper bound. And it turns out, it's possible to guarantee victory while finishing in an expected 3.554 guesses. Alternatively, you can finish in an expected 3.212 guesses if you don't mind a small chance of taking more than six guesses.
I realise that this blog post is about a strategy for humans to use rather than for an ideal solver, so a degree of inexactness is to be expected. Still, I dislike the way the words "best" and "optimal" are thrown around. The guesses presented in the blog post are "best" in the sense that they maximise some heuristic (namely covering the most frequent letters), but the blog post doesn't explain why that heuristic is good.
Regarding an ideal solver, to prove that a strategy is optimal (in terms of the worst-case number of guesses) one would need to show two things:
- That the worst-case number of guesses of the strategy is N, and
- That there is no strategy whose worst-case number of guesses is less than N.
The first is quite easy to do (for an automated strategy): simply run the strategy against each possible secret "wordle" and keep track of the maximum number of guesses. The second is much harder.
(Since the worst-case number of guesses is likely to be small, this is not a fine-grained way to compare strategies. One could also look at the distribution of the number of guesses to get more information; for example a strategy that takes 6 guesses half the time and 5 guesses the other half is clearly better than a strategy that always takes 6 guesses. Still, it would be really nice if people who published automated strategies also published their worst-case number of guesses.)
Knuth's strategy is optimal (in terms of the worst-case number of guesses) for Mastermind, but it might not be optimal for Wordle. Knuth showed that his strategy takes at most 5 guesses for Mastermind, and presumably there is no strategy that takes at most 4 guesses. But his strategy is not optimal in terms of the distribution of the number of guesses; there are strategies with a lower average number of guesses (and the same worst-case number of guesses; see https://mathworld.wolfram.com/Mastermind.html). Reducing the set of possibilities as much as possible is a very sensible strategy, but it might not be optimal because the number of guesses required to solve a set depends on the nature of its elements, not just on the size of the set. In the case of Mastermind, this inefficiency does not affect the worst-case number of guesses; but in the case of Wordle it might.
One thing I think this analysis seems to be missing is taking into account letter position within the word set. If you take that into account, it actually shifts the first guesses, since you're far better off to discover a green square than a yellow one.
There are some pretty weird words that pop up if you do this analysis, so I've settled on some that are slightly suboptimal but won't make me feel like a robot every day.
I will say that "cares", in my analysis, is a significantly better option than the author's suggestion of aeros, because "e" is more common in position 4 than in 2, and a is much more common in position 2 than 1.
For me the most useful immediate information is about vowels, making "adieu" quite useful. But "irate" and "inter" are also pretty good.
Most people focus on the yellow and green results, but the grey results can be even more informative. A grey result doesn't just tell you about its own position, but about all five. So each grey result from the first guess rules out huge swaths of the dictionary.
The more letters from ETAOIN SHRDLU you can cram into the first two guesses, the better off you are. Thus, any letters in color on the first guess would be wasted if repeated in the second word.
Knowing what answers are actually "true" ahead of time is more information than I'm comfortable taking advantage of.
And yes, eliminating letters is crucial - in my opinion it's always better to guess 5 totally new letters for the second word.
Player 1) create a predictable process that can guess wordle words
Player 2) given the full code of 1 find a word that will not be found by that process in 6 steps
The game is over if someone can decide a process that catches all valid 5 letter words in 6 steps.
https://poetix.medium.com/playing-wordle-with-python-6750185...
I note that a very recent Wordle game used the same letter twice, rudely negating one of his otherwise quite reasonable assumptions there.
I've been playing Wordle for a while and while I'm no wordsmith, if you take an analytical strategy to word selection from line 2 onward it's never that difficult to get 3/6 (which I presume is the highest score attainable without significant luck).
English is one of the most irregular languages, with so many influences and variations, but it still has structure and common patterns. Once you have knowledge from a well selected exploratory 1st line, there's a lot you can deduct about potential variants in the word structure of the answer to make an informed 2nd line choice that'll be dual purpose: both being a good exploratory word and also a reasonably hopeful lucky guess.
Afterall, if you're aiming for 4/6 you're not really aiming very high.
I've done about 30 now and my highest amount of guesses was 5 when I had the "ank" of CRANK and from there it was pure guessing as there were multiple, equally likely (as I understand it) possibilities.
My best score was 2 guesses and it felt like a very hollow victory - just a lucky guess, really. 3 feels like you've done some work and yeah, 4 feels average.
In the example given, the first guess already told you there is an E, but then it’s not used in the next guesses. Figuring out the position of that letter, instead of trying to find a third one, will massively reduce your search space, I’m sure someone can do the math on this being more beneficial.
Yesterday I got one green one yellow in the first guess, and got the word in three steps from there. There are very few words that could fit after those two letters + all the excluded ones. You literally just iterate through the alphabet and possible words in your head while excluding the gray letters.
EDIT: just did today in 4 guesses again using this approach. Lucky streak?
My starting word is BACON.
I see what you did there
ARTSY MODEL CHUNK
They may not be the best words in the world, but I thought of them, so they are best to me.
Also, one version of Wordle people can play is to pick the best starting word every time, another involves starting with a different word every time.
1. Some sense of "par". Can you crawl twitter for everyone's solution tweets and get a sense of the average guesses for the day? 2. "Unwordle". For people who follow the hard-mode rules, how far can you get deciphering their guesses based on their shared color grid? Could you make it competitive between friends to see how well you can guess each others guesses? Would that encourage more creative guesses to trick your friends?
Someone in this discussion suggested using frequency analysis at position, which seems interesting, especially when trying to locate misplaced letters. I might have to try that.
The biggest problem is when your guessing dictionary contains words that aren’t in the acceptable word list in Wordle. For that, I generate 10 guesses on each turn. At least one of them should show up. I didn’t want to use Wordle’s dictionary because that felt like cheating.
/usr/share/dict/words
When I did the frequency analysis, I scoped it to only 5 letter words.If you're looking for a great "first guess word", "trace" is one of the words that will yield the most clues.
IRATE SOUND PLUMB FOUND (the answer)
In fact I nearly always play "SOUND" after "IRATE", even if I know it can't be correct, and in this case I knew PLUMB couldn't be correct - after the 2nd word I knew it was -OUND, but there are a lot of words ending in -OUND, so I wanted to eliminate as many as possible in the next move. PLUMB would either eliminate or indicate POUND MOUND or BOUND as the solution, and once it did the former, it was either FOUND HOUND WOUND, and I got lucky. In fact I should have played WHOMP which would have given me a better chance (given LOUND is not a word). In this sense the game is somewhat different to the coloured peg game "Mastermind", where there's no benefit in making a guess that can't possibly be correct.
- least time - fewest moves - fewest unique letters - etc.
Most interesting to me was solving in the least time and having fixed words to maximize letter coverage, which theoretically was 25/26 letters during first five guesses. Access to such a list would make it such that the goal is given those words, can we find the answer in less than some fixed time, e.g., one minute from start to finish.
Further interesting was attempting to find such a list without computer assistance, short of having access to word lists and filters, which ultimately led to a near optimal list of words for the first five moves with 24/26 letters:
** spoiler -> the (5) seed words are located at https://pastebin.com/D1DkzXA4 -- the password is gTgRCFrLYL
The 'a' is notably repeated, and worse, is in the same position; was planning to run a computer search to (1) explore if a perfect solution of five dictionary words with entirely unique letters exists &/or failing (1), then (2) is there a list with 24/25 unique letters or 1-2 repeated letters s.t. all letters are in different positions.
There are of course places where such a set of words with fairly maximal coverage still falls short -- a nearly so example is the word pair UNLIT and UNTIL. Determining how many such combos that might be nondeterministic with such a fixed set of starting words exist in the dictionary would be good if someone wanted to dig deeper, e.g., {ARISE, RAISE, AESIR, REAIS, SERAI} which was cited in another comment.
One is to try to maximize the expected information gain of your guess. If we have an initial set of 128 possible words, we begin with log2(128) = 7 bits of entropy. When we make a guess and receive a response, we narrow down the list of words to the set compatible with that response. If, for example, there are 32 words compatible with our response, then we now have log2(32) = 5 bits of entropy, and our guess was worth 2 bits. For a given guess, there are many possible replies each with its own information gain - in the best case, we get all greens, and are left with 0 bits of uncertainty (for a gain of 7), while in other cases we may get all greys, and learn comparatively little. Further, each reply has its own probability of occurring - all greens is only 1/128, but other replies might be more likely if there are several possible targets that would generate the same reply. Thus, we weight the information gain of each reply by its probability to arrive at the expected information gain for that guess. For the word list provided in the article, I get TARES as the best first guess by this metric.
The second strategy is to continue down the game tree, and find the guess with the lowest number of expected (or alternatively the lowest worst-case) number of guesses. In principle, we might find that while TARES gains us a lot of information as a first guess, it leaves us without a good second guess (since we are restricted to guessing real words, and those words might all be redundant in some way), and thus our total expected number of guesses is larger than if we had taken a slightly less informative first guess. My hunch is that in practice this sort of situation is unlikely to occur, and the best first guess by this metric is probably similar to that of the first metric, although I haven't tried.
E.g. assuming you make the best possible future guesses but your opponent has chosen the least convenient word and propitiating up these minimum and maximums.
I am trying to think if this can be tweaked?
So I Googled the last several solutions to find such a list and instead I found... a gist where someone had reverse engineered the game and put up a list of every solution, including for the next several months. On the plus side, I now don't spend time on the game.. but this is perhaps a warning to not go down the rabbit hole yourself if you want to keep playing ;-)
So you don't just have solutions for the next several months. You have all the solutions ever :)
There's probably more tuning I can do for the algo, but roughly:
- I took all the words from the site's js as the dictionary.
- From remaining eligible words, compute the letter distribution (ignoring letters you already know are in the solution).
- Pick a word that uses as many of the most frequent letters as possible.
- Use one of those as a guess.
The goal is essentially to greedily reduce the remaining candidate words as much as possible per guess.
I do wonder if looking at how a letter splits the space of letters and words would be interesting
I think if I work on this some more I'd try to factor in letter positioning when deciding what to guess. My hunch is that it won't make too much of a difference though.
If no answer, use reliable hints ( i.e on Twitter) to better inform the initial guess.
Beyond that, the best initial word goes beyond information theory, letter frequency, etc to also be a word that has a realistic chance of being word of the day. As a trivial example, AEIOU may reveal some information on the first guess, but I find it extremely unlikely that you will win Wordle in 1 with this guess.
Also added a command line version of wordle in case you want to play more and practice and a simulator I'm using now to explore optimal strategy more.
Considering that eg the letter e is used very frequently, one might try words with more the one e, for instance? Off course this does not mean that I have a different, best strategy, only that it might be more complicated.
https://github.com/nikitaborisov/autowordl
We choose the guess that is expected to result in the smallest set of possible solutions after one guess.
For the secret answer "QUERY" it suggests the following sequence of guesses:
1. LARES
90 possible answers after this guess.
2. GROUT
Only four possibilities now: ['ENURE', 'INURE', 'QUERY', 'QUIRE']
3. BRIBE
This narrows the field down to one possibility:
4. QUERY
The main problem I think is that the words certainly aren’t chosen purely randomly, so the actual letter frequency could be completely different…
From there it's simple to rotate if any of the vowels get picked up into words with common non-vowels like S and T etc.
If no vowel appears out of OUIA, I always go for a word like STREP.
Becomes relatively trivial from there.
Today (01/13/2022) the word was ABBEY.
I'll represent GREEN squares with * and GREY squares with ^
It went as follows:
OUIJA^ B^E^A^ST FA^B*LE^ ABBEY
After FABLE, the first word I could think of that had at least 1 A, at least 1 B, and at least one E was ABBEY.
[1] https://matt-rickard.com/wordle-whats-the-best-starting-word...
If your goal is to minimize the number of remaining words in the worst case, I think SERAI is the winner, leaving you with at most 697 choices. [Assuming my code is right]
To be more specific, I counted 882 possible words in the worst case using ARISE, versus 697 using SERAI.
/usr/share/dict/wordsDiscussed yesterday at https://news.ycombinator.com/item?id=29906892
Left:GJQVXZ