magnusviri

Wordle

So I've played several different word guessing games. I am not good at word games. But they are fun so I play them.

Here's my Wordle notes.

TLDR

pares, monty, build, chawk (untried letters are gfvzjxq)

The time machine

Want to see the old words? Wordle Time Machine and Wordle Archive to the rescue.

Statistics

I wrote a script to parse the wordle word list. Here's some of what I found from it. As of this writing, there are 12,947 words in the list.

Letters

Here's the alphabet sorted by how many times a letter appears in a word. The letter "q" is in only 112 words.

I also tracked how often a letter appears in a certain position.

1   2   3   4   5  
s 1560 a 2260 a 1235 e 2323 s 3950
c 920 o 2093 r 1197 a 1073 e 1519
b 908 e 1626 i 1047 t 897 y 1297
p 857 i 1380 o 989 i 880 d 822
t 815 u 1185 n 962 n 786 t 726
...                  
z 105 f 24 h 120 j 29 q 4
q 78 q 15 j 46 x 12 v 4
x 16 j 11 q 13 q 2 j 3

It really surprised me to see how positions affect the popularity of the letters. 2 letters stick out, "s" and "y". They are very lopsided and occur in certain positions far more than others.

Overall the "s" is the 2nd most popular letter. It is the most popular character in the 1st and 5th positions. It is more than double the "e" in the last position. But in the 2nd, 3rd, and 4th position the "s" is astonishingly in 17th, 10th, and 9th place. 12% start with "s" and 23% of words end with "s". That's a lot of words. If you have an "s", it's probably the 1st or 5th character.

The "y" is even more lopsided. Overall it's 12th popular. But when you examine each position it is in the 1st, 3rd, and 4th position it is 22nd, 20th, and 23rd (out of 26). In the 2nd position it's 10. In the 5th position it gets a bronze metal and comes in at whooping 3rd.

Letters like "c", "b", and "p" came in 2nd, 3rd, and 4th in the 1st position, but are 13th, 18th, and 14th overall. Vowels aren't popular in the 1st position. But all 5 vowels were the most popular in the 2nd position and comprise about 66% of all words. The 3rd position was closest to the overall popularity, except for "e" and "s" are far less popular than overall. The 4th position has "e" more than twice the next letter, the "a".

Because the 5th position is the most lopsided position. 53% of words end with "s", "e", or "y". The 5th most popular letter, "5" is less popular (726 words) than any of the other positions (815, 1185, 962, and 786).

Words

If you allow repeating letters, there are 7 words that begin and end with "s".

The 8th and 9th words dont' have duplicate letters.

Side note, "James" has 3 of the most popular letters in the right place, "a", "e", and "s". But ironically, the "J" is the 3rd least popular letter coming in at 24th place.

Word sets

I am really bad at thinking of words that have certain characters. I've played Scattergories and lose badly every time. But given 4 out of 5 letters, I can often come up with the right word. So my Wordle strategy basically involves revealing the 19 most popular letters. That leaves the 7 least common letters hidden.

When I first started playing I used grep to find words that contain the most common letters in the English language and don't repeat any letters (until word 4). This reveals 19 letters. These are the 2 sets I came up with. I have used the 1st one for several weeks and had good luck.

But I wanted to know if I could find words with the letters in more popular spots. I had to script to do this for me (see below). I was surprised how many word sets there are. I picked this one.

If you wanted to reveal 22 of the characters, you can use this set. Beware, it only gives you one guess left.

The antre and inter sets reveal "g", which is more popular than "b" and "k", which is revealed by the pares set. But I think I prefer pares because I think a character in the right position is more valuable than revealing more popular letters.

The stats

Here's a csv. I could've made it a download but it's so small.

Letter,Pos1,Pos2,Pos3,Pos4,Pos5,Overall
a,736,2260,1235,1073,679,5983
b,908,81,334,242,59,1624
c,920,176,392,406,127,2021
d,681,84,390,471,822,2448
e,303,1626,882,2323,1519,6653
f,595,24,178,233,82,1112
g,637,75,362,422,143,1639
h,488,544,120,235,367,1754
i,165,1380,1047,880,280,3752
j,202,11,46,29,3,291
k,375,95,268,500,257,1495
l,575,697,848,771,475,3366
m,693,188,510,402,182,1975
n,325,345,962,786,530,2948
o,262,2093,989,696,388,4428
p,857,228,363,418,147,2013
q,78,15,13,2,4,112
r,628,940,1197,716,673,4154
s,1560,93,531,515,3950,6649
t,815,239,615,897,726,3292
u,189,1185,666,401,67,2508
v,242,52,240,155,4,693
w,411,163,271,128,64,1037
x,16,57,133,12,70,288
y,181,267,213,108,1297,2066
z,105,29,142,126,32,434

The script

If you want to play with the data yourself, here's the script I wrote.

Cheating with grep

When I first played Wordle, I couldn't help but cheat because I'm so bad at word games. I don't like losing and part of knowing how to use computers means doing things better and easier. Steve Jobs did say that computers are bicycles for our minds ya know. I'm just using my bicycle.

Here's how I used grep to solve Wordle. I'm on macOS so my grep doesn't have look-ahead or look-behind (I think the gnu version does--yeah I could install it with brew, I just haven't done it).

Get the word list

Download the wordle word list.

Or you can use /usr/share/dict/words, but there's words in there that aren't in the Wordle list (like "gawby") and there's words missing (like "betas", "fetas", and "zetas"). This command will save all 5 character words and convert to lower case.

egrep "^.{5}$" /usr/share/dict/words | awk '{print tolower($0)}' > words.txt

Counting words

You can count the words by piping to wc -l.

cat words.txt | grep e | wc -l
5705

Character classes

A character class lets you search for 2 characters and acts like an or operator.

cat words.txt | grep a | grep e | wc -l
1746
cat words.txt | grep "[ae]" | wc -l
9289

Note, egrep can do the same thing as character classes with the | (or) operator.

cat words.txt | egrep "a|e" | wc -l
9289

Green letters

Require letters in specific places. The "t" is in the 3rd position. The periods mean any character.

cat words.txt | grep "..t.."

Grey letters

This will remove all words with any of these characters.

cat words.txt | grep -v "[nr]"

Yellow letters

Require words with these letters. The position is not specific. Note, you can't use a character class here because you want to require all the letters.

cat words.txt | grep a | grep e

Remove words with letters from the specific locations with the -v flag. Use egrep and | (or) to specify multiple patterns.

cat words.txt | egrep -v "...e.|.a..."

Combine them.

cat words.txt | grep a | grep e | egrep -v "...e.|.a..."

Putting it all together

This is the regex I'd use if I tried the words, "antre", "solid", "chump", and "gawky". The only green letter was "t", in the 3rd position. Yellow letters were "a", "e", and "s". All others were grey.

cat words.txt | grep "..t.." | grep -v "[nrolidchumpgwky]" | grep a | grep e | grep s | egrep -v "a....|....e|s....|.a..."

That gives me these possible words. The last letter is either "b", "f" or "z".

betas
fetas
zetas

I can use a word with "b", "f" and "z" to find the last letter. Here's a grep to find a word with those letters.

cat words.txt | grep b | grep f | grep z

There's none. So let's search for "b" and "f".

cat words.txt | grep b | grep f

There are 49 words. Let's just pick one, "fable". That would be the 5th guess, leaving one last guess. If the "b" is present, the word is "betas". If the "f" is present, the word is "fetas". If the "b" and "f" are grey, then the word is "zetas".

Have fun.

Published: 2022-02-16, last edited: 2022-02-16

Copyright 2020 James Reynolds