Build a Wordle Solver Using Rust

The Game

Wordle is a rel­a­tively sim­ple game. If you have ever played Mastermind, it should sound fa­mil­iar. The goal is to fig­ure out a mys­tery word with as few guesses as pos­si­ble. The mys­tery word changes each day. Here are two ex­am­ple guesses.

Two example guesses from the game Wordle

After a guess, each let­ter’s color changes.

Green — The let­ter is cor­rect. Yel­low— The let­ter ex­ists in the word, but not in that space. Gray — The let­ter does not ex­ist in the word.

As you can see, there are a max­i­mum of six guesses. If you can­not find the mys­tery word within six guesses, you lose. I have been com­pet­ing with my grand­mother each day to find the word in as few guesses as pos­si­ble.

XKCD Comic #356

This to­tally nerd-sniped me. I felt an over­whelm­ing urge to build an app that could, con­ceiv­ably find the mys­tery word in as few guesses as pos­si­ble.

How I Did It

First things first, we need a list of English words. I ini­tially used the corn­cob list, but I found greater suc­cess with dwyl’s list.

For this pro­ject, I de­cided to use Rust, just be­cause I felt most con­fi­dent in my abil­ity to make an MVP quickly.

Both the word lists I used are for­mat­ted as a se­quence of in­di­vid­ual words, sep­a­rated by \n char­ac­ters. On Windows (which is what I am us­ing), they also have pesky those lit­tle \r char­ac­ters.

Wordle is heav­ily fo­cused on let­ters. I can re­move items from the word list based on what let­ters I know aren’t in the mys­tery word (these are gray let­ters in-game), and I can re­move items based what let­ters I know are in the mys­tery word (the or­ange or green let­ters), but in a lot of cases that still leaves a lot of pos­si­ble words. I need to way to sort words based on how likely their let­ters are.

To do this, I count how fre­quently each let­ter ap­pears in the word list, and give each word a score based on how fre­quently its com­po­nents ap­pear.

The first step in the pro­gram is to load the word list and count the let­ters:

// Store the total number of times a letter appears.
let mut letter_scores = HashMap::new();
// The final list of words. It will make like easier later in the program to store the words as Vec<char>.
let mut word_list = Vec::new();
let mut last_word = Vec::new();
let file = std::fs::read("corncob_lowercase.txt")?;

// Iterate through all the bytes in the wordlist file, ignoring all `\r` instances.
for letter in file {
    let letter = letter as char;
    match letter {
        '\n' => {
            word_list.push(last_word);
            last_word = Vec::new();
        }
        '\r' => (),
        _ => {
            let entry = letter_scores.entry(letter).or_default();
            *entry += 1;
            last_word.push(letter);
        }
    }
}

Using the de­fault HashMap (which uses SipHash, which is­n’t great for sin­gle-char­ac­ter lookup), prob­a­bly is­n’t the best, per­for­mance-wise, but this is just a toy pro­gram, and does­n’t need to be the fastest thing in the world.

Next, we need to go through the word list, and elim­i­nate words that con­tain gray let­ters. Here is a func­tion that helps do that:

fn matches_found(
    word: &[char],
    found: &[char],
    not: &[char],
    must: &[char],
    masks: &[Vec<char>],
) -> bool {
    // Check if the word contains a letter we know *isn't* in the mystery. <-- The gray letters.
    for c in not {
        if word.contains(c) {
            return false;
        }
    }

    // Check if the word contains the letters we don't know the positions of, but know they are in the mystery word.. <-- The orange letters.
    let mut found_letters = 0;
    for c in must {
        if word.contains(c) {
            found_letters += 1;
        }
    }
    if found_letters < must.len() {
        return false;
    }

    // Check if the word has letters we know exist in the word, but not at the right spots. <-- The orange letters.
    for mask in masks {
        for i in 0..min(word.len(), mask.len()) {
            if word[i] == mask[i] {
                return false;
            }
        }
    }

    // Check if the word contains the already found (green) letters.
    for i in 0..min(word.len(), found.len()) {
        if found[i] != ' ' && word[i] != found[i] {
            return false;
        }
    }

    true
}

It ac­cepts a few dif­fer­ent char slices:

  • Word: the word we want to check. Found: this is a slice con­tain­ing the let­ters we have found (the green ones), with (space) char­ac­ters in the lo­ca­tions we don’t know the char­ac­ter of.
  • Not: this is a slice con­tain­ing the let­ters we know aren’t in the mys­tery word.
  • Must: this is a slice con­tain­ing the let­ter we know are in the mys­tery word, but we don’t know the po­si­tion of.
  • Masks: this is a se­ries of masks. We re­move every word that has let­ters that match any mask here. This is use­ful for elim­i­nat­ing words in the wordlist that con­tain punc­tu­a­tion and for elim­i­nat­ing words that con­tain or­ange let­ters, but in po­si­tions we know they aren’t.

Now all we have to do is run each word in the word list and see if it matches our al­ready known char­ac­ters, up­dat­ing the con­tents of each slice with new in­for­ma­tion af­ter each guess.

Why You Should Care

This sounds like a use­less prob­lem. It is. There is no way this will ben­e­fit any­one other than me, and I def­i­nitely won’t use this when I’m ac­tu­ally com­pet­ing with my grand­mother.

Then why did you do it?

Useless an­swers to use­less prob­lems are use­ful. They teach us how to im­prove, with­out the pres­sure of real stakes. They are also just plain fun.

It’s also a re­flec­tion. How would you have ap­proached this prob­lem in the past? How has your think­ing im­proved. Maybe it’s a bit mag­nan­i­mous to say this lit­tle Wordle solver is the key to self re­flec­tion, but I don’t think it’s that far off.


A Reflection From Months Later

Hi! I am re­turn­ing to this pro­ject months later with a few thoughts.

When I first wrote this ar­ti­cle, I com­pletely ne­glected to share my fit­ness test for each word. In hind­sight, it’s a good thing I did­n’t. It was the ex­act method 3Blue1Brown de­scribed as naive” in his (fantastic) video on this very topic.