The Shannon Game: Guessing the Next Letter in a TextΒΆ

In the Shannon Game, a player tries to guess the first letter in a string. Eventually, after some guesses, the player makes a correct guess. Then, the player tries to guess the next letter. And so on, until all the letters have been revealed. For fun, try playing it at this website.

What if we want to make a computer program play the game? We can again use the structure of a rule-based classifier or predictor. Here, we have a guesser function whose inputs are a string of letters that have already been revealed, plus a list of tuples that represent the guessing rules. The output will again be an ordered list of letters to guess. One natural rule to use is that if the last letter was a ‘q’, the first thing to guess for the next letter is ‘u’. If that fails, the next best guess is ‘a’, since there are a few words in English that have the combination ‘qi’, and ‘i’ is the third best guess, if the text might include some transliteration of Chinese words or names.

In the code below, we implement a guesser function and a list of two rules. The first handles what to guess if the previous letter was ‘q’ and the second rule is the default or fallback case, just guessing all the letters in alphabetic order. Just to make the code a little easier to read, here the sequences of guesses is represented as a string rather than as a list of characters.

Let’s simplify that a little. Notice that in the above, the first rule considers only the very last letter in prev_txt (the most recent letter that was revealed). Suppose we restrict ourselves to guessing rules that consider only the last letter or last few letters in prev_txt. Then we don’t need the first element of a rule to be a function that takes prev_txt as an input. Instead, the first part of the rule can just be a string to compare against the ending of prev_txt.

For example, if the rule was (“ca”, “bdklmnp”), it would match any text where the last two revealed letters were “ca”.

With that simplification, the following code is equivalent to the code above.

Let’s see whether having that extra rule makes us any better at playing the Shannon game. Below we define a function performance that takes a text as input and a list of rules. It goes through the letters in the text one at a time. For each letter, it calls guesser to generate a list of guesses. Then it figures out how many guesses would be required to guess the next character, if it went through the list of guesses one at a time. It adds that to the running total. AT the end, we have a count of the total number of guesses that would be required by our guesser to play the Shannon game on the provided text. We’ve included a snippet of text from Sherlock Holmes to run the performance function on. Adding the rule of guessing U after Q reduces our total guesses, but only by a little, because the text we’re guessing doesn’t have very many qs in it.

Claude Shannon originally devised this game as a way to estimate the entropy of the English language. Suppose we had the very best guesser possible (perhaps some combination of a human who understands English and a computer program that does all kinds of statistical and computational wizardry.) Rather than just counting the total number of guesses it makes, as our performance function did, we could keep track of how often it took one guess, two guesses, three guesses, and so on. From those frequencies, we can infer probabilities (the fraction of all letters that required that many guesses). Then, a lower bound on the entropy of English can be computed from those probabilities, as argued in his classic paper: Shannon, Claude E. "Prediction and entropy of printed English." Bell system technical journal 30.1 (1951): 50-64.

We can see below that guessing in alphabetic order yields an estimate of 4.85 bits per character, much higher than Shannon’s estimate of around 1.3 bits per character. So we’ll need to make a better guesser!

Next Section - Training a Classifier/Predictor