How to recognise an E-mail address with (a variation of) Ludo

Published on Thursday, 02. September 2021

This post is about one of my favourite concepts in Computer Science. It always frustrates me, when I think back on how complicated it was introduced at University. So many people interested in Computer Science will see its mathematical definition and the theory behind it and quit learning about it before they even get started. But it's such a great concept to get people excited about Computer Science. The concept I'm talking about is called finite state machines. In essence, understanding them isn't much harder than understanding how to play a game of Ludo.

What you see above is a snippet of a simplified Ludo board. Each circle represents one of the squares on the field. The arrows are included to show that you can move between the squares. What I'm going to do in this post is to change the rules of Ludo step by step to end up with a game to detect email addresses (Don't worry, it will make sense later). The first change is a simplification of the rules. The version of Ludo I'm constructing is a single-player game. You (the player) only has on token, no player's yard and no home column. Instead, the token directly starts on the starting square. Marking it with an "S" and the finishing square with a double-lined circle, the complete board now looks like this:

Next, I'm adding the rule that whenever you cast your die with a move you can't execute completely, you've lost immediately. This happens for example when your token is placed on square 3 and you're casting a two or a three. With this board and set of rules, you are winning the game only when all your casts add up to 4. At this point, the use of the word "game" is purely academical.

Now comes the most complicated step. The actual game will stay exactly the same, but I'm changing how its board looks like. In Ludo, you have just one line of squares you are moving on. In the simplified version I used here, I marked this by arrows between the squares (or states, as I'll call them from now). Each arrow means that you can move one step. Each turn, you are moving as many steps as your cast die shows. In our new version each arrow (or edge from now in) is assigned a value. You can only move along an edge if your die shows the edge's value. You are also only making at most one move per turn. To make the game equivalent to the one before, new edges need to be added. For example, to support the case where you are casting a 4 from the starting square, a direct edge with value four will be added between state "S" and state "4". When all new edges have been added, the new board looks like this:

If you just want to play Ludo, drawing the board like this is a horrific overcomplication. But while the board has become more complicated, the rules actually became easier. Instead of moving along multiple edges per turn depending on your cast, you move along exactly one edge. You can also directly see to which state you can move from any other state. For Ludo, this simplification doesn't matter. For finite state machines, it does. Because now, we can replace the numbers on the die with anything we want. Doing this in Ludo would require a definition of adding up letters. This doesn't make really sense (in this case, anyhow). Replacing the 1 with "l", 2 with "o", 3, with "f", and 4 with "x" creates the following board:

Notice how I left the state names unchanged. Their names aren't important. What matters are the state we're starting in, and the final state marked by the double-lined circle.

Let's come back to our die whose faces now show letters. Assume that it gets cast multiple times in a row. For each cast you are writing down the letter. This gives you a word (albeit not necessarily a meaningful one). With this, the board we have created above now defines a language. A language in the sense that is used here is defined as a set of words that all fulfil a certain condition. One example for such a condition is that the word is in the English dictionary. But a list of all female first names starting with the letter D is a language as well. The language that is defined by our board is the list of words that, when interpreted as a series of die casts and played on said board ends up in the final state. This is exactly what finite state machines are doing.

Finite state machines are only one mechanism to check such a condition. They also can only be used to check a certain subset of all conditions one can come up with. For example, to check if a word is a female first name, we need to know something about the world that can't be easily encoded on a board like this. In the context of Computer Science, this brings us neatly to the wonderful world of computational complexity and concepts like the Chomsky hierarchy. Unfortunately, these topics go beyond the scope of this post. So I'm leaving you with one final state machine instead:

Can you guess what this machine accepts? As the title of this post already spoils, this is the finite state machine that accepts all possible emails addresses (only slightly simplified).