From Regular Expression to Finite Automaton

For each regular expression — and I mean the three operators and the six recursive rules style — there is a finite automaton that accepts exactly the same strings. Since this is not a university book in mathematics, I’ll show you an inductive reasoning about this and not a formal proof.

The hypothesis is thus that for an arbitrary regular expression p, we can create a finite automaton that has exactly one start state, no paths into the start state, no paths out from the acceptance state and that accepts exactly the same strings that are matched by p.

  • The empty string ε is a regular expression corresponding to a finite automaton with a start state, a path that accepts the empty string ε and leads from the start state to an acceptance state. We’ll call this an //ε-path//.
  • The empty set Ø is the set equivalent to a regular expression that can’t match any single string — not even the empty string ε. It is the same as a two-state automaton, with no single path. One state is start and the other one acceptance. But, they are not linked.
  • A regular expression that only matches the symbol b corresponds to a finite automaton with two states: start and acceptance. There’s a path from start t acceptance, and it only accepts the symbol b.

All three finite automata above have two states. One is start and the other one is acceptance. The difference is that the first one has an ε-path from start to acceptance, the second one has no path, and the third one has a b-path. Now we’ll continue. Imagine that we have two regular expressions p and q corresponding to finite automata s and t respectively.

  • Concatenation of two regular expressions p and q means that we first match a string with p, directly followed by a string that’s matched by q. To create this finite automaton we first add ε-paths from every acceptance state in s to the start state of t. Then we deprive all acceptance states in s their acceptance status and we’ll also withdraw the start status of the start state in t.
  • Alternation of two regular expressions p and q, i.e. p|q is like a finite automaton with a new start state that has ε-paths to all start staes of s and t. The new finite automaton also has a new acceptance state that is reached with ε-paths from all acceptance states of s and t. The start and acceptance states of s and t are thus not start and acceptance state in our new automaton.
  • Kleene star is the concatenation closure. Assume that p = q*. Then s is the finite automaton we get if we take t and add two states and four paths as follows: One new state is the start state and the other one is an acceptance state. All acceptance states of ´s´ loses that status in s, but instead gets an ε-path to the new acceptance state. We add two ε-paths from the new initial state — one to the old start state and one to the new acceptance state. In addition to that, we insert one ε-path from each of the old acceptance states to the old start state.

Look at the pictures above. Then take a deep breath and feel if you can translate an arbitrary regular expression to a finite automaton. Finally assess the last picture where the regular expression (w|bb)* is depicted as a graph using the method described above. Does it feel reasonable?

Pomodoro Technique Illustrated -- New book from The Pragmatic Programmers, LLC

%d bloggers like this: