The Kleene Star operator

All finite languages ​​can be described by regular expressions. You can simply list the strings as an alternation string1|string2|string3 etc. But there are also some languages with an infinite number of strings that can be described by regular expressions. To achieve that, we use an operator we call Kleene star after the American mathematician Stephen Cole Kleene. If p is a regular expression, then p* is the smallest superset of p that contains ε (the empty string) and is closed under the concatenation operation. It is the set of finite length strings that can be created by concatenating strings that match the expression p. If p can match any other string than ε, then p* will match an infinite number of possible strings.

The real name of the typographic symbol (glyph) used to denote the Kleene Star is asterisk. It’s a Greek (not geek) word and it logically enough means little star. Normally, it has five or six arms, but in its original purpose — to describe birth dates in a family tree — it had seven arms. This is a very popular symbol. In Unicode, there are a score of different variants and many communities have chosen their own meaning of the asterisk. In typography it means a footnote. In musical notation, it may mean that the sustain pedal of the piano should be lifted. On our cell phones, we use the asterisk to navigate menus in touch-tone systems. So there is no world-wide interpretation of *. However, in this book it’ll always mean the operator Kleene star.

Do you want to see a simple example? The concatenation closure of one single symbol, such as a, is a* = { ε, a, aa, aaa,... }. Want to see a more academic example? The concatenation closure of the set consisting solely of the empty string ε is — well, the set consisting solely of the empty string ε* = ε. Want to see a more complicated example? (1|0)* = { ε, 0, 1, 01, 10, 001, 010, 011,... }. It may thus be different matches of the expression that concatenates. Can you write a regular expression that matches all binary strings that contain at least one zero? Or all binary strings with an even number of ones?

The operator Kleene star — pronounced /ˈkleɪniː stɑ:r/ klay-nee star — is unary, i.e. it only takes one operand. The operand is the regular expression to the left, which allows us to say that it’s a postfix operator. It has the highest priority of all operators and it is associative. The latter means that if two operators of the same priority are competing, then the operator closest to the operand wins. Since p** = p*, we say that the Kleene star is idempotent. And I want to emphasize again that p* = (p|)*, i.e. the empty string ε is always present in a closure. We’ll later on see that there is a very common — and none the less disastrous — mistake to forget this very fact.

Here are some possible answers to two the questions above:

'110'.match /(0|1)*0(0|1)*/ #=> #<MatchData "110"> - all strings with at least oe zero
'1111'.match /(0|1)*0(0|1)*/ #=> nil
'1001'.match /((10*1)|0*)*/ #=> #<MatchData "1001"> - all strings with an even number of ones
'11001'.match /((10*1)|0*)*/ #=> #<MatchData "1100">
''.match /((10*1)|0*)*/ #=> #<MatchData ""> - even the empty string has an even number of ones
'1001'.match /((10*1)|0)|/ #=> #<MatchData "1001"> - another way to express an even number of ones
'11001'.match /((10*1)|0)|/ #=> #<MatchData "11">
''.match /((10*1)|0)|/ #=> #<MatchData "">
'1'.match /((10*1)|0)|/ #=> #<MatchData "">
'010'.match /((10*1)|0)|/ #=> #<MatchData "0">
'01'.match /((10*1)|0)|/ #=> #<MatchData "0">

The positive closure operator + and the at least n operator {n,} are abstractions for expressing infinite concatenation. We’ll soon explore them in more detail.

Pomodoro Technique Illustrated -- New book from The Pragmatic Programmers, LLC

About these ads


Get every new post delivered to your Inbox.

%d bloggers like this: