Regular Expressions are patterns that match strings in text files or text streams. All strings that can possibly be matched by a particular expression are collectively a regular language. Every regular language has at least one corresponding finite automaton. The finite automaton is a very simple state machine.
In order to match regular languages we need only three operations – Les trois Mousqetaires de Regex:
- Kleene star, i.e * – repeat the preceding pattern zero or more times
- Concatenation – match two consecutive patterns, e.g. 73 match 7 followed by 3
- Alternation – match exactly one out of a list of patterns, i.e. logical OR
The precedence is as the list above. Sometimes, that’s not enough and thus we also need D’Artagnan, the fourth musketeer. I’m talking about parenthesis for grouping – just like in any other programming language.
The unix command-line tool egrep meets the regular expressions litmus test. It supports concatenation, Kleene star, alternation, and grouping.
What about Windows then?
There’s a built-in command-line tool in Windows called findstr. And findstr supports Kleene star and concatenation. That’s 2/4 of the mandatory operations we need to match regular languages. Unlike egrep, it lacks alternation and forced precedence with parenthesis – and thus you can’t use findstr for regular expressions.
What to do then if you want to write regular expressions at the command line in Windows? There are at least two alternatives:
- UnxUtils are native Win32 ports of common GNU utilities. Native means that the executables only depend on the Microsoft C-runtime (msvcrt.dll) and not an emulation layer. One of the executables is egrep.
- Cygwin is a Linux API emulation layer and a vast number of tools – among them egrep.
D’Artagnan and his three musketeer friends Athos, Porthos, and Aramis lived by the motto tous pour un, un pour tous. If you don’t have grouping and the three mandatory operations, then you can’t write regular expressions.