Regular Expressions – a brief history

Regular Expressions is a programming language with which we can specify a set of strings. With the help of two operators and one function, we can be more concise than if we would have to list all the strings that are included in the set. Where does Regular Expressions come from? Why is it called Regular and how does it differ from Regex?

The story begins with a neuroscientist and a logician who together tried to understand how the human brain could produce complex patterns using simple cells that are bound together. In 1943, Warren McCulloch and Walter Pitts published “A logical calculus of the ideas immanent in nervous activity”, in the Bulletin of Mathematical Biophysics 5:115-133. Although it was not the objective, it turned out this paper had a great influence on computer science in our time. In 1956, mathematician Stephen Kleene developed this model one step further. In the paper “Representation of events in nerve nets and finite automata” he presents a simple algebra. Somewhere at this point the terms Regular Sets and Regular Expressions were born. As mentioned above, Kleene’s algebra had only two operations and one function.

In 1968, the Unix pioneer Ken Thompson published the article “Regular Expression Search Algorithm” in Communications of the ACM (CACM), Volume 11. With code and prose he described a Regular Expression compiler that creates IBM 7094 object code. Thompson’s efforts did not end there. He also implemented Kleene’s notation in the editor QED. The aim was that the user could do advanced pattern matching in text files. The same feature appeared later on in the editor ed.

To search for a Regular Expression in ed you wrote g/<regular expression>/p The letter g meant global search and p meant print the result. The command — g/re/p — resulted in the standalone program grep, released in the fourth edition of Unix 1973. However, grep didn’t have a complete implementation of regular expressions, and it was not until 1979, in the seventh edition of Unix that we were blessed with Alfred Aho’s egrepextended grep. Now the circle was closed. The program egrep translated any regular expressions to a corresponding DFA.

Larry Wall’s Perl programming language from the late 80’s helped Regular Expressions to become mainstream. Regular Expressions are seamlessly integrated in Perl, even with its own literals. New features were also added to Regular Expressions. The language was extended with abstractions and syntactic sugar, and also brand new features that may not even be possible to implement in Finite Automata. This raises the question if modern Regular Expressions can be called Regular Expressions. I don’t think so. The term Regex denotes not only Kleene’s algebra but also the additions made by Perl, Java, Ruby, and other implementations.

Pomodoro Technique Illustrated -- New book from The Pragmatic Programmers, LLC

Advertisements


%d bloggers like this: