Regex is not Regular

Regular expressions span exactly the same languages as the mathematical construction called Finite atomata. With the Pumping lemma, we can prove that it’s impossible to create regular expressions for many problems, e.g.:

  1. All palindromes
  2. All text strings that doesn’t consist of a prime number of identical characters

Since Regular expressions have become popular as extensions to imperative programming languages, new features have been added. Languages from Context-free grammars and Pushdown automata (essentially a finite automata with an attached stack), can be matched with these features. Regular expressions are Type-3 and Context-free grammars are type-2 in the Chomsky hierarchy.

To not be confused we call these built-in languages that are more than Type-3, Regex. Regexes are more powerful than Regular expressions.

Non regular prime number finder

With the (non regular) feature back-reference, it’s possible to create a regex that only match strings with a prime number of identical characters:

irb(main):001:0> r = /^.?$|^((.)\2+?)\1+$/
=> /^.?$|^((.)\2+?)\1+$/
irb(main):002:0> r.match "22"
=> nil
irb(main):003:0> r.match "333"
=> nil
irb(main):004:0> r.match "4444"
=> #<MatchData "4444" 1:"44" 2:"4">
irb(main):005:0> r.match "55555"
=> nil
irb(main):006:0> r.match "tttttt"
=> #<MatchData "tttttt" 1:"tt" 2:"t">

Non regular palindrome finder

If we also add the (non regular) feature recursion, we can match all palindromes:

irb(main):001:0> p =
/\A(?<palindrome>|.|(?:(?<prefix>.)\g<palindrome>\k<prefix+0>))\z/
=> /\A(?<palindrome>|.|(?:(?<prefix>.)\g<palindrome>\k<prefix+0>))\z/
irb(main):002:0> p.match "otto"
=> #<MatchData "otto" palindrome:"otto" prefix:"t">
irb(main):003:0> p.match "dallas sallad"
=> #<MatchData "dallas sallad" palindrome:"dallas sallad" prefix:"s">
irb(main):004:0> p.match "rais air"
=> nil
irb(main):005:0> p.match "1010"
=> nil

Advertisements


%d bloggers like this: