Recursive Regexes

Oniguruma supports recursive regular expressions. They can e.g. be used for matching a generative grammar like the following:

  1. expression -> expression + term
  2. expression -> expression - term
  3. expression -> term
  4. term -> ( expression )
  5. term -> digit

I wrote the following regex that corresponds to the grammar above:

  • /^(?<expression>(?<term>\d|\(\g<expression>\))([-+]\g<expression>)?)$/

Note that \g<expression> is invoked recursively – like rule number 4 in the grammar. I chose to only allow single digit numbers to make the expression crisper as example. This could of course easily be changed to general integers, floats or imaginary numbers.

Here goes some testing:

irb(main):001:0> r = /^(?<expression>(?<term>\d|\(\g<expression>\))([-+]\g<expression>)?)$/
=> /^(?<expression>(?<term>\d|\(\g<expression>\))([-+]\g<expression>)?)$/
irb(main):002:0> "2+4"[r]
=> "2+4"
irb(main):003:0> "2+45"[r]
=> nil
irb(main):004:0> "2-3(4+3)"[r]
=> nil
irb(main):005:0> "2-3(+3)"[r]
=> nil
irb(main):006:0> "2-31(+3)"[r]
=> nil
irb(main):007:0> "2-3+(+3)"[r]
=> nil
irb(main):008:0> "2-3+(4+3)"[r]
=> "2-3+(4+3)"

Expressions that return nil are obviously not correct.

Pomodoro Technique Illustrated -- New book from The Pragmatic Programmers, LLC


%d bloggers like this: