The Scent of Regex Requisite

Did you know that the famous quote “Some people, when confronted with a problem, think: I know, I’ll use regular expressions. Now they have two problems.” dates all the way back to 1997? However, most programmers agree that regex has its time and place. But, how can we know when to use regex? It’s really simple. We must use our nose and feel the scent of regex requisite. Below is a list of five scents that puts the R-word in our working memory.

Text to type

A text sequence is also a kind of data type. You may have read it from a file or perhaps a user entered it into your system. But you don’t confine yourself to text. You want to transform it into a bunch of structured data records. You read record after record from the text hank. Each record consists of a series of comma delimited fields and each record is terminated with a semicolon. Regex loves to parse text hanks.

Non recursive

A recursively-defined data type may be instantiated with values that contains values of the very same type. Think of fir branches. Each branch consists of one stem, zero or more sub branches, and many needles. The sub branches are fir branches as well. A branch may have a sub branch, which may have a sub branch, which may have a sub branch etc. In theory there is no limit to how many levels we can have. As soon as you want to translate text into recursive data — then regex is usually not the best tool. To parse an entire HTML document with nested div tags is an example of recursive data.

Not lucid

If the input is small, regex often doesn’t add anything. But, when you do search-and-replace in 2000 files and what you want to replace has a variable appearance — then a neat little regex is the generalized solution. You can capture different versions and replace them with something that actually depends on the input data. It is quality — no mistakes — and quantity — no misses. In a small input, you can modify by hand. You can easily see what should be changed and to what.


Suddenly it happens: you have input from a user, from the network, from another system or from a file. You can not predict what will come, more than that it’s text. It may be a lot and it may be a tiny little piece. Yes, it can even be an empty string. This very uncertainty makes the generalized description of the input data characteristics, useful. You describe a pattern, not a specific entity. Regex is a superhero when it comes to describing generalized patterns.

Complex logic

I’ve described before how 20+ lines of Java code could be transformed into one small regex. This is not a general law. Regex is a limited programming language suited to solve a very specific class of problems. However, in this case the imperative Java code had a lot of nested as well as consecutive conditional statements; if-else-if-if-else — i.e. complex logic. Regex is a declarative language. You describe what you want, and not how to get there. Thus, you don’t have to state all these scrubby paths.

Some of these five scents partly overlap, but each of them are well worth to remember. Facing a programming problem, if you can’t feel any of them, then you can be pretty sure that there are better tools than regex.

Pomodoro Technique Illustrated -- New book from The Pragmatic Programmers, LLC

About these ads


Get every new post delivered to your Inbox.

%d bloggers like this: