Dot — The Regex Barbapapa

Remember Barbapapa — Annette Tison’s and Talus Taylor’s children’s books and films from the 1970s? The hero was a pink, pear-shaped guy with the ability to take on almost any shape whatsoever. The equivalent in Regex is dot ..

Dot is a character class — a generic character. Instead of using literal characters, like 2, a or #, you can use dot to specify that you accept almost any character.

Ruby> 'mama 2 ##'.gsub /a|2|#/, '¤' #=> "m¤m¤ ¤ ¤¤"
Ruby> 'mama 2 ##'.gsub /./, '¤' #=> "¤¤¤¤¤¤¤¤¤"

There are two cultural problems with the dot, that is important to be aware of:

  1. The character class dot . and the closure function * are together and separately the most abused features of Regex. If you use them perfunctory, you’ll often end up with too general regexes — sometimes even incorrect. Every time you intend to write ., * or even .* You should consider if you really mean something more specific.
  2. A majority of Regex books, including the most popular one, are unclear or even entirely incorrect, as they claim that “dot matches any character.” In most cases dot matches “any character except line breaks.” It’s a very, very important difference.

Why doesn’t dot normally match line breaks?
The original implementations of Regex operated line by line. Programs like grep, handles one line at a time. Trailing line breaks are filtered out before processing. Hence, there are no line breaks. NASA engineer Larry Wall created Perl In the 1980s — the programming language that evangelized Regex more than anyhting else. The original purpose was to make report processing easier. What would then be more natural than to continue on the path of line-oriented work? Another argument is that the idiom .* would change meaning if dot matches line breaks. Perl set the agenda and now, a few decades later, we can only accept that dot typically don’t match line breaks, no matter what you and I believe is logical.

Ruby> "grey gr y gr\ny gray gr\ry".scan /gr.y/
#=> ["grey", "gr y", "gray"]

How can you force dot to match line breaks?
You set a flag. Unfortunately, this flag has different names in different Regex dialects. In Perl, it’s called single-line mode. Imagine what happens if dot matches all characters, including line breaks. Input data becomes a long line where the line break is a character like any other — hence the name. Single-line mode should not be confused with what in Perl is called multi-line mode. Multi-line mode affects the anchors $ and ^ and it’s orthogonal with single-line mode. To add more confusion, Ruby use the term multi-line line, when they mean Perl’s single-line mode. And the real multi-line mode is mandatory in Ruby — no flag available there. The best approach to this mess is if you and I call the flag Dot match all, no matter how it is written syntactically in different dialects. By the way, in Ruby you add m next to the Regex literal when we want the dot to match any character.

Ruby> "grey gr y gr\ny gray gr\ry".scan /gr.y/
#=> ["grey", "gr y", "gray"]
Ruby> "grey gr y gr\ny gray gr\ry".scan /gr.y/m
#=> ["grey", "gr y", "gr\ny", "gray", "gr\ry"]

And if there is no flag?
In some Regex dialects, most notably JavaScript, there’s no flag for dot match all. An workaround is to replace the dot with the idiom [\ s\ S]. This idiom matches exactly one character — either white space or anything that is not whitespace. These two classes are of course 100% of all the characters — including line breaks.

JavaScript> 'grey gr y gr\ny gray gr\ry'.match(/gr.y/g);
[ 'grey', 'gr y', 'gray' ]
JavaScript> 'grey gr y gr\ny gray gr\ry'.match(/gr[\s\S]y/g);
[ 'grey',
'gr y',
'gr\ny',
'gray',
'gr\ry' ]

Is dot to general?
I also argued above that the dot is often abused in our community. What does that mean? Imagine that you want to find all time strings in a text. You’ve got the following specification:

  • Time always includes hours and minutes, sometimes even seconds.
  • Hours, minutes and seconds are always written with two digits
  • You don’t have to ignore impossible numbers, such as minute 61.
  • Inbetween hours, minutes and seconds you’ll find one of the separators dot . or colon :.

The results of a simple regex like \d\d.\d\d(.\d\d)? might surprise you:

Ruby> "12:34 09.00 24.56.33".scan /(\d\d.\d\d(.\d\d)?)/
#=> "12:34 09", "00 24.56"

That’s not what you wished for. Dot matches space! If you replace the item with the more specific character class [.:] you aim closer to target. You mustn’t forget that dot inside a character class means that you literally want to match the dot character ..

Ruby> "12:34 09.00 24.56.33".scan /(\d\d[.:]\d\d([.:]\d\d)?)/
#=> "12:34", "09.00", "24.56.33"

Pomodoro Technique Illustrated -- New book from The Pragmatic Programmers, LLC

8 management ideas for 2013

It’s 2013 now — a new year — and you struggle with inspiration. How can I be a modern manager? Here goes eight management ideas you might want to put more focus on:

1. Autonomous Teams
An autonomous team has skills (cross-functional) and are empowered (self-organized) to make its own decisions. The team has clear constraints for its mission and works towards goals based on outcomes and impacts. Everyone must be comfortable with working in an autonomous team.

2. Beyond Budgeting
Swedish bank SHB has been managed for over 30 years without budgets. Norwegian Statoil is another similar example. Annual budgets encourage managers to focus on making the numbers instead of making a difference. The alternative is dynamic and relative targets, holistic reviews, dynamic forecasts, dynamic resource allocation, and being event-driven rather than calendar-driven.

3. Holistic Thinking
Effect on customer’s or the customer’s customer’s business is more important than whether the individual projects hits estimated time, quality and cost. Fewer parallel projects, less formal roles and more decisions just-in-time makes the organization more flexible to adapt to the prevailing reality. When allocation of individuals is limited to 70-80%, there’s even more room for dynamics. Collaboration and shared goals across the project boundaries increases the total effect. Profitability is more important than cost control.

4. Non-financial incentive models
Team-based incentive programs might reduce the individual’s willingness to corrupt the system. And incentives don’t always have to be financial. With creativity and by listening to the employee, we can discover completely different things that are highly valued by our employees. With digital social tools, some of the rewards may come from colleagues as real-time feedback.

5. Knowledge-creating
Innovation-driven product development is more long-lasting than maintenance driven product development. Successful innovation requires that people from all levels of the organization put effort in monitoring the external environment. New combinations of explicit knowledge need to be internalized and shared by all colleagues. To grow employee’s tacit knowledge (talent) rather than build formal processes (structural capital) gives us an outstanding capacity.

6. Real-time Performance
Rather than annual performance reviews, try 15-30 minutes coordination meetings every week or every second week with your employees. Focus on individual development, not individual measurement. Targets are based on outcomes and impact. The manager’s mission is to help employees achieve their goals by removing impediments.

7. Recruit the right people, rather than the right experience
Don’t overvalue experience from your own field, your tools and your processes when recruiting. Other proficiencies have high value, such as personal energy, ability to complete, ability to learn, social skills, and ability to help the team grow. The new employee shouldn’t only look for the best financial solution. It must be her strategic decision, that this is the best environment for me to grow. She values teamwork and aim for t-shaped skills: depth of related skills and expertise in a single field.

8. Transparency and Visualization
To make all employees feel really involved, decisions must be accessible to everyone. The fact that information is stored somewhere isn’t enough. Abstract views of the current state are visualized on walls of wonder, in office areas where most people are. The visualizations are used as decision support while prioritizing.

And finally, here’s a bonus idea: The office as a laboratory where there´s always small experiments under way.

Pomodoro Technique Illustrated -- New book from The Pragmatic Programmers, LLC

Regular Expressions – a brief history

Regular Expressions is a programming language with which we can specify a set of strings. With the help of two operators and one function, we can be more concise than if we would have to list all the strings that are included in the set. Where does Regular Expressions come from? Why is it called Regular and how does it differ from Regex?

The story begins with a neuroscientist and a logician who together tried to understand how the human brain could produce complex patterns using simple cells that are bound together. In 1943, Warren McCulloch and Walter Pitts published “A logical calculus of the ideas immanent in nervous activity”, in the Bulletin of Mathematical Biophysics 5:115-133. Although it was not the objective, it turned out this paper had a great influence on computer science in our time. In 1956, mathematician Stephen Kleene developed this model one step further. In the paper “Representation of events in nerve nets and finite automata” he presents a simple algebra. Somewhere at this point the terms Regular Sets and Regular Expressions were born. As mentioned above, Kleene’s algebra had only two operations and one function.

In 1968, the Unix pioneer Ken Thompson published the article “Regular Expression Search Algorithm” in Communications of the ACM (CACM), Volume 11. With code and prose he described a Regular Expression compiler that creates IBM 7094 object code. Thompson’s efforts did not end there. He also implemented Kleene’s notation in the editor QED. The aim was that the user could do advanced pattern matching in text files. The same feature appeared later on in the editor ed.

To search for a Regular Expression in ed you wrote g/<regular expression>/p The letter g meant global search and p meant print the result. The command — g/re/p — resulted in the standalone program grep, released in the fourth edition of Unix 1973. However, grep didn’t have a complete implementation of regular expressions, and it was not until 1979, in the seventh edition of Unix that we were blessed with Alfred Aho’s egrepextended grep. Now the circle was closed. The program egrep translated any regular expressions to a corresponding DFA.

Larry Wall’s Perl programming language from the late 80’s helped Regular Expressions to become mainstream. Regular Expressions are seamlessly integrated in Perl, even with its own literals. New features were also added to Regular Expressions. The language was extended with abstractions and syntactic sugar, and also brand new features that may not even be possible to implement in Finite Automata. This raises the question if modern Regular Expressions can be called Regular Expressions. I don’t think so. The term Regex denotes not only Kleene’s algebra but also the additions made by Perl, Java, Ruby, and other implementations.

Pomodoro Technique Illustrated -- New book from The Pragmatic Programmers, LLC

Give feedback on my Regular Expressions book prototype

I’ve published a prototype of an upcoming illustrated Regular Expressions book here: http://www.staffannoteberg.com/regexbook

Any feedback is very very appreciated. What would make this book more useful for you?

Best Regards // Staffan

GrafDok — Graphical Documentation and Visualizing community

Graphical documentation and visualizing are hot subject matters in the second wave of Agile. We have started a community in Swedish that we call GrafDok. So far we have a discussion list at Google Group and yesterday we had our first IRL gathering. The latter was very fun and worthwhile. There’s a report at http://grafdok.wordpress.com with loads of pictures and a few Swedish words from that event.

How can you join the GrafDok community? Join the discussion list:

What if you don’t speak Swedish? There’s a sister discussion list in English, started by the Belgian Agile profile Yves Hanoulle. You find it here:

Pomodoro Technique Illustrated -- New book from The Pragmatic Programmers, LLC

Interview on Time Management and Future Book Projects

Baris: Effectively managing your to-do list is a big part of the Pomodoro Technique. I really like the simplicity of having a super simple list with items grouped as “now”, “today”, “later”. Is the “now list” your invention? Please tell me the thought process behind it.

Staffan: I think it’s my invention, even though many other people most certainly have similar concepts. Even if you decide to focus on just one thing, your thoughts easily starts to wander now and then. Writing the title of your current activity on a slip of paper and putting it next to the keyboard reminds you with in a fraction of a second what it was.

I’m interviewed by Baris Sarer. The full text is here:

  • Part one: http://www.pomodorotime.org/pomodoro-technique-2/staffan-noteborg-interview-on-pomodor-technique-part-i/
  • Part two: http://www.pomodorotime.org/pomodoro-technique-2/staffan-noteborg-interview-on-pomodoro-technique-part-ii/
  • New inteview about Time Management by Turing China

    ‘I’ve used “give it a try for five minutes — I’ll start the kitchen timer now” with children in several situations where I expect that their major impediment is to get started’

    Turing China published a long interview with me in Chinese http://bit.ly/yTt4zP and in English http://bit.ly/A4q6MW.

    Pomodoro Technique Illustrated -- New book from The Pragmatic Programmers, LLC



    Follow

    Get every new post delivered to your Inbox.