HTML5 Form Validation With Regex

Client side validation has always been a potential headache for front-end programmers. Embedded blocks with a mixture of imperative JavaScript and declarative regex can be a mess. HTML5 has ambition to add abstraction layers that would make this a bit easier. As I’ll explain below, theres’ still a long way to go before it’s rock solid.

There are two ideas that enters the scene now:

  1. The <input> tag has new type attribute values like url, email, date, telephone number, and color.
  2. The <input> tag has the new attribute pattern where you can describe allowed input with a regex.

Note that it’s only validation. It would have been nice to have filtering (e.g. remove spaces in a credit card number) or even replacing (euro is sent to server, whether the user enters euro or ).

In case (1) as well as (2), a nice red-green feedback lets the user know if the user entered text is correct. The tool-tip of the input widget can also have a descriptive message of what the system expects from the user. You just set a value of the title attribute. More on that below.

1. New values for the type attribute of the <input> tag

To use the type attribute is simple. Here’s an example with the new value email:

<input type="email" required />

This made me curious. I guess that email is implemented with a regex under the hood. What does it look like? I don’t know, but it’s not correct. As a matter of fact the spec for the email attribute value is incorrect. It looks like this:

A valid e-mail address is a string that matches the ABNF production 1*( atext / “.” ) “@” ldh-str *( “.” ldh-str ) where atext is defined in RFC 5322 section 3.2.3, and ldh-str is defined in RFC 1034 section 3.5.

So currently, the HTML5 browsers accepts the email -@- and doesn’t accept "staffan nöteberg"@rekursiv.se — I tried. It should be the other way around. (Yes, spaces and diaeresis makes sense to the left of the @ sign, as it’s a local mailbox routing that might involve a not so SMTP:ish system. For the record I tried…

echo 'hello!' |
  /usr/lib/sendmail '"staffan nöteberg"@rekursiv.se'

…and it works!).

However, even though it’s already implemented in many browsers, W3C makes it clear that it’s only a working draft. For the moment there’s a note in the document that they are aware of this error:

NOTE: This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the “@” character), too vague (after the “@” character), and too lax (allowing comments, white space characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

My recommendation is to NOT use the email attribute until it has a better implementation.

2. New attribute pattern of the <input> tag

The input tag has several new attributes to specify constraints: autocomplete, min, max, multiple, pattern, and step. I’m particularly interested in the pattern attribute. It’s more generic than the new values of the type attribute mentioned above.

The pattern value is a regex. In what regex dialect? Yes, you guessed it: JavaScript according to ECMA-262 Edition 5. This is a major drawback, since the regex support in JavaScript is modest (e.g. there’s even no meta class to match a letter — many other regex engines support the Unicode \p{L}). The whole user input must be matched by the regex, not only a fraction. You can look at it as if your regex is prefixed with ^(?: and suffixed with )$.

Here are three pragmatic (but not globally perfect) examples I created:

  • Strong password: <input title="at least eight symbols containing at least one number, one lower, and one upper letter" type="text" pattern="(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,}" required />
  • Email address: <input type="text" title="email" required pattern="[^@]+@[^@]+\.[a-zA-Z]{2,6}" />
  • Phone number: <input type="text" required pattern="(\+?\d[- .]*){7,13}" title="international, national or local phone number"/>

I leave it as a reader exercise to interpret these regexes. And you can try them too! They are online in this test page:

If you combine type="email" and pattern then both constraints must be fulfilled.

Summary

HTML5 form validation is a good idea. The pattern tag is very generic, albeit its rather limited regex dialect. Be careful with the new values of the type attribute, as they are only in prototype status currently.

Finally: What about browser support. I’m in deep water here, but I understand it as there’s support for this kind of validation in IE 10+, Firefox 8+, Chrome 16+, Opera 11.6+, and Opera Mobile 10+. There’s partial or none support in Safari and Android.

Pomodoro Technique Illustrated -- New book from The Pragmatic Programmers, LLC

Advertisements

19 Responses to “HTML5 Form Validation With Regex”


  1. 1 chris@leipold.ws 2012-03-2 at 13.48

    A regex that matches any correct email address is almost impossible to write; Jeffrey Friedl has an example in Mastering Regular Expressions which is 3 printed pages long.
    Your example would forbid an ip address as server name which is neither nice nor common but possible.
    So it would be really nice if browser vendors would implement s.th. useful …

  2. 2 Staffan Nöteberg 2012-03-2 at 14.06

    I agree with you, Chris!

    If front-end programmers want to stop the worst non-email entries with the pattern attribute, then it’s about writing a pragmatic regex, e.g. like my example above (good point however about IP domains). These kind of regexes will be different in different countries, applications, and cultures.

    If the browser vendors add an email abstraction, like type="email", then it should adhere exactly to RFC 5322, since it will potentially be used in any kind of application, anywhere in the world.

    // Staffan

  3. 3 Chris 2012-03-2 at 14.59

    I agree.
    And it seems I switched name and address in my previous comment.

  4. 4 Salagir 2012-03-5 at 08.22

    I totally agree that -@- isn’t the kind of email i want. But “name surname”@domain.com , I don’t wan’t it either.

    I think there are two kinds of emails : the one everybody knows and that is usual, aka [a-z0-9_+.-]+@(valid domain&tld) (ci)
    And the one that was cool for geeks 30 years ago but nobody will use today in reality : mails with quotes, spaces, ip after the @, etc.

    type=email should be checking the first one only. If I put something with space and quotes in my database, I’m sure the programs that will read this DB after will fail in their parsing.

    To be honest: we don’t need a code that works for 0.01% of the cases of us geeks being geeks. I don’t even complain anymore about mail validations that don’t accept “+”.

  5. 5 Staffan Nöteberg 2012-03-5 at 21.15

    In my opinion, [a-z0-9_+.-] is the past and Unicode \p{L}, \p{N} etc. is the future. In Swedish [a-z] is only a part of the alphabet and in many written languages it’s not even a part. Ideally the part to the left of @ should potentially be any ‘@’ free Unicode string — intended to be routed by the terminal mail server.

  6. 6 Salagir 2012-03-6 at 10.16

    It is another debate, but to me, emails should be looked as simple identifiers. Something easy to copy from a business card or even to remember. Like a domain name (that’s why i’m also against using utf8 in domain names).

    The complex and fun name can still be used on the left part of a “To:” Sir Jöhñ Åbæ-Ølýk’n Jr. <john.abae-olykn@gmail.com>

    Unicode is the future (even the present), but please, not in identifiers…

    The simplest reason ? I don’t know how to type Å or 漢 on my keyboard. If I absolutely need to be able to type them to send an email, I won’t send it.

    Sorry for being quite off-topic.

  7. 7 Rajesh 2013-01-15 at 13.49

    thanks for the validation


  1. 1 HTML5 Form Validation With Regex « Staffan Nötebergs blog | Know IT. Talk About IT. Trackback on 2012-10-21 at 10.48
  2. 2 » The HTML5 Diaries – Part 1 – Semantic Markup, Forms, Media and SVG Matthew Hughes Trackback on 2012-12-3 at 23.56
  3. 3 HTML5 code snippets to take your website to the next level | CatsWhoCode.com Trackback on 2013-03-4 at 16.09
  4. 4 HTML5 code snippets to take your website to the next level Trackback on 2013-03-5 at 08.56
  5. 5 Timothy Long | Basic Client-Side Validation with the HTML5 Pattern Element Trackback on 2013-06-11 at 01.05
  6. 6 HTML5 Form Validation: Using Required and Pattern Attributes | RailsThemes.com Blog Trackback on 2013-06-12 at 06.19
  7. 7 Hyper useful, ready to use HTML5 snippets | CatsWhoCode.com Trackback on 2013-09-23 at 15.34
  8. 8 Hyper useful, ready to use HTML5 snippets | JTB Productions Trackback on 2013-09-23 at 15.36
  9. 9 WP Magnet | Hyper useful, ready to use HTML5 snippets Trackback on 2013-09-23 at 16.00
  10. 10 Hyper useful, ready to use HTML5 snippets Trackback on 2013-09-23 at 19.02
  11. 11 Snippets (trechos) HTML5 prontos para usar e super úteis | Fernando Dias - Webmaster Freelancer Trackback on 2013-09-24 at 19.38
  12. 12 HTML5 Starter Template | lgit smartblog Trackback on 2013-11-20 at 15.42
Comments are currently closed.




%d bloggers like this: