
Client side validation has always been a potential headache for front-end programmers. Embedded blocks with a mixture of imperative JavaScript and declarative regex can be a mess. HTML5 has ambition to add abstraction layers that would make this a bit easier. As I’ll explain below, theres’ still a long way to go before it’s rock solid.
There are two ideas that enters the scene now:
- The
<input>tag has newtypeattribute values likeurl,email,date,telephone number, andcolor. - The
<input>tag has the new attributepatternwhere you can describe allowed input with a regex.
Note that it’s only validation. It would have been nice to have filtering (e.g. remove spaces in a credit card number) or even replacing (euro is sent to server, whether the user enters euro or €).
In case (1) as well as (2), a nice red-green feedback lets the user know if the user entered text is correct. The tool-tip of the input widget can also have a descriptive message of what the system expects from the user. You just set a value of the title attribute. More on that below.
1. New values for the type attribute of the <input> tag
To use the type attribute is simple. Here’s an example with the new value email:
<input type="email" required />
This made me curious. I guess that email is implemented with a regex under the hood. What does it look like? I don’t know, but it’s not correct. As a matter of fact the spec for the email attribute value is incorrect. It looks like this:
A valid e-mail address is a string that matches the ABNF production 1*( atext / “.” ) “@” ldh-str *( “.” ldh-str ) where atext is defined in RFC 5322 section 3.2.3, and ldh-str is defined in RFC 1034 section 3.5.
So currently, the HTML5 browsers accepts the email -@- and doesn’t accept "staffan nöteberg"@rekursiv.se — I tried. It should be the other way around. (Yes, spaces and diaeresis makes sense to the left of the @ sign, as it’s a local mailbox routing that might involve a not so SMTP:ish system. For the record I tried…
echo 'hello!' |
/usr/lib/sendmail '"staffan nöteberg"@rekursiv.se'
…and it works!).
However, even though it’s already implemented in many browsers, W3C makes it clear that it’s only a working draft. For the moment there’s a note in the document that they are aware of this error:
NOTE: This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the “@” character), too vague (after the “@” character), and too lax (allowing comments, white space characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.
My recommendation is to NOT use the email attribute until it has a better implementation.
2. New attribute pattern of the <input> tag
The input tag has several new attributes to specify constraints: autocomplete, min, max, multiple, pattern, and step. I’m particularly interested in the pattern attribute. It’s more generic than the new values of the type attribute mentioned above.
The pattern value is a regex. In what regex dialect? Yes, you guessed it: JavaScript according to ECMA-262 Edition 5. This is a major drawback, since the regex support in JavaScript is modest (e.g. there’s even no meta class to match a letter — many other regex engines support the Unicode \p{L}). The whole user input must be matched by the regex, not only a fraction. You can look at it as if your regex is prefixed with ^(?: and suffixed with )$.
Here are three pragmatic (but not globally perfect) examples I created:
- Strong password:
<input title="at least eight symbols containing at least one number, one lower, and one upper letter" type="text" pattern="(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,}" required /> - Email address:
<input type="text" title="email" required pattern="[^@]+@[^@]+\.[a-zA-Z]{2,6}" /> - Phone number:
<input type="text" required pattern="(\+?\d[- .]*){7,13}" title="international, national or local phone number"/>
I leave it as a reader exercise to interpret these regexes. And you can try them too! They are online in this test page:
If you combine type="email" and pattern then both constraints must be fulfilled.
Summary
HTML5 form validation is a good idea. The pattern tag is very generic, albeit its rather limited regex dialect. Be careful with the new values of the type attribute, as they are only in prototype status currently.
Finally: What about browser support. I’m in deep water here, but I understand it as there’s support for this kind of validation in IE 10+, Firefox 8+, Chrome 16+, Opera 11.6+, and Opera Mobile 10+. There’s partial or none support in Safari and Android.

A regex that matches any correct email address is almost impossible to write; Jeffrey Friedl has an example in Mastering Regular Expressions which is 3 printed pages long.
Your example would forbid an ip address as server name which is neither nice nor common but possible.
So it would be really nice if browser vendors would implement s.th. useful …
I agree with you, Chris!
If front-end programmers want to stop the worst non-email entries with the
patternattribute, then it’s about writing a pragmatic regex, e.g. like my example above (good point however about IP domains). These kind of regexes will be different in different countries, applications, and cultures.If the browser vendors add an email abstraction, like
type="email", then it should adhere exactly to RFC 5322, since it will potentially be used in any kind of application, anywhere in the world.// Staffan
I agree.
And it seems I switched name and address in my previous comment.
I totally agree that -@- isn’t the kind of email i want. But “name surname”@domain.com , I don’t wan’t it either.
I think there are two kinds of emails : the one everybody knows and that is usual, aka [a-z0-9_+.-]+@(valid domain&tld) (ci)
And the one that was cool for geeks 30 years ago but nobody will use today in reality : mails with quotes, spaces, ip after the @, etc.
type=email should be checking the first one only. If I put something with space and quotes in my database, I’m sure the programs that will read this DB after will fail in their parsing.
To be honest: we don’t need a code that works for 0.01% of the cases of us geeks being geeks. I don’t even complain anymore about mail validations that don’t accept “+”.
In my opinion,
[a-z0-9_+.-]is the past and Unicode\p{L},\p{N}etc. is the future. In Swedish [a-z] is only a part of the alphabet and in many written languages it’s not even a part. Ideally the part to the left of @ should potentially be any ‘@’ free Unicode string — intended to be routed by the terminal mail server.It is another debate, but to me, emails should be looked as simple identifiers. Something easy to copy from a business card or even to remember. Like a domain name (that’s why i’m also against using utf8 in domain names).
The complex and fun name can still be used on the left part of a “To:” Sir Jöhñ Åbæ-Ølýk’n Jr. <john.abae-olykn@gmail.com>
Unicode is the future (even the present), but please, not in identifiers…
The simplest reason ? I don’t know how to type Å or 漢 on my keyboard. If I absolutely need to be able to type them to send an email, I won’t send it.
Sorry for being quite off-topic.
thanks for the validation