validating forms (and more) with the html5 pattern attribute
DESCRIPTION
In the past, validating forms in the client has typically required doing some heavy lifting with JavaScript. But you may not know HTML5 changes all that. Browsers now check that the content of an input match its type (and we've got new types like email, url and number to make that even more useful). But, what you might not know about is the pattern attribute, which lets us use regular expressions directly in HTML to specify what format the user's input should have. In this session, Chris Lienert will look at some of the common regex patterns you can use to validate user input, coupled with some of the many tricks he's learned to help users complete those forms we all love to hate.TRANSCRIPT
Validating Forms(AND MORE)
WITH THEHTML5 Pattern
Attribute
@cliener
HTML5 RECAP
http://wufoo.com/html5/Before I get stuck into patterns, I’ll quickly go through some of the important parts of HTML5 forms. If you want to play along at home, have a look at Wufoo’s HTML Forms guide.
HTML5 RECAP
<input type="email">
Most of the new input types come with awesome custom keyboards for iOS and more recent Android (amongst others). Some browser versions offer basic validation in but inconsistent implementations mean they can’t be relied upon.The key new inputs types are “email”,
HTML5 RECAP
<input type="tel">
“tel”,
HTML5 RECAP
<input type="url">
“url” and
<input type="number">
HTML5 RECAP
“number”.
<input type="datetime">
HTML5 RECAP
Unfortunately “datetime”, “time” and “year” input types have been marked as “in danger” by the W3C due to poor and inconsistent browser support. There’s something of a chicken and egg situation when it comes to standards and browser support. Unfortunately for the HTML5 date and time inputs,
the chicken looks something like this.
<input required>
HTML5 RECAP
Important attributes to note are “required”,
<input placeholder= "dd/mm/yyyy">
HTML5 RECAP
“placeholder”
PATTERNS
<input pattern="/d*">
and, of course, “pattern”. Patterns allow you to provide a regular expression string primarily for input validation. A numeric regulation expression like the this one can trigger a numeric keyboard. You’ll often see the above pattern as “[0-9]*” because most people don’t know how to write regular expressions to save themselves.
NOW YOU HAVE TWO PROBLEMS
http://evolt.org/node/36435I can’t remember the origin so I’ll call it an old saying: “You have a problem which can be solved with a regular expression. Now you have two problems.”I won’t pretend regular expressions are the easiest to write however a good guide the one on Evolt will certainly make life easier for you.
PATTERNS
/\d*/.test("12345");>>> true
I find the JS console (in a browser near you) are a handy way to test out regular expressions. This one matches any numeric value
/\d*/.test("Once I caught a fish alive");>>> false
PATTERNS
And correctly returns false to a string.
POSTCODE
/\d{4}/.test("605");>>> false
Australian postcodes are easily matched with this pattern - exactly four digits - returning false here for “605”
/\d{4}/.test("6050");>>> true
POSTCODE
and true here for a correct value.
YEAR
/\d{4}|\d{2}/.test("13");>>> true
Getting slightly more complex with a pipe which acts as an “or” switch, we can match either two or four digit years.
TIME
/(([0-‐1]\d)|(2[0-‐3])):[0-‐5]\d/.test("08:37");>>> true
Square brackets specify a range of values, round brackets group sub-expressions together: here matching a twenty hour time value.
/(([0-‐1]?\d)|(2[0-‐3]))(:|\.)[0-‐5]\d/.test("8.37");>>> true
TIME
With a slight modification, we’re now matching a wider range of natural inputs.
DATE
/\d{2}\/\d{2}\/\d{4}/.test("24/10/2013");>>> true
Dates seem fairly straight-forward
/\d{2}\/\d{2}\/\d{4}/.test("99/99/9999");>>> true
DATE
but are a lot more complex than they seem.
/(?=\d)(?:(?:31(?!.(?:0?[2469]|11))|(?:30|29)(?!.0?2)|29(?=.0?2.(?:(?:(?:1[6-‐9]|[2-‐9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00)))(?:\x20|$))|(?:2[0-‐8]|1\d|0?[1-‐9]))([-‐\/])(?:1[012]|0?[1-‐9])\1(?:1[6-‐9]|[2-‐9]\d)?\d\d(?:(?=\x20\d)\x20|$))?/.test("99/99/9999");>>> false
DATE
A bit more work and we end up with this.
At this point I can understand if you’re a little shocked, so I’ll pull it back a little.
/\w+@\w+\.\w+/.test("[email protected]")>>> true
Email addresses are often something of a mine field but, on the surface, the pattern is fairly simple: “[email protected]”
/\w+@\w+\.\w+/.test("cli+ner@ server.com.au");>>> true
Already we’re matching better than most email validation rules out there.
/\w+@\w+\.\w+/.test("[email protected]");>>> true
Except the simplicity soon gets us unstuck when we get false positives. Given how scary date validation was, it’s tempting to see what Google has to offer
/(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\]\000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(?:(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-‐\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*))*)?;\s*)/.test("[email protected]");>>> true
http://www.codinghorror.com/blog/2005/02/regex-use-vs-regex-abuse.html
And here is 6.2KB of RFC standard-compliant email regular expression.
Which brings me to a major problem when regular expressions.
Unfortunately, a fair number of programmers out in the wild haven’t quite worked out regular expressions yet so you need to be really careful about what you’re write.
/([\w!#%&$'*+\-‐=?^_`{|}~]+[\w!#%&$'*+\-‐=?^_`{|}~.]+[\w!#%&$'*+\-‐=?^_`{|}~]+|[\w!#%&$'*+\-‐=?^_`{|}~]{1,2})@([A-‐Za-‐z0-‐9-‐]+\.)+[A-‐Za-‐z0-‐9-‐]+/.test("[email protected]");>>> false
For the record, here’s my RFC-compliant email regular expression.
SECURITY
As a whole, password validation is awful.
SECURITY
Over-validation is a great way to make people hate you
SECURITY
because, as much as you might think otherwise, you don’t know as much about them as you think you do.
SECURITY
But then there are people like, presumably, Bob.
And here be dragyns. In the last couple of years, millions of passwords - hashed and otherwise - have been grabbed from various online services which means that everything we though we knew about password validation is, in essence, wrong. This also means that enforced password validation is probably wrong too. So, please don’t. Invest your energy into two factor authentication or something else that’s actually secure rather than incurring the wrath of the users who do know what they’re doing.
"4123 5678 9012 3456".replace(/(\s)/g, "");>>> "4123567890123456"
CREDIT CARDS
While I’ve shown how awesome regular expressions are for pattern validation, they can also manipulate data. Right here is everything you need to strip spaces from a credit card number so I can enter it with spaces as it is on my card. You no longer have any excuses.
ERROR MESSAGES
<input pattern="\d{4}" title="Please enter a valid postcode e.g. 3000">
Rather crucial when doing pattern validation is presenting a meaningful error message to users. In HTML5, browsers are abusing the title attribute to this end. You can also set the .setCustomValidity() DOM property but title will work for most instances.
ERROR MESSAGES
The results aren’t perfect, but they’re a whole lot better than server validation.
In summary, are regular expressions better than kittens? I think the answer is definitively yes.
@cliener
http://www.slideshare.net/cliener/
FURTHER READING
HTML5 Rocks: Constraint Validation: Native Client Side Validation for Web Formshttp://www.html5rocks.com/en/tutorials/forms/constraintvalidation/