regular expressions: javascript and beyond

93

Upload: max-shirshin

Post on 03-Jul-2015

666 views

Category:

Technology


2 download

DESCRIPTION

Regular Expressions is a powerful tool for text and data processing. What kind of support do browsers provide for that? What are those little misconceptions that prevent people from using RE effectively? The talk gives an overview of the regular expression syntax and typical usage examples.

TRANSCRIPT

Page 1: Regular Expressions: JavaScript And Beyond
Page 2: Regular Expressions: JavaScript And Beyond

Regular Expressions:JavaScript And Beyond

Max ShirshinFrontend Team Lead

deltamethod

Page 3: Regular Expressions: JavaScript And Beyond

Introduction

Page 4: Regular Expressions: JavaScript And Beyond

Types of regular expressions• POSIX (BRE, ERE)

• PCRE = Perl-Compatible Regular Expressions

4

From the JavaScript language specification:

"The form and functionality of regular expressions is modelled after the regular expression facility in the Perl 5 programming language".

Page 5: Regular Expressions: JavaScript And Beyond

5

JS syntax (overview only)

var re = /^foo/;

       

Page 6: Regular Expressions: JavaScript And Beyond

6

JS syntax (overview only)

var re = /^foo/;

// booleanre.test('string');    

Page 7: Regular Expressions: JavaScript And Beyond

7

JS syntax (overview only)

var re = /^foo/;

// booleanre.test('string'); // null or Arrayre.exec('string');

Page 8: Regular Expressions: JavaScript And Beyond

8

Regular expressions consist of...

● Tokens— common characters— special characters (metacharacters)

● Operations— quantification— enumeration— grouping

Page 9: Regular Expressions: JavaScript And Beyond

Tokens and metacharacters

Page 10: Regular Expressions: JavaScript And Beyond

/./.test('foo'); // true

/./.test('\r\n') // false

        10

Any character

Page 11: Regular Expressions: JavaScript And Beyond

/./.test('foo'); // true

/./.test('\r\n') // false

What do you need instead:

/[\s\S]/ for JavaScript or/./s (works in Perl/PCRE, not in JS)

11

Any character

Page 12: Regular Expressions: JavaScript And Beyond

>>> /^something$/.test('something')true

   

 

 

12

String boundaries

Page 13: Regular Expressions: JavaScript And Beyond

>>> /^something$/.test('something')true

>>> /^something$/.test('something\nbad')false

 

 

13

String boundaries

Page 14: Regular Expressions: JavaScript And Beyond

>>> /^something$/.test('something')true

>>> /^something$/.test('something\nbad')false

>>> /^something$/m.test('something\nbad')true

14

String boundaries

Page 15: Regular Expressions: JavaScript And Beyond

>>> /\ba/.test('alabama)true   

   

   

15

Word boundaries

Page 16: Regular Expressions: JavaScript And Beyond

>>> /\ba/.test('alabama)true>>> /a\b/.test('alabama')true

   

   

16

Word boundaries

Page 17: Regular Expressions: JavaScript And Beyond

>>> /\ba/.test('alabama)true>>> /a\b/.test('alabama')true

>>> /a\b/.test('naïve')true

   

17

Word boundaries

Page 18: Regular Expressions: JavaScript And Beyond

>>> /\ba/.test('alabama)true>>> /a\b/.test('alabama')true

>>> /a\b/.test('naïve')true

not a word boundary/\Ba/.test('alabama');

18

Word boundaries

Page 19: Regular Expressions: JavaScript And Beyond

Character classes

Page 20: Regular Expressions: JavaScript And Beyond

/\s/ (inverted version: /\S/)

   

   

     

20

Whitespace

Page 21: Regular Expressions: JavaScript And Beyond

/\s/ (inverted version: /\S/)

FF:\t \n \v \f \r \u0020 \u00a0 \u1680 \u180e \u2000 \u2001 \u2002 \u2003 \u2004 \u2005 \u2006 \u2007 \u2008 \u2009 \u200a\ u2028 \u2029\ u202f \u205f \u3000

Chrome, IE 9:as in FF plus \ufeff

IE 7, 8 :-(only:\t \n \v \f \r \u0020

21

Whitespace

Page 22: Regular Expressions: JavaScript And Beyond

/\d/ ~ digits from 0 to 9

/\w/ ~ Latin letters, digits, underscoreDoes not work for Cyrillic, Greek etc.

Inverted forms:/\D/ ~ anything but digits/\W/ ~ anything but alphanumeric characters

22

Alphanumeric characters

Page 23: Regular Expressions: JavaScript And Beyond

Example:/[abc123]/          

23

Custom character classes

Page 24: Regular Expressions: JavaScript And Beyond

Example:/[abc123]/ Metacharacters and ranges supported:/[A-F\d]/      

24

Custom character classes

Page 25: Regular Expressions: JavaScript And Beyond

Example:/[abc123]/ Metacharacters and ranges supported:/[A-F\d]/ More than one range is okay:/[a-cG-M0-7]/  

25

Custom character classes

Page 26: Regular Expressions: JavaScript And Beyond

Example:/[abc123]/ Metacharacters and ranges supported:/[A-F\d]/ More than one range is okay:/[a-cG-M0-7]/ IMPORTANT: ranges come from Unicode, not from national alphabets!

26

Custom character classes

Page 27: Regular Expressions: JavaScript And Beyond

"dot" means just dot!/[.]/.test('anything') // false

   

27

Custom character classes

Page 28: Regular Expressions: JavaScript And Beyond

"dot" means just dot!/[.]/.test('anything') // false

adding \ ] -/[\\\]-]/

28

Custom character classes

Page 29: Regular Expressions: JavaScript And Beyond

anything except a, b, c:/[^abc]/ ^ as a character:/[abc^]/

29

Inverted character classes

Page 30: Regular Expressions: JavaScript And Beyond

/[^]/matches ANY character;

a nice alternative to /[\s\S]/

30

Inverted character classes

Page 31: Regular Expressions: JavaScript And Beyond

/[^]/matches ANY character;could bea nice alternative to /[\s\S]/

31

Inverted character classes

Page 32: Regular Expressions: JavaScript And Beyond

/[^]/matches ANY character;could bea nice alternative to /[\s\S]/

Chrome, FF:>>> /([^])/.exec('a');['a', 'a']

32

Inverted character classes

Page 33: Regular Expressions: JavaScript And Beyond

/[^]/matches ANY character;could bea nice alternative to /[\s\S]/

IE:>>> /([^])/.exec('a');['a', '']

33

Inverted character classes

Page 34: Regular Expressions: JavaScript And Beyond

/[^]/matches ANY character;could bea nice alternative to /[\s\S]/

IE:>>> /([\s\S])/.exec('a');['a', 'a']

34

Inverted character classes

Page 35: Regular Expressions: JavaScript And Beyond

Quantifiers

Page 36: Regular Expressions: JavaScript And Beyond

/bo*/.test('b') // true

   

36

Zero or more, one or more

Page 37: Regular Expressions: JavaScript And Beyond

/bo*/.test('b') // true

/.*/.test('') // true  

37

Zero or more, one or more

Page 38: Regular Expressions: JavaScript And Beyond

/bo*/.test('b') // true

/.*/.test('') // true /bo+/.test('b') // false

38

Zero or more, one or more

Page 39: Regular Expressions: JavaScript And Beyond

/colou?r/.test('color');/colou?r/.test('colour');

39

Zero or one

Page 40: Regular Expressions: JavaScript And Beyond

40

How many?

/bo{7}/ exactly 7

       

Page 41: Regular Expressions: JavaScript And Beyond

41

How many?

/bo{7}/ exactly 7

/bo{2,5}/ from 2 to 5, x < y      

Page 42: Regular Expressions: JavaScript And Beyond

42

How many?

/bo{7}/ exactly 7

/bo{2,5}/ from 2 to 5, x < y /bo{5,}/ 5 or more    

Page 43: Regular Expressions: JavaScript And Beyond

43

How many?

/bo{7}/ exactly 7

/bo{2,5}/ from 2 to 5, x < y /bo{5,}/ 5 or more This does not work in JS:/b{,5}/.test('bbbbb')

Page 44: Regular Expressions: JavaScript And Beyond

var r = /a+/.exec('aaaaa');    

44

Greedy quantifiers

Page 45: Regular Expressions: JavaScript And Beyond

var r = /a+/.exec('aaaaa'); >>> r[0] 

45

Greedy quantifiers

Page 46: Regular Expressions: JavaScript And Beyond

var r = /a+/.exec('aaaaa'); >>> r[0]"aaaaa"

46

Greedy quantifiers

Page 47: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');         

47

Lazy quantifiers

Page 48: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');>>> r[0]       

48

Lazy quantifiers

Page 49: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');>>> r[0]"a"      

49

Lazy quantifiers

Page 50: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');>>> r[0]"a" r = /a*?/.exec('aaaaa');   

50

Lazy quantifiers

Page 51: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');>>> r[0]"a" r = /a*?/.exec('aaaaa');>>> r[0] 

51

Lazy quantifiers

Page 52: Regular Expressions: JavaScript And Beyond

var r = /a+?/.exec('aaaaa');>>> r[0]"a" r = /a*?/.exec('aaaaa');>>> r[0]""

52

Lazy quantifiers

Page 53: Regular Expressions: JavaScript And Beyond

Groups

Page 54: Regular Expressions: JavaScript And Beyond

capturing/(boo)/.test("boo");

   

54

Groups

Page 55: Regular Expressions: JavaScript And Beyond

capturing/(boo)/.test("boo");

non-capturing/(?:boo)/.test("boo");

55

Groups

Page 56: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');         

       

56

Grouping and the RegExp constructor

Page 57: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo"     

       

57

Grouping and the RegExp constructor

Page 58: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b" 

       

58

Grouping and the RegExp constructor

Page 59: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b">>> RegExp.$9""       

59

Grouping and the RegExp constructor

Page 60: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b">>> RegExp.$9"">>> RegExp.$10undefined   

60

Grouping and the RegExp constructor

Page 61: Regular Expressions: JavaScript And Beyond

var result = /(bo)o+(b)/.exec('the booooob');>>> RegExp.$1"bo">>> RegExp.$2"b">>> RegExp.$9"">>> RegExp.$10undefined>>> RegExp.$0undefined

61

Grouping and the RegExp constructor

Page 62: Regular Expressions: JavaScript And Beyond

/((foo) (b(a)r))/

 

     

62

Numbering of capturing groups

Page 63: Regular Expressions: JavaScript And Beyond

/((foo) (b(a)r))/

$1 ( ) foo bar      

63

Numbering of capturing groups

Page 64: Regular Expressions: JavaScript And Beyond

/((foo) (b(a)r))/

$1 ( ) foo bar $2 ( ) foo   

64

Numbering of capturing groups

Page 65: Regular Expressions: JavaScript And Beyond

/((foo) (b(a)r))/

$1 ( ) foo bar $2 ( ) foo$3 ( ) bar 

65

Numbering of capturing groups

Page 66: Regular Expressions: JavaScript And Beyond

/((foo) (b(a)r))/

$1 ( ) foo bar $2 ( ) foo$3 ( ) bar$4 ( ) a

66

Numbering of capturing groups

Page 67: Regular Expressions: JavaScript And Beyond

var r = /best(?= match)/.exec('best match');

   

       

67

Lookahead

Page 68: Regular Expressions: JavaScript And Beyond

var r = /best(?= match)/.exec('best match');

>>> !!rtrue

       

68

Lookahead

Page 69: Regular Expressions: JavaScript And Beyond

var r = /best(?= match)/.exec('best match');

>>> !!rtrue

>>> r[0]"best"    

69

Lookahead

Page 70: Regular Expressions: JavaScript And Beyond

var r = /best(?= match)/.exec('best match');

>>> !!rtrue

>>> r[0]"best" >>> /best(?! match)/.test('best match')false

70

Lookahead

Page 71: Regular Expressions: JavaScript And Beyond

NOT supported in JavaScript at all

/(?<=text)match/positive lookbehind

/(?<!text)match/negative lookbehind

71

Lookbehind

Page 72: Regular Expressions: JavaScript And Beyond

Enumerations

Page 73: Regular Expressions: JavaScript And Beyond

/red|green|blue light//(red|green|blue) light/ >>> /var a(;|$)/.test('var a')true

73

Logical "or"

Page 74: Regular Expressions: JavaScript And Beyond

true/(red|green) apple is \1/.test('red apple is red')

true/(red|green) apple is \1/.test('green apple is green')

74

Backreferences

Page 75: Regular Expressions: JavaScript And Beyond

Alternative character represenations

Page 76: Regular Expressions: JavaScript And Beyond

\x09 === \t (not Unicode but ASCII/ANSI)\u20AC === € (in Unicode)

 

   

   

76

Representing a character

Page 77: Regular Expressions: JavaScript And Beyond

\x09 === \t (not Unicode but ASCII/ANSI)\u20AC === € (in Unicode)

backslash takes away special character meaning:

/\(\)/.test('()') // true/\\n/.test('\\n') // true

   

77

Representing a character

Page 78: Regular Expressions: JavaScript And Beyond

\x09 === \t (not Unicode but ASCII/ANSI)\u20AC === € (in Unicode)

backslash takes away special character meaning:

/\(\)/.test('()') // true/\\n/.test('\\n') // true

...or vice versa!/\f/.test('f') // false!

78

Representing a character

Page 79: Regular Expressions: JavaScript And Beyond

Flags

Page 80: Regular Expressions: JavaScript And Beyond

g i m s x y      

     

80

Regular expression flags

Page 81: Regular Expressions: JavaScript And Beyond

g i m s x y global match   

     

81

Regular expression flags

Page 82: Regular Expressions: JavaScript And Beyond

g i m s x y global matchignore case 

     

82

Regular expression flags

Page 83: Regular Expressions: JavaScript And Beyond

g i m s x y global matchignore casemultiline matching for ^ and $

     

83

Regular expression flags

Page 84: Regular Expressions: JavaScript And Beyond

g i m s x y global matchignore casemultiline matching for ^ and $

JavaScript does NOT provide support for:string as single lineextend pattern

84

Regular expression flags

Page 85: Regular Expressions: JavaScript And Beyond

g i m s x y global matchignore casemultiline matching for ^ and $

Mozilla-only, non-standard:stickyMatch only from the .lastIndex index (a regexp instance property). Thus, ^ can match at a predefined position.

85

Regular expression flags

Page 86: Regular Expressions: JavaScript And Beyond

/(?i)foo//(?i-m)bar$//(?i-sm).x$//(?i)foo(?-i)bar/ Some implementations do NOT support flag switching on-the-go.

In JS, flags are set for the whole regexp instance and you can't change them.

86

Alternative syntax for flags

Page 87: Regular Expressions: JavaScript And Beyond

RegExp in JavaScript

Page 88: Regular Expressions: JavaScript And Beyond

RegExp instances: /regexp/.exec('string') null or array ['whole match', $1, $2, ...] /regexp/.test('string') false or true String instances: 'str'.match(/regexp/) 'str'.match('\\w{1,3}') - same as /regexp/.exec if no 'g' flag used; - array of all matches if 'g' flag used (internal capturing groups ignored) 'str'.search(/regexp/) 'str'.search('\\w{1,3}') first match index, or -1

88

Methods

Page 89: Regular Expressions: JavaScript And Beyond

String instances:'str'.replace(/old/, 'new'); WARNING: special magic supported in the replacement string: $$ inserts a dollar sign "$" $& substring that matches the regexp $` substring before $& $' substring after $& $1, $2, $3 etc.: string that matches n-th capturing group 'str'.replace(/(r)(e)gexp/g, function(matched, $1, $2, offset, sourceString) { // what should replace the matched part on this iteration? return 'replacement';});

89

Methods

Page 90: Regular Expressions: JavaScript And Beyond

// BAD CODEvar re = new RegExp('^' + userInput + '$');// ...var userInput = '[abc]'; // oops!

// GOOD, DO IT AT HOMERegExp.escape = function(text) { return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");}; var re = new RegExp('^' + RegExp.escape(userInput) + '$');

90

RegExp injection

Page 91: Regular Expressions: JavaScript And Beyond

Recommended reading

Page 92: Regular Expressions: JavaScript And Beyond

Online, just google it:MDN Guide on Regular Expressions

Mastering Regular ExpressionsO'Reilly Media

The Book:

Page 93: Regular Expressions: JavaScript And Beyond

Thank you!