an introduction to regular expressions
DESCRIPTION
An Introduction to Regular expressionsTRANSCRIPT
![Page 1: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/1.jpg)
![Page 2: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/2.jpg)
And are they contagious?
![Page 3: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/3.jpg)
There is no official standard for
regular expressions, so no real
definition.
Simply put, you can call it a
text pattern to search and/or
replace text.
Easy peasy!
![Page 4: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/4.jpg)
Perl programming language
Perl-compatible
.NET
Java
JavaScript
… What, no cherry flavour?
![Page 5: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/5.jpg)
Back to grammar school!
![Page 6: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/6.jpg)
a matches any occurrence of that character Jack is a boy. cat matches About cats and dogs.
![Page 7: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/7.jpg)
square bracket [ backslash \ caret ^ dollar sign $ period or dot . vertical bar or pipe symbol | question mark ? asterisk or star * plus sign + opening round bracket ( closing round bracket ) opening curley bracket {
![Page 8: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/8.jpg)
Special characters are reserved for special use. They need to be preceded by a backslash if you want to match them as literal characters. This is called escaping. If you want to match 1+1=2 the correct regex is 1\+1=2
![Page 9: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/9.jpg)
tab \t carriage return \r line feed \n beginning of line ^ end of line $ word boundary \b
![Page 10: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/10.jpg)
If regular expressions are Unicode enabled you can search any character using the Unicode value. Depending on syntax: \u0000 or \x{0000} Hard space \u00A0 or \x{00A0} ® sign \u00AE or \x{00AE} ...
![Page 11: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/11.jpg)
Quantifiers allow you to specify the number of occurrences to match against X? X, once or not at all X* X, zero or more times X+ X, one or more times X{n} X, exactly n times X{n,} X, at least n times X{n,m} X, at least n but not more than m times
![Page 12: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/12.jpg)
The regex colou?r matches both colour and color. You can also group items together by using brackets: Nov(ember)? will match Nov and November The regex a+ is the same as a{1,} and matches a or aaaaa The regex w{3} matches www.qa-distiller.com
![Page 13: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/13.jpg)
Simply place the characters you want to match between square brackets. If you want to match an a or an e, use [ae]. You could use this in gr[ae]y to match either gray or grey. A character class matches only a single character, the order is not important You can also use ranges. [0-9] matches a single digit between 0 and 9
![Page 14: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/14.jpg)
Typing a caret after the opening square bracket will negate the character class. q[^u] means: "a q followed by a character that is not a u". It will match the q and the space after the q in Iraq is a political quagmire. but not the q of quagmire because it is followed by the letter u
![Page 15: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/15.jpg)
\d digit [0-9] \w word character [A-Za-z0-9_ ] \s whitespace [ \t\r\n] Negated versions \D not a digit [^\d] \W not a word character [^\w] \S not a whitespace [^\s]
![Page 16: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/16.jpg)
The dot matches a single character, without caring what that character is. The regex e. matches Houston, we have a problem
![Page 17: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/17.jpg)
If you want to search for cat or dog, separate both options with a vertical bar or pipe symbol: cat|dog matches Are you sure you want a cat? You can add more options like this: green|black|yellow|white
![Page 18: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/18.jpg)
Which of the following completely matches regex a(ab)*a 1) abababa 2) aaba 3) aabbaa 4) aba 5) aabababa
![Page 19: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/19.jpg)
Which of the following completely matches regex ab+c? 1) abc 2) ac 3) abbb 4) bbc 5) abbcc
![Page 20: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/20.jpg)
Which of the following completely matches regex a.[bc]+ 1) abc 2) abbbbbbbb 3) azc 4) abcbcbcbc 5) ac 6) asccbbbbcbcccc
![Page 21: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/21.jpg)
Which of the following completely matches regex (very )+(fat )?(tall|ugly) man 1) very fat man 2) fat tall man 3) very very fat ugly man 4) very very very tall man
![Page 22: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/22.jpg)
Still awake?
![Page 23: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/23.jpg)
Positive lookahead: X(?=X) Match something that is followed by something Yamagata(?= Europe) matches Yamagata Europe, Yamagata Intech Solutions Negative lookahead: X(?!X) Match something that is not followed by something Yamagata(?! Europe) matches Yamagata Europe, Yamagata Intech Solutions
![Page 24: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/24.jpg)
Positive lookbehind: (?<=X)X Match something following something (?<=a)b matches thingamabob Negative lookbehind: (?<!X)X Match something not following something (?<!a)b matches thingamabob
![Page 25: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/25.jpg)
Round brackets create a backreference. You can use the backreference with a backslash + the number of the backreference. The regex Java(script) is a \1ing language matches Javascript is a scripting language The regex (Java)(script) is a \2ing language that is not the same as \1 matches Javascript is a scripting language that is not the same as Java
![Page 26: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/26.jpg)
Use the regex \b(\w+) \1\b to find doubled words. Ze streelde haar haar in in de auto. With exceptions: \b(?!haar\b)(\w+) \1\b Ze streelde haar haar in in de auto.
![Page 27: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/27.jpg)
You want to add brackets around step numbers: This is step 5 from chapter 1. Continue with step 45 from page 15. Use the regex ([sS]tep) (\d+) to find all instances. Replace it by \1 (\2) Or alternatively (?<=[sS]tep )\d+ by (\0)
![Page 28: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/28.jpg)
Powerful, for individual text-based files
More powerful, batch operations, command line
No back references
RegEx Text File Filter
RegEx search
Very limited
Powerful, called GREP
![Page 29: An Introduction to Regular expressions](https://reader033.vdocuments.mx/reader033/viewer/2022052315/55658e4bd8b42a2b6d8b4f05/html5/thumbnails/29.jpg)
Some people, when confronted with a problem, think "I know, I'll use regular expressions.“ Now they have two problems. -> Do not try to do everything in one uber-regex -> Regular expressions are not parsers