an introduction to regular expressions

Post on 27-May-2015

1.137 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

An Introduction to Regular expressions

TRANSCRIPT

And are they contagious?

There is no official standard for

regular expressions, so no real

definition.

Simply put, you can call it a

text pattern to search and/or

replace text.

Easy peasy!

Perl programming language

Perl-compatible

.NET

Java

JavaScript

… What, no cherry flavour?

Back to grammar school!

a matches any occurrence of that character Jack is a boy. cat matches About cats and dogs.

square bracket [ backslash \ caret ^ dollar sign $ period or dot . vertical bar or pipe symbol | question mark ? asterisk or star * plus sign + opening round bracket ( closing round bracket ) opening curley bracket {

Special characters are reserved for special use. They need to be preceded by a backslash if you want to match them as literal characters. This is called escaping. If you want to match 1+1=2 the correct regex is 1\+1=2

tab \t carriage return \r line feed \n beginning of line ^ end of line $ word boundary \b

If regular expressions are Unicode enabled you can search any character using the Unicode value. Depending on syntax: \u0000 or \x{0000} Hard space \u00A0 or \x{00A0} ® sign \u00AE or \x{00AE} ...

Quantifiers allow you to specify the number of occurrences to match against X? X, once or not at all X* X, zero or more times X+ X, one or more times X{n} X, exactly n times X{n,} X, at least n times X{n,m} X, at least n but not more than m times

The regex colou?r matches both colour and color. You can also group items together by using brackets: Nov(ember)? will match Nov and November The regex a+ is the same as a{1,} and matches a or aaaaa The regex w{3} matches www.qa-distiller.com

Simply place the characters you want to match between square brackets. If you want to match an a or an e, use [ae]. You could use this in gr[ae]y to match either gray or grey. A character class matches only a single character, the order is not important You can also use ranges. [0-9] matches a single digit between 0 and 9

Typing a caret after the opening square bracket will negate the character class. q[^u] means: "a q followed by a character that is not a u". It will match the q and the space after the q in Iraq is a political quagmire. but not the q of quagmire because it is followed by the letter u

\d digit [0-9] \w word character [A-Za-z0-9_ ] \s whitespace [ \t\r\n] Negated versions \D not a digit [^\d] \W not a word character [^\w] \S not a whitespace [^\s]

The dot matches a single character, without caring what that character is. The regex e. matches Houston, we have a problem

If you want to search for cat or dog, separate both options with a vertical bar or pipe symbol: cat|dog matches Are you sure you want a cat? You can add more options like this: green|black|yellow|white

Which of the following completely matches regex a(ab)*a 1) abababa 2) aaba 3) aabbaa 4) aba 5) aabababa

Which of the following completely matches regex ab+c? 1) abc 2) ac 3) abbb 4) bbc 5) abbcc

Which of the following completely matches regex a.[bc]+ 1) abc 2) abbbbbbbb 3) azc 4) abcbcbcbc 5) ac 6) asccbbbbcbcccc

Which of the following completely matches regex (very )+(fat )?(tall|ugly) man 1) very fat man 2) fat tall man 3) very very fat ugly man 4) very very very tall man

Still awake?

Positive lookahead: X(?=X) Match something that is followed by something Yamagata(?= Europe) matches Yamagata Europe, Yamagata Intech Solutions Negative lookahead: X(?!X) Match something that is not followed by something Yamagata(?! Europe) matches Yamagata Europe, Yamagata Intech Solutions

Positive lookbehind: (?<=X)X Match something following something (?<=a)b matches thingamabob Negative lookbehind: (?<!X)X Match something not following something (?<!a)b matches thingamabob

Round brackets create a backreference. You can use the backreference with a backslash + the number of the backreference. The regex Java(script) is a \1ing language matches Javascript is a scripting language The regex (Java)(script) is a \2ing language that is not the same as \1 matches Javascript is a scripting language that is not the same as Java

Use the regex \b(\w+) \1\b to find doubled words. Ze streelde haar haar in in de auto. With exceptions: \b(?!haar\b)(\w+) \1\b Ze streelde haar haar in in de auto.

You want to add brackets around step numbers: This is step 5 from chapter 1. Continue with step 45 from page 15. Use the regex ([sS]tep) (\d+) to find all instances. Replace it by \1 (\2) Or alternatively (?<=[sS]tep )\d+ by (\0)

Powerful, for individual text-based files

More powerful, batch operations, command line

No back references

RegEx Text File Filter

RegEx search

Very limited

Powerful, called GREP

Some people, when confronted with a problem, think "I know, I'll use regular expressions.“ Now they have two problems. -> Do not try to do everything in one uber-regex -> Regular expressions are not parsers

top related