perl regular expressions: string matching. for this lecture, we focus on string matching using a if...

35
Perl regular expressions: string matching

Post on 22-Dec-2015

233 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Perl

regular expressions:string matching

Page 2: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

string matching

• For this lecture, we focus on string matching using a if statement

• The format—if ($str =~ /pattern to match/) # true when

match—if ($str !~ /patch to match/) #true when no

match—the same as—if ($str =~ m/pattern to match/) # true when

match—if ($str !~ m/patch to match/) #true when no

match

Page 3: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

simple matching

• match a string or string variable• if($str =~ /dog/)

—true if $str contains dog

• If the $str and =~ or !~ is left off, then it uses $_ for matching

Page 4: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

case insensitive matching

• /i ignore case

• if ($str =~ /dog/i)—true if $str contains dog. The match is case

insensitive.—if ($str =~ /DOG/i) #same

Page 5: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

alternation matching

• | allows matching with an or• if ($str =~ /Fred|Wilma|Pebbles/)

—True if contains Fred, Wilma, or Pebbles

• if ($str =~/Fred|Wilma|Pebbles Flintstone/)—matches Fred, Wilma, or Pebbles Flintstone

• Grouping• if ($str =~/(Fred|Wilma|Pebbles)

Flintstone/)—matches Fred Flintstone, Wilma Flintstone, or

Pebbles Flintstone

• if ($str =~/(Blue|Song)bird/)—matches Bluebird or Songbird

Page 6: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

alternation matching (2)

• if ($str =~/th(is|at)/)—true if $str contains this or that

• if ($str =~ /(p|g|m|s|b)et/) —true if $str contains: pet, get, met, set, or bet

Page 7: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Single character matching

• Use []• if($str =~ /[abc]/)

—true if $str contains a and/or b and/or c

• if ($str =~ /[pgmsb]et/)—true if $str contains for pet, get, met, set or bet

• if($str =~/[Fred]/)—true if $str contains F and/or r and/or e and/or d

• Not listed characters ^ character• if($str =~/[^abc]/)

—true if $str does not contain a and b and c

• if($str =~/[a-z]/)—true if $str contains any lower case letter a

through z

Page 8: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Single character or'd matching (2)

• if ($str =~/[0-9]/)—true if $str contains any number 0 through 9

• if ($str =~/[0-9\-]/)—matches 0 through 9 or the minus

• if ($str =~/[a-z0-9\^]/)—matches any single lowercase letter or digit or

^

• if ($str =~/[a-zA-Z0-9_]/)—matches any single letter, digit, or underscore

• if ($str =~/[^aeiouAEIOU]/)—matches any non-vowel in $str

• if ($str !~ /[aeiouAEIOU]/)—matches only if there are no vowels in $str

Page 9: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

matching quantifiers

• multiple uses {min,max}• if ($str =~ /a{3}/)

—true if $str contains aaa

• common mistake• if($str =~ /Fred{3}/)

—matches Freddd, not FredFredFred

• if ($str =~/(Fred){3}/)—matches FredFredFred

• if ($str =~/a{3,}/)—matches aaa, aaaa, aaaaa, aaaaaa, etc.

• if ($str =~/a{3,5}b/)—matches aaab, aaaab, aaaaab

Page 10: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

matching quantifiers (2)

• if ($str =~/a{0,5}/)—match a, aa, aaa, aaaa, aaaaa, and if there are no

a's

• if ($str =~/a*/)—* match 0 or more times (max match)

• if ($str =~/a*?/)—* match 0 or more times (min match)

• Difference between min and max matching• $_ ="aaaa"; #matches all three above

—Difference *, matches "aaaa" while *? matches "a"—max matches as many characters as it can—while min, matches as few characters as it can—This becomes important in the next lecture.

Page 11: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

matching quantifiers (3)

• + 1 or more times (max match)• +? 1 or more times (min match)• if ($str =~ /a+/)

—true if there are 1 or more "a"s

• ? match 0 or 1 time (max match)• ?? match 0 or 1 time (min match)• if ($str =~ /a?/)

—true if there 1 a or no "a"s

• Also {3,5}? min match – tries to match only 3 where possible

• and {3,5} max match—tries to match 5 where possible

Page 12: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

matching quantifiers (4)

• if ($str =~ /fo+ba?r/)—matches f, 1 or more o's, b, 0 or 1 a, then an r—match: fobar, foobar, foobr, —Non-match: fbar (missing o), foobaar (to many

a's)

• if ($str =~ /fo*ba?r/)—matches f, 0 or more o's, b, 0 or 1 a, then an r—match: fobar, fbr, fooobr, etc…

• Inside [], matching quantifiers are "normal" characters.

• if ($str =~/[.?!+]*/)—matches zero or more ., ?, !, or +

Page 13: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Exercise 7

• What will the following match?1. /a+[bc]/2. /(a|be)t/i3. /Hi{1,3} There\!?/4. /(Foo)?Bar/i5. /[1-9][1-9][a-z]*/6. /[a-zA-z]+, [A-Z]{2} [0-9]{5}/

• Write an regular expression for these1. Match a social security number (with or

without dashes)2. A street address: number Name with either

St, Ln, Rd or nothing. Also case insensitive

Page 14: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

metasymbols

• . match one character (except newline)• if($str =~ /./)

—Always true, except when $str = ""

• if ($str =~ /d.g/)—true for d and anycharacter and g

– so dog, dbg, dag, dcg, d g, etc.

• if ($str =~ /d.*g/)—true d and 0 or more character and g

– so dg, dog, dasdfg, d g, etc.

• if ($str =~ /d.+g/)—true d and 1 or more character and g

– so NOT dg, but the rest dog, dasdfg, d g, etc.

Page 15: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

metasymbols (2)

• if ($str =~ /d.?g/)—true for d and any single character and g AND

dg

• if ($str =~ /d.{0,1}g/)—true for d and any single character and g AND

dg—same as above

• if ($str =~ /d.{2}g/)—true for d and 2 characters and g

– so doog, dafg, dghg, etc…

• if ($str =~ /d.{2,5}g/)—true for d and 2 to 5 characters and g

– so dooog, doog, dXXXXXg, gXobgg, etc…

Page 16: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

metasymbols (3)

• Anchoring• ^ beginning of the string (only a not in [])• $ end of the string• if ($str =~ /^dog$/)

—true only for "dog", not "ddogg"

• if ($str =~ /^dog/)—true only when the string start with "dog"—so "dog", "doga", etc.

Page 17: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

metasymbols (4)

• if ($str =~ /dog$/)—true when the string ends with "dog"—"dog", "asdfadfdog", "ddddooodog"

• if ($str =~ /^.$/)—true when the string is one character long and

not the newline symbol

• if ($str =~/^[abc]+/)—true when the string start with

– "a", "aa", "aaa", etc with any characters following.– "b", "bb", "bbb", etc with any characters following.– "c", "cc", "ccc", etc with any characters following– As well as any combination of a's, b's, and c's

+ "abcabc", etc.

Page 18: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

metasymbols (5)

• \d match a Digit [0-9]• \D match a Nondigit [^0-9]• \s match whitespace [ \t\n\r\

f]• \S match a Nonwhitespace [^ \t\

n\r\f]• \w match a Word character [a-zA-Z0-

9_]• \W match a Non word Character [^a-

zA-Z0-9_]

Page 19: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Examples• if ($str =~ /\d/) #true when $str contains a digit • if ($str =~ /\d+/) #true when $str contains 1 or

more digit• if ($str =~/\w\d/) #true contains a word character

and 1 digit• if ($str =~/\w+\d/) #true when contains 1 or

more word characters and 1 digit—true "abc1" "a1" "11" "_9" "Z8" and "a1a1"

• if ($str =~/^\s\w\d/)—true when it starts with a whitespace, then a word

character, and then a digit—" 11" "\ta1" "\n11" etc.

• if ($str =~/^\s*\w\d/)—true when it starts with 0 or more whitespaces, then a

word character, and then a digit—" 11" "11" " \t a1" etc

Page 20: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

boundaries assertions

• \b matches at any word boundary—as defined by \w and \W

• \B matches at any non word boundary—as defined by \W and \w

/\bis\b/ #matches "what it is" and "that is it"—can also be writing as /\Wis\W/—won't match "tist"

/\Bis\B/ #matches "thistle" and "artist"—can also be writing as /\wis\w/—won't match "that is it"

Page 21: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

boundaries assertions (2)

/\bis\B/ #matches "istanbul" and "so—isn't that"—similar to /\Wis\w/

– but won't match "istanbul", because "is" is at the front of the string and won't match \W.

—Since \w is [a-zA-Z0-9_], then all punctuation counts as a word boundary.

—So /\bisn\B/ won't match "isn't", because of ' is not a Word character

/\Bis\b/ #matches "this" and "this is for you"—similar to /\wis\W/—For the second example, the match is for

"this", instead of "is".—As in example above \W won't match at the

end of a string.

Page 22: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Exercise 8

• What will the following match?1./a+\w*?/2./\w\s*\w+/3./\bHi\bThere/4./\b\w+\b.+There[!]?$/i5./^\d+[a-z]*/6./\w+,\s\w{2}\s{2}\d{5}/

• Write an regular expression for these1.Rewrite #6 so the city can two or more words.2.Must start with has a letter, then have any

number of letters and/or numbers or none at all, but end with a number

Page 23: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Parentheses as memory

• special variables \1 .. \9• $1 holds the first match inside a ()if ($str =~ /(\d)asdf\1/)

—true when has a digit, then asdf, then the same digit

—examples: 1asdf1, 3asdf3

if($str =~ /(\w+)(\d+)as\2\1/)—true for a word, then digits, as, same digits,

then same word—examples: "hi12as12hi" "1_31as311_"

Page 24: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Parentheses as memory (2)

if ($str =~ /(\d)+asdf\1/)• Note: (\d)+ is different from (\d+)

—(\d+) match max digits, goes into \1—(\d)+ match a digit, but last match goes into \1—examples:—(\d)+ on 123, \1 = 3, but the match is on 123

– So 123asdf3 would match from the top if– In the next lecture, it does some strange things on

substitutions.

Page 25: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Parentheses as memory (3)

• parentheses around parentheses• if ($str =~ /((\w+) (\w+))/)

—\1, \2, \3 are bound to values$str = "Hi There"; \2 = "Hi", \3 = "There", \1="Hi There"

• Perl works from the outer most parentheses to the inner, ( is 1, ((\w+) is 2, the second (\w+) is 3

• (((\w+) )(\w+)) has \1, \2, \3, \4• 12 3 4• \1 = "Hi There", \2 = "Hi ", \3 ="Hi", \4 =

"There"

Page 26: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Variable Interpolation

• Using variables inside in the match• $find = "abc";• if ($str =~/$find/)

—matches when $str contains the value of $find

• $str = "ddogg";• if($str =~ /\w$dog\w/)

—true if $str contains the string in $dog and a word letter in front and behind.

Page 27: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Special Read-Only variables

• We've seen \1 .. \9. There only have a value inside the match. But $1 .. $9 hold they value (same as \1 .. \9) after the match

if ($str =~ /(\d+)asd\1/) {print "matched $1 \n";

}• If $str = "123asd123", then the output

would be matched 123

Page 28: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

capturing matches

• $str = "a xxx c xxxxxc xxx d";• ($a, $b) = ($str =~ m/(.+)x(.+)c/);

—$a = "a xxx c xxx"; Also $1 = "a xxx c xxx";—$b = "x"; Also $2 = "x";

Page 29: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

match as a true value

• / / returns a true/false value• returning the switch structureSWITCH: {

$str =~ /abc/ && do {$a =1; last SWITCH;};$str =~ /def/ and do {$d = 1; last SWITCH;};$c = 1;

}• Strange looking code. Also, this is one of

the very few places a ; is needed after a }—NOTE either && or and could be used.

Page 30: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Commenting your matches

• /x ignore most white space and allows comments

/\w+: #Match a word and a colon ( #Begin group \s+ #match one or more spaces

\w+ #match another word) #end group\s* #match zero or more spaces\d+ #match 1 more digit

/x;• same as/\w+:(\s+\w+)\s*\d+/;

Page 31: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Commenting your matches (2)

• Be careful in comments that you don't use / otherwise perl thinks it is the end of the match

• You have think about where the whitespace is in the match.

• If you need to match a #, use \#

Page 32: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

more flags for pattern matching

• matching with newline in the string• //s let the . match the newline (\n)

—$str = "asdf\n asdf\n";—/(f.)/; no match—/(f.)/s; #$1 = "f\n";

• //m lets ^ and $ match next to embedded \n—$str = "af\nasdf\n";—/(af$)/; # won't match —/(af$)/m; # $1 = "af";—/^(as)/; #won't match—/^(as)/m; # matches, $1= "as";—/(f.)$/ms; # matches only the last "f\n", because

the . matched the \n, so it's "end of line marker".

Page 33: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Pattern Delimiters

• if ($str =~ /\/usr\/local/)—true if $str contains /usr/local

• To avoid backslashing / we can change the delimiter—choose another delimiter, which is a

nonalpanumeric character, such %%, ##, {}, [] , <>, etc

—must use the m in front of the match so perl knows what you want

• if ($str =~ m%/usr/local%)—true if $str contains /usr/local

• if ($str =~ m[/usr/local])—true if $str contains /usr/local, but confusing

since it can be mistaking for [] single character matching.

Page 34: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

Exercise 9

• What will the following match this /\-?(\|)?m\(\d+\)\1/i1. "–|m(12)|"2. "|M(12)|"3. "-|M(12)"4. "m(12)|"5. "M(12)"For /\-?(\|)?m\(\d+\)\1?/i6. "|m(12)|"7. "m(12)"8. "-|m(12)|"9. "-|m(12)"

Page 35: Perl regular expressions: string matching. For this lecture, we focus on string matching using a if statement The format —if ($str =~ /pattern to match/)

QA&