advanced string handling
DESCRIPTION
Things we might want to do Finding patterns using regular expressions Manipulating Strings Splitting Processing Tokens Basic methods to review Substring(), charAt(), indexOf(), toLowerCase(), startsWith(), endsWith(), firstIndexOf(), lastIndexOf(), trim(), length(). Advanced String handling. - PowerPoint PPT PresentationTRANSCRIPT
Things we might want to do Finding patterns using regular expressions Manipulating Strings Splitting Processing Tokens
Basic methods to review Substring(), charAt(), indexOf(),
toLowerCase(), startsWith(), endsWith(), firstIndexOf(), lastIndexOf(), trim(), length()
Regular expressions is a syntax for pattern matching used by many programming languages
Examples of regular expression syntax: [aceF] matches any of the letters enclosed in [ ] * matches zero or more occurrences of a pattern + matches one or more occurrences of a pattern \s matches whitespace
String methods that use regular expressions include matches(), split(), replaceAll()
More concrete examples are on the following slides
Note: This page is a brief overview; regular expression syntax has much more in it
Problem: A String is an immutable object Bad solution (1453 milliseconds on my computer):
Repeatedly create a new string from an old one String str; for (int i=0; i<10000; i++) str += “abcdef”;
Better solution (0 milliseconds on my computer): Use StringBuilder, a mutable string class
StringBuilder build = new StringBuilder(10000);For (int i=0; i<10000; i++) { build.append(“abcdef”); }String str = build.toString();
Extracting information from a string Example:
String date = "January 23, 1923 10:32:15 pm"; String[] data = date.split("[, :]+"); for (int i=0; i<data.length; i++) System.out.println(data[i]);
The argument to split: determines how the date is converted to an array Characters enclosed between [ and ] are delimiters
Space, comma and colon Split when one or more (+) delimiters are found in
a row
OutputJanuary231923103215pm
Definition: A token is a group of characters treated as a unit
String Tokenizer classString expression = "((3.5 + 52)/234 + 75.2*83.9 - 9.0";// The next line removes white spaceexpression = expression.replaceAll("\\s+",""); StringTokenizer tokenizer = new StringTokenizer(expression, "()+-/*", true);
while (tokenizer.hasMoreTokens()){ System.out.println(tokenizer.nextToken()); }
Output((3.5 +52)/234 +75.2*83.9 -9.0
Definition: white space includes space, tab, new line characters