characters and strings. representation of single characters data type char is the data type that...

57
Characters and Strings

Post on 22-Dec-2015

236 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Characters and Strings

Page 2: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Representation of single characters

• Data type char is the data type that represents single characters, such as letters, numerals, and punctuation marks

• A literal value of type char is written as a single character enclosed within single quotation marks

• Examples:‘a’, ‘F’, ‘9’, ‘&’, ‘ ’, ‘,’

Page 3: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Character encoding

• ASCII stands for American Standard Code for Information Interchange.

• ASCII is one of the document coding schemes widely used today. This coding scheme allows different computers to share information easily.

• Most programming languages support ASCII characters

Page 4: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

ASCII Encoding

• ASCII works well for English-language documents because all characters and punctuation marks are included in the ASCII codes.

• ASCII does not represent the full character sets of other languages.

Page 5: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

ASCII Encoding

For example, character 'O' is 79 (row value 70 + col value 9 = 79).

For example, character 'O' is 79 (row value 70 + col value 9 = 79).

O

9

70

Page 6: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Limitations of ASCII

• ASCII uses 8 bits to represent a single character– One bit is reserved for the sign in standard

ASCII– This leaves 27 (128) unique combinations of

bits to represent characters– The extended ASCII set uses all 8 bits to

represent a character, given 256 unique combinations

Page 7: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Unicode Encoding

• The Unicode Worldwide Character Standard (Unicode) supports the interchange, processing, and display of the written texts of diverse languages.

• Java uses the Unicode standard for representing char constants.

• Each Unicode character occupies 16 bits, allowing for the possibility of 216 (65,536) unique bit combinations

• Currently 34,168 distinct characters are defined, covering most of the major world languages

Page 8: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

ASCII/Unicode equivalence

• Unicode uses the same bit combinations for the characters that exist in the ASCII set

• Thus, an English alphabetic character has the same numeric value in both ASCII and Unicode

Page 9: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Special characters

• Several keys on a standard keyboard don’t translate directly into printable (or displayable) characters

• For example, the Enter key moves the cursor to a new line; we already know that the character that corresponds to this action can be represented as ‘\n’

Page 10: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Special characters

• Some other special characters used in Java include:– ‘\t’: horizontal tab character– ‘\a’: alarm “character” – causes system

speaker to beep– ‘\\’: a single backslash

Page 11: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Converting between char and int

char ch1 = 'X';System.out.println(ch1);System.out.println( (int) ch1);

X88

We can convert between a numeric (int) value and its corresponding ASCII character equivalent by using type casting, as the examples below illustrate:

int x = 99;System.out.println(x); // prints 99System.out.println( (char) x); // prints c

Page 12: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Character comparison

• Values of type char can be compared just like integers are compared, since they are actually stored as binary whole numbers

• In the ASCII (and Unicode) set, uppercase letters have lower numeric value than lowercase letters

• So, for example, ‘A’ is less than ‘a’, and ‘b’ is greater than ‘Z’

Page 13: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Strings

• A string is a sequence of characters that is treated as a single value.

• Instances of the String class are used to represent strings in Java.

• We access individual characters of a string by calling the charAt method of the String object.

Page 14: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Strings

• Each character in a string has an index we use to access the character.

• Java uses zero-based indexing; the first character’s index is 0, the second is 1, and so on.

• To refer to the first character of the word name, we say

name.charAt(0)

Page 15: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

String indexing with charAt method

• An indexed expression is used to refer to individual characters in a string.

Page 16: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Constructing strings

• Since String is a class, we can create an instance of a class by using the new method.–The statements we have used so far, such as

String name1 = “Kona”;

–works as a shorthand for String name1 = new String(“Kona”);

–But this shorthand works for the String class only.

Page 17: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Example: Counting Vowelschar letter;

String name = JOptionPane.showInputDialog(null,"Your name:");

int numberOfCharacters = name.length();

int vowelCount = 0;

for (int i = 0; i < numberOfCharacters; i++) {

letter = name.charAt(i);

if (letter == 'a' || letter == 'A' || letter == 'e' || letter == 'E' ||

letter == 'i' || letter == 'I' || letter == 'o' || letter == 'O' ||

letter == 'u' || letter == 'U' ) {

vowelCount++;

}

}

System.out.print(name + ", your name has " + vowelCount + " vowels");

Here’s the code to count the number of vowels in the input string.

Here’s the code to count the number of vowels in the input string.

Page 18: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Example: Counting ‘Java’ int javaCount = 0;

boolean repeat = true;

String word;

while ( repeat ) {

word = JOptionPane.showInputDialog(null,"Next word:");

if ( word.equals("STOP") ) {

repeat = false;

} else if ( word.equalsIgnoreCase("Java") ) {

javaCount++;

}

}

Continue reading words and count how many times the word Java occurs in the input, ignoring the case.

Continue reading words and count how many times the word Java occurs in the input, ignoring the case.

Notice how the comparison is done. We are not using the == operator.

Notice how the comparison is done. We are not using the == operator.

Page 19: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Other Useful String OperatorsMethod Meaning

compareTo Compares the two strings.str1.compareTo( str2 )

substring Extracts the a substring from a string.str1.substring( 1, 4 )

trim Removes the leading and trailing spaces.str1.trim( )

valueOf Converts a given primitive data value to a string.String.valueOf( 123.4565 )

startsWith Returns true if a string starts with a specified prefix string.

str1.startsWith( str2 )

endsWith Returns true if a string ends with a specified suffix string.

str1.endsWith( str2 )

Page 20: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Comparing Strings

• Comparing String objects is similar to comparing other objects.

• The equality test (==) is true if the contents of the variables are the same.

• For a reference data type, the equality test is true if both variables refer to the same object, because they both contain the same address. Thus, the “contents of the variable” does not mean “the sequence of characters in the String”

Page 21: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Comparing Strings

• We don’t usually use the == operator to compare Strings

• The equals method is true if the String objects to which the two variables refer contain the same string value.

String s1 = new String (“hello”);String s2 = new String (“hello”);if (s1 == s2)

System.out.println (“They are equal”); // this won’t printif (s1.equals(s2))

System.out.println (“No, really, they are”); // this will print

Page 22: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The difference between the equality test and the equals method

Page 23: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

… continued

Page 24: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Comparing Strings

• String comparison may be done in several ways. – The methods equals and equalsIgnoreCase

compare string values; one is case-sensitive and one is not.

– The method compareTo returns a value:• Zero (0) if the strings are equal. • A negative integer if the first string is less than the

second.• A positive integer if the first string is greater than

the second.

Page 25: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Comparing Strings

• As long as a new String object is created using the new operator, the rule for comparing objects applies to comparing strings.String str = new String (“Java”);

• If the new operator is not used, string data are treated as if they are of the primitive data type.String str = “Java”;

Page 26: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The difference between using and not using the new operator for String

Page 27: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Pattern Matching and Regular Expressions

• Pattern matching is a common function in many applications.

• In Java 2 SDK 1.4, two new classes, Pattern and Matcher, are added.

• The String class also includes several new methods that support pattern matching.

Page 28: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Pattern Example• Suppose students are assigned a three-digit code:

– The first digit represents the major (5 indicates computer science);– The second digit represents either in-state (1), out-of-state (2), or

foreign (3);– The third digit indicates campus housing:

• On-campus dorms are numbered 1-7.• Students living off-campus are represented by the digit 8.

The 3-digit pattern to represent computer science majors living on-campus is

5[123][1-7]

firstcharacter

is 5second

characteris 1, 2, or 3

thirdcharacter

is any digit between 1 and 7

Page 29: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Pattern Matching and Regular Expression

• The pattern is called a regular expression that allows us to denote a large set of “words” (any sequence of symbols) succinctly.

• Brackets [ ] represent choices, so [abc] means a, b, or c.

• For example, the definition for a valid Java identifier may be stated as

[a-zA-Z][a-zA-Z0-9_$]*

Page 30: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Regular Expressions

• Rules– The brackets [ ] represent choices– The asterisk symbol * means zero or more

occurrences. – The plus symbol + means one or more occurrences.– The hat symbol ^ means negation.– The hyphen – means ranges. – The parentheses ( ) and the vertical bar | mean a

range of choices for multiple characters.

Page 31: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Regular Expression Examples

Expression Description

[013] A single digit 0, 1, or 3.

[0-9][0-9] Any two-digit number from 00 to 99.

[0-9&&[^4567]] A single digit that is 0, 1, 2, 3, 8, or 9.

[a-z0-9] A single character that is either a lowercase letter or a digit.

[a-zA-z][a-zA-Z0-9_$]* A valid Java identifier consisting of alphanumeric characters, underscores, and dollar signs, with the first character being an alphabet.

[wb](ad|eed) Matches wad, weed, bad, and beed.

(AZ|CA|CO)[0-9][0-9] Matches AZxx,CAxx, and COxx, where x is a single digit.

Page 32: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

More Examples

Expression Description

X{N} Repeat X exactly N times, where X is a regular expression for a single character.

X{N,} Repeat X at least N times.

X{N,M} Repeat X at least N but no more than M times.

Page 33: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Pattern Matching and Regular Expression

• The matches method from the String class is similar to the equals method.

• However, unlike equals, the argument to matches can be a pattern.

Page 34: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Pattern Matching and Regular Expression

• The period symbol (.) is used to match any character except a line terminator (\n or \r).

String document;document = ...; //assign text to ‘document’if (document.matches(“.*zen of objects.*”){

System.out.println(“Found”);} else {

System.out.println(“Not found”);}

Page 35: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Pattern Matching and Regular Expression

• Brackets ([ ]) are used for expressing a range of choices for a given character.

• To express a range of choices for multiple characters, use parentheses and the vertical bar.

Page 36: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Pattern Matching and Regular Expression

Expression Description

[wb](ad|eed) Matches wad, weed, bad, and beed.

(pro|anti)-OOP Matches pro-OOP and anti-OOP

(AZ|CA|CO)[0-9]{4} Matches AZxxxx,CAxxxx, and COxxxx, where x is a single digit.

Page 37: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Pattern Matching and Regular Expression

• The replaceAll method is new to the Version 1.4 String class.

• This method allows us to replace all occurrences of a substring that matches a given regular expression with a given replacement string.

Page 38: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Pattern Matching and Regular Expression

• For example, to replace all vowels in a string with the @ symbol:

String originalText, modifiedText;originalText = ...;

//assign string to ‘originalText’modifiedText = originalText.replaceAll(“[aeiou]”,”@”);

• Note that this method does not change the original text; it simply returns a modified text as a separate string.

Page 39: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Pattern Matching and Regular Expression

• To match a whole word, use the \b symbol to designate the word boundary.

str.replaceAll(“\\btemp\\b”, “temporary”);

• Two backslashes are necessary because we must write the expression in a String representation. Two backslashes prevents the system from interpreting the regular expression backslash as a control character.

Page 40: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Pattern Matching and Regular Expression

• The backslash is also used to search for a command character. For example:– To search for the plus symbol (+) in text, we use the

backslash as \+.– To express it as a string, we write “\\+”.

Page 41: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The Pattern and Matcher Classes

• The matches and replaceAll methods of the String class are shorthand for using the Pattern and Matcher classes from the java.util.regex package.

Page 42: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The Pattern and Matcher Classes

• If str and regex are String objects, then both

str.matches(regex);

andPattern.matches(regex, str);

are equivalent to Pattern pattern = Pattern.compile(regex);

Matcher matcher = p.matcher(str);

matcher.matches();

Page 43: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The Pattern and Matcher Classes• Creating Pattern and Matcher objects gives

us more options and efficiency.

• The compile method of the Pattern class converts the stated regular expression to an internal format to carry out the pattern-matching operation.

• This conversion is carried out every time the matches method of the String or Pattern class is executed.

Page 44: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The Pattern and Matcher Classes/* Chapter 9 Sample Program: Checks whether the input string is

a valid identifier. This version uses the Matcher and Pattern classes.

File: Ch9MatchJavaIdentifier2.java */ import javax.swing.*;import java.util.regex.*;

class Ch9MatchJavaIdentifier2 { private static final String STOP = STOP"; private static final String VALID ="Valid Java identifier"; private static final String INVALID ="Not a valid Java identifier";

Page 45: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The Pattern and Matcher Classesprivate static final String VALID_IDENTIFIER_PATTERN =

"[a-zA-Z][a-zA-Z0-9_$]*";

public static void main (String[] args) {String str, reply;Matcher matcher;Pattern pattern =

Pattern.compile(VALID_IDENTIFIER_PATTERN);while (true) {

str = JOptionPane.showInputDialognull, "Identifier:");

if (str.equals(STOP)) break;

Page 46: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The Pattern and Matcher Classesmatcher = pattern.matcher(str);if (matcher.matches()) {

reply = VALID;} else {

reply = INVALID;}JOptionPane.showMessageDialog(null,

str + ":\n" + reply);} // ends loop

} // ends main} // ends class

Page 47: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The Pattern and Matcher Classes

• The find method is another powerful method of the Matcher class.

• The method searches for the next sequence in a string that matches the pattern, and returns true if the pattern is found.

Page 48: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The Pattern and Matcher Classes

• When a matcher finds a matching sequence of characters, we can query the location of the sequence by using the start and end methods.

Page 49: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The Pattern and Matcher Classes

• The start method returns the position in the string where the first character of the pattern is found.

• The end method returns the value 1 more than the position in the string where the last character of the pattern is found.

Page 50: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The String Class is Immutable

• In Java a String object is immutable– This means once a String object is created, it cannot

be changed, such as replacing a character with another character or removing a character

– The String methods we have used so far do not change the original string. They created a new string from the original. For example, substring creates a new string from a given string.

• The String class is defined in this manner for efficiency reasons.

Page 51: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Effect of Immutability

We can do thisbecause Stringobjects areimmutable.

Page 52: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The StringBuffer Class

• In many string processing applications, we would like to change the contents of a string. In other words, we want it to be mutable.

• Manipulating the content of a string, such as replacing a character, appending a string with another string, deleting a portion of a string, and so on, may be accomplished by using the StringBuffer class.

Page 53: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

StringBuffer ExampleStringBuffer word = new StringBuffer("Java");word.setCharAt(0, 'D');word.setCharAt(1, 'i');

Changing a string Java to Diva

word

: StringBuffer

Java

Before

word

: StringBuffer

Diva

After

word.setCharAt(0, 'D');word.setCharAt(1, 'i');

Page 54: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

Sample ProcessingReplace all vowels in the sentence with ‘X’.

char letter;

String inSentence = JOptionPane.showInputDialog(null, "Sentence:");

StringBuffer tempStringBuffer = new StringBuffer(inSentence);

int numberOfCharacters = tempStringBuffer.length();for (int index = 0; index < numberOfCharacters; index++) {

letter = tempStringBuffer.charAt(index);if ( letter == 'a' || letter == 'A' || letter == 'e' || letter == 'E' ||

letter == 'i' || letter == 'I' || letter == 'o' || letter == 'O' ||

letter == 'u' || letter == 'U' ) {

tempStringBuffer.setCharAt(index,'X');}

}

JOptionPane.showMessageDialog(null, tempStringBuffer );

Page 55: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The append and insert Methods

• We use the append method to append a String or StringBuffer object to the end of a StringBuffer object. – The method can also take an argument of the

primitive data type.– Any primitive data type argument is converted

to a string before it is appended to a StringBuffer object.

• We can insert a string at a specified position by using the insert method.

Page 56: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The StringBuilder Class

• This class is new to Java 5.0 (SDK 1.5)• The class is added to the newest version

of Java to improve the performance of the StringBuffer class.

• StringBuffer and StringBuilder support exactly the same set of methods, so they are interchangeable.

Page 57: Characters and Strings. Representation of single characters Data type char is the data type that represents single characters, such as letters, numerals,

The StringBuilder Class

• There are advanced cases where we must use StringBuffer, but all sample applications in the book, StringBuilder can be used.

• Since the performance is not our main concern and that the StringBuffer class is usable for all versions of Java, we will use StringBuffer only in this book.