computation with strings 2 day 2 - 8/29/14 ling 3820 & 6820 natural language processing harry...
TRANSCRIPT
![Page 1: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/1.jpg)
Computation with strings 2Day 2 - 8/29/14LING 3820 & 6820
Natural Language Processing
Harry Howard
Tulane University
![Page 2: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/2.jpg)
Course organization
29-Aug-2014NLP, Prof. Howard, Tulane University
2
http://www.tulane.edu/~howard/LING3820/
The syllabus is under construction. http://www.tulane.edu/~howard/CompCultEN/
Is there anyone here that wasn't here on Wednesday?
I didn't put together any practice, because we have done too little.
I will e-mail you some practice to do over the weekend.
![Page 3: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/3.jpg)
Computer hygiene
You must turn your computer off every now and then, so that it can clean itself.
By the same token, you should close applications every now and then.
29-Aug-2014
3
NLP, Prof. Howard, Tulane University
![Page 4: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/4.jpg)
What is a string?
What is an escape character?
What do these do: +, *, len(), sorted(), set()?
What is the difference between a type & a token?
Does Python know what you mean?
Review
29-Aug-2014
4
NLP, Prof. Howard, Tulane University
![Page 5: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/5.jpg)
A string is a sequence of characters delimited between single or double quotes.
§3. Computation with strings
29-Aug-2014
5
NLP, Prof. Howard, Tulane University
![Page 6: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/6.jpg)
Open Spyder
29-Aug-2014
6
NLP, Prof. Howard, Tulane University
![Page 7: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/7.jpg)
Method notation
The material aggregated to a method in parentheses is called its argument(s).
In the examples above, the argument S can be thought of linguistically as the object of a noun: the length of S, the alphabetical sorting of S, the set of S. But what if two pieces of information are needed for a method to work, for instance, to count the number of o’s in otolaryngologist?
To do so, Python allows for information to be prefixed to a method with a dot:
>>> S.count('o') The example can be read as “in S, count the o’s”, with
the argument being the substring to be counted, 'o', and the attribute being the string over which the count progresses, or more generally:
attribute.method(argument) What can be attribute and argument varies from method
to method and so must be memorized.
29-Aug-2014NLP, Prof. Howard, Tulane University
7
![Page 8: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/8.jpg)
How to clean up a string There is a group of methods for modifying the
properties of a string, illustrated below. You can guess what they do from their names:
>>> S = 'i lOvE yOu' >>> S >>> S.lower() >>> S.upper() >>> S.swapcase() >>> S.capitalize() >>> S.title() >>> S.replace('O','o') >>> S.strip('i') >>> S2 = ' '+S+' ' >>> S2 >>> S2.strip()
29-Aug-2014NLP, Prof. Howard, Tulane University
8
![Page 9: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/9.jpg)
3.3. How to find your way around a string
29-Aug-2014
9
NLP, Prof. Howard, Tulane University
![Page 10: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/10.jpg)
index() or rindex()
You can ask Python for a character’s index with the index() or rindex() methods, which take the string as an attribute and the character as an argument:
1. >>> S = 'otolaryngologist' 2. >>> S.index('o') 3. >>> S.rindex('o') 4. >>> S.index('t') 5. >>> S.rindex('t') 6. >>> S.index('l') 7. >>> S.rindex('l') 8. >>> S.index('a') 9. >>> S.rindex('a')
29-Aug-2014NLP, Prof. Howard, Tulane University
10
![Page 11: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/11.jpg)
find() & rfind()
Python also has a method find(), which appears to do the same thing as index():
1. >>> S.find('o') 2. >>> S.rfind('o') 3. >>> S.find('t') 4. >>> S.rfind('t') 5. >>> S.find('l') 6. >>> S.rfind('l') 7. >>> S.find('a') 8. >>> S.rfind('a')
29-Aug-2014NLP, Prof. Howard, Tulane University
11
![Page 12: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/12.jpg)
index() or find()
Where they differ lies in how they handle null responses:
1. >>> S.find('z')
2. -1
3. >>> S.index('z')
4. Traceback (most recent call last):
5. File "<stdin>", line 1, in <module>
6. ValueError: substring not found
29-Aug-2014NLP, Prof. Howard, Tulane University
12
![Page 13: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/13.jpg)
How to find substrings
These two methods can also find substrings:
1.>>> S.find('oto') 2.>>> S.index('oto') 3.>>> S.find('ist') 4.>>> S.index('ist') 5.>>> S.find('ly') 6.>>> S.index('ly')
29-Aug-2014NLP, Prof. Howard, Tulane University
13
![Page 14: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/14.jpg)
Limiting the search to a substring index() and find() allow optional arguments for the beginning and end positions of a substring, in order to limit searching to a substring’s confines:
1.>>> S.index('oto', 0, 3) 2.>>> S.index('oto', 3) 3.>>> S.find('oto', 0, 3) 4.>>> S.find('oto', 3) index/find(string, beginning, end)
29-Aug-2014NLP, Prof. Howard, Tulane University
14
![Page 15: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/15.jpg)
3.3.2. Zero-based indexation
29-Aug-2014
15
NLP, Prof. Howard, Tulane University
![Page 16: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/16.jpg)
0 = 1
You probably thought that the first character in a string should be given the number 1, but Python actually gives it 0, and the second character gets 1.
There are some advantages to this format which do not concern us here, but we will mention a real-world example. In Europe, the floors of buildings are numbered in such a way that the ground floor is considered the zeroth one, so that the first floor up from the ground is the first floor, though in the USA, it would called the second floor.
29-Aug-2014NLP, Prof. Howard, Tulane University
16
![Page 17: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/17.jpg)
In a picture
29-Aug-2014NLP, Prof. Howard, Tulane University
17
![Page 18: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/18.jpg)
Finding characters given a position1. >>> S = 'abcdefgh' 2. >>> S[2] 3. >>> S[5] 4. >>> S[2:5] 5. >>> S[-6] 6. >>> S[-3] 7. >>> S[-6:-3] 8. >>> S[-6:-3] == S[2:5] 9. >>> S[-6:5] 10. >>> S[5:-6]
29-Aug-2014NLP, Prof. Howard, Tulane University
18
![Page 19: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/19.jpg)
More slicing
If no beginning or end position is mentioned for a slice, Python defaults to the beginning or end of the string:
1. >>> S[2:] 2. >>> S[-2:] 3. >>> S[:2] 4. >>> S[:-2] 5. >>> S[:] The result of a slice is a string object, so it can be concatenated with another string or repeated:
1. >>> S[:-1] + '!' 2. >>> S[:2] + S[2:] 3. >>> S[:2] + S[2:] == S 4. >>> S[-2:] * 2
29-Aug-2014NLP, Prof. Howard, Tulane University
19
![Page 20: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/20.jpg)
Extended slicing
Slice syntax allows a mysterious third argument, by appending an additional colon and integer. What do these do?:
1.>>> S[::1] 2.>>> S[::2] 3.>>> S[::3] 4.>>> S[::4]
29-Aug-2014NLP, Prof. Howard, Tulane University
20
![Page 21: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/21.jpg)
All three arguments together Of course, you can still use the first two arguments to slice out a substring, which the third one steps through:
1.>>> S[1:7:1] 2.>>> S[1:7:2] 3.>>> S[1:7:3] 4.>>> S[1:7:6] Thus the overall format of a slice is:
string[start:end:step]
29-Aug-2014NLP, Prof. Howard, Tulane University
21
![Page 22: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/22.jpg)
How to reverse a string
1. >>> S[::-1] 2. >>> S[::-2] 3. >>> S[::-3] 4. >>> S[::-4]
29-Aug-2014NLP, Prof. Howard, Tulane University
22
![Page 23: COMPUTATION WITH STRINGS 2 DAY 2 - 8/29/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.mx/reader035/viewer/2022062314/56649ed85503460f94be6ccb/html5/thumbnails/23.jpg)
The rest of §3I will send you some practice for what we have done this week.
Next time
29-Aug-2014NLP, Prof. Howard, Tulane University
23