lecture 6: comparing things word similarity
DESCRIPTION
Lecture 6: Comparing Things Word Similarity. Methods in Computational Linguistics II Queens College. Today. List Comprehensions Determining Word Similarity Co-occurrences WordNet. List Comprehensions. Compact way to process every item in a list. [x for x in array]. Methods. - PowerPoint PPT PresentationTRANSCRIPT
Methods in Computational Linguistics II
Queens College
Lecture 6: Comparing ThingsWord Similarity
2
Today
• List Comprehensions• Determining Word Similarity• Co-occurrences • WordNet
3
List Comprehensions
• Compact way to process every item in a list.
• [x for x in array]
4
Methods
• Using the iterating variable, x, methods can be applied.
• Their value is stored in the resulting list.• [len(x) for x in array]
5
Conditionals
• Elements from the original list can be omitted from the resulting list, using conditional statements
• [x for x in array if len(x) == 3]
6
Building up
• These can be combined to build up complicated lists
• [x.upper() for x in array if len(x) > 3 and x.startswith(‘t’)]
7
Lists Containing Lists
• Lists can contain lists• [[a, 1], [b, 2], [d, 4]]• ...or tuples• [(a, 1), (b, 2), (d, 4)]• [ [d, d*d] for d in array if d < 4]
8
Lists within lists are often called 2-d arrays
• This is another way we store tables.
• Similar to nested dictionaries.• a = [[0,1], [1,0]• a[1][1]• a[0][0]
9
Using multiple lists
• Multiple lists can be processed simultaneously in a list comprehension
• [x*y for x in array1 for y in array2]
10
Co-occurrences
• How would you identify common co-occurrences?
• Define a co-occurrence:– “school bus” vs. “school river”
11
How are words related?
12
Some relations
13
Anything else?
• What relationships would you like to know about between words?
14
WordNet
15
Synsets
16
Other relationships in WordNet
17
WordNet Similarity
18
WordNet Similarity
19
Word sense disambiguation
20
Stemming and Lemmatizing
21
Stemming and Lemmatization in NLTK
22
WordNet Demo
23
Next Time
• Word Similarity– Wordnet
• Data structures– 2-d arrays. – Trees– Graphs