using wordnet to retrieve words from their meanings İlknur durgar el-kahlout and kemal oflazer...
TRANSCRIPT
USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS
İlknur Durgar El-Kahlout and Kemal Oflazer
Sabancı Universityİstanbul, Turkey
Problem
For a given definition, find the appropriate word (or words)
Traditional dictionary is of no use From a dictionary, find an appropriate
word that has a “similar” definition
Examples User definition:
Akımı ölçmek için kullanılan alet(A device that is used to measure the currenta)
In the dictionary:akımölçer: elektrik akımının şiddetini
ölçmeye yarayan araç, ampermetre(ammeter: a device that measures the intensity
of electrical current, amperemeter)
?
Applications Computer-assisted language
learning Solving crossword puzzles Reverse dictionary
Outline Problem statement Meaning-to-Word System (MTW) Our Approach Methods Results Result Summary Conclusion
Problem Statement Find the “similarity” between two
definitionsAkımı ölçmek için kullanılan alet
(A device that is used to measure the current)
Elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre
(a device that measures the intensity of electrical current, amperemeter)
Meaning-to-Word (MTW) addresses the problem of finding
the appropriate word (or words), whose meaning “matches” the given definition
Two subproblems finding words whose definitions are
"similar" to the query in some sense ranking the candidate words using a
variety of ways
User Definition
Search in Dictionary
Rank Candidates
query
candidates
List of words
Information Flow in MTW
Available Resources
Turkish Monolingual Dictionary About 50.000 entries
Turkish WordNet About 11.000 synsets
User Definition
Search in Dictionary
Rank Candidates
query
candidates
List of words
Normalization
Normalization
Normalization
Tokenization Stemming Stop Word Elimination
User Definition
Search in Dictionary
Rank Candidates
query
candidates
List of words
Query Processing
Query Processing
Query Processing Subset Generation
Search with different set of words Select informative words from user’s
queryQuery: daha önce hiç evlenmemiş kişi (a person who
has never been married)
{önce, evlen, kişi} (before, marry, person)
{evlen, kişi}, {önce, kişi}, {önce, evlen} (marry, person) (before, person) (before, marry)
{evlen}, {önce}, {kişi} (marry) (before) (person)
Query Processing
Subset Sorting Unordered list of subsets are
insufficient Rank the generated subsets
1) By the number of words{önce, evlen, kişi} (before, marry, person)
{evlen, kişi} (marry, person)
2) By the sum of frequency logarithm{evlen, kişi} (marry, person)
{önce, kişi} (before, person)
User Definition
Search in Dictionary
Rank Candidates
query
candidates
List of words
Searching for Meanings
Searching for Meanings Two methods
Stem Matching Query Expansion (using WordNet)
Stem Matching Morphological normalization of
words Find meanings that contain
morphological variants of the original definition
Stem Matching (Ex.)(A device that is used to measure the current)
{ akımı ölçmek için kullanılan alet }
ak (white) ölç(measure) için(to) kullan(use) alet (device)
akım(current) iç(drink) kul (slave)
akı (flux)
Colored stems are the matching ones
Stem Matching
(A device that is used to measure the current)
akımı ölçmek için kullanılan alet
elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre
(a device that measures the intensity of electrical current, amperemeter)
Stem Matching
(A device that is used to measure the current)
akımı ölçmek için kullanılan alet
elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre
(a device that measures the intensity of electrical current, amperemeter)
Drawbacks Generate noisy stems ilim (science, my city) ilim (science), il (city)
Conflate two words with very different meanings to the same stem
ilim (science, my city), ilde (in the city) il (city)
Cannot find relations between similar words
kimse (someone) kişi (person)
bölüm (part) kısım (portion)
Stem Matching
Using Query Expansion Two different approaches:
Expand query with relations (synonyms, specializations, generalizations)
Expand query with unexpanded query’s relevant answers
WordNet synonyms are used in MTW
{besin, gıda} (food, nourishment) {iyileş, düzel} (to get better) /{iyileş, geliş} (to
improve)
Query Expansion (Ex.)(A device that is used to measure the current)
{ akımı ölçmek için kullanılan alet }
ak (white) ölç(measure) için(to) kullan(use) alet (device)
akım(current) iç(drink) kul (slave)
akı (flux)
beyaz faydalan araç
debi yararlan gereç
akış köle
Query Expansion (Ex.)(A device that is used to measure the current)
akımı ölçmek için kullanılan alet
elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre
(a device that measures the intensity of electrical current, amperemeter)
Query Expansion (Ex.)(A device that is used to measure the current)
akımı ölçmek için kullanılan alet
elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre
(a device that measures the intensity of electrical current, amperemeter)
User Definition
Search in Dictionary
Rank Candidates
query
candidates
List of words
Ranking
Ranking Very important part of MTW
Having the right answer in the retrieved set is not enough
Aim is to have the right answer at top of the retrieved set (Ex: in first top 50 answers)
Ranking Simple but effective methods
Number of matched words Subset informativeness - frequency of
words in the subset Ratio of number of matched words to
the number of words in the candidate dictionary definition
Longest Common Subsequence - order of the matched words
Some Statistics Training sets:
50 queries from users 50 queries from a dictionary
Test sets: 50 queries from users 50 queries from a separate dictionary
Test set 1 (user)
Training set 1
Test set 2 (dict.)
Training set 2
# of queries 50 50 50 50
Avg. # of query words
5.66 4.64 9.24 13.98
Max. # of query words
17 12 23 45
Min. # of query words
2 1 1 6
Rank Test set 1
Training set 1
Test set 2
Training set 2
1-10 13 (26%)
18 (36%)
45 (90%)
41 (82%)
11-50 7 (14%) 12 (24%)
2 (4%) 5 (10%)
>50 19 (38%)
10 (20%)
3 (6%) 4 (8%)
Not found
11 (22%)
10 (20%)
0 (0%) 0 (0%)
Stem Matching all stems included
Low % in top 10 in user queries but very high results in dictionary queries
Stem Matching
Rank Test set 1
Training set 1
Test set 2
Training set 2
1-10 14 (28%)
21 (42%)
46 (92%)
43 (86%)
11-50 5 (10%) 9 (18%) 1 (2%) 5 (10%)
>50 18 (36%)
9 (18%) 3 (6%) 2 (4%)
Not found
13 (26%)
11 (22%)
0 (0%) 0 (0%)
longest stem included (heuristics)
Improvement in user queries, slightly better performance in dictionary queries
Query Expansion (WordNet)
Rank Test set 1
Training set 1
Test set 2
Training set 2
1-10 14(28%)
24 (48%)
45 (90%)
41 (82%)
11-50 9 (18%) 9 (18%) 2 (4%) 5 (10%)
>50 18 (36%)
12 (24%)
3 (6%) 4 (8%)
Not found
9 (18%) 5 (10%) 0 (0%) 0 (0%)
all stems included
Better results in user queries, no change in dictionary queries
Query Expansion (WordNet)
Rank Test set 1
Training set 1
Test set 2
Training set 2
1-10 14 (28%)
24 (48%)
41 (82%)
39 (78%)
11-50 6 (12%) 8 (16%) 5 (10%) 6 (12%)
>50 21 (42%)
13 (26%)
1 (2%) 5 (10%)
Not found
9 (18%) 5 (10%) 0 (0%) 0 (0%)
longest stem included (heuristics)
Better performance than ‘longest stem matching’ in user queries, but worse performance in dictionary queries
Result Summary Stem Matching (longest stem
included) 60% success in real user queries 96% success in dictionary queries
Query Expansion (all stems included) 68% success in real user queries 92% success in dictionary queries
Conclusion We have implemented a ‘Meaning to
Word’ system for Turkish Results on unseen data are rather
satisfactory Query expansion is better
Although, it cannot find the words for all queries
68% of real user queries and 90% of dictionary queries are found in the first 50 results
THANK YOU !