using wordnet to retrieve words from their meanings İlknur durgar el-kahlout and kemal oflazer...

36
USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Upload: ayanna-dodgen

Post on 01-Apr-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS

İlknur Durgar El-Kahlout and Kemal Oflazer

Sabancı Universityİstanbul, Turkey

Page 2: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Problem

For a given definition, find the appropriate word (or words)

Traditional dictionary is of no use From a dictionary, find an appropriate

word that has a “similar” definition

Page 3: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Examples User definition:

Akımı ölçmek için kullanılan alet(A device that is used to measure the currenta)

In the dictionary:akımölçer: elektrik akımının şiddetini

ölçmeye yarayan araç, ampermetre(ammeter: a device that measures the intensity

of electrical current, amperemeter)

?

Page 4: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Applications Computer-assisted language

learning Solving crossword puzzles Reverse dictionary

Page 5: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Outline Problem statement Meaning-to-Word System (MTW) Our Approach Methods Results Result Summary Conclusion

Page 6: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Problem Statement Find the “similarity” between two

definitionsAkımı ölçmek için kullanılan alet

(A device that is used to measure the current)

Elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre

(a device that measures the intensity of electrical current, amperemeter)

Page 7: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Meaning-to-Word (MTW) addresses the problem of finding

the appropriate word (or words), whose meaning “matches” the given definition

Two subproblems finding words whose definitions are

"similar" to the query in some sense ranking the candidate words using a

variety of ways

Page 8: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

User Definition

Search in Dictionary

Rank Candidates

query

candidates

List of words

Information Flow in MTW

Page 9: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Available Resources

Turkish Monolingual Dictionary About 50.000 entries

Turkish WordNet About 11.000 synsets

Page 10: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

User Definition

Search in Dictionary

Rank Candidates

query

candidates

List of words

Normalization

Normalization

Page 11: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Normalization

Tokenization Stemming Stop Word Elimination

Page 12: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

User Definition

Search in Dictionary

Rank Candidates

query

candidates

List of words

Query Processing

Query Processing

Page 13: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Query Processing Subset Generation

Search with different set of words Select informative words from user’s

queryQuery: daha önce hiç evlenmemiş kişi (a person who

has never been married)

{önce, evlen, kişi} (before, marry, person)

{evlen, kişi}, {önce, kişi}, {önce, evlen} (marry, person) (before, person) (before, marry)

{evlen}, {önce}, {kişi} (marry) (before) (person)

Page 14: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Query Processing

Subset Sorting Unordered list of subsets are

insufficient Rank the generated subsets

1) By the number of words{önce, evlen, kişi} (before, marry, person)

{evlen, kişi} (marry, person)

2) By the sum of frequency logarithm{evlen, kişi} (marry, person)

{önce, kişi} (before, person)

Page 15: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

User Definition

Search in Dictionary

Rank Candidates

query

candidates

List of words

Searching for Meanings

Page 16: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Searching for Meanings Two methods

Stem Matching Query Expansion (using WordNet)

Page 17: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Stem Matching Morphological normalization of

words Find meanings that contain

morphological variants of the original definition

Page 18: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Stem Matching (Ex.)(A device that is used to measure the current)

{ akımı ölçmek için kullanılan alet }

ak (white) ölç(measure) için(to) kullan(use) alet (device)

akım(current) iç(drink) kul (slave)

akı (flux)

Colored stems are the matching ones

Page 19: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Stem Matching

(A device that is used to measure the current)

akımı ölçmek için kullanılan alet

elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre

(a device that measures the intensity of electrical current, amperemeter)

Page 20: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Stem Matching

(A device that is used to measure the current)

akımı ölçmek için kullanılan alet

elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre

(a device that measures the intensity of electrical current, amperemeter)

Page 21: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Drawbacks Generate noisy stems ilim (science, my city) ilim (science), il (city)

Conflate two words with very different meanings to the same stem

ilim (science, my city), ilde (in the city) il (city)

Cannot find relations between similar words

kimse (someone) kişi (person)

bölüm (part) kısım (portion)

Stem Matching

Page 22: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Using Query Expansion Two different approaches:

Expand query with relations (synonyms, specializations, generalizations)

Expand query with unexpanded query’s relevant answers

WordNet synonyms are used in MTW

{besin, gıda} (food, nourishment) {iyileş, düzel} (to get better) /{iyileş, geliş} (to

improve)

Page 23: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Query Expansion (Ex.)(A device that is used to measure the current)

{ akımı ölçmek için kullanılan alet }

ak (white) ölç(measure) için(to) kullan(use) alet (device)

akım(current) iç(drink) kul (slave)

akı (flux)

beyaz faydalan araç

debi yararlan gereç

akış köle

Page 24: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Query Expansion (Ex.)(A device that is used to measure the current)

akımı ölçmek için kullanılan alet

elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre

(a device that measures the intensity of electrical current, amperemeter)

Page 25: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Query Expansion (Ex.)(A device that is used to measure the current)

akımı ölçmek için kullanılan alet

elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre

(a device that measures the intensity of electrical current, amperemeter)

Page 26: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

User Definition

Search in Dictionary

Rank Candidates

query

candidates

List of words

Ranking

Page 27: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Ranking Very important part of MTW

Having the right answer in the retrieved set is not enough

Aim is to have the right answer at top of the retrieved set (Ex: in first top 50 answers)

Page 28: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Ranking Simple but effective methods

Number of matched words Subset informativeness - frequency of

words in the subset Ratio of number of matched words to

the number of words in the candidate dictionary definition

Longest Common Subsequence - order of the matched words

Page 29: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Some Statistics Training sets:

50 queries from users 50 queries from a dictionary

Test sets: 50 queries from users 50 queries from a separate dictionary

Test set 1 (user)

Training set 1

Test set 2 (dict.)

Training set 2

# of queries 50 50 50 50

Avg. # of query words

5.66 4.64 9.24 13.98

Max. # of query words

17 12 23 45

Min. # of query words

2 1 1 6

Page 30: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Rank Test set 1

Training set 1

Test set 2

Training set 2

1-10 13 (26%)

18 (36%)

45 (90%)

41 (82%)

11-50 7 (14%) 12 (24%)

2 (4%) 5 (10%)

>50 19 (38%)

10 (20%)

3 (6%) 4 (8%)

Not found

11 (22%)

10 (20%)

0 (0%) 0 (0%)

Stem Matching all stems included

Low % in top 10 in user queries but very high results in dictionary queries

Page 31: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Stem Matching

Rank Test set 1

Training set 1

Test set 2

Training set 2

1-10 14 (28%)

21 (42%)

46 (92%)

43 (86%)

11-50 5 (10%) 9 (18%) 1 (2%) 5 (10%)

>50 18 (36%)

9 (18%) 3 (6%) 2 (4%)

Not found

13 (26%)

11 (22%)

0 (0%) 0 (0%)

longest stem included (heuristics)

Improvement in user queries, slightly better performance in dictionary queries

Page 32: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Query Expansion (WordNet)

Rank Test set 1

Training set 1

Test set 2

Training set 2

1-10 14(28%)

24 (48%)

45 (90%)

41 (82%)

11-50 9 (18%) 9 (18%) 2 (4%) 5 (10%)

>50 18 (36%)

12 (24%)

3 (6%) 4 (8%)

Not found

9 (18%) 5 (10%) 0 (0%) 0 (0%)

all stems included

Better results in user queries, no change in dictionary queries

Page 33: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Query Expansion (WordNet)

Rank Test set 1

Training set 1

Test set 2

Training set 2

1-10 14 (28%)

24 (48%)

41 (82%)

39 (78%)

11-50 6 (12%) 8 (16%) 5 (10%) 6 (12%)

>50 21 (42%)

13 (26%)

1 (2%) 5 (10%)

Not found

9 (18%) 5 (10%) 0 (0%) 0 (0%)

longest stem included (heuristics)

Better performance than ‘longest stem matching’ in user queries, but worse performance in dictionary queries

Page 34: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Result Summary Stem Matching (longest stem

included) 60% success in real user queries 96% success in dictionary queries

Query Expansion (all stems included) 68% success in real user queries 92% success in dictionary queries

Page 35: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Conclusion We have implemented a ‘Meaning to

Word’ system for Turkish Results on unseen data are rather

satisfactory Query expansion is better

Although, it cannot find the words for all queries

68% of real user queries and 90% of dictionary queries are found in the first 50 results

Page 36: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

THANK YOU !