extracting sense-disambiguated example sentences from parallel corpora

68
Introduction Example Extraction Example Selection Results Conclusion Extracting Sense-Disambiguated Example Sentences From Parallel Corpora Gerard de Melo and Gerhard Weikum Max Planck Institute for Informatics Saarbr¨ ucken, Germany 2009-09-18 G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Upload: gerard-de-melo

Post on 11-May-2015

214 views

Category:

Data & Analytics


0 download

DESCRIPTION

Example sentences provide an intuitive means of grasping the meaning of a word, and are frequently used to complement conventional word definitions. When a word has multiple meanings, it is useful to have example sentences for specific senses (and hence definitions) of that word rather than indiscriminately lumping all of them together. In this paper, we investigate to what extent such sense-specific example sentences can be extracted from parallel corpora using lexical knowledge bases for multiple languages as a sense index. We use word sense disambiguation heuristics and a cross-lingual measure of semantic similarity to link example sentences to specific word senses. From the sentences found for a given sense, an algorithm then selects a smaller subset that can be presented to end users, taking into account both representativeness and diversity. Preliminary results show that a precision of around 80% can be obtained for a reasonable number of word senses, and that the subset selection yields convincing results.

TRANSCRIPT

Page 1: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

ExtractingSense-Disambiguated Example Sentences

From Parallel Corpora

Gerard de Melo and Gerhard Weikum

Max Planck Institute for InformaticsSaarbrucken, Germany

2009-09-18

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 2: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Outline

1 Introduction

2 Example Extraction

3 Example Selection

4 Results

5 Conclusion

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 3: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Outline

1 Introduction

2 Example Extraction

3 Example Selection

4 Results

5 Conclusion

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 4: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

a thin, sharp-pointed metal pin with a raised spiral

thread running around it and a slotted head, used to join

things together by being rotated in under pressure(Concise OED)

Use a crosshead screwdriver to tighten the screws inserted

in the pre-fixed connectors.(Web)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 5: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

a thin, sharp-pointed metal pin with a raised spiral

thread running around it and a slotted head, used to join

things together by being rotated in under pressure(Concise OED)

Use a crosshead screwdriver to tighten the screws inserted

in the pre-fixed connectors.(Web)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 6: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

Example Sentences

any sentence that contains a word (being used in a specificsense)

allow the user to grasp a word’s meaning

see circumstances a word would typically be used in

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 7: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

Example Sentences

any sentence that contains a word (being used in a specificsense)

allow the user to grasp a word’s meaning

see circumstances a word would typically be used in

traditional intensional word definitions may be too confusinghumans used to deriving meaning from contextusers can verify whether they have understood definition correctly

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 8: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

Example Sentences

any sentence that contains a word (being used in a specificsense)

allow the user to grasp a word’s meaning

see circumstances a word would typically be used in

possible contexts, e.g. child vs. youngster

typical collocations, e.g. to give birth or birth rate

but not *to give nascence or *nascence rate

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 9: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

Example Sentences

=⇒ most modern dictionaries include example sentences

number, length often limited

digital dictionaries: tight space constraints of print media nolonger apply!

larger number of example sentences can be presented to theuser on demand

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 10: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

Example Sentences

=⇒ most modern dictionaries include example sentences

number, length often limited

digital dictionaries: tight space constraints of print media nolonger apply!

larger number of example sentences can be presented to theuser on demand

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 11: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

Example Sentences

=⇒ most modern dictionaries include example sentences

number, length often limited

digital dictionaries: tight space constraints of print media nolonger apply!

larger number of example sentences can be presented to theuser on demand

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 12: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

Example Sentences

=⇒ most modern dictionaries include example sentences

number, length often limited

digital dictionaries: tight space constraints of print media nolonger apply!

larger number of example sentences can be presented to theuser on demand

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 13: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

Goals

automatically obtain example sentences for a specific sense ofa word

choose set of representative sentences to present to user

distinguish senses:There were many bats flying out of the cave.

vs.In professional baseball, only wooden bats

are permitted.

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 14: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

Goals

automatically obtain example sentences for a specific sense ofa word

choose set of representative sentences to present to user

not all examples are equally usefulscreen space may be limited (can still show more only on demand)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 15: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Outline

1 Introduction

2 Example Extraction

3 Example Selection

4 Results

5 Conclusion

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 16: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Sense Index (Dictionary)

English: Princeton WordNet 3.0Spanish: Spanish WordNet

σ(t): get senses for term tσ(t, s): is sense s a sense of term t?

behind the scenes: morphological analysis, multi-wordexpression detection

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 17: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Sense Index (Dictionary)

English: Princeton WordNet 3.0Spanish: Spanish WordNet

σ(t): get senses for term tσ(t, s): is sense s a sense of term t?

behind the scenes: morphological analysis, multi-wordexpression detection

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 18: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Sense Index (Dictionary)

English: Princeton WordNet 3.0Spanish: Spanish WordNet

σ(t): get senses for term tσ(t, s): is sense s a sense of term t?

behind the scenes: morphological analysis, multi-wordexpression detection

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 19: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Sense Disambiguation

disambiguate word occurrences, then use sentence as examplefor those word senses

fine-grained WSD not reliable enough (cf. SemEval results)

idea: use parallel corpora, jointly look at both versions of text

=⇒ greater accuracy can be achieved

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 20: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Sense Disambiguation

disambiguate word occurrences, then use sentence as examplefor those word senses

fine-grained WSD not reliable enough (cf. SemEval results)

idea: use parallel corpora, jointly look at both versions of text

=⇒ greater accuracy can be achieved

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 21: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Sense Disambiguation

disambiguate word occurrences, then use sentence as examplefor those word senses

fine-grained WSD not reliable enough (cf. SemEval results)

idea: use parallel corpora, jointly look at both versions of text

=⇒ greater accuracy can be achieved

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 22: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Sense Disambiguation

disambiguate word occurrences, then use sentence as examplefor those word senses

fine-grained WSD not reliable enough (cf. SemEval results)

idea: use parallel corpora, jointly look at both versions of text

=⇒ greater accuracy can be achieved

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 23: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

Sense Disambiguation

There were many bats flying out of the cave.

Aus der Hohle flogen viele Fledermause.

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 24: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Introduction

Sense Disambiguation

There were many bats flying out of the cave.

Aus der Hohle flogen viele Fledermause.

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 25: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Parallel Disambiguation

req.: text available in two languages a, bgood news: more and more parallel corpora available

word alignment

compute scorewsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa)∑

s′∈σ(ta)

σ(ta,s′)csim(tb,s′)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 26: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Parallel Disambiguation

req.: text available in two languages a, bgood news: more and more parallel corpora available

word alignment

compute scorewsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa)∑

s′∈σ(ta)

σ(ta,s′)csim(tb,s′)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 27: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Parallel Disambiguation

req.: text available in two languages a, bgood news: more and more parallel corpora available

word alignment

compute scorewsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa)∑

s′∈σ(ta)

σ(ta,s′)csim(tb,s′)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 28: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Components

csim(tb, sa): cross-lingual similarity

monolingual WSD: compare bag-of-words vector for termcontext with sense context (constructed from definitions, etc.)

Semantic Similarity sim(s1, s2)identify only near-identical senses (e.g. of house and home),not arbitrary associations (e.g. house and door)

csim(tb, sa) =∑

sb∈σ(tb)

sim(sa, sb) wsd(sb|tb)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 29: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Components

csim(tb, sa): cross-lingual similarity

monolingual WSD: compare bag-of-words vector for termcontext with sense context (constructed from definitions, etc.)

Semantic Similarity sim(s1, s2)identify only near-identical senses (e.g. of house and home),not arbitrary associations (e.g. house and door)

wsd(s|t) = σ(t, s)(α + v(s)T v(t)

||v(s)|| ||v(t)||

)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 30: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Components

csim(tb, sa): cross-lingual similarity

monolingual WSD: compare bag-of-words vector for termcontext with sense context (constructed from definitions, etc.)

Semantic Similarity sim(s1, s2)identify only near-identical senses (e.g. of house and home),not arbitrary associations (e.g. house and door)

sim(s1, s2) =

1 s1 = s2

1 s1, s2 in near-synonymy relationship

1 s1, s2 in hypernymy/hyponymy relationship

0 otherwise

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 31: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Extraction

Extraction

Use as example sentence iff score sufficiently high

wsd(sa|ta, tb) = wsd(ta, sa)σ(ta,sa)csim(tb,sa)∑

s′∈σ(ta)

σ(ta,s′)csim(tb,s′)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 32: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Outline

1 Introduction

2 Example Extraction

3 Example Selection

4 Results

5 Conclusion

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 33: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

Task Motivation

for computational applications: more data =⇒ better data

for human users: a good limited selection should be providedat first

assumption: space constraint k - number of sentencesgoal: choose a good set of k example sentences

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 34: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

Task Motivation

for computational applications: more data =⇒ better data

for human users: a good limited selection should be providedat first

assumption: space constraint k - number of sentencesgoal: choose a good set of k example sentences

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 35: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

Task Motivation

for computational applications: more data =⇒ better data

for human users: a good limited selection should be providedat first

assumption: space constraint k - number of sentencesgoal: choose a good set of k example sentences

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 36: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

What is a good example sentence?

one that showcases how a word is used in context(typical prepositions, collocations, etc.)

one that helps in grasping the meaning

related work by Rychly et al. (2008):one that is intelligible to learners

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 37: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

What is a good example sentence?

one that showcases how a word is used in context(typical prepositions, collocations, etc.)

one that helps in grasping the meaning

related work by Rychly et al. (2008):one that is intelligible to learners

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 38: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

What is a good example sentence?

one that showcases how a word is used in context(typical prepositions, collocations, etc.)

one that helps in grasping the meaning

related work by Rychly et al. (2008):one that is intelligible to learners

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 39: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

Assets

example sentences can be thought of as having certain assets

6 different classes of n-gram assets

1 class of assets for entire original sentence

e.g. for account:containing frequent bigram bank account can be an assetcontaining frequent trigram to account for can be an asset

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 40: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

Assets

example sentences can be thought of as having certain assets

6 different classes of n-gram assets

1 class of assets for entire original sentence

1: original unigram2-5: bigram with preceding/following, trigram, etc.

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 41: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

Assets

example sentences can be thought of as having certain assets

6 different classes of n-gram assets

1 class of assets for entire original sentence

weights w(a) = f (x ,a)∑ni=1 f (xi ,a)

, etc.

=⇒ higher weight for open an account than for Peter’s chequingaccount

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 42: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

Assets

example sentences can be thought of as having certain assets

6 different classes of n-gram assets

1 class of assets for entire original sentence

weight: cosine similarity with sense definition=⇒ bias towards explanatory sentences

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 43: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

What is a good set of example sentences?

Select sentences with greatest total asset weights?

Instead: maximize∑

a∈⋃

x∈CA(x)

w(a)

Problem is NP-hard (proof via Vertex Cover reduction)

−→ use greedy heuristic

No: Most frequent expressions would dominate the result set.Need diversity!

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 44: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

What is a good set of example sentences?

Select sentences with greatest total asset weights?

Instead: maximize∑

a∈⋃

x∈CA(x)

w(a)

Problem is NP-hard (proof via Vertex Cover reduction)

−→ use greedy heuristic

each asset in result set only counted once

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 45: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

What is a good set of example sentences?

Select sentences with greatest total asset weights?

Instead: maximize∑

a∈⋃

x∈CA(x)

w(a)

Problem is NP-hard (proof via Vertex Cover reduction)

−→ use greedy heuristic

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 46: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

What is a good set of example sentences?

Select sentences with greatest total asset weights?

Instead: maximize∑

a∈⋃

x∈CA(x)

w(a)

Problem is NP-hard (proof via Vertex Cover reduction)

−→ use greedy heuristic

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 47: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

Greedy Heuristic to select set C

Greedily select highest-weighted sentence x :x ← argmax

x∈X\C

∑a∈A(x)

w(a)

Then set the weights w(a) of all assets a ∈ A(x) to zero

=⇒ ranked list obtained (useful!)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 48: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

Greedy Heuristic to select set C

Greedily select highest-weighted sentence x :x ← argmax

x∈X\C

∑a∈A(x)

w(a)

Then set the weights w(a) of all assets a ∈ A(x) to zero

=⇒ ranked list obtained (useful!)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 49: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Example Selection

Greedy Heuristic to select set C

Greedily select highest-weighted sentence x :x ← argmax

x∈X\C

∑a∈A(x)

w(a)

Then set the weights w(a) of all assets a ∈ A(x) to zero

=⇒ ranked list obtained (useful!)

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 50: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Outline

1 Introduction

2 Example Extraction

3 Example Selection

4 Results

5 Conclusion

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 51: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Resources

OPUS corpora (OpenSubtitles, OpenOffice.org collections)and Reuters RCV1 for non-disambiguated sentence selection

GIZA++/UPlug for lexical alignment

Princeton WordNet 3.0 and Spanish WordNet(+ sense mappings for compatibility)

TreeTagger for morphological analysis

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 52: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Resources

OPUS corpora (OpenSubtitles, OpenOffice.org collections)and Reuters RCV1 for non-disambiguated sentence selection

GIZA++/UPlug for lexical alignment

Princeton WordNet 3.0 and Spanish WordNet(+ sense mappings for compatibility)

TreeTagger for morphological analysis

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 53: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Resources

OPUS corpora (OpenSubtitles, OpenOffice.org collections)and Reuters RCV1 for non-disambiguated sentence selection

GIZA++/UPlug for lexical alignment

Princeton WordNet 3.0 and Spanish WordNet(+ sense mappings for compatibility)

TreeTagger for morphological analysis

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 54: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Resources

OPUS corpora (OpenSubtitles, OpenOffice.org collections)and Reuters RCV1 for non-disambiguated sentence selection

GIZA++/UPlug for lexical alignment

Princeton WordNet 3.0 and Spanish WordNet(+ sense mappings for compatibility)

TreeTagger for morphological analysis

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 55: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Examples from OpenSubtitles Corpusline (something, as acord or rope, that is longand thin and flexible)

I got some fishing line if you wantme to stitch that.Von Sefelt, get the stern line.

line (the descendants ofone individual)

What line of kings do you descendfrom?My line has ended.

catch (catch up withand possibly overtake)

He’s got 100 laps to catch BeauBrandenburg if he wants to becomeworld champion.They won’t catch up.

catch (grasp with the I didn’t catch your name.mind or develop anunderstanding of)

Sorry, I didn’t catch it.

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 56: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Examples from OpenSubtitles Corpustalk (exchange thoughts,talk with)

Why don’t we have a seat and talk itover.Okay, I’ll talk to you but onecondition...

talk (use language) But we’ll be listening from the kitchenso talk loud.You spit when you talk.

opening (a ceremonyaccompanying the startof some enterprise)

We don’t have much time until theopening day of Exhibition.What a disaster tomorrow is theopening ceremony!

opening (the firstperformance, as of atheatrical production)

It will be rehearsed in the morningready for the opening tomorrow night.You ready for our big opening night?

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 57: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Sense-disambiguated Example SentencesCorpus Covered

SensesExampleSentences

Accuracy(Wilsoninterval)

OpenSubtitles en-es 13,559 117,078 0.815 ± 0.081OpenSubtitles es-en 8,833 113,018 0.798 ± 0.090OpenOffice.org en-es 1,341 13,295 0.803 ± 0.081OpenOffice.org es-en 932 11,181 0.793 ± 0.087

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 58: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

error analysis:(1) incorrect lexical alignments unlikely to lead to incorrectdisambiguation(2) incomplete sense inventories can lead to mistakes(3) also, on a few occasions, the morphological analyser led towrong results

implications for future work:(1) better monolingual WSD(2) additional languages, especially phylogenetically unrelated

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 59: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

error analysis:(1) incorrect lexical alignments unlikely to lead to incorrectdisambiguation(2) incomplete sense inventories can lead to mistakes(3) also, on a few occasions, the morphological analyser led towrong results

implications for future work:(1) better monolingual WSD(2) additional languages, especially phylogenetically unrelated

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 60: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Rankings for OpenSubtitles Corpusbeing or located on ordirected toward the

1. In America we drive on the rightside of the road.

side of the body to theeast when facing north

2. I’ll tie down your right arm soyou can learn to throw a left.

3. If we wait from the right side, wehave an advantage there.

put up with something 1. You can’t stand it, can you?or somebodyunpleasant

2. You really think I can toleratesuch an act?

3. No one can stand that harmonicaall day long.

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 61: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Rankings for OpenSubtitles Corpususing or providing or 1. Not the electric chair.producing ortransmitting or

2. Some electrical currentcirculating through my body.

operated by electricity 3. Near as I can tell it’s anelectrical impulse.

take something orsomebody with

1. And they were kind enough totake me in here.

oneself somewhere 2. It conveys such a great feeling.3. We interrupt this program to

bring you a special news bulletin.

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 62: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Rankings for long in RCV1 Corpus1. In the long term interest rate market, the yield of the key

182nd 10 year Japanese government bond (JGB) fell to2.060 percent early on Tuesday, a record low for anybenchmark 10-year JGB.

2. “The government and opposition have gambled away the lastchance for a long time to prove they recognise the country’sproblems, and that they put the national good above theirown power interests”, news weekly Der Spiegel said.

3. As long as the index keeps hovering between 957 and 995,we will maintain our short term neutral recommendation.

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 63: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Rankings for purchase in RCV1 Corpus1. Romania’s State Ownership Fund (FPS), the country’s main

privatisation body, said on Wednesday it had accepted fivebids for the purchase of a 50.98 percent stake in the largestlocal cement maker Romcim.

2. Grand Hotel Group said on Wednesday it has agreed toprocure an option to purchase the remaining 50 percent ofthe Grand Hyatt complex in Melbourne from hotel developerand investor Lustig & Moar.

3. The purchase price for the business, which had 1996calendar year sales of about $25 million, was not disclosed.

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 64: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Results

Rankings for colonial in RCV1 Corpus1. Hong Kong came to the end of 156 years of British colonial

rule on June 30 and is now an autonomous capitalist regionof China, running all its own affairs except defence anddiplomacy.

2. The letter was sent in error to the embassy of Portugal – theformer colonial power in East Timor – and was neitherreturned nor forwarded to the Indonesian embassy.

3. Sino-British relations hit a snag when former Governor ChrisPatten launched electoral reforms in the twilight years ofcolonial rule despite fierce opposition by Beijing.

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 65: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Outline

1 Introduction

2 Example Extraction

3 Example Selection

4 Results

5 Conclusion

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 66: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Conclusion

Framework for:

extracting sense-disambiguated example sentences fromparallel corporaselecting limited numbers of sentences given space constraints

Future Work:

better disambiguation, e.g. additional languages, bettertechniquesadditional input for selection: sentence length, definitionextraction techniquesintegrated user interface for UWN (Universal Wordnet)

Contact: [email protected]

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 67: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Conclusion

Framework for:

extracting sense-disambiguated example sentences fromparallel corporaselecting limited numbers of sentences given space constraints

Future Work:

better disambiguation, e.g. additional languages, bettertechniquesadditional input for selection: sentence length, definitionextraction techniquesintegrated user interface for UWN (Universal Wordnet)

Contact: [email protected]

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

Page 68: Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

IntroductionExample ExtractionExample Selection

ResultsConclusion

Conclusion

Framework for:

extracting sense-disambiguated example sentences fromparallel corporaselecting limited numbers of sentences given space constraints

Future Work:

better disambiguation, e.g. additional languages, bettertechniquesadditional input for selection: sentence length, definitionextraction techniquesintegrated user interface for UWN (Universal Wordnet)

Contact: [email protected]

G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences