introduction motivation linguistic levels types of mwes approaches to identify mwes limitations...

25
Multiword Expressions Presented by: Bhuban Seth (09305005) Somya Gupta (10305011) Advait Mohan Raut (09305923) Victor Chakraborty (09305903) Under the guidance of: Prof. Pushpak Bhattacharya.

Upload: ayla

Post on 23-Mar-2016

27 views

Category:

Documents


3 download

DESCRIPTION

Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References. Put the sweater on Put the sweater on the table Put the light on. Put the sweater on Put the sweater on the table Put the light on Roughly defined as: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Multiword Expressions

Presented by:

Bhuban Seth (09305005)

Somya Gupta (10305011)

Advait Mohan Raut (09305923)

Victor Chakraborty (09305903)

Under the guidance of: Prof. Pushpak Bhattacharya.

Page 2: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Contents

Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Page 3: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Introduction

Put the sweater on Put the sweater on the table Put the light on

Page 4: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Introduction

Put the sweater on Put the sweater on the table Put the light on

Roughly defined as: Idiosyncratic interpretations that cross word

boundaries (or spaces)

Page 5: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Examples

His grandfather kicked the bucket. This job is a piece of cake Put the sweater on He is the dark horse of the match

Google Translations of above sentences:

अपने दादा बाल्टी लात मारी

इस काम के केक का एक टुकड़ा है

स्वेटर पर रखो

वह मैच के अंधेरे घोड़ा है

Page 6: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Motivation

Multiword expressions

•“Of the same order of magnitude as the number of single words” (Jakendoff 1977)•41% - WordNet 1.7 (Fellbaum 1999)

Resolution needed in:

•Machine Translation – Google translate Poor performance example•Information Retrieval•Tagging , Parsing , Question Answering System , WSD

Page 7: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Linguistic Levels

•In short, Ad hocLexicology

•Put on weight, Put the sweater on

Morphology and Syntax

•Spill the BeansSemantics

•Kick the Bucket, Kick the bucket filled with waterPragmatics

Page 8: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

How to Handle These?

Variation in Flexibility

Syntactic Idiomaticity

Page 9: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Types (Sag et al 2002)

Page 10: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Types - Examples

Type ExampleFixed In Short , Ad hoc, Palo Alto, Alta

VistaCompound Nominals Congressman, Car park, Part of

SpeechProper Names Deccan Chargers, Delhi

DaredevilsNon Decomposable Idioms Kick the Bucket

Decomposable Idioms Spill the Beans, Let the Cat out Verb Particle Constructions Take off, Put on, Light Verb Constructions Give a Demo, Take a Shower

Institutionalized Phrases Black and White, Traffic Light, Telephone booth

Page 11: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Approaches

Page 12: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Knowledge Based Approach

1)Word with space : Fixed expression• Stemmer may be used to

detect MWEs.• But it fails .. Why???• Kicks the bucket MWE• Kick the buckets Not

MWE• Princeton Wordnet – Flaw

2)Circumscribed Constructions:• Consecutive

Nouns Most probably MWE

3) Inflection Head : Semi fixed expression• Ex : part of

speech parts of speech

Page 13: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Statistical Approaches

Co-occurrence properties

Substitutability

Distributional Similarity

Semantic Similarity

Page 14: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Co-occurrence properties

Example: Black and White

Scan a corpus and find probabilities of bigrams and tri-grams.

P(X|Y) = P(XY)/P(Y)

If P(X|Y) is high, then there is a chance that word sequence ‘YX’ is a MWE.

Demerit:• “I am “ Not MWE.

Page 15: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Point-wise Mutual Information (PMI)

PMI(X,Y)= log {P(X,Y)/(P(X).P(Y))}

PMI(X,Y) of a word pair (X,Y) is measure of strength of their

collocation

Other methods like students-t test and Pearson chi-square can also be used.

Demerit:• Need to differentiate between

systematic & chance co-occurrence

Page 16: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Pearson’s chi-square test

Based on assumption of normal distribution of word frequency, which

could be a limitation

Null hypothesis: the words are independent of each other.

Higher the value of the chi-square statistic, the stronger the association

between the words

Demerit:• For small data collections, assumptions

of normality and chi-square distribution do not hold. Hence, large corpus required

Page 17: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Substitutability

The ability to replace parts of lexical items with alternatives.

Alternatives can be similar or opposite words with respect to tasks & approaches.

Mostly after the substitution the new phrase no longer remains MWE.

Can be used to remove possible Non-MWEs

Src: Kim, 2008

Page 18: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Distributional Similarity

A method to extract the semantic similarity using the context

When two words are similar, then their context words are also similar

Src: Kim, 2008

Page 19: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Semantic Similarity

Similar NCs could have same semantic relations

Src: Kim, 2008

Page 20: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Method

Src: Kim, 2008

Page 21: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

MWE Resources

•British National Corpus (BNC)•Brown CorpusCorpus•WordNet•Moby’s Thesaurus- contains 30K root words & 2.5M synonyms and related words

Lexical Resources

•WordNet::Similarity- gives measure of semantic similarity between two given wordsTools

Page 22: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Limitations of current Approaches

Many NLP approaches treat MWEs according to the words-with-spaces method

Many approaches get commonly-attested MWE usages right, sometimes using “ad hoc” methods, e.g. preprocessing

However, most approaches handle variation badly, fail to generalize, and result in NLP systems that are difficult to maintain and extend

Page 23: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Conclusion

MWEs have been classified in terms of lexicalized phrases (like fixed , semi fixed and syntactically flexible) and institutionalized phrases.

MWE analysis in NLP is equally important as any of the other domain like MT or WSD.

Hybrid approach is most probably the best method so far to extract MWE from corpus.

Page 24: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

References

Kim, S. N. (2008). Statistical modeling of multiword expressions.

Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Filckinger, D. (2001). Multiword Expression : A pain in the neck for the NLP. In the proceeding of the 3rd International conference on Intelligent text processing and computational linguistics.

Calzolari, N. a. (2002). Towards best practice for

multiword expressions in computational lexicons. Proc. of the 3rd International conference of language resources and evaluation, (pp. 1934--40).

Page 25: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

Thank You

Questions???