hindi pos tagging and chunking : an memm approach aniket dalal kumar nagaraj uma sawant sandeep...
Post on 30-Jan-2016
217 views
TRANSCRIPT
![Page 1: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/1.jpg)
Hindi POS tagging and chunking : An MEMM
approach
Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke
Under the guidance of Prof. P. Bhattacharyya
![Page 2: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/2.jpg)
Goal
Lexical AnalysisPart-Of-Speech (POS) Tagging : Assigning part-of-speech to each word. e.g. Noun, Verb...
Syntactic AnalysisChunking : Identify and label phrases as verb phrase, noun phrase etc.
Language : Hindi Approach : MEMM
![Page 3: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/3.jpg)
Outline
Maximum Entropy Markov Model (MEMM)Principle
Mathematical formulation
System overview Parameter estimation and classification
POS tagging features
Chunking features
Results and error analysis
Future work
Conclusion
![Page 4: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/4.jpg)
Maximum Entropy Markov Model
Maximum entropy principle The least biased model which considers all known
information is the one which maximizes entropy.
Entropy
![Page 5: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/5.jpg)
Maximum Entropy Markov Model
Mathematical formulation...
The distribution with the maximum entropy is equivalent to
\
![Page 6: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/6.jpg)
System overview
Parameter estimation and classification
GIS (Generalized Iterative Scaling)
finds the model parameters that define the maximum
entropy classifier for a given feature set and training
corpus
Beam Search
heuristic search algorithm, optimization of best-first
search
unfolds the first m most promising nodes at each depth
![Page 7: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/7.jpg)
What are features?
Feature function : Indicator function which captures useful facts of the
modelling task
For example,
![Page 8: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/8.jpg)
POS tagging features
Context-based POS tag of previous word
Current word
Word-dependentSuffixes
Digits
Special characters
English words
![Page 9: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/9.jpg)
POS tagging features
Dictionary-basedPossible tags for the word, according to the dictionary
Corpus-drivenOccurrence of a word and its tag(s) according to the training data
![Page 10: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/10.jpg)
Chunking features
Context based features Word itself (conditionally)
POS tag
Chunk label of previous word
Current POS tag based featureTag class
![Page 11: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/11.jpg)
Experimental Setup
26 POS tags6 chunk labels75 - 25 split of training and test dataResult averaged over 10 data sets
![Page 12: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/12.jpg)
Results
POS tagging accuracy Best : 89.346 %
Average : 88.4 %
Chunk labelling accuracy (per word basis)
Best : 87.399 %
Average : 86.45 %
![Page 13: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/13.jpg)
Accuracy across runs
![Page 14: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/14.jpg)
Error Analysis : POS tagging
Good performance for :VAUX, VFM, VNN
Postpositions
Need to improve :Compound tags
Proper nouns
![Page 15: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/15.jpg)
Error Analysis : Chunking
Good performance for :Noun phrase
Need to improve :Verb phrase
![Page 16: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/16.jpg)
Future Work
Morphological Features
Enriching dictionary
Hybrid models
![Page 17: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/17.jpg)
References
1. Adwait Ratnaparakhi. 1996. A maximum entropy model for part-of-speech tagging. In Erich Brill and Kenneth Church, editors, Proceedings of the Conference on Empirical Methods in NLP, pages 133-142. ACL. Somerset, New Jersey.
2. Adwait Ratnaparakhi. 1997. A simple introduction to maximum entropy models for natural language processing. Technical report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.
![Page 18: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/18.jpg)
References
3. Adam L. Berger , Vincent J. Della Pietra , Stephen A. Della Pietra, 1996 .A maximum entropy approach to natural language processing, Computational Linguistics, v.22 n.1, p.39-71.
4. Akshay Singh, Sushma Bendre, and Rajeev Sangal. 2005. HMM based chunker for hindi. In Proceedings of IJCNLP-05. Jeju Island, Republic of Korea.
![Page 19: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/19.jpg)
References
5. J. N. Darroch, D. Ratcliff, 1972. Generalized Iterative Scaling for Log-Linear Models, The Annals of Mathematical Statistics.
![Page 20: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/20.jpg)
Thank you!
Questions ?
![Page 21: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/21.jpg)
Example
Ram/PN aur/CC Sita/PN Shaadi/N karne/GRND ja/VM
rahen/VAUX hain/VAUX
![Page 22: Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya](https://reader036.vdocuments.mx/reader036/viewer/2022081520/56649d3a5503460f94a14818/html5/thumbnails/22.jpg)
Beam Search
Ram
N:0.3 CC:0.005 PN:0.4 CC:0.2
CC:0.15 CC:0.25 INJ:0.10
VA:0.05
Aur