urdu noun phrase chunking - center for language 2010-08-24¢  urdu noun phrase chunking ms...

Download Urdu Noun Phrase Chunking - Center for Language 2010-08-24¢  Urdu Noun Phrase Chunking MS Thesis Submitted

Post on 03-Apr-2020

4 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Urdu Noun Phrase Chunking

    MS Thesis

    Submitted in Partial Fulfillment of the Requirements for

    the Degree of

    Master of Science (Computer Science)

    at the

    National University of Computer and Emerging Sciences `

    by

    Shahid Siddiq MS (CS) 06-0817

    2009

  • Approved by Committee Members:

    Approved:

    ______________________________ Head (Department of Computer Sciences) __________________ 20 ________

    Approved by Committee Members: Advisor ___________________________________

    Dr. Sarmad Hussain Professor National University of Computer & Emerging Sciences

    Co –Advisor ___________________________________

    Mr. Shafiq-ur-Rahman Associate Professor National University of Computer & Emerging Sciences

  • Acknowledgements

    If the oceans turn into ink and all the wood becomes pen, even then, the praises of Allah

    Almighty cannot be expressed. Unfeigned and meek thanks are before him, who created the

    universe and bestowed the mankind with knowledge and wisdom to search for its secret and

    invigorated me with fortunate and capability to complete my research work on this interesting

    topic.

    The warmest and deepest sense of obligation are to respected advisor Prof. Dr. Sarmad Hussain,

    for his valuable guidance in this thesis work, kind and humble attitude during different stages of

    this work.

    Many thanks to all of my friends in the field of Natural Language processing for guiding me

    during experimentation and literature review, particularly to Mr. Asim Ali, PhD student,

    National University of Computer and Emerging Sciences, Mr. Umer Khalid Qureshi, Center of

    Research in Urdu Language Processing and Ahmed Muaz, Center of Research in Urdu Language

    Processing for kind guidance.

    Special Thanks to Co-advisor Mr. Shafiq-ur-Rahman for his kind guidance during this work and

    then review of this report.

    Shahid Siddiq

  • Table of Contents

    1 Introduction............................................................................................................................. 1 1.1 Organization of Thesis Report .............................................................................................. 1

    2 Background............................................................................................................................. 2 2.1 Parts of Speech...................................................................................................................... 2

    2.1.1 Parts of Speech Tags...................................................................................................... 3 2.1.1.1 Part of Speech Tags of English............................................................................... 3 2.1.1.2 Part of Speech Tags of Urdu................................................................................... 4

    2.2 Phrases in Urdu ..................................................................................................................... 6 2.3 Important Characteristics of Urdu ........................................................................................ 7

    2.3.1 Free Word Order Property ............................................................................................. 7 2.3.2 Case Markers ................................................................................................................. 8

    3 Chunking............................................................................................................................... 10 3.1 Benefits of Chunking .......................................................................................................... 11 3.2 NP Chunking....................................................................................................................... 11

    4 Literature Review.................................................................................................................. 13 4.1 Methods of Chunking ......................................................................................................... 13

    4.1.1 Rule based Chunking ................................................................................................... 13 4.1.2 Corpus based Chunking ............................................................................................... 14 4.1.3 Hybrid approach of Chunking ..................................................................................... 17

    4.2 Tools for Chunking ............................................................................................................. 18 4.3 SNoW based Chunking tag set comparison........................................................................ 18

    5 Current Work ........................................................................................................................ 20 5.1 Motivation........................................................................................................................... 20 5.2 Problem Statement .............................................................................................................. 20 5.3 Scope................................................................................................................................... 21 5.4 Methodology....................................................................................................................... 21

    5.4.1 Computational Model .................................................................................................. 21 5.4.2 Architecture.................................................................................................................. 23 5.4.3 Tagger .......................................................................................................................... 24 5.4.4 Preparation of Data ...................................................................................................... 24

    5.4.4.1 Revision of Data ................................................................................................... 24 5.4.4.2 Identification of Boundaries for Noun Phrases..................................................... 25

    5.5 Experimentation.............................................................................................................. 26 5.5.1 First Phase of Experiments (Statistical Method Implementation) ............................... 27

    5.5.1.1 Experiment 1: Base Experiment Using Basic Methodology ................................ 27 5.5.1.2 Experiment 2: Extended Experiment Using Transformation of All POS............. 27 5.5.1.3 Experiment 3: Extended Experiment Using Transformation of Nouns Only....... 28

    5.5.2 Second Phase of Experiments (Implementation of Rules) .......................................... 28 6 Results and Discussion ......................................................................................................... 31

    6.1 Results................................................................................................................................. 31 6.1.1 Overall Accuracy of Experiments................................................................................ 31 6.1.2 Precision....................................................................................................................... 31 6.1.3 Recall ........................................................................................................................... 31 6.1.4 Experiment 1: Base Experiment Using Basic Methodology ....................................... 32

    6.1.4.1 Right to Left Training and Testing (Natural Direction of Urdu) .......................... 32 6.1.4.2 Left to Right Training and Testing ....................................................................... 33

    6.1.5 Experiment 2: Extended Experiment using Transformation of All POS..................... 35

  • 6.1.6 Experiment 3: Extended Experiment using Transformation of Nouns Only............... 36 6.2 Discussion ........................................................................................................................... 38

    7 Conclusion and Future Work ................................................................................................ 43 7.1 Conclusion .......................................................................................................................... 43 7.2 Directions for Future Work................................................................................................. 43

    References..................................................................................................................................... 45 Appendices.................................................................................................................................... 49

    Appendix A: Terms of Reference ............................................................................................. 49 Appendix B: Rule Set of Experiments...................................................................................... 50 Appendix C: Tag Sequence Examples of Experiments ............................................................ 66

    Training Tag sequence of Experiment 1A: Base Experiment using Basic Methodology Right to Left Direction (Sample Data)........................................

Recommended

View more >