Transcript
Page 1: Part of speech tagging for Arabic

Part-of-Speech

Tagging

Alkhalaf.H , Alotaibi.S , Alruhaili.Sh

Page 2: Part of speech tagging for Arabic

Outline:

• Introduction

• Methods• Constructing An Automatic Lexicon for Arabic Language.

• APT: Arabic Part-of-speech Tagger.

• The HMM-Based POS Tagger.

• The Stemmer

• The POS Tagger

• Results• Constructing An Automatic Lexicon for Arabic Language.

• APT: Arabic Part-of-speech Tagger.

• The HMM-Based POS Tagger.

• Conclusion

Page 3: Part of speech tagging for Arabic

Introduction: * Arabic language

• Arabic is the language of millions of people all

over the world For that Interest in the Arabic

language is growing fast.

• Language processing tools for Arabic are yet to

achieve the quality and robustness.

• So far not been covered enough and still fertile

field.

Page 4: Part of speech tagging for Arabic

In the study of languages

• Corpus Linguistics refers to a methodology

which governs a natural language by developing

it through a set of theoretical and abstract rules

• Corpus Linguistics, originally done by hand, are

now performed by an automated process using

algorithms in software applications

Page 5: Part of speech tagging for Arabic

Part-of-Speech Tagging (POS tagging or

POST)

• Part of the Annotation method in the Corpus

Linguistics is the process of assigning a part-of-

speech to each word in a sentence as well as its

context in relationship with adjacent and related words

in a phrase, sentence, or paragraph

• A simplified form of this is commonly associated with

the identification of words as

nouns, verbs, adjectives, adverbs, etc.

Page 6: Part of speech tagging for Arabic

The Arabic verbal structures are composed

of three classes

• Noun: It is either a name or a word that

describes a person, thing or idea.

• Verb: It is a word that denotes an action and

could be combined with some particles.

• Particle: This class includes everything that is

neither a verb nor a noun, prepositions of

coordination, conjunction.

Page 7: Part of speech tagging for Arabic

APT: Arabic Part-of-speech Tagger

Previously

Word

Search in lexicon

Found ?yes no

Assign all tag possible

Not assign any tag

Methodology:

Page 8: Part of speech tagging for Arabic

NOW

APT: Arabic Part-of-speech Tagger (cont.)

Word

Search root in lexicon

There is more of a tag or did not find any tag ?

Stemming

yes no

Assign tag by affixes Tagging

Page 9: Part of speech tagging for Arabic

APT: Arabic Part-of-speech Tagger (cont.)Results:

Page 10: Part of speech tagging for Arabic

APT: Arabic Part-of-speech Tagger (cont.)

• The statistical tagger achieved an accuracy of

around 90% when disambiguating ambiguous

words with this tagset.

Page 11: Part of speech tagging for Arabic

Constructing An Automatic Lexicon for Arabic Language

Methodology:

Page 12: Part of speech tagging for Arabic

Constructing An Automatic Lexicon for

Arabic Language (cont.)

•When calculating the efficiency errors were

ignored of stemming process.

• The algorithm extracts the only triple roots.

% Total

%correct

wordsincorrect

words

# correct

words

# Incorrect

words

# word

96.50%96.50%3.50%30211313

Results:

Page 13: Part of speech tagging for Arabic

The HMM-Based POS Tagger

Page 14: Part of speech tagging for Arabic

The Tokenizer

• Since punctuation marks need to be tagged; it tags them as PUNC by pass them to the POS tagger.

• The purpose of the tokenization phase is to go through some pre-processing steps in order to prepare the input text for the remaining modules.

• HMM POS Tagger architecture developed a tokenizerto separate the punctuation marks from the words.

Then the tokenizer converts the input text into a list of words using the space as a delimiter. The resulting list is passed to the stemme.

Page 15: Part of speech tagging for Arabic

The Stemmer

• Stemming is the process of segmenting and separating affixes from a stem to produce prefix,

stem, and suffix parts.

Page 16: Part of speech tagging for Arabic

The Stemmer (cont.)

Page 17: Part of speech tagging for Arabic

The POS Tagger

• HMM model ( The POS tagger) has been built by constructing the trigram language models.

Page 18: Part of speech tagging for Arabic

The POS Tagger (cont.)

Page 19: Part of speech tagging for Arabic

The HMM-Based POS Tagger

• F-measure :

[2 x Precision x Recall] / [Precision + Recall]

where Precision = Ncorrect / Nresponse

and Recall = Ncorrect / Nkey

Page 20: Part of speech tagging for Arabic

The HMM-Based POS Tagger (cont.)

• The performance of the POS tagger decreased to55 % when it was used to tag a non-stemmed

text.

• Using F-measure ;The HMM tagger achieved 97 %.

Page 21: Part of speech tagging for Arabic

Conclusion

• Part of speech (PoS) tagging are very important and basic applications of Natural Language Processing

• In this paper we highlighted the importance of part of speech tagging in wide range of NLP applications .

• We have display the most important technologies interested in POS used so far for part of speech taggers for Arabic text from several papers.

Page 22: Part of speech tagging for Arabic

Thanks..


Top Related