Research ArticleExploring the Performance of Tagging for the Classical andthe Modern Standard Arabic
Dia AbuZeina 1 and Taqieddin Mostafa Abdalbaset2
1College of Information Technology and Computer Engineering Palestine Polytechnic University Hebron State of Palestine2Palestine Technical UniversityndashKadoorie AL-Aroub Branch Hebron State of Palestine
Correspondence should be addressed to Dia AbuZeina abuzeinappuedu
Received 7 August 2018 Accepted 23 October 2018 Published 23 January 2019
Guest Editor Omar Abu Arqub
Copyright copy 2019 Dia AbuZeina and Taqieddin Mostafa Abdalbaset This is an open access article distributed under the CreativeCommons Attribution License which permits unrestricted use distribution and reproduction in any medium provided theoriginal work is properly cited
The part of speech (PoS) tagging is a core component in many natural language processing (NLP) applications In fact thePoS taggers contribute as a preprocessing step in various NLP tasks such as syntactic parsing information extraction machinetranslation and speech synthesis In this paper we examine the performance of a modern standard Arabic (MSA) based taggerfor the classical (ie traditional or historical) Arabic In this work we employed the Stanford Arabic model tagger to evaluatethe imperative verbs in the Holy Quran In fact the Stanford tagger contains 29 tags however this work experimentally evaluatesjust one that is the VB equiv imperative verb The testing set contains 741 imperative verbs which appear in 1848 positions in the HolyQuran Despite the previously reported accuracy of the Arabic model of the Stanford tagger which is 9626 for all tags and 8014for unknown words the experimental results show that this accuracy is only 728 for the imperative verbs This result promotesthe need for further research to expose why the tagging is severely inaccurate for classical Arabic The performance decline mightbe an indication of the necessity to distinguish between training data for both classical and MSA Arabic for NLP tasks
1 Introduction
The part of speech (PoS) tagging also known as word-category disambiguation is a process to determine the tagof each word in a given input text The tagging processuses the context to label words using syntactic tags such asnoun adjective verb or preposition that are also known asparts of speech word-classes grammatical categories lexicalclass markers or syntactic categories Tagging is performedeither manually by linguistic experts or automatically bymachine learning algorithms intuitively this work considersthe computational track Word tags are mainly used todescribe the words and their jobs according to the context forfurther processing That is each word has a particular rolebased on the position and the adjacent words in the sentenceThe tagset is a predefined list that generally includes somesymbols such as nouns pronouns adjectives verbs adverbspropositions conjunctions and the definite and indefinitearticles (sometimes called ldquodeterminersrdquo) Of course thetagset is prepared by the language linguistic scholars to
describe the languagersquos membership or word family The sizeof the tagset is variable and depends on the requirements orthe capacity of developing applications In any case the tagsetshould best fit and efficiently serve the intended purposesHence there is no predefined tagset for all languages andthere is no standard (ie unique) tagset for a certain languageRather it is a debatable matter
The PoS is increasingly becoming a vital factor in therelated natural language processing (NLP) applications Infact creating knowledge base resources (eg tag relation-ships) is one objective of the PoS tagging that can be laterused in other NLP tools In fact PoS tagging has many rolesin the field of NLP as a basic prepossessing step For instancesome ofNLPPoS tagging based applications include syntacticparsing information extraction machine translation speechsynthesis and named entity recognition (NER) This workis aimed at exploring the performance of the PoS for theclassical Arabic using a modern standard Arabic (MSA)tagger that is the Stanford tagger [1] Since it is difficult toevaluate the Stanford tagger for all tags (29 tags) as it requires
HindawiAdvances in Fuzzy SystemsVolume 2019 Article ID 6254649 10 pageshttpsdoiorg10115520196254649
2 Advances in Fuzzy Systems
a large annotated corpus the Quranic imperative verbs werechosen in the evaluation process The Stanford tagger usesthe label VB to mark the imperative verbs That is this workis restricted to a testing dataset that contains a list of allimperative verbs in the Holy Quran that is obtained from[2]This work is distinguished by presenting an experimentalstudy of the classical Arabic performance using one of thefreely available taggers and therefore making it clear forcomparison purposes This work also aims to demonstratethe tagging problems from different points of view suchas the Arabic PoS tagging benefits and challenges tagsetscapacities tagging algorithms and the recent studies in thisfield
In spite of the importance of taggersrsquo performance forboth classical andMSAArabic few studies have explored theaccuracy for the classical Arabic On the other hand most ofthe previous studies focused on the tagsets and the taggingapproaches For instance one study [3] proposed an Arabictagset with detailed hierarchical levels of the categories andtheir relationships (ie a tree of different levels) As indicatedthis study focused on the imperative verbs in theHoly QuranThe reason for choosing the imperative is that it is easier tofind such annotated testing collections due to the previouseffort of Arabic scholars to serve the Quranic studies Inaddition theArabic language is distinguished to have a stand-alone form of imperative verbs whereas it is mixed with thepresent verb as found in the English language For instancethe English language has the verb ldquogordquo as an imperative andpresent verb while the same verbs have a different form in theArabic language as the imperative is ldquo
rdquo and the presentis ldquo
rdquo which are completely different words in terms oftranscription and tense
Even though the documentation of the Stanford tagger[4] indicates that the accuracy of the Arabic model is 9626on an MSA test portion as described in [5] and 8014 forunknown words our measure shows extremely less accuracyIn this work the Stanford tagger scored only 728 accuracyfor a collection of Arabic imperative verbs It is worthindicating that the Stanford tagger works at word level (iethe tag is given to the whole word instead of its parts suchas prefixes stems and suffixes as some other taggers do)Despite diacritics playing an important role in the taggingprocess nevertheless they are discarded in this work sincethe Stanford tagger does not consider the diacritics of theinput text However we do keep the Hamza (eg
) and the
Madd () symbols in the corresponding characters That is
the testing dataset is a nonvocalized Arabic text The outputof this work highlights the importance of reinvestigatingthe tagging problem for the Arabic language since many ofprevious studies report accuracies into the nineties percentileReinvestigation includes different aspects of training data aseither classical or MSA the tagsets the corpora sizes etc
The rest of this paper is organized as follows In the nextsection we demonstrate the benefits of the tagging for variousNLP applications In Section 3 we present why tagging is achallenging task We exhibit the literature review in Section 4followed by the Stanford tagset in Section 5 The proposed
method is described in Section 6 and the experimental resultsin Section 7 Finally we conclude in Section 8
2 The Benefits of Tagging
The PoS tagging is the core of many NLP algorithms dueto the useful information it gives about a word and itsneighbors In fact NLP applications employ the output ofthe PoS tagging for different purposes such as checkingthe correctness of the syntactic structure around the wordFor instance regarding the Arabic language adjectives arepreceded by nouns while nouns are preceded by adjectivesin the English language such as ldquoA beautiful schoollarrrarr13
13 rdquo Similarly nouns are preceded by verbs inthe English such as ldquoHe runs fastrdquo while the Arabic allowsboth directions such as ldquo 13 larrrarr the teacher
writes the lessonrdquo and ldquo 13 larrrarr theteacher writes the lessonrdquo Therefore the Google translatorgives the same translation for two different word orderArabic sentences Hence knowing the syntax of word orderis extremely important for some NLP applications since itlimits the output candidates and increases the probabilities ofcorrect answers The following are some of NLP applicationsthat utilize the PoS tagging
(i) Capturing common syntactical rules [6] presentsa data mining based method to extract the com-mon syntactical rules in the Holy Quran The studyreported that the common relationships between thewordsrsquo tags (ie the common rule) are tag1=RPtag2=NN tag3=WP 91 997904rArr tag4=VBD 90 accuracy(097912) Formore information of the tags the readerrefers to Section 5 in this paper
(ii) Enhancing the performance in speech recognition[7] employs the PoS to generate new words basedon the neighboring word tags The study used com-pound nouns that are followed by adjectives and thepreposition followed by any word After recognitionthe compound words were placed back to theiroriginal states (ie two parts) This method showsperformance enhancement
(iii) Named Entity Recognition (NER) [8] employs atagger for named entities recognition (NER) NERaims at extracting the names such as people orga-nizations locations cities or companies NER isbeneficial for certain applications such as classifyingcontent for news providers This facilitates catego-rization and content discovery NER also speeds upthe search process in sizeable data that containsfor instance millions of articles Other applicationsinclude using powering content recommendationscustomer support and research papers
(iv) Syntactic Parsing [9] employs the PoS tagging forsyntactic parsing Syntactic parsing is a process toconfirm that the input sentence follows the languagersquosformal grammar Figure 1 shows a parsing tree fora simple sentence The parsing tree represents the
Advances in Fuzzy Systems 3
sentence
noun phrasenoun phrase verb phrase
noun nounverbarticle
the cat ate mouse
Figure 1 An example of a parsing tree
syntactic structure of the text and is mainly used foranalyzing the input sentence
(v) Other PoS tagging based applications includesemantic role labeling [10] speech synthesis [11]speech recognition [12] information extraction [13]summarization [14] sentiment analysis also calledopinion mining [15] diacritization [16] softwareengineering [17] question answering [18] translation[19] plagiarism detection [20] key phrases extraction[21] ontology [22] and extracting Arabic noun com-pound [23]
3 The Challenge of Tagging
That fact that a word can take different tags makes the PoStagging a challenging task That is a word can be labeled bydifferent tags based on the context Therefore the goal of thePoS tagging algorithms is to remove such ambiguity and labelthe words correctly Table 1 shows some examples of wordsthat take different tags based on the context As shown in thetable the word ldquogold larrrarr rdquo in sentence 1 is taggedas VBD (verb past tense) while it is tagged as NN (nounsingular or mass) in sentence 2 Similarly the word ldquoSaidlarrrarr rdquo in the first sentence is tagged as NNP (propernoun singular) while it is tagged as JJ (adjective) in sentence3 This shows how a particular word can have different labelswhich is the challenge of the PoS tagging process Hencethe problem of the PoS tagging is to resolve ambiguities bychoosing the proper tag considering the surrounded wordsOf course the absence of diacritics in the Arabic formalwriting system adds evenmore ambiguity For instance thereis no ambiguity to know that the diacritized word ldquogoldlarrrarr rdquo is a noun and the diacritized word ldquowent larrrarr
rdquo is a verb The figure also shows the tagging output for thetranslated sentences using the English model of the Stanfordtagger
4 Literature Review
Despite the importance of the PoS tagging for both MSAand classical Arabic most of the previous tagging studieshave mainly focused on the MSA In addition the literatureshows there is an active research to consider suitable tagsets
that truly reflect the linguistic items of Arabic as one ofthe morphologically rich languages In this literature wedemonstrate the up-to-date Arabic tagging research whichfocused on the main aspects and components such as thetype of the training text (ie MSA classical tweets) tagsetstagging algorithms unknown words stemming In [30] thestudy indicated that stemming (ie removing prefixes andpostfixes or suffixes) enhances the tagging performance In[31] the study presented a method to tag tweets that isusually written out of the formal and proper spelling of thelanguage In [28] the study considered a method to handlethe ldquounknown wordsrdquo which are the words that did notappear in the training corpus In [26] the study consideredthe problem which arises when estimating the transitionprobabilities in limited amounts of training data The studyproposed decision trees basedmethod to handle this problemthat generally occurs in the hidden Markov models (HMM)tagging technique In [32] the study implemented themaster-slave technique for the PoS tagging they used HMM as amaster tagger and maximum match (MM) and Brill taggersas slaves There are many approaches to perform the PoStagging the most widely used is the statistical approach thatis based on the HMM Another approach is not-statisticalwhich is on a number of hand-crafted disambiguation rulesto find the most appropriate tag for each word as in[33]
The recent studies of part of speech tagging includedifferent aspects For instance [34] developed a part of speechtagger for the Arabic heritage They scored an accuracy of9622They also reported that themost of the tagging errorsare results of segmentation Reference [35] employs part ofspeech tagging to enhance the performance of Arabic textclassification Reference [36] demonstrates part of speechtagging for the Arabic Gulf dialect For the tagging processthey employ Support Vector Machine (SVM) classifier andbidirectional Long Short Term Memory (Bi-LSTM) Refer-ence [37] presents a tagging based study regarding Arabicdialects identification Reference [38] uses part of speechand semantic tagging to extract features for training NeuralMachine Translation
Table 2 presents some information regarding taggingsystems such as tagging algorithms tagsets corpora andaccuracies We are aware that the accuracy is not a mattersince each work has its own corpus nevertheless reportingthese measures might give an indication of the overallaccuracy of the Arabic PoS tagging Similarity even the tagsetsize is important however it is more important to haveenough training set to cover the tags used otherwise zerovaluesmight be assigned to theHMM transition probabilitieswhich raises a tagging problem
5 Stanford Tagset
As indicated in the literature review there are many tagsetsthat are used in the previous studies Mainly the tags aredivided into two classes (ie categories) which are closedclass and open classThe closed class has a fixedmembershipsuch as prepositions while the open class can accept new
4 Advances in Fuzzy Systems
Tabl
e1So
meS
tanf
ordtagg
edAra
bica
ndEn
glish
sent
ence
s
Inpu
tsen
tenc
esan
dth
etra
nslatio
nus
ingGoo
glet
rans
lator
Sent
ence
1Se
nten
ce2
Sent
ence
313
13 $
13 amp
( )
+
( -13
Said
wen
ttoscho
olTh
ewin
nerg
otag
oldne
cklace
Hap
pyda
ywew
ishyo
uStan
ford
tagg
erou
tput
s(Ara
bicm
odel)
DTN
N13
IN N
NP
VBD
NN
IN
N
N13
$13 IN
amp
DTN
N
( ) V
BD
+VBG
V
BP(
-13 JJ
NN
Stan
ford
tagg
erou
tput
s(En
glish
mod
el)Sa
idN
NPwen
tVBD
toT
Oscho
olN
NTh
eDT
win
nerNN
gotV
BDaDT
goldJJ
neck
lace
NN
Hap
pyJJ
dayNN
wePR
Pwish
VBP
you
PRP
Advances in Fuzzy Systems 5
The proposed algorithm1 Obtain the text of the Holy Quran from [24] and remove the diacritics2 Install the full version of the Stanford Arabic model tagger from [25]3 Have the text of the Holy Quran tagged4 Obtain a list of all imperative verbs in the Holy Quran from [2]5 Find all words that have the tag VB equiv imperative verb6 Compare the two lists the one we obtained in step 5 and the list we obtained in step 4 to find the correctly tagged imperative verb7 Find the accuracy based on the information that is obtained in step 6
Algorithm 1 Evaluating imperative verbs using the Stanford tagger
Table 2 Some of the literature tagging research
No Ref Tagging Method Tagset Size Corpus Size Accuracy(tags) (words) ()
1 [26] A decision tree based tagger 110 78K amp500K 9165 997888rarr 97182 [27] Support Vector Machines (SVM) 24 140K 95493 [28] Hidden Markov Models 24 29300 950 997888rarr 9714 [29] SVM and a Neural network 21 6844 9105 [1] Maximum Entropy based tagger 29 588244 961
words especially in the technology fields as ldquoto faxrdquo Table 3shows the 29 tags of the Arabic model of the Stanford tagger
6 The Proposed Method
This section presents the steps that we follow to find theperformance of the Stanford tagger against the Quranicimperative verbs The first step is the tagging process thatproduces an annotated text file of the entire Quranic sen-tencesThenwe used a number of Python programs to extractthe correctly tagged imperative verbs as well as the wronglytagged imperative verbs etc The textual version of theHoly Quran is obtained from the Quran Printing ComplexSaudi Arabia website [36] Algorithm 1 summarizes theimplemented steps
The input testing set is the nondiacritized textual form ofthe Holy Quran Figure 2 shows what the testing set lookslike The figure contains the first chapter or Surah of theHoly Quran (Surat al-Fatih ahmdashThe Opening) in additionto the first three sentence of the second chapter (SuratAl-BaqarahmdashThe Cow) Figure 3 shows the output of theStanford tagger for the Quranic sentences that appear inFigure 2 As it is observed Figure 3 shows some correctlytagged words such as the following 0
13)131 DTJJ 13-
VBD WP 2 VBP The figure also shows some
wrongly tagged words such as the following 034NNP 5(VBD ( VBD
The tagger output that is shown in Figure 3 is the maincontent that can be used for further analysis to find thebehavior of the tagger Of course the correctly tagged wordsare required (ie the correct labels of the testing words) inorder to measure the accuracy which adds more difficulty inthis kind of research In other words if we want to measure
the accuracy for the ldquoentirerdquo Holy Quran we have to preparean annotated version of the Holy Quran which is a difficulttask This is why we chose a subset that contains only theimperative verbs
7 The Experimental Results
For the evaluation we used the full Stanford tagger (129MB) that is freely available at the website of the Stanfordnatural language processing group through the link [37]It is relatively simple to execute the tagger by running thecommand shown in Figure 4 in the Windows CommandPrompt program That is the tagger does not require specialsystems as we run it on the Command prompt of theWindows 10 home operating system The figure shows that77749 words are tagged in a very short time
The experimental results are demonstrated in Table 4The table exposes the information regarding the imperativeverbs however this work can be expanded to measurethe performance for different tags such as noun or verbSimilarly it is possible to find the performance of the Stanfordtagger regarding the prepositions in the Holy Quran inwhich the same steps can be followed to get the accuracyfor prepositions or the overall accuracy of all tags Finallyexploring the performance for the Stanford tagger as wellas for the other taggers will lead to discover more weaknesspoints to be avoided in future NLP systems
8 Conclusions
This work explored the performance of the Stanford taggerfor the Arabic language The experimental results show theimportance of distinguishing between training data whenpreparing taggers That is the tagger that is prepared forpoetry is different from the tagger that is prepared for prose
6 Advances in Fuzzy SystemsTa
ble3
Thet
agse
toft
heStan
ford
Ara
bicm
odel
tagg
er
Tag
Mea
ning
with
exam
ples
Ta
gM
eani
ngwith
exam
ples
1DTJ
JDT
+Ad
jective
16PR
PPe
rson
alpr
onou
n13
13 )6
7 8 9
7 lt
713 = )
)
(gt
lt 7
7A
2DTJ
JRDT
+Ad
jective
com
parativ
e17
PRP$
Possessiv
epro
noun
13
13 (13 ) -
13 ) 6
A=
7(
7B
C
D
(gt
E 3
DTN
NDT
+Nou
nsin
gularo
rmass
18RB
Adve
rb F
0G
7H(
713 gt(
713 -
=
2(
F I
J
75(
4DTN
NP
DT
+Pr
oper
noun
sin
gular
19RP
Particle
F
amp0G
K
1 713
(13 )
713 L
A )
F
79
75
DTN
NS
DT
+Nou
nplur
al20
VB
Verb
the
impe
rativ
efor
mD
0G
13 M-J
713 M( 9
7
13 M(
1
7
13 7N
7
6IN
Prep
ositi
onor
subo
rdin
atin
gco
njun
ction
(B
+|
F
)21
VBD
Verb
pas
tten
se
F
O(
2(P713
(13 7 amp
7JJ
Adjective
22VBG
Verb
ger
undor
presen
tpar
ticiple
QH13
7( 13 amp
713 R
+13 )6
J 7D gtG
713 ( 13 713
8JJR
Adjective
com
parativ
e23
VBN
Verb
pastp
artic
iple
13 (13 ) (
13 S13 T
13 13 ) 6
U 7
BC V
7 U
HWX
- 0
13 13
7
7(13 )
9NN
Nou
nsin
gularo
rmas
s24
VBP
Verb
non
3rdpe
rson
singu
larp
rese
ntQ
- R
N( +
13 0G
13 7
E lt 7
Y (13
10NNP
Prop
erno
uns
ingu
lar
25VN
Verb
3rd
person
singu
larp
rese
ntQ13 amp
713
X 1R
amp0G
0
7
2(
7Z [
H )
0G
11NNS
Nou
nplur
al26
WP
Whpr
onou
n
H6
0G
7
B
13 D
0G
13 M(
713 M( I
713 M(13 13
12NOUN
QUA
NT
Nou
nqu
antit
y27
WRB
Whad
verb
8
7P7
J 0J 7D
2(
F I
( -P7
] V
7J
13CC
Coo
rdin
atin
gco
njun
ction
28ADJNUM
Adjective
Num
eric
]=
F[
7(-V7[
7^J lt
(
13 S13
J (J
713
7D (1
14
CDCa
rdin
alnu
mbe
r29
UH
Interje
ctionun
usua
lkin
dof
wor
dQE
7$
P7EWR
(13
D 7J M
$J 7 C ) 7
13 8( amp
15DT
Dem
onstr
ativep
rono
uns
13 (J 9
_( gtG
7`
7Z
7
Advances in Fuzzy Systems 7
Tabl
e4
Thee
xper
imen
talr
esults
Mea
sure
Total
Then
umbe
rofim
perativ
everbs
inth
eHolyQur
an18
48ve
rbs
Then
umbe
rofim
perativ
everbs
after
rem
ovin
gdu
plicates
741v
erbs
Then
umbe
rofw
ords
tagg
edas
impe
rativ
everbs
(ie
VB)
282wor
dsTh
ecor
rectly
tagg
edim
perativ
everbs
byco
mpa
ringwith
thec
orrect
list
This
lead
stoth
ecor
rectly
tagg
edlis
ttha
tinc
lude
s
54wor
dsA
7
= 7
- 13 )13
( 7[ aJ 3 47
(
7[ aJ 3 4(
7 (
7
7
[C 6
7N
( 7[
(
713 13
7[
C 6(
7 (
7P7[
C 13 amp(
7
713
7N
7
( 7[
V 7
amp(
7 2 (
7 ) amp7
7
13 7
7[
( 6[
77Y
7 13 (
7 V
7
7
V7
7
(
7
7 = 47
1
7 amp
713 [
7
(
7
[ = 47
6
7[
bG(
7[
amp(
7
13 7 amp
7N
6(
7 X )
713 )
13 c(
713
7-
amp(
7
13 C amp(
Ther
estist
hewro
ngly
classifi
edve
rbs(
iec
lassifi
edou
tofV
Btag)
This
listi
nclude
sfori
nstanc
e68
7
[7
( 7
7(
(
7(
7
[
7$ 7
7
E W
[7
0 7
( [
7 [
This
listi
nten
tiona
llyco
ntains
wor
dsth
atha
veth
esam
eroo
tldquo equiv
enterrdquoTh
isgive
sanin
dica
tionof
ther
ichn
essa
ndth
ederivative
natu
reof
theA
rabicl
angu
age
Accu
racy=
Cor
rectly
tagg
ged
impe
rativ
eve
rbs
All
impe
rativ
eve
rbs
Accu
racy=54
741=728
8 Advances in Fuzzy Systems
Figure 2 A part of Quranic testing set
Figure 3 A part of the Stanford tagger output
Figure 4 The running command of the Stanford tagger
Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort
towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research
Advances in Fuzzy Systems 9
References
[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000
[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814
[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017
[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef
ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006
[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013
[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011
[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008
[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016
[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002
[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006
[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007
[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008
[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018
[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017
[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015
[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015
[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012
[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003
[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015
[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015
[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015
[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015
[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp
stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-
ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016
[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004
[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016
[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018
[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017
[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017
[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015
[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000
[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018
[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018
10 Advances in Fuzzy Systems
[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012
[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017
[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772
[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015
[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017
Computer Games Technology
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
Advances in
FuzzySystems
Hindawiwwwhindawicom
Volume 2018
International Journal of
ReconfigurableComputing
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
thinspArtificial Intelligence
Hindawiwwwhindawicom Volumethinsp2018
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawiwwwhindawicom Volume 2018
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Computational Intelligence and Neuroscience
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018
Human-ComputerInteraction
Advances in
Hindawiwwwhindawicom Volume 2018
Scientic Programming
Submit your manuscripts atwwwhindawicom
2 Advances in Fuzzy Systems
a large annotated corpus the Quranic imperative verbs werechosen in the evaluation process The Stanford tagger usesthe label VB to mark the imperative verbs That is this workis restricted to a testing dataset that contains a list of allimperative verbs in the Holy Quran that is obtained from[2]This work is distinguished by presenting an experimentalstudy of the classical Arabic performance using one of thefreely available taggers and therefore making it clear forcomparison purposes This work also aims to demonstratethe tagging problems from different points of view suchas the Arabic PoS tagging benefits and challenges tagsetscapacities tagging algorithms and the recent studies in thisfield
In spite of the importance of taggersrsquo performance forboth classical andMSAArabic few studies have explored theaccuracy for the classical Arabic On the other hand most ofthe previous studies focused on the tagsets and the taggingapproaches For instance one study [3] proposed an Arabictagset with detailed hierarchical levels of the categories andtheir relationships (ie a tree of different levels) As indicatedthis study focused on the imperative verbs in theHoly QuranThe reason for choosing the imperative is that it is easier tofind such annotated testing collections due to the previouseffort of Arabic scholars to serve the Quranic studies Inaddition theArabic language is distinguished to have a stand-alone form of imperative verbs whereas it is mixed with thepresent verb as found in the English language For instancethe English language has the verb ldquogordquo as an imperative andpresent verb while the same verbs have a different form in theArabic language as the imperative is ldquo
rdquo and the presentis ldquo
rdquo which are completely different words in terms oftranscription and tense
Even though the documentation of the Stanford tagger[4] indicates that the accuracy of the Arabic model is 9626on an MSA test portion as described in [5] and 8014 forunknown words our measure shows extremely less accuracyIn this work the Stanford tagger scored only 728 accuracyfor a collection of Arabic imperative verbs It is worthindicating that the Stanford tagger works at word level (iethe tag is given to the whole word instead of its parts suchas prefixes stems and suffixes as some other taggers do)Despite diacritics playing an important role in the taggingprocess nevertheless they are discarded in this work sincethe Stanford tagger does not consider the diacritics of theinput text However we do keep the Hamza (eg
) and the
Madd () symbols in the corresponding characters That is
the testing dataset is a nonvocalized Arabic text The outputof this work highlights the importance of reinvestigatingthe tagging problem for the Arabic language since many ofprevious studies report accuracies into the nineties percentileReinvestigation includes different aspects of training data aseither classical or MSA the tagsets the corpora sizes etc
The rest of this paper is organized as follows In the nextsection we demonstrate the benefits of the tagging for variousNLP applications In Section 3 we present why tagging is achallenging task We exhibit the literature review in Section 4followed by the Stanford tagset in Section 5 The proposed
method is described in Section 6 and the experimental resultsin Section 7 Finally we conclude in Section 8
2 The Benefits of Tagging
The PoS tagging is the core of many NLP algorithms dueto the useful information it gives about a word and itsneighbors In fact NLP applications employ the output ofthe PoS tagging for different purposes such as checkingthe correctness of the syntactic structure around the wordFor instance regarding the Arabic language adjectives arepreceded by nouns while nouns are preceded by adjectivesin the English language such as ldquoA beautiful schoollarrrarr13
13 rdquo Similarly nouns are preceded by verbs inthe English such as ldquoHe runs fastrdquo while the Arabic allowsboth directions such as ldquo 13 larrrarr the teacher
writes the lessonrdquo and ldquo 13 larrrarr theteacher writes the lessonrdquo Therefore the Google translatorgives the same translation for two different word orderArabic sentences Hence knowing the syntax of word orderis extremely important for some NLP applications since itlimits the output candidates and increases the probabilities ofcorrect answers The following are some of NLP applicationsthat utilize the PoS tagging
(i) Capturing common syntactical rules [6] presentsa data mining based method to extract the com-mon syntactical rules in the Holy Quran The studyreported that the common relationships between thewordsrsquo tags (ie the common rule) are tag1=RPtag2=NN tag3=WP 91 997904rArr tag4=VBD 90 accuracy(097912) Formore information of the tags the readerrefers to Section 5 in this paper
(ii) Enhancing the performance in speech recognition[7] employs the PoS to generate new words basedon the neighboring word tags The study used com-pound nouns that are followed by adjectives and thepreposition followed by any word After recognitionthe compound words were placed back to theiroriginal states (ie two parts) This method showsperformance enhancement
(iii) Named Entity Recognition (NER) [8] employs atagger for named entities recognition (NER) NERaims at extracting the names such as people orga-nizations locations cities or companies NER isbeneficial for certain applications such as classifyingcontent for news providers This facilitates catego-rization and content discovery NER also speeds upthe search process in sizeable data that containsfor instance millions of articles Other applicationsinclude using powering content recommendationscustomer support and research papers
(iv) Syntactic Parsing [9] employs the PoS tagging forsyntactic parsing Syntactic parsing is a process toconfirm that the input sentence follows the languagersquosformal grammar Figure 1 shows a parsing tree fora simple sentence The parsing tree represents the
Advances in Fuzzy Systems 3
sentence
noun phrasenoun phrase verb phrase
noun nounverbarticle
the cat ate mouse
Figure 1 An example of a parsing tree
syntactic structure of the text and is mainly used foranalyzing the input sentence
(v) Other PoS tagging based applications includesemantic role labeling [10] speech synthesis [11]speech recognition [12] information extraction [13]summarization [14] sentiment analysis also calledopinion mining [15] diacritization [16] softwareengineering [17] question answering [18] translation[19] plagiarism detection [20] key phrases extraction[21] ontology [22] and extracting Arabic noun com-pound [23]
3 The Challenge of Tagging
That fact that a word can take different tags makes the PoStagging a challenging task That is a word can be labeled bydifferent tags based on the context Therefore the goal of thePoS tagging algorithms is to remove such ambiguity and labelthe words correctly Table 1 shows some examples of wordsthat take different tags based on the context As shown in thetable the word ldquogold larrrarr rdquo in sentence 1 is taggedas VBD (verb past tense) while it is tagged as NN (nounsingular or mass) in sentence 2 Similarly the word ldquoSaidlarrrarr rdquo in the first sentence is tagged as NNP (propernoun singular) while it is tagged as JJ (adjective) in sentence3 This shows how a particular word can have different labelswhich is the challenge of the PoS tagging process Hencethe problem of the PoS tagging is to resolve ambiguities bychoosing the proper tag considering the surrounded wordsOf course the absence of diacritics in the Arabic formalwriting system adds evenmore ambiguity For instance thereis no ambiguity to know that the diacritized word ldquogoldlarrrarr rdquo is a noun and the diacritized word ldquowent larrrarr
rdquo is a verb The figure also shows the tagging output for thetranslated sentences using the English model of the Stanfordtagger
4 Literature Review
Despite the importance of the PoS tagging for both MSAand classical Arabic most of the previous tagging studieshave mainly focused on the MSA In addition the literatureshows there is an active research to consider suitable tagsets
that truly reflect the linguistic items of Arabic as one ofthe morphologically rich languages In this literature wedemonstrate the up-to-date Arabic tagging research whichfocused on the main aspects and components such as thetype of the training text (ie MSA classical tweets) tagsetstagging algorithms unknown words stemming In [30] thestudy indicated that stemming (ie removing prefixes andpostfixes or suffixes) enhances the tagging performance In[31] the study presented a method to tag tweets that isusually written out of the formal and proper spelling of thelanguage In [28] the study considered a method to handlethe ldquounknown wordsrdquo which are the words that did notappear in the training corpus In [26] the study consideredthe problem which arises when estimating the transitionprobabilities in limited amounts of training data The studyproposed decision trees basedmethod to handle this problemthat generally occurs in the hidden Markov models (HMM)tagging technique In [32] the study implemented themaster-slave technique for the PoS tagging they used HMM as amaster tagger and maximum match (MM) and Brill taggersas slaves There are many approaches to perform the PoStagging the most widely used is the statistical approach thatis based on the HMM Another approach is not-statisticalwhich is on a number of hand-crafted disambiguation rulesto find the most appropriate tag for each word as in[33]
The recent studies of part of speech tagging includedifferent aspects For instance [34] developed a part of speechtagger for the Arabic heritage They scored an accuracy of9622They also reported that themost of the tagging errorsare results of segmentation Reference [35] employs part ofspeech tagging to enhance the performance of Arabic textclassification Reference [36] demonstrates part of speechtagging for the Arabic Gulf dialect For the tagging processthey employ Support Vector Machine (SVM) classifier andbidirectional Long Short Term Memory (Bi-LSTM) Refer-ence [37] presents a tagging based study regarding Arabicdialects identification Reference [38] uses part of speechand semantic tagging to extract features for training NeuralMachine Translation
Table 2 presents some information regarding taggingsystems such as tagging algorithms tagsets corpora andaccuracies We are aware that the accuracy is not a mattersince each work has its own corpus nevertheless reportingthese measures might give an indication of the overallaccuracy of the Arabic PoS tagging Similarity even the tagsetsize is important however it is more important to haveenough training set to cover the tags used otherwise zerovaluesmight be assigned to theHMM transition probabilitieswhich raises a tagging problem
5 Stanford Tagset
As indicated in the literature review there are many tagsetsthat are used in the previous studies Mainly the tags aredivided into two classes (ie categories) which are closedclass and open classThe closed class has a fixedmembershipsuch as prepositions while the open class can accept new
4 Advances in Fuzzy Systems
Tabl
e1So
meS
tanf
ordtagg
edAra
bica
ndEn
glish
sent
ence
s
Inpu
tsen
tenc
esan
dth
etra
nslatio
nus
ingGoo
glet
rans
lator
Sent
ence
1Se
nten
ce2
Sent
ence
313
13 $
13 amp
( )
+
( -13
Said
wen
ttoscho
olTh
ewin
nerg
otag
oldne
cklace
Hap
pyda
ywew
ishyo
uStan
ford
tagg
erou
tput
s(Ara
bicm
odel)
DTN
N13
IN N
NP
VBD
NN
IN
N
N13
$13 IN
amp
DTN
N
( ) V
BD
+VBG
V
BP(
-13 JJ
NN
Stan
ford
tagg
erou
tput
s(En
glish
mod
el)Sa
idN
NPwen
tVBD
toT
Oscho
olN
NTh
eDT
win
nerNN
gotV
BDaDT
goldJJ
neck
lace
NN
Hap
pyJJ
dayNN
wePR
Pwish
VBP
you
PRP
Advances in Fuzzy Systems 5
The proposed algorithm1 Obtain the text of the Holy Quran from [24] and remove the diacritics2 Install the full version of the Stanford Arabic model tagger from [25]3 Have the text of the Holy Quran tagged4 Obtain a list of all imperative verbs in the Holy Quran from [2]5 Find all words that have the tag VB equiv imperative verb6 Compare the two lists the one we obtained in step 5 and the list we obtained in step 4 to find the correctly tagged imperative verb7 Find the accuracy based on the information that is obtained in step 6
Algorithm 1 Evaluating imperative verbs using the Stanford tagger
Table 2 Some of the literature tagging research
No Ref Tagging Method Tagset Size Corpus Size Accuracy(tags) (words) ()
1 [26] A decision tree based tagger 110 78K amp500K 9165 997888rarr 97182 [27] Support Vector Machines (SVM) 24 140K 95493 [28] Hidden Markov Models 24 29300 950 997888rarr 9714 [29] SVM and a Neural network 21 6844 9105 [1] Maximum Entropy based tagger 29 588244 961
words especially in the technology fields as ldquoto faxrdquo Table 3shows the 29 tags of the Arabic model of the Stanford tagger
6 The Proposed Method
This section presents the steps that we follow to find theperformance of the Stanford tagger against the Quranicimperative verbs The first step is the tagging process thatproduces an annotated text file of the entire Quranic sen-tencesThenwe used a number of Python programs to extractthe correctly tagged imperative verbs as well as the wronglytagged imperative verbs etc The textual version of theHoly Quran is obtained from the Quran Printing ComplexSaudi Arabia website [36] Algorithm 1 summarizes theimplemented steps
The input testing set is the nondiacritized textual form ofthe Holy Quran Figure 2 shows what the testing set lookslike The figure contains the first chapter or Surah of theHoly Quran (Surat al-Fatih ahmdashThe Opening) in additionto the first three sentence of the second chapter (SuratAl-BaqarahmdashThe Cow) Figure 3 shows the output of theStanford tagger for the Quranic sentences that appear inFigure 2 As it is observed Figure 3 shows some correctlytagged words such as the following 0
13)131 DTJJ 13-
VBD WP 2 VBP The figure also shows some
wrongly tagged words such as the following 034NNP 5(VBD ( VBD
The tagger output that is shown in Figure 3 is the maincontent that can be used for further analysis to find thebehavior of the tagger Of course the correctly tagged wordsare required (ie the correct labels of the testing words) inorder to measure the accuracy which adds more difficulty inthis kind of research In other words if we want to measure
the accuracy for the ldquoentirerdquo Holy Quran we have to preparean annotated version of the Holy Quran which is a difficulttask This is why we chose a subset that contains only theimperative verbs
7 The Experimental Results
For the evaluation we used the full Stanford tagger (129MB) that is freely available at the website of the Stanfordnatural language processing group through the link [37]It is relatively simple to execute the tagger by running thecommand shown in Figure 4 in the Windows CommandPrompt program That is the tagger does not require specialsystems as we run it on the Command prompt of theWindows 10 home operating system The figure shows that77749 words are tagged in a very short time
The experimental results are demonstrated in Table 4The table exposes the information regarding the imperativeverbs however this work can be expanded to measurethe performance for different tags such as noun or verbSimilarly it is possible to find the performance of the Stanfordtagger regarding the prepositions in the Holy Quran inwhich the same steps can be followed to get the accuracyfor prepositions or the overall accuracy of all tags Finallyexploring the performance for the Stanford tagger as wellas for the other taggers will lead to discover more weaknesspoints to be avoided in future NLP systems
8 Conclusions
This work explored the performance of the Stanford taggerfor the Arabic language The experimental results show theimportance of distinguishing between training data whenpreparing taggers That is the tagger that is prepared forpoetry is different from the tagger that is prepared for prose
6 Advances in Fuzzy SystemsTa
ble3
Thet
agse
toft
heStan
ford
Ara
bicm
odel
tagg
er
Tag
Mea
ning
with
exam
ples
Ta
gM
eani
ngwith
exam
ples
1DTJ
JDT
+Ad
jective
16PR
PPe
rson
alpr
onou
n13
13 )6
7 8 9
7 lt
713 = )
)
(gt
lt 7
7A
2DTJ
JRDT
+Ad
jective
com
parativ
e17
PRP$
Possessiv
epro
noun
13
13 (13 ) -
13 ) 6
A=
7(
7B
C
D
(gt
E 3
DTN
NDT
+Nou
nsin
gularo
rmass
18RB
Adve
rb F
0G
7H(
713 gt(
713 -
=
2(
F I
J
75(
4DTN
NP
DT
+Pr
oper
noun
sin
gular
19RP
Particle
F
amp0G
K
1 713
(13 )
713 L
A )
F
79
75
DTN
NS
DT
+Nou
nplur
al20
VB
Verb
the
impe
rativ
efor
mD
0G
13 M-J
713 M( 9
7
13 M(
1
7
13 7N
7
6IN
Prep
ositi
onor
subo
rdin
atin
gco
njun
ction
(B
+|
F
)21
VBD
Verb
pas
tten
se
F
O(
2(P713
(13 7 amp
7JJ
Adjective
22VBG
Verb
ger
undor
presen
tpar
ticiple
QH13
7( 13 amp
713 R
+13 )6
J 7D gtG
713 ( 13 713
8JJR
Adjective
com
parativ
e23
VBN
Verb
pastp
artic
iple
13 (13 ) (
13 S13 T
13 13 ) 6
U 7
BC V
7 U
HWX
- 0
13 13
7
7(13 )
9NN
Nou
nsin
gularo
rmas
s24
VBP
Verb
non
3rdpe
rson
singu
larp
rese
ntQ
- R
N( +
13 0G
13 7
E lt 7
Y (13
10NNP
Prop
erno
uns
ingu
lar
25VN
Verb
3rd
person
singu
larp
rese
ntQ13 amp
713
X 1R
amp0G
0
7
2(
7Z [
H )
0G
11NNS
Nou
nplur
al26
WP
Whpr
onou
n
H6
0G
7
B
13 D
0G
13 M(
713 M( I
713 M(13 13
12NOUN
QUA
NT
Nou
nqu
antit
y27
WRB
Whad
verb
8
7P7
J 0J 7D
2(
F I
( -P7
] V
7J
13CC
Coo
rdin
atin
gco
njun
ction
28ADJNUM
Adjective
Num
eric
]=
F[
7(-V7[
7^J lt
(
13 S13
J (J
713
7D (1
14
CDCa
rdin
alnu
mbe
r29
UH
Interje
ctionun
usua
lkin
dof
wor
dQE
7$
P7EWR
(13
D 7J M
$J 7 C ) 7
13 8( amp
15DT
Dem
onstr
ativep
rono
uns
13 (J 9
_( gtG
7`
7Z
7
Advances in Fuzzy Systems 7
Tabl
e4
Thee
xper
imen
talr
esults
Mea
sure
Total
Then
umbe
rofim
perativ
everbs
inth
eHolyQur
an18
48ve
rbs
Then
umbe
rofim
perativ
everbs
after
rem
ovin
gdu
plicates
741v
erbs
Then
umbe
rofw
ords
tagg
edas
impe
rativ
everbs
(ie
VB)
282wor
dsTh
ecor
rectly
tagg
edim
perativ
everbs
byco
mpa
ringwith
thec
orrect
list
This
lead
stoth
ecor
rectly
tagg
edlis
ttha
tinc
lude
s
54wor
dsA
7
= 7
- 13 )13
( 7[ aJ 3 47
(
7[ aJ 3 4(
7 (
7
7
[C 6
7N
( 7[
(
713 13
7[
C 6(
7 (
7P7[
C 13 amp(
7
713
7N
7
( 7[
V 7
amp(
7 2 (
7 ) amp7
7
13 7
7[
( 6[
77Y
7 13 (
7 V
7
7
V7
7
(
7
7 = 47
1
7 amp
713 [
7
(
7
[ = 47
6
7[
bG(
7[
amp(
7
13 7 amp
7N
6(
7 X )
713 )
13 c(
713
7-
amp(
7
13 C amp(
Ther
estist
hewro
ngly
classifi
edve
rbs(
iec
lassifi
edou
tofV
Btag)
This
listi
nclude
sfori
nstanc
e68
7
[7
( 7
7(
(
7(
7
[
7$ 7
7
E W
[7
0 7
( [
7 [
This
listi
nten
tiona
llyco
ntains
wor
dsth
atha
veth
esam
eroo
tldquo equiv
enterrdquoTh
isgive
sanin
dica
tionof
ther
ichn
essa
ndth
ederivative
natu
reof
theA
rabicl
angu
age
Accu
racy=
Cor
rectly
tagg
ged
impe
rativ
eve
rbs
All
impe
rativ
eve
rbs
Accu
racy=54
741=728
8 Advances in Fuzzy Systems
Figure 2 A part of Quranic testing set
Figure 3 A part of the Stanford tagger output
Figure 4 The running command of the Stanford tagger
Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort
towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research
Advances in Fuzzy Systems 9
References
[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000
[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814
[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017
[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef
ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006
[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013
[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011
[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008
[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016
[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002
[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006
[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007
[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008
[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018
[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017
[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015
[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015
[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012
[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003
[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015
[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015
[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015
[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015
[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp
stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-
ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016
[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004
[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016
[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018
[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017
[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017
[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015
[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000
[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018
[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018
10 Advances in Fuzzy Systems
[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012
[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017
[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772
[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015
[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017
Computer Games Technology
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
Advances in
FuzzySystems
Hindawiwwwhindawicom
Volume 2018
International Journal of
ReconfigurableComputing
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
thinspArtificial Intelligence
Hindawiwwwhindawicom Volumethinsp2018
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawiwwwhindawicom Volume 2018
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Computational Intelligence and Neuroscience
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018
Human-ComputerInteraction
Advances in
Hindawiwwwhindawicom Volume 2018
Scientic Programming
Submit your manuscripts atwwwhindawicom
Advances in Fuzzy Systems 3
sentence
noun phrasenoun phrase verb phrase
noun nounverbarticle
the cat ate mouse
Figure 1 An example of a parsing tree
syntactic structure of the text and is mainly used foranalyzing the input sentence
(v) Other PoS tagging based applications includesemantic role labeling [10] speech synthesis [11]speech recognition [12] information extraction [13]summarization [14] sentiment analysis also calledopinion mining [15] diacritization [16] softwareengineering [17] question answering [18] translation[19] plagiarism detection [20] key phrases extraction[21] ontology [22] and extracting Arabic noun com-pound [23]
3 The Challenge of Tagging
That fact that a word can take different tags makes the PoStagging a challenging task That is a word can be labeled bydifferent tags based on the context Therefore the goal of thePoS tagging algorithms is to remove such ambiguity and labelthe words correctly Table 1 shows some examples of wordsthat take different tags based on the context As shown in thetable the word ldquogold larrrarr rdquo in sentence 1 is taggedas VBD (verb past tense) while it is tagged as NN (nounsingular or mass) in sentence 2 Similarly the word ldquoSaidlarrrarr rdquo in the first sentence is tagged as NNP (propernoun singular) while it is tagged as JJ (adjective) in sentence3 This shows how a particular word can have different labelswhich is the challenge of the PoS tagging process Hencethe problem of the PoS tagging is to resolve ambiguities bychoosing the proper tag considering the surrounded wordsOf course the absence of diacritics in the Arabic formalwriting system adds evenmore ambiguity For instance thereis no ambiguity to know that the diacritized word ldquogoldlarrrarr rdquo is a noun and the diacritized word ldquowent larrrarr
rdquo is a verb The figure also shows the tagging output for thetranslated sentences using the English model of the Stanfordtagger
4 Literature Review
Despite the importance of the PoS tagging for both MSAand classical Arabic most of the previous tagging studieshave mainly focused on the MSA In addition the literatureshows there is an active research to consider suitable tagsets
that truly reflect the linguistic items of Arabic as one ofthe morphologically rich languages In this literature wedemonstrate the up-to-date Arabic tagging research whichfocused on the main aspects and components such as thetype of the training text (ie MSA classical tweets) tagsetstagging algorithms unknown words stemming In [30] thestudy indicated that stemming (ie removing prefixes andpostfixes or suffixes) enhances the tagging performance In[31] the study presented a method to tag tweets that isusually written out of the formal and proper spelling of thelanguage In [28] the study considered a method to handlethe ldquounknown wordsrdquo which are the words that did notappear in the training corpus In [26] the study consideredthe problem which arises when estimating the transitionprobabilities in limited amounts of training data The studyproposed decision trees basedmethod to handle this problemthat generally occurs in the hidden Markov models (HMM)tagging technique In [32] the study implemented themaster-slave technique for the PoS tagging they used HMM as amaster tagger and maximum match (MM) and Brill taggersas slaves There are many approaches to perform the PoStagging the most widely used is the statistical approach thatis based on the HMM Another approach is not-statisticalwhich is on a number of hand-crafted disambiguation rulesto find the most appropriate tag for each word as in[33]
The recent studies of part of speech tagging includedifferent aspects For instance [34] developed a part of speechtagger for the Arabic heritage They scored an accuracy of9622They also reported that themost of the tagging errorsare results of segmentation Reference [35] employs part ofspeech tagging to enhance the performance of Arabic textclassification Reference [36] demonstrates part of speechtagging for the Arabic Gulf dialect For the tagging processthey employ Support Vector Machine (SVM) classifier andbidirectional Long Short Term Memory (Bi-LSTM) Refer-ence [37] presents a tagging based study regarding Arabicdialects identification Reference [38] uses part of speechand semantic tagging to extract features for training NeuralMachine Translation
Table 2 presents some information regarding taggingsystems such as tagging algorithms tagsets corpora andaccuracies We are aware that the accuracy is not a mattersince each work has its own corpus nevertheless reportingthese measures might give an indication of the overallaccuracy of the Arabic PoS tagging Similarity even the tagsetsize is important however it is more important to haveenough training set to cover the tags used otherwise zerovaluesmight be assigned to theHMM transition probabilitieswhich raises a tagging problem
5 Stanford Tagset
As indicated in the literature review there are many tagsetsthat are used in the previous studies Mainly the tags aredivided into two classes (ie categories) which are closedclass and open classThe closed class has a fixedmembershipsuch as prepositions while the open class can accept new
4 Advances in Fuzzy Systems
Tabl
e1So
meS
tanf
ordtagg
edAra
bica
ndEn
glish
sent
ence
s
Inpu
tsen
tenc
esan
dth
etra
nslatio
nus
ingGoo
glet
rans
lator
Sent
ence
1Se
nten
ce2
Sent
ence
313
13 $
13 amp
( )
+
( -13
Said
wen
ttoscho
olTh
ewin
nerg
otag
oldne
cklace
Hap
pyda
ywew
ishyo
uStan
ford
tagg
erou
tput
s(Ara
bicm
odel)
DTN
N13
IN N
NP
VBD
NN
IN
N
N13
$13 IN
amp
DTN
N
( ) V
BD
+VBG
V
BP(
-13 JJ
NN
Stan
ford
tagg
erou
tput
s(En
glish
mod
el)Sa
idN
NPwen
tVBD
toT
Oscho
olN
NTh
eDT
win
nerNN
gotV
BDaDT
goldJJ
neck
lace
NN
Hap
pyJJ
dayNN
wePR
Pwish
VBP
you
PRP
Advances in Fuzzy Systems 5
The proposed algorithm1 Obtain the text of the Holy Quran from [24] and remove the diacritics2 Install the full version of the Stanford Arabic model tagger from [25]3 Have the text of the Holy Quran tagged4 Obtain a list of all imperative verbs in the Holy Quran from [2]5 Find all words that have the tag VB equiv imperative verb6 Compare the two lists the one we obtained in step 5 and the list we obtained in step 4 to find the correctly tagged imperative verb7 Find the accuracy based on the information that is obtained in step 6
Algorithm 1 Evaluating imperative verbs using the Stanford tagger
Table 2 Some of the literature tagging research
No Ref Tagging Method Tagset Size Corpus Size Accuracy(tags) (words) ()
1 [26] A decision tree based tagger 110 78K amp500K 9165 997888rarr 97182 [27] Support Vector Machines (SVM) 24 140K 95493 [28] Hidden Markov Models 24 29300 950 997888rarr 9714 [29] SVM and a Neural network 21 6844 9105 [1] Maximum Entropy based tagger 29 588244 961
words especially in the technology fields as ldquoto faxrdquo Table 3shows the 29 tags of the Arabic model of the Stanford tagger
6 The Proposed Method
This section presents the steps that we follow to find theperformance of the Stanford tagger against the Quranicimperative verbs The first step is the tagging process thatproduces an annotated text file of the entire Quranic sen-tencesThenwe used a number of Python programs to extractthe correctly tagged imperative verbs as well as the wronglytagged imperative verbs etc The textual version of theHoly Quran is obtained from the Quran Printing ComplexSaudi Arabia website [36] Algorithm 1 summarizes theimplemented steps
The input testing set is the nondiacritized textual form ofthe Holy Quran Figure 2 shows what the testing set lookslike The figure contains the first chapter or Surah of theHoly Quran (Surat al-Fatih ahmdashThe Opening) in additionto the first three sentence of the second chapter (SuratAl-BaqarahmdashThe Cow) Figure 3 shows the output of theStanford tagger for the Quranic sentences that appear inFigure 2 As it is observed Figure 3 shows some correctlytagged words such as the following 0
13)131 DTJJ 13-
VBD WP 2 VBP The figure also shows some
wrongly tagged words such as the following 034NNP 5(VBD ( VBD
The tagger output that is shown in Figure 3 is the maincontent that can be used for further analysis to find thebehavior of the tagger Of course the correctly tagged wordsare required (ie the correct labels of the testing words) inorder to measure the accuracy which adds more difficulty inthis kind of research In other words if we want to measure
the accuracy for the ldquoentirerdquo Holy Quran we have to preparean annotated version of the Holy Quran which is a difficulttask This is why we chose a subset that contains only theimperative verbs
7 The Experimental Results
For the evaluation we used the full Stanford tagger (129MB) that is freely available at the website of the Stanfordnatural language processing group through the link [37]It is relatively simple to execute the tagger by running thecommand shown in Figure 4 in the Windows CommandPrompt program That is the tagger does not require specialsystems as we run it on the Command prompt of theWindows 10 home operating system The figure shows that77749 words are tagged in a very short time
The experimental results are demonstrated in Table 4The table exposes the information regarding the imperativeverbs however this work can be expanded to measurethe performance for different tags such as noun or verbSimilarly it is possible to find the performance of the Stanfordtagger regarding the prepositions in the Holy Quran inwhich the same steps can be followed to get the accuracyfor prepositions or the overall accuracy of all tags Finallyexploring the performance for the Stanford tagger as wellas for the other taggers will lead to discover more weaknesspoints to be avoided in future NLP systems
8 Conclusions
This work explored the performance of the Stanford taggerfor the Arabic language The experimental results show theimportance of distinguishing between training data whenpreparing taggers That is the tagger that is prepared forpoetry is different from the tagger that is prepared for prose
6 Advances in Fuzzy SystemsTa
ble3
Thet
agse
toft
heStan
ford
Ara
bicm
odel
tagg
er
Tag
Mea
ning
with
exam
ples
Ta
gM
eani
ngwith
exam
ples
1DTJ
JDT
+Ad
jective
16PR
PPe
rson
alpr
onou
n13
13 )6
7 8 9
7 lt
713 = )
)
(gt
lt 7
7A
2DTJ
JRDT
+Ad
jective
com
parativ
e17
PRP$
Possessiv
epro
noun
13
13 (13 ) -
13 ) 6
A=
7(
7B
C
D
(gt
E 3
DTN
NDT
+Nou
nsin
gularo
rmass
18RB
Adve
rb F
0G
7H(
713 gt(
713 -
=
2(
F I
J
75(
4DTN
NP
DT
+Pr
oper
noun
sin
gular
19RP
Particle
F
amp0G
K
1 713
(13 )
713 L
A )
F
79
75
DTN
NS
DT
+Nou
nplur
al20
VB
Verb
the
impe
rativ
efor
mD
0G
13 M-J
713 M( 9
7
13 M(
1
7
13 7N
7
6IN
Prep
ositi
onor
subo
rdin
atin
gco
njun
ction
(B
+|
F
)21
VBD
Verb
pas
tten
se
F
O(
2(P713
(13 7 amp
7JJ
Adjective
22VBG
Verb
ger
undor
presen
tpar
ticiple
QH13
7( 13 amp
713 R
+13 )6
J 7D gtG
713 ( 13 713
8JJR
Adjective
com
parativ
e23
VBN
Verb
pastp
artic
iple
13 (13 ) (
13 S13 T
13 13 ) 6
U 7
BC V
7 U
HWX
- 0
13 13
7
7(13 )
9NN
Nou
nsin
gularo
rmas
s24
VBP
Verb
non
3rdpe
rson
singu
larp
rese
ntQ
- R
N( +
13 0G
13 7
E lt 7
Y (13
10NNP
Prop
erno
uns
ingu
lar
25VN
Verb
3rd
person
singu
larp
rese
ntQ13 amp
713
X 1R
amp0G
0
7
2(
7Z [
H )
0G
11NNS
Nou
nplur
al26
WP
Whpr
onou
n
H6
0G
7
B
13 D
0G
13 M(
713 M( I
713 M(13 13
12NOUN
QUA
NT
Nou
nqu
antit
y27
WRB
Whad
verb
8
7P7
J 0J 7D
2(
F I
( -P7
] V
7J
13CC
Coo
rdin
atin
gco
njun
ction
28ADJNUM
Adjective
Num
eric
]=
F[
7(-V7[
7^J lt
(
13 S13
J (J
713
7D (1
14
CDCa
rdin
alnu
mbe
r29
UH
Interje
ctionun
usua
lkin
dof
wor
dQE
7$
P7EWR
(13
D 7J M
$J 7 C ) 7
13 8( amp
15DT
Dem
onstr
ativep
rono
uns
13 (J 9
_( gtG
7`
7Z
7
Advances in Fuzzy Systems 7
Tabl
e4
Thee
xper
imen
talr
esults
Mea
sure
Total
Then
umbe
rofim
perativ
everbs
inth
eHolyQur
an18
48ve
rbs
Then
umbe
rofim
perativ
everbs
after
rem
ovin
gdu
plicates
741v
erbs
Then
umbe
rofw
ords
tagg
edas
impe
rativ
everbs
(ie
VB)
282wor
dsTh
ecor
rectly
tagg
edim
perativ
everbs
byco
mpa
ringwith
thec
orrect
list
This
lead
stoth
ecor
rectly
tagg
edlis
ttha
tinc
lude
s
54wor
dsA
7
= 7
- 13 )13
( 7[ aJ 3 47
(
7[ aJ 3 4(
7 (
7
7
[C 6
7N
( 7[
(
713 13
7[
C 6(
7 (
7P7[
C 13 amp(
7
713
7N
7
( 7[
V 7
amp(
7 2 (
7 ) amp7
7
13 7
7[
( 6[
77Y
7 13 (
7 V
7
7
V7
7
(
7
7 = 47
1
7 amp
713 [
7
(
7
[ = 47
6
7[
bG(
7[
amp(
7
13 7 amp
7N
6(
7 X )
713 )
13 c(
713
7-
amp(
7
13 C amp(
Ther
estist
hewro
ngly
classifi
edve
rbs(
iec
lassifi
edou
tofV
Btag)
This
listi
nclude
sfori
nstanc
e68
7
[7
( 7
7(
(
7(
7
[
7$ 7
7
E W
[7
0 7
( [
7 [
This
listi
nten
tiona
llyco
ntains
wor
dsth
atha
veth
esam
eroo
tldquo equiv
enterrdquoTh
isgive
sanin
dica
tionof
ther
ichn
essa
ndth
ederivative
natu
reof
theA
rabicl
angu
age
Accu
racy=
Cor
rectly
tagg
ged
impe
rativ
eve
rbs
All
impe
rativ
eve
rbs
Accu
racy=54
741=728
8 Advances in Fuzzy Systems
Figure 2 A part of Quranic testing set
Figure 3 A part of the Stanford tagger output
Figure 4 The running command of the Stanford tagger
Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort
towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research
Advances in Fuzzy Systems 9
References
[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000
[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814
[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017
[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef
ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006
[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013
[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011
[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008
[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016
[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002
[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006
[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007
[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008
[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018
[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017
[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015
[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015
[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012
[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003
[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015
[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015
[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015
[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015
[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp
stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-
ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016
[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004
[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016
[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018
[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017
[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017
[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015
[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000
[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018
[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018
10 Advances in Fuzzy Systems
[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012
[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017
[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772
[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015
[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017
Computer Games Technology
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
Advances in
FuzzySystems
Hindawiwwwhindawicom
Volume 2018
International Journal of
ReconfigurableComputing
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
thinspArtificial Intelligence
Hindawiwwwhindawicom Volumethinsp2018
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawiwwwhindawicom Volume 2018
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Computational Intelligence and Neuroscience
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018
Human-ComputerInteraction
Advances in
Hindawiwwwhindawicom Volume 2018
Scientic Programming
Submit your manuscripts atwwwhindawicom
4 Advances in Fuzzy Systems
Tabl
e1So
meS
tanf
ordtagg
edAra
bica
ndEn
glish
sent
ence
s
Inpu
tsen
tenc
esan
dth
etra
nslatio
nus
ingGoo
glet
rans
lator
Sent
ence
1Se
nten
ce2
Sent
ence
313
13 $
13 amp
( )
+
( -13
Said
wen
ttoscho
olTh
ewin
nerg
otag
oldne
cklace
Hap
pyda
ywew
ishyo
uStan
ford
tagg
erou
tput
s(Ara
bicm
odel)
DTN
N13
IN N
NP
VBD
NN
IN
N
N13
$13 IN
amp
DTN
N
( ) V
BD
+VBG
V
BP(
-13 JJ
NN
Stan
ford
tagg
erou
tput
s(En
glish
mod
el)Sa
idN
NPwen
tVBD
toT
Oscho
olN
NTh
eDT
win
nerNN
gotV
BDaDT
goldJJ
neck
lace
NN
Hap
pyJJ
dayNN
wePR
Pwish
VBP
you
PRP
Advances in Fuzzy Systems 5
The proposed algorithm1 Obtain the text of the Holy Quran from [24] and remove the diacritics2 Install the full version of the Stanford Arabic model tagger from [25]3 Have the text of the Holy Quran tagged4 Obtain a list of all imperative verbs in the Holy Quran from [2]5 Find all words that have the tag VB equiv imperative verb6 Compare the two lists the one we obtained in step 5 and the list we obtained in step 4 to find the correctly tagged imperative verb7 Find the accuracy based on the information that is obtained in step 6
Algorithm 1 Evaluating imperative verbs using the Stanford tagger
Table 2 Some of the literature tagging research
No Ref Tagging Method Tagset Size Corpus Size Accuracy(tags) (words) ()
1 [26] A decision tree based tagger 110 78K amp500K 9165 997888rarr 97182 [27] Support Vector Machines (SVM) 24 140K 95493 [28] Hidden Markov Models 24 29300 950 997888rarr 9714 [29] SVM and a Neural network 21 6844 9105 [1] Maximum Entropy based tagger 29 588244 961
words especially in the technology fields as ldquoto faxrdquo Table 3shows the 29 tags of the Arabic model of the Stanford tagger
6 The Proposed Method
This section presents the steps that we follow to find theperformance of the Stanford tagger against the Quranicimperative verbs The first step is the tagging process thatproduces an annotated text file of the entire Quranic sen-tencesThenwe used a number of Python programs to extractthe correctly tagged imperative verbs as well as the wronglytagged imperative verbs etc The textual version of theHoly Quran is obtained from the Quran Printing ComplexSaudi Arabia website [36] Algorithm 1 summarizes theimplemented steps
The input testing set is the nondiacritized textual form ofthe Holy Quran Figure 2 shows what the testing set lookslike The figure contains the first chapter or Surah of theHoly Quran (Surat al-Fatih ahmdashThe Opening) in additionto the first three sentence of the second chapter (SuratAl-BaqarahmdashThe Cow) Figure 3 shows the output of theStanford tagger for the Quranic sentences that appear inFigure 2 As it is observed Figure 3 shows some correctlytagged words such as the following 0
13)131 DTJJ 13-
VBD WP 2 VBP The figure also shows some
wrongly tagged words such as the following 034NNP 5(VBD ( VBD
The tagger output that is shown in Figure 3 is the maincontent that can be used for further analysis to find thebehavior of the tagger Of course the correctly tagged wordsare required (ie the correct labels of the testing words) inorder to measure the accuracy which adds more difficulty inthis kind of research In other words if we want to measure
the accuracy for the ldquoentirerdquo Holy Quran we have to preparean annotated version of the Holy Quran which is a difficulttask This is why we chose a subset that contains only theimperative verbs
7 The Experimental Results
For the evaluation we used the full Stanford tagger (129MB) that is freely available at the website of the Stanfordnatural language processing group through the link [37]It is relatively simple to execute the tagger by running thecommand shown in Figure 4 in the Windows CommandPrompt program That is the tagger does not require specialsystems as we run it on the Command prompt of theWindows 10 home operating system The figure shows that77749 words are tagged in a very short time
The experimental results are demonstrated in Table 4The table exposes the information regarding the imperativeverbs however this work can be expanded to measurethe performance for different tags such as noun or verbSimilarly it is possible to find the performance of the Stanfordtagger regarding the prepositions in the Holy Quran inwhich the same steps can be followed to get the accuracyfor prepositions or the overall accuracy of all tags Finallyexploring the performance for the Stanford tagger as wellas for the other taggers will lead to discover more weaknesspoints to be avoided in future NLP systems
8 Conclusions
This work explored the performance of the Stanford taggerfor the Arabic language The experimental results show theimportance of distinguishing between training data whenpreparing taggers That is the tagger that is prepared forpoetry is different from the tagger that is prepared for prose
6 Advances in Fuzzy SystemsTa
ble3
Thet
agse
toft
heStan
ford
Ara
bicm
odel
tagg
er
Tag
Mea
ning
with
exam
ples
Ta
gM
eani
ngwith
exam
ples
1DTJ
JDT
+Ad
jective
16PR
PPe
rson
alpr
onou
n13
13 )6
7 8 9
7 lt
713 = )
)
(gt
lt 7
7A
2DTJ
JRDT
+Ad
jective
com
parativ
e17
PRP$
Possessiv
epro
noun
13
13 (13 ) -
13 ) 6
A=
7(
7B
C
D
(gt
E 3
DTN
NDT
+Nou
nsin
gularo
rmass
18RB
Adve
rb F
0G
7H(
713 gt(
713 -
=
2(
F I
J
75(
4DTN
NP
DT
+Pr
oper
noun
sin
gular
19RP
Particle
F
amp0G
K
1 713
(13 )
713 L
A )
F
79
75
DTN
NS
DT
+Nou
nplur
al20
VB
Verb
the
impe
rativ
efor
mD
0G
13 M-J
713 M( 9
7
13 M(
1
7
13 7N
7
6IN
Prep
ositi
onor
subo
rdin
atin
gco
njun
ction
(B
+|
F
)21
VBD
Verb
pas
tten
se
F
O(
2(P713
(13 7 amp
7JJ
Adjective
22VBG
Verb
ger
undor
presen
tpar
ticiple
QH13
7( 13 amp
713 R
+13 )6
J 7D gtG
713 ( 13 713
8JJR
Adjective
com
parativ
e23
VBN
Verb
pastp
artic
iple
13 (13 ) (
13 S13 T
13 13 ) 6
U 7
BC V
7 U
HWX
- 0
13 13
7
7(13 )
9NN
Nou
nsin
gularo
rmas
s24
VBP
Verb
non
3rdpe
rson
singu
larp
rese
ntQ
- R
N( +
13 0G
13 7
E lt 7
Y (13
10NNP
Prop
erno
uns
ingu
lar
25VN
Verb
3rd
person
singu
larp
rese
ntQ13 amp
713
X 1R
amp0G
0
7
2(
7Z [
H )
0G
11NNS
Nou
nplur
al26
WP
Whpr
onou
n
H6
0G
7
B
13 D
0G
13 M(
713 M( I
713 M(13 13
12NOUN
QUA
NT
Nou
nqu
antit
y27
WRB
Whad
verb
8
7P7
J 0J 7D
2(
F I
( -P7
] V
7J
13CC
Coo
rdin
atin
gco
njun
ction
28ADJNUM
Adjective
Num
eric
]=
F[
7(-V7[
7^J lt
(
13 S13
J (J
713
7D (1
14
CDCa
rdin
alnu
mbe
r29
UH
Interje
ctionun
usua
lkin
dof
wor
dQE
7$
P7EWR
(13
D 7J M
$J 7 C ) 7
13 8( amp
15DT
Dem
onstr
ativep
rono
uns
13 (J 9
_( gtG
7`
7Z
7
Advances in Fuzzy Systems 7
Tabl
e4
Thee
xper
imen
talr
esults
Mea
sure
Total
Then
umbe
rofim
perativ
everbs
inth
eHolyQur
an18
48ve
rbs
Then
umbe
rofim
perativ
everbs
after
rem
ovin
gdu
plicates
741v
erbs
Then
umbe
rofw
ords
tagg
edas
impe
rativ
everbs
(ie
VB)
282wor
dsTh
ecor
rectly
tagg
edim
perativ
everbs
byco
mpa
ringwith
thec
orrect
list
This
lead
stoth
ecor
rectly
tagg
edlis
ttha
tinc
lude
s
54wor
dsA
7
= 7
- 13 )13
( 7[ aJ 3 47
(
7[ aJ 3 4(
7 (
7
7
[C 6
7N
( 7[
(
713 13
7[
C 6(
7 (
7P7[
C 13 amp(
7
713
7N
7
( 7[
V 7
amp(
7 2 (
7 ) amp7
7
13 7
7[
( 6[
77Y
7 13 (
7 V
7
7
V7
7
(
7
7 = 47
1
7 amp
713 [
7
(
7
[ = 47
6
7[
bG(
7[
amp(
7
13 7 amp
7N
6(
7 X )
713 )
13 c(
713
7-
amp(
7
13 C amp(
Ther
estist
hewro
ngly
classifi
edve
rbs(
iec
lassifi
edou
tofV
Btag)
This
listi
nclude
sfori
nstanc
e68
7
[7
( 7
7(
(
7(
7
[
7$ 7
7
E W
[7
0 7
( [
7 [
This
listi
nten
tiona
llyco
ntains
wor
dsth
atha
veth
esam
eroo
tldquo equiv
enterrdquoTh
isgive
sanin
dica
tionof
ther
ichn
essa
ndth
ederivative
natu
reof
theA
rabicl
angu
age
Accu
racy=
Cor
rectly
tagg
ged
impe
rativ
eve
rbs
All
impe
rativ
eve
rbs
Accu
racy=54
741=728
8 Advances in Fuzzy Systems
Figure 2 A part of Quranic testing set
Figure 3 A part of the Stanford tagger output
Figure 4 The running command of the Stanford tagger
Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort
towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research
Advances in Fuzzy Systems 9
References
[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000
[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814
[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017
[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef
ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006
[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013
[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011
[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008
[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016
[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002
[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006
[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007
[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008
[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018
[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017
[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015
[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015
[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012
[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003
[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015
[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015
[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015
[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015
[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp
stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-
ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016
[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004
[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016
[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018
[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017
[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017
[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015
[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000
[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018
[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018
10 Advances in Fuzzy Systems
[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012
[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017
[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772
[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015
[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017
Computer Games Technology
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
Advances in
FuzzySystems
Hindawiwwwhindawicom
Volume 2018
International Journal of
ReconfigurableComputing
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
thinspArtificial Intelligence
Hindawiwwwhindawicom Volumethinsp2018
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawiwwwhindawicom Volume 2018
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Computational Intelligence and Neuroscience
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018
Human-ComputerInteraction
Advances in
Hindawiwwwhindawicom Volume 2018
Scientic Programming
Submit your manuscripts atwwwhindawicom
Advances in Fuzzy Systems 5
The proposed algorithm1 Obtain the text of the Holy Quran from [24] and remove the diacritics2 Install the full version of the Stanford Arabic model tagger from [25]3 Have the text of the Holy Quran tagged4 Obtain a list of all imperative verbs in the Holy Quran from [2]5 Find all words that have the tag VB equiv imperative verb6 Compare the two lists the one we obtained in step 5 and the list we obtained in step 4 to find the correctly tagged imperative verb7 Find the accuracy based on the information that is obtained in step 6
Algorithm 1 Evaluating imperative verbs using the Stanford tagger
Table 2 Some of the literature tagging research
No Ref Tagging Method Tagset Size Corpus Size Accuracy(tags) (words) ()
1 [26] A decision tree based tagger 110 78K amp500K 9165 997888rarr 97182 [27] Support Vector Machines (SVM) 24 140K 95493 [28] Hidden Markov Models 24 29300 950 997888rarr 9714 [29] SVM and a Neural network 21 6844 9105 [1] Maximum Entropy based tagger 29 588244 961
words especially in the technology fields as ldquoto faxrdquo Table 3shows the 29 tags of the Arabic model of the Stanford tagger
6 The Proposed Method
This section presents the steps that we follow to find theperformance of the Stanford tagger against the Quranicimperative verbs The first step is the tagging process thatproduces an annotated text file of the entire Quranic sen-tencesThenwe used a number of Python programs to extractthe correctly tagged imperative verbs as well as the wronglytagged imperative verbs etc The textual version of theHoly Quran is obtained from the Quran Printing ComplexSaudi Arabia website [36] Algorithm 1 summarizes theimplemented steps
The input testing set is the nondiacritized textual form ofthe Holy Quran Figure 2 shows what the testing set lookslike The figure contains the first chapter or Surah of theHoly Quran (Surat al-Fatih ahmdashThe Opening) in additionto the first three sentence of the second chapter (SuratAl-BaqarahmdashThe Cow) Figure 3 shows the output of theStanford tagger for the Quranic sentences that appear inFigure 2 As it is observed Figure 3 shows some correctlytagged words such as the following 0
13)131 DTJJ 13-
VBD WP 2 VBP The figure also shows some
wrongly tagged words such as the following 034NNP 5(VBD ( VBD
The tagger output that is shown in Figure 3 is the maincontent that can be used for further analysis to find thebehavior of the tagger Of course the correctly tagged wordsare required (ie the correct labels of the testing words) inorder to measure the accuracy which adds more difficulty inthis kind of research In other words if we want to measure
the accuracy for the ldquoentirerdquo Holy Quran we have to preparean annotated version of the Holy Quran which is a difficulttask This is why we chose a subset that contains only theimperative verbs
7 The Experimental Results
For the evaluation we used the full Stanford tagger (129MB) that is freely available at the website of the Stanfordnatural language processing group through the link [37]It is relatively simple to execute the tagger by running thecommand shown in Figure 4 in the Windows CommandPrompt program That is the tagger does not require specialsystems as we run it on the Command prompt of theWindows 10 home operating system The figure shows that77749 words are tagged in a very short time
The experimental results are demonstrated in Table 4The table exposes the information regarding the imperativeverbs however this work can be expanded to measurethe performance for different tags such as noun or verbSimilarly it is possible to find the performance of the Stanfordtagger regarding the prepositions in the Holy Quran inwhich the same steps can be followed to get the accuracyfor prepositions or the overall accuracy of all tags Finallyexploring the performance for the Stanford tagger as wellas for the other taggers will lead to discover more weaknesspoints to be avoided in future NLP systems
8 Conclusions
This work explored the performance of the Stanford taggerfor the Arabic language The experimental results show theimportance of distinguishing between training data whenpreparing taggers That is the tagger that is prepared forpoetry is different from the tagger that is prepared for prose
6 Advances in Fuzzy SystemsTa
ble3
Thet
agse
toft
heStan
ford
Ara
bicm
odel
tagg
er
Tag
Mea
ning
with
exam
ples
Ta
gM
eani
ngwith
exam
ples
1DTJ
JDT
+Ad
jective
16PR
PPe
rson
alpr
onou
n13
13 )6
7 8 9
7 lt
713 = )
)
(gt
lt 7
7A
2DTJ
JRDT
+Ad
jective
com
parativ
e17
PRP$
Possessiv
epro
noun
13
13 (13 ) -
13 ) 6
A=
7(
7B
C
D
(gt
E 3
DTN
NDT
+Nou
nsin
gularo
rmass
18RB
Adve
rb F
0G
7H(
713 gt(
713 -
=
2(
F I
J
75(
4DTN
NP
DT
+Pr
oper
noun
sin
gular
19RP
Particle
F
amp0G
K
1 713
(13 )
713 L
A )
F
79
75
DTN
NS
DT
+Nou
nplur
al20
VB
Verb
the
impe
rativ
efor
mD
0G
13 M-J
713 M( 9
7
13 M(
1
7
13 7N
7
6IN
Prep
ositi
onor
subo
rdin
atin
gco
njun
ction
(B
+|
F
)21
VBD
Verb
pas
tten
se
F
O(
2(P713
(13 7 amp
7JJ
Adjective
22VBG
Verb
ger
undor
presen
tpar
ticiple
QH13
7( 13 amp
713 R
+13 )6
J 7D gtG
713 ( 13 713
8JJR
Adjective
com
parativ
e23
VBN
Verb
pastp
artic
iple
13 (13 ) (
13 S13 T
13 13 ) 6
U 7
BC V
7 U
HWX
- 0
13 13
7
7(13 )
9NN
Nou
nsin
gularo
rmas
s24
VBP
Verb
non
3rdpe
rson
singu
larp
rese
ntQ
- R
N( +
13 0G
13 7
E lt 7
Y (13
10NNP
Prop
erno
uns
ingu
lar
25VN
Verb
3rd
person
singu
larp
rese
ntQ13 amp
713
X 1R
amp0G
0
7
2(
7Z [
H )
0G
11NNS
Nou
nplur
al26
WP
Whpr
onou
n
H6
0G
7
B
13 D
0G
13 M(
713 M( I
713 M(13 13
12NOUN
QUA
NT
Nou
nqu
antit
y27
WRB
Whad
verb
8
7P7
J 0J 7D
2(
F I
( -P7
] V
7J
13CC
Coo
rdin
atin
gco
njun
ction
28ADJNUM
Adjective
Num
eric
]=
F[
7(-V7[
7^J lt
(
13 S13
J (J
713
7D (1
14
CDCa
rdin
alnu
mbe
r29
UH
Interje
ctionun
usua
lkin
dof
wor
dQE
7$
P7EWR
(13
D 7J M
$J 7 C ) 7
13 8( amp
15DT
Dem
onstr
ativep
rono
uns
13 (J 9
_( gtG
7`
7Z
7
Advances in Fuzzy Systems 7
Tabl
e4
Thee
xper
imen
talr
esults
Mea
sure
Total
Then
umbe
rofim
perativ
everbs
inth
eHolyQur
an18
48ve
rbs
Then
umbe
rofim
perativ
everbs
after
rem
ovin
gdu
plicates
741v
erbs
Then
umbe
rofw
ords
tagg
edas
impe
rativ
everbs
(ie
VB)
282wor
dsTh
ecor
rectly
tagg
edim
perativ
everbs
byco
mpa
ringwith
thec
orrect
list
This
lead
stoth
ecor
rectly
tagg
edlis
ttha
tinc
lude
s
54wor
dsA
7
= 7
- 13 )13
( 7[ aJ 3 47
(
7[ aJ 3 4(
7 (
7
7
[C 6
7N
( 7[
(
713 13
7[
C 6(
7 (
7P7[
C 13 amp(
7
713
7N
7
( 7[
V 7
amp(
7 2 (
7 ) amp7
7
13 7
7[
( 6[
77Y
7 13 (
7 V
7
7
V7
7
(
7
7 = 47
1
7 amp
713 [
7
(
7
[ = 47
6
7[
bG(
7[
amp(
7
13 7 amp
7N
6(
7 X )
713 )
13 c(
713
7-
amp(
7
13 C amp(
Ther
estist
hewro
ngly
classifi
edve
rbs(
iec
lassifi
edou
tofV
Btag)
This
listi
nclude
sfori
nstanc
e68
7
[7
( 7
7(
(
7(
7
[
7$ 7
7
E W
[7
0 7
( [
7 [
This
listi
nten
tiona
llyco
ntains
wor
dsth
atha
veth
esam
eroo
tldquo equiv
enterrdquoTh
isgive
sanin
dica
tionof
ther
ichn
essa
ndth
ederivative
natu
reof
theA
rabicl
angu
age
Accu
racy=
Cor
rectly
tagg
ged
impe
rativ
eve
rbs
All
impe
rativ
eve
rbs
Accu
racy=54
741=728
8 Advances in Fuzzy Systems
Figure 2 A part of Quranic testing set
Figure 3 A part of the Stanford tagger output
Figure 4 The running command of the Stanford tagger
Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort
towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research
Advances in Fuzzy Systems 9
References
[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000
[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814
[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017
[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef
ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006
[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013
[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011
[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008
[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016
[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002
[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006
[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007
[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008
[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018
[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017
[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015
[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015
[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012
[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003
[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015
[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015
[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015
[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015
[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp
stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-
ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016
[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004
[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016
[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018
[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017
[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017
[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015
[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000
[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018
[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018
10 Advances in Fuzzy Systems
[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012
[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017
[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772
[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015
[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017
Computer Games Technology
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
Advances in
FuzzySystems
Hindawiwwwhindawicom
Volume 2018
International Journal of
ReconfigurableComputing
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
thinspArtificial Intelligence
Hindawiwwwhindawicom Volumethinsp2018
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawiwwwhindawicom Volume 2018
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Computational Intelligence and Neuroscience
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018
Human-ComputerInteraction
Advances in
Hindawiwwwhindawicom Volume 2018
Scientic Programming
Submit your manuscripts atwwwhindawicom
6 Advances in Fuzzy SystemsTa
ble3
Thet
agse
toft
heStan
ford
Ara
bicm
odel
tagg
er
Tag
Mea
ning
with
exam
ples
Ta
gM
eani
ngwith
exam
ples
1DTJ
JDT
+Ad
jective
16PR
PPe
rson
alpr
onou
n13
13 )6
7 8 9
7 lt
713 = )
)
(gt
lt 7
7A
2DTJ
JRDT
+Ad
jective
com
parativ
e17
PRP$
Possessiv
epro
noun
13
13 (13 ) -
13 ) 6
A=
7(
7B
C
D
(gt
E 3
DTN
NDT
+Nou
nsin
gularo
rmass
18RB
Adve
rb F
0G
7H(
713 gt(
713 -
=
2(
F I
J
75(
4DTN
NP
DT
+Pr
oper
noun
sin
gular
19RP
Particle
F
amp0G
K
1 713
(13 )
713 L
A )
F
79
75
DTN
NS
DT
+Nou
nplur
al20
VB
Verb
the
impe
rativ
efor
mD
0G
13 M-J
713 M( 9
7
13 M(
1
7
13 7N
7
6IN
Prep
ositi
onor
subo
rdin
atin
gco
njun
ction
(B
+|
F
)21
VBD
Verb
pas
tten
se
F
O(
2(P713
(13 7 amp
7JJ
Adjective
22VBG
Verb
ger
undor
presen
tpar
ticiple
QH13
7( 13 amp
713 R
+13 )6
J 7D gtG
713 ( 13 713
8JJR
Adjective
com
parativ
e23
VBN
Verb
pastp
artic
iple
13 (13 ) (
13 S13 T
13 13 ) 6
U 7
BC V
7 U
HWX
- 0
13 13
7
7(13 )
9NN
Nou
nsin
gularo
rmas
s24
VBP
Verb
non
3rdpe
rson
singu
larp
rese
ntQ
- R
N( +
13 0G
13 7
E lt 7
Y (13
10NNP
Prop
erno
uns
ingu
lar
25VN
Verb
3rd
person
singu
larp
rese
ntQ13 amp
713
X 1R
amp0G
0
7
2(
7Z [
H )
0G
11NNS
Nou
nplur
al26
WP
Whpr
onou
n
H6
0G
7
B
13 D
0G
13 M(
713 M( I
713 M(13 13
12NOUN
QUA
NT
Nou
nqu
antit
y27
WRB
Whad
verb
8
7P7
J 0J 7D
2(
F I
( -P7
] V
7J
13CC
Coo
rdin
atin
gco
njun
ction
28ADJNUM
Adjective
Num
eric
]=
F[
7(-V7[
7^J lt
(
13 S13
J (J
713
7D (1
14
CDCa
rdin
alnu
mbe
r29
UH
Interje
ctionun
usua
lkin
dof
wor
dQE
7$
P7EWR
(13
D 7J M
$J 7 C ) 7
13 8( amp
15DT
Dem
onstr
ativep
rono
uns
13 (J 9
_( gtG
7`
7Z
7
Advances in Fuzzy Systems 7
Tabl
e4
Thee
xper
imen
talr
esults
Mea
sure
Total
Then
umbe
rofim
perativ
everbs
inth
eHolyQur
an18
48ve
rbs
Then
umbe
rofim
perativ
everbs
after
rem
ovin
gdu
plicates
741v
erbs
Then
umbe
rofw
ords
tagg
edas
impe
rativ
everbs
(ie
VB)
282wor
dsTh
ecor
rectly
tagg
edim
perativ
everbs
byco
mpa
ringwith
thec
orrect
list
This
lead
stoth
ecor
rectly
tagg
edlis
ttha
tinc
lude
s
54wor
dsA
7
= 7
- 13 )13
( 7[ aJ 3 47
(
7[ aJ 3 4(
7 (
7
7
[C 6
7N
( 7[
(
713 13
7[
C 6(
7 (
7P7[
C 13 amp(
7
713
7N
7
( 7[
V 7
amp(
7 2 (
7 ) amp7
7
13 7
7[
( 6[
77Y
7 13 (
7 V
7
7
V7
7
(
7
7 = 47
1
7 amp
713 [
7
(
7
[ = 47
6
7[
bG(
7[
amp(
7
13 7 amp
7N
6(
7 X )
713 )
13 c(
713
7-
amp(
7
13 C amp(
Ther
estist
hewro
ngly
classifi
edve
rbs(
iec
lassifi
edou
tofV
Btag)
This
listi
nclude
sfori
nstanc
e68
7
[7
( 7
7(
(
7(
7
[
7$ 7
7
E W
[7
0 7
( [
7 [
This
listi
nten
tiona
llyco
ntains
wor
dsth
atha
veth
esam
eroo
tldquo equiv
enterrdquoTh
isgive
sanin
dica
tionof
ther
ichn
essa
ndth
ederivative
natu
reof
theA
rabicl
angu
age
Accu
racy=
Cor
rectly
tagg
ged
impe
rativ
eve
rbs
All
impe
rativ
eve
rbs
Accu
racy=54
741=728
8 Advances in Fuzzy Systems
Figure 2 A part of Quranic testing set
Figure 3 A part of the Stanford tagger output
Figure 4 The running command of the Stanford tagger
Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort
towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research
Advances in Fuzzy Systems 9
References
[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000
[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814
[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017
[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef
ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006
[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013
[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011
[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008
[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016
[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002
[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006
[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007
[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008
[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018
[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017
[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015
[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015
[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012
[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003
[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015
[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015
[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015
[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015
[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp
stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-
ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016
[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004
[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016
[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018
[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017
[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017
[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015
[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000
[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018
[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018
10 Advances in Fuzzy Systems
[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012
[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017
[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772
[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015
[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017
Computer Games Technology
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
Advances in
FuzzySystems
Hindawiwwwhindawicom
Volume 2018
International Journal of
ReconfigurableComputing
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
thinspArtificial Intelligence
Hindawiwwwhindawicom Volumethinsp2018
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawiwwwhindawicom Volume 2018
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Computational Intelligence and Neuroscience
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018
Human-ComputerInteraction
Advances in
Hindawiwwwhindawicom Volume 2018
Scientic Programming
Submit your manuscripts atwwwhindawicom
Advances in Fuzzy Systems 7
Tabl
e4
Thee
xper
imen
talr
esults
Mea
sure
Total
Then
umbe
rofim
perativ
everbs
inth
eHolyQur
an18
48ve
rbs
Then
umbe
rofim
perativ
everbs
after
rem
ovin
gdu
plicates
741v
erbs
Then
umbe
rofw
ords
tagg
edas
impe
rativ
everbs
(ie
VB)
282wor
dsTh
ecor
rectly
tagg
edim
perativ
everbs
byco
mpa
ringwith
thec
orrect
list
This
lead
stoth
ecor
rectly
tagg
edlis
ttha
tinc
lude
s
54wor
dsA
7
= 7
- 13 )13
( 7[ aJ 3 47
(
7[ aJ 3 4(
7 (
7
7
[C 6
7N
( 7[
(
713 13
7[
C 6(
7 (
7P7[
C 13 amp(
7
713
7N
7
( 7[
V 7
amp(
7 2 (
7 ) amp7
7
13 7
7[
( 6[
77Y
7 13 (
7 V
7
7
V7
7
(
7
7 = 47
1
7 amp
713 [
7
(
7
[ = 47
6
7[
bG(
7[
amp(
7
13 7 amp
7N
6(
7 X )
713 )
13 c(
713
7-
amp(
7
13 C amp(
Ther
estist
hewro
ngly
classifi
edve
rbs(
iec
lassifi
edou
tofV
Btag)
This
listi
nclude
sfori
nstanc
e68
7
[7
( 7
7(
(
7(
7
[
7$ 7
7
E W
[7
0 7
( [
7 [
This
listi
nten
tiona
llyco
ntains
wor
dsth
atha
veth
esam
eroo
tldquo equiv
enterrdquoTh
isgive
sanin
dica
tionof
ther
ichn
essa
ndth
ederivative
natu
reof
theA
rabicl
angu
age
Accu
racy=
Cor
rectly
tagg
ged
impe
rativ
eve
rbs
All
impe
rativ
eve
rbs
Accu
racy=54
741=728
8 Advances in Fuzzy Systems
Figure 2 A part of Quranic testing set
Figure 3 A part of the Stanford tagger output
Figure 4 The running command of the Stanford tagger
Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort
towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research
Advances in Fuzzy Systems 9
References
[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000
[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814
[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017
[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef
ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006
[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013
[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011
[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008
[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016
[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002
[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006
[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007
[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008
[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018
[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017
[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015
[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015
[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012
[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003
[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015
[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015
[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015
[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015
[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp
stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-
ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016
[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004
[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016
[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018
[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017
[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017
[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015
[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000
[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018
[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018
10 Advances in Fuzzy Systems
[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012
[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017
[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772
[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015
[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017
Computer Games Technology
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
Advances in
FuzzySystems
Hindawiwwwhindawicom
Volume 2018
International Journal of
ReconfigurableComputing
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
thinspArtificial Intelligence
Hindawiwwwhindawicom Volumethinsp2018
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawiwwwhindawicom Volume 2018
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Computational Intelligence and Neuroscience
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018
Human-ComputerInteraction
Advances in
Hindawiwwwhindawicom Volume 2018
Scientic Programming
Submit your manuscripts atwwwhindawicom
8 Advances in Fuzzy Systems
Figure 2 A part of Quranic testing set
Figure 3 A part of the Stanford tagger output
Figure 4 The running command of the Stanford tagger
Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort
towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers
Data Availability
The data used to support the findings of this study areavailable from the corresponding author upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research
Advances in Fuzzy Systems 9
References
[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000
[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814
[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017
[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef
ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006
[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013
[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011
[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008
[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016
[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002
[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006
[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007
[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008
[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018
[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017
[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015
[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015
[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012
[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003
[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015
[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015
[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015
[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015
[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp
stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-
ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016
[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004
[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016
[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018
[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017
[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017
[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015
[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000
[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018
[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018
10 Advances in Fuzzy Systems
[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012
[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017
[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772
[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015
[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017
Computer Games Technology
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
Advances in
FuzzySystems
Hindawiwwwhindawicom
Volume 2018
International Journal of
ReconfigurableComputing
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
thinspArtificial Intelligence
Hindawiwwwhindawicom Volumethinsp2018
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawiwwwhindawicom Volume 2018
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Computational Intelligence and Neuroscience
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018
Human-ComputerInteraction
Advances in
Hindawiwwwhindawicom Volume 2018
Scientic Programming
Submit your manuscripts atwwwhindawicom
Advances in Fuzzy Systems 9
References
[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000
[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814
[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017
[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef
ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006
[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013
[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011
[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008
[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016
[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002
[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006
[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007
[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008
[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018
[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017
[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015
[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015
[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012
[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003
[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015
[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015
[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015
[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015
[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp
stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-
ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016
[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004
[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016
[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018
[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017
[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017
[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015
[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000
[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018
[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018
10 Advances in Fuzzy Systems
[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012
[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017
[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772
[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015
[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017
Computer Games Technology
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
Advances in
FuzzySystems
Hindawiwwwhindawicom
Volume 2018
International Journal of
ReconfigurableComputing
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
thinspArtificial Intelligence
Hindawiwwwhindawicom Volumethinsp2018
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawiwwwhindawicom Volume 2018
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Computational Intelligence and Neuroscience
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018
Human-ComputerInteraction
Advances in
Hindawiwwwhindawicom Volume 2018
Scientic Programming
Submit your manuscripts atwwwhindawicom
10 Advances in Fuzzy Systems
[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012
[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017
[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772
[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015
[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017
Computer Games Technology
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
Advances in
FuzzySystems
Hindawiwwwhindawicom
Volume 2018
International Journal of
ReconfigurableComputing
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
thinspArtificial Intelligence
Hindawiwwwhindawicom Volumethinsp2018
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawiwwwhindawicom Volume 2018
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Computational Intelligence and Neuroscience
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018
Human-ComputerInteraction
Advances in
Hindawiwwwhindawicom Volume 2018
Scientic Programming
Submit your manuscripts atwwwhindawicom
Computer Games Technology
International Journal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Journal ofEngineeringVolume 2018
Advances in
FuzzySystems
Hindawiwwwhindawicom
Volume 2018
International Journal of
ReconfigurableComputing
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
thinspArtificial Intelligence
Hindawiwwwhindawicom Volumethinsp2018
Hindawiwwwhindawicom Volume 2018
Civil EngineeringAdvances in
Hindawiwwwhindawicom Volume 2018
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawiwwwhindawicom Volume 2018
Hindawi
wwwhindawicom Volume 2018
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
RoboticsJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Computational Intelligence and Neuroscience
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018
Human-ComputerInteraction
Advances in
Hindawiwwwhindawicom Volume 2018
Scientic Programming
Submit your manuscripts atwwwhindawicom