international journal of computer engineering and … to punjabi...problem using hybrid approach of...
TRANSCRIPT
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
350
ENGLISH TO PUNJABI MACHINE TRANSLATION SYSTEM USING
HYBRID APPROACH OF WORD SENSE DISAMBIGUATION AND
MACHINE TRANSLATION
1Gurleen Kaur Sidhu,
2Navjot Kaur
1Department of Computer Science and Engineering, Sri Guru Granth Sahib World University
Fatehgarh Sahib, Punjab 140406, India 2Department of Computer Science and Engineering, Punjabi university Patiala, Punjab 140406, India
ABSTRACT
Machine Translation and Word Sense Disambiguation are most popular applications of
Natural Language Processing, because Machine Translation is cheap and best to understand than
any other language during conversation. Whereas Word Sense Disambiguation helps to get the
correct meaning of particular word in which context that is used. In our system we are using hybrid
approach with help of which we can disambiguate the words and can get best result of machine
translation. Conditional Random Field algorithm with decision list using direct mapping is easiest
method with best result to solve the problem of disambiguation. In our system, Conditional
Random field, divide the data into categories and calculate the frequency of words with respect to
the category. Category having maximum frequency in the sentence meaning will relates to that
category. Accuracy of our System for correct sentences is 81.2% on the bases of tested sentences
only.
Keywords: Conditional Random Field, Machine Translation, Natural language, Word Sense
disambiguation, Hybrid approach.
I. INTRODUCTION
During automatic translation of sentences there is a problem of incorrect sense in the target
text. The process of assigning correct sense according to context is known as Word Sense
Disambiguation. We have a lot of applications and online sites which are helpful to give the
meaning of the input text. But they are not able to disambiguate the meanings. We try to solve this
problem using hybrid approach of word sense disambiguation and machine translation. Machine
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 4, July-August (2013), pp. 350-357
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
351
translation and word sense disambiguation are the most popular applications of Natural language
processing. To process the data available on Internet in Blogs, website, social sites, and business
site which are presented in natural language is known as Natural language processing. More
information about history and overview of applications are discussed in Fig1. Introduction.
Fig1. Introduction
To review the previously used techniques on different languages are discussed in Literature
survey. Methodology part is use to explain the proposed technique which is the combination of
various sub-techniques or algorithms of Word sense disambiguation and Machine Translation. Result
and discussion is use to discuss the advantages and disadvantages of the system. Conclusion explains
the how much beneficial the proposed system is, accuracy is also discussed in this part. Future Work
gives us the direction in this field.
II. LITERATURE SURVEY
Review of English study is given in Fig 2, in this brief introduction of six part of speech and
their sub types are given. Whereas remaining two parts are preposition and article. Articles are use to
distinguish the vowels & consonants, to define singular „a‟ used.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
352
Fig 2. Review of Part of speech in English
Review of research papers for techniques.[1]- Hybrid (statistical +rules) approach based
transliteration system of person names; from a person name written in Punjabi (Gurumukhi Script),
the system produces its English (Roman Script) transliteration. Experiments have shown that the
performance is sufficiently high. The overall accuracy of system comes out to be 95.23%. Reasons
behind the wrong answers of named entities are Multiple Transliterations, Wrong Input of Words,
Character Gap, One-to-Multi mapping Problem.[2] The natural language processing is a
multidisciplinary field at intersection of linguistic, psycholinguistic, Computer science and
engineering, machine learning and statistics. Also gives the reasons of popularity of the Natural
language processing day by day. More increase in business world more people move from one to
another country, help counters are established everywhere to Conway the proper message need to
process the natural language. [6]-Machine translation is used to translate the source text into the
target text with or without the help of human assistance. Machine translation has various approaches:
direct Translation method- word to word directly translate. Transfer-Based Translation- is done with
the proper knowledge of the rule of any language in which we want to translate. Interlingua-based
translation – inter-mediator is used to convert into target language. Corpus-based translation - is use
the parallel corpus of source and target text. Hybrid translation- is made with the help of above all.
Nancy ide (1998) [7] - define the various applications in which we can use the word sense
disambiguation method. [11]- The supervised learning method of word Sense Disambiguation, which
is Cosine Similarity. researcher extract two sets of features; the set of words that have occurred
frequently in the text and Cosine similarity algorithm uses the concept of inner product of two
vectors. After converting each context to a vector of words, cosine similarity measures the similarity
between a new context and each existing context in the training corpus. [12] Researcher work on
shahmukhi to Gurumukhi transliteration and try to remove the ambiguity problem. To different
approaches are used for word sense disambiguation that are: state sequence representation as a
Hidden Markov Model and N-gram in which small window of size -5+ is used. Accuracy for word
Sense Disambiguation using both approaches is calculated more than 92%.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
353
III. METHODOLOGY
Fig 3. Flow Chat for Proposed System
Algorithm for Proposed Punjabi to English Machine Translation System:
Step1: START �
input text
Step2: Check the text is present or not
o If present then move to step3
Else
o Display the message “please enter the text first”
Step3: ANALYSIS the sentence
o TOKENIZATION (Split sentence on the bases of white-space count the words)
Repeat the next two steps for every token
o PREPOCESSING (further divide into 2 subparts)
o Text normalization (optional)
� Implement the proposed algorithm for American to British English
o Sentence Differentiation
� Rules implement to check sentence is simple or compound
o PART OF SPEECH TAGGING (DIRECT MAPPING IMPLEMENTED)
After Analysis the sentence move on Step 4
Step4: SYNTHESIS the sentence
o DIRECT MAPPING( WORD + POS )
o PRESENT then FETCH the MEANING (MOVE ON reorder)
o Otherwise HYBRID APPROACH FOR WSD implement on sentence
� If (WORD+ POS ) having multiple CATEGORIES
� Increase the counter of all category(Repeat the above step for all tokens )
� Check that category having (Ambiguous word+ maximum Frequency)
assigns that meaning to the ambiguous word.
� Fetch the meaning move on next REORDER
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
354
o REORDER
o According to target text.
Step5: TRANSLATION ENGINE
o OUTPUT ( after Reordering combine the words in the form of sentence and display)
Step 6: END.
IV. RESULTS AND DISCUSSION
• First Case: general case is explained with 2 main examples that are give in below figures
with their discussion according to their results. In this Simple sentence is entered as input
which is correct in format our system show the output better than the previous one.
Fig.4: Correct and incorrect Sentence with discussion
• Random words used in sentence: System gives their meaning if present in the database
but avoid generating the sentence.
Fig.5 shows the Error given by our system due to incorrect formation of input sentence.
That‟ s why our system gives the message try again. To check whether Sentence formation is
incorrect
Fig. 5: System gives Error
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
355
Fig.7 : Lack of Word Sense disambiguation
Fig.8: Remove ambiguity of Words
Our system uses the Conditional random field to remove the ambiguity of the words. In
above fig. Input sentence is „we visited the bank and that was situated at the bank‟ .
Bank word is ambiguous here. First we check the conjunction word so that meaning of words
fetch according to the sub-parts. So in first sub-part there is no specific category the sentence relates
to the general category so we fetch the meaning which is generally used most that is financial bank.
Then we solve the second part here is the word ‟ situated‟ which is belongs to geography category.
We fetch the both meanings of bank. But here condition is applied the word used in sentence with
maximum frequency will be used. So we use the meaning of Bank related with geography category
for second part. Then reorder the sentence with respect to their POS then generate the target sentence
as display in the fig8.
Inaccuracy of result: character-gap, wrong input, word not present in database.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
356
V. CONCLUSION
We conclude that Machine Translation and Word Sense Disambiguation are most popular
applications of Natural Language Processing, because Machine Translation is cheap and best to
understand than any other language during conversation. Whereas Word Sense Disambiguation helps
to get the correct meaning of particular word in which context that is used. From Literature Survey,
we conclude that the basic structure and various sub-parts of part of speech of both languages
English and Punjabi. Also, know the previously implemented techniques by the different researchers.
In our system we are using hybrid approach with help of which we can disambiguate the words and
can get best result of machine translation. Conditional Random Field algorithm with decision list
using direct mapping is easiest method with best result to solve the problem of disambiguation.
Accuracy of our System is given below:
Fig.9: Accuracy table for testing the system
VI. FUTURE WORK
• More techniques can combine with this system for more accuracy.
• More data can use.
• Categories can further classify into sub-parts.
• Part of speech can more explore with sub-categories.
VII. ACKNOWLEDGEMENTS
As a part of my course I have taken the problem as “English to Punjabi Machine Translation
System using Hybrid Approach of Word Sense Disambiguation and Machine Translation” as
my Thesis Topic. I am very thankful to Mrs. Navjot Kaur, Assistant Professor, Punjabi University,
and Patiala for giving me such a valuable support in doing my work. She provided all the relevant
material that was sufficient for me to complete my thesis work. She provided help and time
whenever asked for. Last but not least, a word of thanks for the authors of all those books and papers
which I have consulted during my thesis work as well as for preparing the report. At the end thanks
to the Almighty for not letting me down at the time of crisis and showing me the silver lining in the
dark clouds.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
357
VIII. REFERENCES
JOURNAL
[1]. Kamal Deep, Dr.Vishal Goyal, Hybrid Approach for Punjabi to English Transliteration
System, International Journal of Computer Applications (0975 – 8887)Volume 28– No.1,
August 2011
[2]. Fabio Ciravegna, Recent Advances in Natural Language Processing, IEEE Computer
Society 2003.
[4]. J. Hutchins, An introduction to Machine Translation. Academic Press, 1992.
[7]. Nancy Ide, Jean Veronis, Introduction to the Special Issue on Word Sense Disambiguation:
The State of the Art, 1998J.
[8]. Pushpak Bhattacharyya, CS460/626: Natural LanguageProcessing/Speech, NLP and the
Web (Lecture 25– Knowledge Based andSupervised WSD), IIT Bombay, 6th March, 2012,
p.24.
[9]. Pushpak Bhattacharyya, CS460/626: Natural LanguageProcessing/Speech, NLP and the
Web (Lecture 25– Knowledge Based andSupervised WSD), IIT Bombay, 6th March, 2012,
p.35.
[10]. Durgesh D Rao, Machine Translation, pp.61-70, July1998.
[13]. Kamaljeet Kaur Batra, G S Lehal, Rule Based Machine Translation of Noun Phrases from
Punjabi to English, IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 5,
September 2010.
[14]. P.Tamilselvi, S.K.Srivatsa, Case Based Word Sense Disambiguation Using Optimal
Features, 2011 International Conference on Information Communication and Management
IPCSIT vol.16 Singapore, (2011).
BOOKS
[15]. Wren & Martin, English Grammar and Composition, S.CHAND Publication,
THESIS
[6]. R.Harshawardhan,Rule Based Machine Translation System For English To Malayalam
Language, Centre for Excellence in Computational Engineering and Networking, December
2011.
[28]. Kamal Deep, Dr.Vishal Goyal, Hybrid Approach for Punjabi to English Transliteration
System, Punjabi university Patiala, September 2011.
PROCEEDING PAPER
[3]. Available: http://en.wikipedia.org/wiki/Natural_language_processing
[11]. M. Nameh, S.M. Fakhrahmad, M. Zolghadri Jahromi, A New Approach to Word Sense
Disambiguation Based on Context Similarity, Proceedings of the World Congress on
Engineering 2011 Vol I, pp. 456-459.
[12]. Tejinder Singh Saini, Gurpreet Singh Lehal Word Disambiguation in Shahmukhi to
Gurmukhi Transliteration, Proceedings of the 9th Workshop on Asian Language Resources,
Chiang Mai, Thailand, November 12 and 13, 2011, pages 79–87.
[26]. Available at: http://en.wikipedia.org/wiki/Machine_translation
[27]. Available at: http://en.wikipedia.org/wiki/Word-sense_disambiguation