a pan indian perspective in mt ver 5.0.doc

Upload: rashi100

Post on 07-Jul-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    1/12

    TITLE OF THE PAPER 

    EILMT:  A Pan-Indian Perspective in Machine Translation

    AUTHORSHemant Darbari, Executive Director, C-DAC, Pune, [email protected]

    Anuradha Lee, !rou" Co-ordinator, C-DAC, Pune, ee@cdac,in

    A"aru"a Da#$u"ta, Team Co-ordinator, C-DAC, Pune, a"aru"[email protected]%an&a 'ain, Pro(ect Leader, C-DAC, Pune, "ri%an&a(@cdac.in)arvanan, Amrita *niver#it%, #ar+an#ter@$mai.com

    ABSTRACTTo cut-acro## the an$ua$e barrier and to encoura$e the an$ua$e "urai#m o mor"hoo$ica% com"ex

    an$ua$e# )"roat //0, e#"ecia% )outh-A#ian an$ua$e# 1ri#hnamurti et a. /230 in India, a

    con#ortium mode robu#t Machine Tran#ation #%#tem 4MT)5 that i# abe to rai#e the accurac% o $eneration

    i# deveo"ed (oint% b% C-DAC, Pune and DIT, !6I. In 7atura Lan$ua$e Proce##in$ 47LP5 and 7aturaLan$ua$e *nder#tandin$ 47L*5, Machine Tran#ation "a%# a vita roe in toda%8# India or an% #ort o e-

    an$ua$e "roce##in$ and under#tandin$ b% machine. In each o the 9uarter o eectronic era o a muti-

    in$ua communit% machine tran#ation, inormation retrieva or #"eech "roce##in$ become# obi$ator%.

    Thi# "a"er "ro"o#e# to de#cribe a h%brid ba#ed machine tran#ation #%#tem rom En$i#h to Indian

    an$ua$e#. Thi# "a"er a#o "ro"o#e# the TA! ba#ed memor% mana$ed Machine Tran#ation )%#tem 'o#hiet a. /20 ai$nin$ +ith other rue ba#ed, exam"e ba#ed and #tati#tica ba#ed Machine Tran#ation )%#tem

    or En$i#h-Hindi, En$i#h-*rdu, En$i#h-6ri%a, En$i#h-an$a, En$i#h-Marathi and En$i#h-Tami.

    EILMT ha# e#"ecia% been de#i$ned to tran#ate in "atorm inde"endent modue#. Thi# i# a "ro"o#edh%brid ba#ed thin-cient;thic&-#erver de#i$n< +here u#er# 4cient#5 o thi# #%#tem u#e a #tandard bro+#er to

    acce## the tran#ation #ervice# o the #erver. =e ca thi# a# a Pan-Indian "er#"ective on Machine

    Tran#ation. In thi# "a"er, +e +i ex"ain the chaen$e# aced and #oution dra+n at the variou# eve# o 

    architecture, an$ua$e and in$ui#tic com"utation. =hie buidin$ the Machine Tran#ation )%#tem, +ehave ta&en care o the #"eed and accurac% o #%ntactica% and mor"hoo$ica% diver#iied an$ua$e# at

    moduar and "ha#e# o EILMT #%#tem.

    >

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    2/12

    1.0 Introduction to Machine TransationIn thi# "re#ent "a"er, +e +i ex"ain the chaen$e# encountered to co"e +ith the #"eed and accurac% o 

    #%ntactica% and mor"hoo$ica% diver#iied an$ua$e# te#ted and deveo"ed or Machine Tran#ation

    #%#tem ba#ed on con#ortium mode or En$i#h to Indian Lan$ua$e# in coaboration +ith C-DAC, Pune andDIT, !ovt. o India.

    In 3?/ the idea o Machine Tran#ation evoved, +hen ene De#carte# "ro"o#ed *niver#a Lan$ua$e. In/B, the !eor$eto+n ex"eriment  4/B5 invoved u%-automatic tran#ation o #ixt% u##ian #entence#

    into En$i#h. In ate /2>#, machine tran#ation incined to #tati#tica mode# and exam"e ba#ed mode#

    evoved $radua%. And the Machine Tran#ation #%#tem i&e )%#tran u#ed b% Ata!i#ta #earch en$ine,METE6 u#ed at the Canadian Meteoroo$ica Centre, Exam"e-ba#ed machine tran#ation "ro"o#ed b%Ma&oto 7a$ao and #evera other H%brid ba#ed Machine Tran#ation #%#tem came into exi#tence. Durin$ the

    %ear //>-/, DIT 4De"artment o Inormation Technoo$%5 o !overnment o India initiated the TDIL

    4Technoo$% or Deveo"ment o Indian an$ua$e#5 "ro(ect to encoura$e the Indian an$ua$e "roce##in$ in

    the area o IT. The in#titution# name%, C-DAC, Pune 4MA7TA5< 7C)T 4no+ C-DAC, Mumbai<MATA5< IIIT-H%derabad 4Anu#aara&a, and )HA1TI5 and IIT-1an"ur 4An$abharati5 have ta&en the

    Machine Tran#ation )%#tem rom En$i#h to Hindi to $reater hei$ht b% deveo"in$ a""ication# u#in$

    cuttin$ ed$e technoo$%.

    ".0 Introduction to EILMTTo overcome the an$ua$e barrier and to encoura$e the an$ua$e "urai#m o mor"hoo$ica% com"ex

    an$ua$e# )"roat //0, e#"ecia% )outh-A#ian an$ua$e# 1ri#hnamurti et a. /230 in India, a

    con#ortium mode robu#t Machine Tran#ation #%#tem 4MT)5 that i# abe to rai#e the accurac% o $enerationi# deveo"ed (oint% b% C-DAC, Pune and DIT, !ovt. o India. It i# domain #"eciic Machine Tran#ation

    #%#tem rom the domain o touri#m. Thi# "ro(ect i# deveo"ed b% > con#ortium in#titute#: C-DAC,

    Mumbai, IIIT-H%derabad, II)c-an$aore, IIT-omba%, 'adav"ur *niver#it% 1o&ata, Amrita *niver#it%

      Coimbatore, IIIT-Aahabad, ana#thai id%a"eeth ana#thai, *t&a *niver#it% hubane#h+ar andC-DAC, Pune bein$ the con#ortium eader. EILMT i# a h%brid ba#ed Machine Tran#ation #%#tem +ith

    TA! ormai#m 4Tree Ad(oinin$ !rammar ba#ed MT deveo"ed b% C-DAC, Pune5, )MT 4#tati#tica ba#ed

    MT deveo"ed b% C-DAC, Mumbai5, A7AL!E7 4ue ba#ed MT b% IIIT-H%derabad5 and EMT

    4Exam"e ba#ed #%#tem deveo"ed b% II)c, an$aore5. To mea#ure the "erormance o aorementionedtran#ation en$ine# and evauate the an$ua$e "air +i#e tran#ation accurac%, +e re"re#ent here the interna

    te#tin$ carried out b% con#ortia and eedbac& on the te#t-re"ort "rovided b% the !I)T !rou", C-DAC, Pune

    on EILMT a"ha ver#ion . )ee EILMT Pro$re## e"ort, ?>>/0. The tran#ation out"ut accurac% o eacho the#e aorementioned tran#ation en$ine# are $iven beo+. oo+in$ tabe data i# the avera$e #core o en$ine or each #entence #tructure t%"e:

    Sentence Structure t#$e Ana%en &'( EBMT &'( SMT &'( TA% &'(

    Co"ua 22.F B?.> 33.? 2F.>

    )im"e 2.F 3>.>> F.> /.>>

    A""o#itiona F.> BF.> 3F.> /.?

    eative Cau#e F.>> G.F .>> /.>>

    That-Cau#e 3?.> .? 3>.>> /?.>

    =h-Cau#e 3.33 G2.GG G.F 3G.GG

    Co-ordinate 3.>> G?.> F2.F /.>>

    Conditiona B2.> 3.? 3G.F FF.>

    PP Initia 3>.>> BG.F F.> /G.F

    Adverb Initia 2>.>> G.GG G.GG /.>>

    !erundia G3.>> G.3> F.>> 2G.GG

    Partici"e 2.? ?.>> 2.? /.?

    Ininitive >.>> B3.? 3>.>> /G.F

    Di#cour#e Connector F>.>> .>> F.>> F.>>

    Ta)e 1: En$ine +i#e Tran#ation out"ut accurac%

    )imiar%, an$ua$e "air +i#e tran#ation accurac% on TA! tran#ation en$ine +a# evauated, +ho#ea""roximate tran#ation accurac% i# a# oo+: or En$i#h-Hindi "air the tran#ation accurac% i#

    http://en.wikipedia.org/wiki/Georgetown-IBM_experimenthttp://en.wikipedia.org/wiki/Russian_languagehttp://en.wikipedia.org/wiki/English_languagehttp://en.wikipedia.org/wiki/Georgetown-IBM_experimenthttp://en.wikipedia.org/wiki/Russian_languagehttp://en.wikipedia.org/wiki/English_language

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    3/12

    a""roximate% 2< or En$i#h-*rdu i# a""roximate% F< or En$i#h-6ri%a i# a""roximate% 2>< or 

    En$i#h-an$a i# a""roximate% F>< or En$i#h-Marathi i# a""roximate% 3< and or En$i#h-Tami i#

    a""roximate% F>.

    *.0 Introduction to EILMT Architecture+ The ChallengesEILMT i# a +eb-ba#ed Machine Tran#ation #%#tem #oution +ith a h%brid a""roach acro## #ix an$ua$e-

     "air# rom En$i#h to Hindi, *rdu, 6ri%a, an$a, Marathi and Tami. Aon$ +ith our dierent machinetran#ation en$ine#, the 7amed Entit% eco$nier 7E0 and =ord )en#e Di#ambi$uation =)D0 modue#are deveo"ed b% IIT, Mumbai. EILMT #%#tem architechture i# re"re#ented in the oo+in$ dia$ram:

    ,ia-ra 1: EILMT #%#tem architechture

    a#ic #%#tem aciitation# o EILMT con#ortium are: *#er Lo$ modue< Pre-Proce##in$ modue< our Tran#ation En$ine#: Ana!en, EMT, )MT J TA! or #ix an$ua$e "air#< Po#t-Proce##in$ modue<

    Coation and an&in$ modue< a com"atibe #%#tem +ith =GC< and ro+#er com"atibiit% or IE, Moia,

    ireox, !oo$e Chrome, A""e )aari J 6"era. 4)ee  Annexure 1 B  or detaied EILMT #%#tem

    #"eciication#5.

    *.1 O/era Architecture o EILMTEILMT i# a +eb ba#ed tran#ation #%#tem acce##ed #imutaneou#% +ith muti"e u#er# and re9ue#t#. 'o##

    i# the a""ication #erver +ith robu#t databa#e I;6 ie;exe and or ra"id, tran#actiona, #ecure and "ortabea""ication EILMT i# #u""orted b% E' 4Enter"rie 'ava ean5. EILMT i# de#i$ned on the ine o 

    centraied de#i$n +here Internet cient# #ubmit their document# to a muti-core #erver +here the "ar#in$and $eneration i# a #"a+nin$ o muti-threaded embeddin$. )i$niicant%, the outer a%er thread connect# to

    A7AL!E7 en$ine 4im"emented in PEL on Linux "atorm5, another thread +ith )MT +ith #erver andthe other +ith EMT en$ine. And the an&in$ modue coate# and ran& the tran#ation rom the above

    mentioned tran#ation en$ine#.

    Initia% 7E +hich i# u#ed in )MT #%#tem deveo"ed b% C-DAC, Mumbai oo+ed a Maximum Entro"%

    a#ed A""roach. Thi# #%#tem had an accurac% o 2.GG on ConLL-?>>G data#et. 4Preci#ion: 2G.2,eca: F2./ -Mea#ure: 2.GG5. The current #%#tem u#e# t+o #ta$e#: )M# oo+ed b% MEMM#.

    *#in$ ? "ha#e#, im"roved the accurac% to /G 4Preci#ion: /?.3 eca: /G.B2 -Mea#ure: /G5.

    *." I$eentation o TA% ForaisTree Ad(oinin$ !rammar 1roch and 'o#hi, /20 i# im"emented or a 3 an$ua$e-"air# in EILMT on

    TA! tran#ation en$ine. The 'AA ba#ed TA! "ar#er tran#ate# En$i#h document# to Hindi, *rdu, 6ri%a,an$a, Marathi and Tami. The #i$niicant eature o thi# "ar#er i# incrementa "ar#er that identiie# the 4a5cau#e or "hra#e on the ba#i# o "robabe decarative cau#e boundar% and, 4b5 ater identi%in$ cau#e

     boundar% the TA! tree derivation #tructure identiie# "robabe "arent derivation to the neare#t chid

    derivation #tructure to $ive the ina inte$rated derivationa tree to the TA! !enerator. The TA! en$ine i#

    enriched in #uch a +a% that it can "roce## the "ar#in$ and $eneration or interro$ative #entence#, ne$ation,$erundia con#truction, reative cau#e con#truction, and "a#t J "ro$re##ive "artici"e etc. The "re-

     "roce#in$ i# controed b% #u"ervi#ed modue# #uch a# #%ntactic TA! tree di#ambi$uator modue +ith

    o"timied code and databa#e-de#i$n +ritten in re$uar ex"re##ion#. Con#ider the oo+in$ de#cri"tion o 

    ?

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    4/12

    the incrementa "ar#er that ha# $iven moduarit%, exten#ionait% and #"eed in the tran#ation "roce## o TA!

    en$ine. Probabiit% o ad(oinin$ the "arent derivation# to a neare#t "robabe chid derivation i# $iven b% the

    oo+in$ e9uation:

    K c4N5O < =here, N 7umber o Chid derivation#, K 7umber o Par#er Derivation#, c Combination ,

    Con#ider the #entence “The 2th centur% harat"ur-ird-)anctuar%, +hich i# a#o &no+n a# the 1eoadeo-!hana-7ationa Par&,i# amou# a# the mo#t im"ortant bird breedin$ and eedin$ habitat o the +ord.”

    oo+in$ i# the "ar#e derivation o cau#e 4one o the cau#e5:

    ,ia-ra "+ Par#er Derivation o cau#e

    oo+in$ i# the com"ete !enerated derivation 4or derived Tree5: 

    ,ia-ra *: Com"ete !enerated derivation

    The "re#ent "a"er re"re#ent# our interna ana%#i# and te#tin$ o touri#m cor"u# on TA! tran#ation en$ine

    throu$h an accurac% $ra"h ba#ed on the $rade #cae o B "oint# #ub(ect to #%ntactica% and #emantica%

    +e-ormedne## and choice o "ro"er exica #eection.

    ,ia-ra : Accurac% !ra"h or TA! En$ine

    .0 Lin-uistic ,ias$ora o EILMT+ Mor$hoo-ica# ,i/ersiied Lan-ua-esThe En$i#h cor"u# o ,?>> #entence# rom touri#m domain +ere coected, or$anied, vetted and ai$ned

    )incair, '. // and ?>>B0 or a 3 an$ua$e-"air#.

    India bein$ a Lin$ui#tic Area #ee 1ri#hnamurti et a, /230 in )outh-A#ian #ub-continent, both Indo-Ar%an

    4ea#tern and +e#tern Indo-Ar%an5 and Dravidian an$ua$e amiie# +ith rich mor"hoo$ica herita$e have it##e"arate di#tinct in$ui#tic identit% at #ource-tar$et TA! $rammar, tran#er $rammar 4a #ource-tar$et in& 

    $rammar5, rue-normaier, rue# or mor"hoo$ica ana%#i# and #%nthe#i# and tran#iteration and t%"in$-too

    G

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    5/12

    rue. The #t%i#tic trend ob#erved in EILMT touri#m cor"u# i#: #im"e #entence 4B./B re9uenc% o 

    occurrence5, co"uative con#truction 4G.B/ re9uenc% o occurrence5, co-ordinate #entence# 4?>.>

    re9uenc% o occurrence5, a""o#itiona #entence# 4.GG re9uenc% o occurrence5, variou# decarative

    cau#e #tructure# 4?? re9uenc% o occurrence5, $erundia con#truction# 4.G re9uenc% o occurrence5,conditiona #entence# 4 re9uenc% o occurrence5, di#cour#e connector 4>.FF re9uenc% o occurrence5

    and ininitiva #entence# 4/.>G re9uenc% o occurrence5. Thu#, the "arae cor"u# created or a 3

    an$ua$e-"air# and eature# #uch a# intei$ibiit%, com"rehen#ibiit% and uenc% in tran#ation are maintained

    to #et a reerence to the machine out"ut, a# E.M. En9ue#t ha# #aid ver% correct% that Pro"er +ord# in "ro"er  "ace# create# #t%e#Q.

    In 7atura Lan$ua$e Proce##in$ 47LP5 and 7atura Lan$ua$e *nder#tandin$ 47L*5 Terr% Patten, /20,Machine Tran#ation "a%# a vita roe in Indian #ub-context or e-an$ua$e "roce##in$ b% machine. In En$-

    Hindi EILMT #%#tem, the ocaiation o in$ui#tic "ecuiaritie# o Hindi #uch a# obi9ue ormation,

    er$ativit%, mar&ed-$ender #%#tem, ca#e-mar&in$, direct-obi9ue "uraiation etc. are handed in a controed

    environment throu$h mor"h-#%nthe#ier, inite and non-inite $enerator#, P6) conver#ion rue etc. )imiar%,or other Indo-Ar%an an$ua$e "air# i.e., En$-*rdu, En$-6ri%a, En$-an$a and En$-Marathi the in$ui#tic

    eature# #uch a#, Per#o-Arabic and Indic "uraiation #%#tem, exico-#emantic "ecuiaritie#, co"ua dro",

    dro""in$ o exi#tentia #ub(ect, "o#t-"o#ition #%nthe#i#, #%nthe#i# o ca#e-mar&in$, em"hatic citic ormation,

    u#a$e o ca##iier, verb root ateration, #tron$ and three eve $ender #%#tem, $ender ba#ed noun #%nthe#i#,and com"oundin$ etc., hattachar%a, T et. a //3< 1ri#hnamurti, h. et a /23< )e&ir& /2?< =iiam#

    /20 are incor"orated throu$h eature-ba#ed exicon, ordered rue-ba#ed normaier etc. A""roximate%GF,>>> biin$ua exicon, ?>>> "hra#a exicon, /F TA! tree di#ambi$uation rue, ?-> #ource TA! tree#,

    ?-?G> tar$et TA! tree#, 2>> tran#er $rammar ma""in$ and F>-F mor"h-#%nthe#i# rue are deveo"ed or each Indo-Ar%an an$ua$e-"air#.

    oo+in$ #ection +i ex"ain the in$ui#tic chaen$e# aced and an$ua$e com"utin$ #oution# dra+n in

    EILMT #%#tem or #%ntactica% and mor"hoo$ica% diver#iied and com"ex an$ua$e# to rai#e the #"eedand tran#ation accurac%:

    .1 Raisin- Transation S$eed and Accurac#+ Interediate Soutionsa(. ecti%in$ +ron$ P6) ta$$in$ o )tanord ta$$er 4ver#ion .35 throu$h rue ba#ed P6) ta$$in$ 4)eeComputational Linguistics, voume /, number ?, ""GG-GG>.5. Con#ider the oo+in$ exam"e# that #tate#

    the interna P6) conver#ion rue that rectiie# the erroneou# ta$$in$ out"ut o )tanord ta$$er,

      i#it the )hee#h-Maha or the Ha o ictor% $itterin$ +ith mirror# and a#cend the ort on ee"hantR# bac&QStanord Out$ut: !isit222233, the@@@@DT, )hee#h@@@@77P, Maha@@@@77P, or@@@@CC, the@@@@DT,

    Ha@@@@77P, o@@@@I7, ictor%@@@@77P, $itterin$@@@@!, +ith@@@@I7, mirror#@@@@S,

    and@@@@CC, a#cend@@@@, the@@@@DT, ort@@@@77P, on@@@@I7, ee"hant@@@@77,

    xtd@xtdxtd@xdt@@@@77, bac&@@@@0

    Interna Pos Cate-or# Strin-: !isit22!ERB  the-)hee#h-Maha@@76*7 or@@C67' the-Ha@@76*7 o@@PEPictor%@@76*7 $itterin$@@PrPAT +ith@@PEP mirror#@@76*7 and@@C67' bac&@@AD

    a#cend@@TKPEAPP6I7T the-ort@@76*7 on@@PEP ee"hant@@76*7 xtd@xtdxtd@xdt@@A)0

    A"art rom P6) ta$$in$, emotion and #en#e ta$$in$ i# nece##ar% in Machine Tran#ation to ca"ture the

    #emantic anoma% o the natura an$ua$e.

    )(. Chun&in$ i# an im"ortant "art o #hao+ "ar#in$ eve. It minimie# the number o to&en# to be #ent to thecore "ar#er, thu# reducin$ the number o "o##ibe ad(unction# and eected the tran#ation time a# +e a# thetran#ation 9uait%. =e "erorm noun "hra#e chun&in$ and verb $rou" coation. Con#ider the oo+in$

    exam"e at eve- chun&in$,The-Prince;77P o;I7 =ae#;77P Mu#eum0;77P ,;, the-'ahan$ir-art-!aer%0;77P ,;, the-variou#-churche#0;77) ,;, tem"e#;77)

    and;CC #hrine#;77) incudin$;! the;DT one;CD0;77P o;I7 Ha(i-Ai0;77P out;I7 on;I7 an-i#and0;77 in&ed;7 b%;I7 a-

    cau#e+a%0;77 ,;, are;P +orth;'' a-$im"#e0;77

    And eve-? chun&in$:The-Prince-o-=ae#-Mu#eum0;77P ,;, the-'ahan$ir-Art-!aer%0;77P ,;, the-variou#- churche#0;77) ,;, tem"e#;77) and;CC

    #hrine#;77) incudin$;! the-one0;77P o;I7 Ha(i-Ai0;77P out;I7 on;I7 an-i#and0;77 in&ed;7 b%;I7 a-cau#e+a%0;77 ,;,

    are;P +orth;'' a-$im"#e0;77

    B

    mailto:zxtd@zxtdzxtd@zxdt@@ASmailto:zxtd@zxtdzxtd@zxdt@@ASmailto:zxtd@zxtdzxtd@zxdt@@AS

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    6/12

    c(.  =e u#e a TA! 4Tree Ad(oinin$ !rammar5 'o#hi et al! /F0 "ar#er, and or that +e have created anumber o tree# to re"re#ent #tructure o #ource and tar$et an$ua$e#. In thi# ormai#m each to&en i# ta$$ed

    +ith a P6) ta$;cate$or%, on the ba#i# o +hich a #et o "o##ibe tree ta$# are a##i$ned to the to&en. Thi#

     "roce## i# caed tree ta$$in$. A #entence a# a #trin$ o tree-ta$$ed to&en#, are then #ent to the "ar#er. =hen ato&en in a #entence i# ta$$ed +ith a number o tree# the "ar#er i# iabe to "roduce muti"e derivation#, mo#t

    o them bein$ ina""ro"riate. Thi# reduce# accurac% and #"eed. To eiminate thi# #"uriou# derivation, or at

    ea#t minimie them, +e ado"ted the techni9ue o TA! tree "runin$. 6ur "runin$ modue urther di#ambi$uate# accordin$ to the #%ntactic context and he"# in #eection o TA! tree in more "reci#e +a%.

    Accurac% and #"eed o the #%#tem, thu#, +a# #ub#tantia% im"roved.

    d(. To hande the #%nthe#i# o con#truction# in Indian Lan$ua$e#, mor"hoo$ica com"exitie# o noun# andverb# and their inter-reation#hi"#, and the &aara&a ormai#m !an$o"adh%a%, M. //>0 in a deined context

     "a%# a ma(or roe in noun or verb #%nthe#i#. Henceorth variou# cate$orie# at ad(oinin$ "o#ition# a# an

    ad"o#itiona +ord# i&e ad(ective#, "o#t-"o#ition# 4"ara#ar$a#5, and variou# "artice# i&e the av%a%a# etc. are

    a#o +ithin thi# deined context. a#ica% "o#t-"o#ition#, av%a%a# have modiied-modiier Arono, M.

    /F30 unction addin$ more in$ui#tic inormation to the end-u#er# o the tar$et an$ua$e. The eatureembedded mor"hoo$ica rue# 4and a#o #ometime# $ender a$reement5 +ritten or the #%nthe#ier can be

    #een throu$h the #%nthe#ied out"ut. erb# in the an$ua$e demand the &ara&a identitie# and the noun# ui

    the demand# accordin$ to the %o$%ataa. And, in a deined context, noun# demand "ara#ar$a# or "o#t-"o#ition

    on a #emantic account. oo+in$ dia$ram ex"ain# the #%nthe#i# "roce## in EILMT #%#tem:

    ,ia-ra 4: Mor"h-#%nthe#i# "roce## o EILMT #%#tem

    Above mentioned "oint# rom a5 to d5, the in$ui#tic variation# and com"exitie# that are handed throu$h

     "re-"roce##in$ or "o#t-"roce##in$ $enerative modue# have e#caated tran#ation accurac% and #"eed in acon#iderabe +a%. oo+in$ $ra"h re"re#ent# the com"ari#on o tran#ation #"eed bet+een od and ne+

    ver#ion o EILMT #%#tem 4i.e., #"eed o tran#ation beore and ater "runin$ and context di#ambi$uation o 

    the P6) ta$#et#, TA! tree ta$$in$ and noun-verb #%nthe#i#5. The above deveo"ment at "ar#in$ and

    $eneration #ta$e# ha# rai#ed the #"eed o tran#ation in the atter 4or ne+5 ver#ion o EILMT:

    ,ia-ra 5: Com"ari#on o #"eed o od and ne+ ver#ion

    4.0 En-ish6Tai EILMT S#ste+ an Overview

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    7/12

    In En$i#h-Tami EILMT #%#tem, #"ecia attention to Tami mor"hoo$ica #%#tem ha# been $iven. A# Tami

    root# to Dravidian an$ua$e ami%, bein$ a$$utinative an$ua$e, the #%nthe#i# o inite and non-inite orm#,

    #%nthe#i# o noun or noun $rou" and $ender ba#ed #%#tem ha# been catered throu$h eature ba#ed exicon, and

    noun and verb mor"h-#%nthe#ier. In modern Tami three t%"e# o +ord# noun, verb and itaicco or "artice#are ound. The noun indicate# animate and inanimate cate$orie# 4ti7ai, i# ca##iied into u%arti7ai and

    a&i7ai5. There are three $ender# in Tami - ma#cuine and eminine and neuter +here ma#cuine and

    eminine indicate# #in$uar number and neuter $ender indicate# "ura number. There are three "er#on# in

    Tami 4ir#t, #econd and third "er#on5. Ca#e inexion i# "rominent +ith #uixe# in Tami. Tami bein$a$$utinative in nature )ee aradara(an, Mu. /220 i# ound to be dierent to "ar#e and $enerate than the

    manner in +hich Indo-Ar%an an$ua$e# are $eneratin$ in EILMT #%#tem. A""roximate% G,>>> biin$ua

    exicon, /? "hra#a exicon, /F TA! tree di#ambi$uation rue, ? #ource TA! tree#, ?F tar$et TA! tree#,

    BF tran#er $rammar ma""in$ and >> mor"h-#%nthe#i# rue are deveo"ed or En$i#h-Tami ver#ion.Con#ider the oo+in$ exam"e rom touri#m domain +ith EILMT TA! out"ut in Tami:

    En-ish: Mother EarthR i# &ind in return. ; Tai:  அன பம தரல வகயக இரறoo+in$ dia$ram re"re#ent# the En$i#h-Tami *#er Interace 4+ith Tami out"ut5:

    ,ia-ra 7: En$i#h-Tami *#er Interace out"ut

    4.1 Transation Accurac# o En-ish6Tai EILMT S#steTo evauate the tran#ation accurac% o En$i#h-Tami #%#tem the #core +a# evauated throu$h)ub(ective;Human Evauation. The "arameter# or te#tin$ the tran#ation accurac% o EILMT #%#tem or 

    #ub(ective;human evauation are: P6) ta$$in$, P-)%ntax, !-)%ntax, Mor"h-)%nthe#i#, Lexicon avaiabiit%

    and "hra#e mar&in$. =e re"re#ent here the interna te#tin$ carried out b% con#ortia and eedbac& on thete#t-re"ort "rovided b% the !I)T !rou", C-DAC, Pune on EILMT a"ha ver#ion .. "ee Appendix I   B or En$i#h-Tami out"ut0. The above Evauation o En$-Tami o Human Evauator i# #ho+n in the oo+in$

    ar-chart 4Evauation "arameter a# $iven above5.

    ,ia-ra 8: ar-Chart o En$-Tami tran#ation out"ut Evauation

    4." Sco$e o i$ro/eent or En-6Tai TA% EILMT s#steIt i# evident rom the above i$ure that oo+in$ in$ui#tic deveo"ment i# re9uired or the urther 

    deveo"ment o En$i#h-Tami #%#tem to increa#e the tran#ation eve accurac%. oo+in$ "oint# are to be

    con#idered or uture im"rovement o the En$i#h-Tami #%#tem:

    3

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    8/12

     a(. e-ramin$ and enhancin$ the 7oun Coation modue on the ba#i# o Phra#e Ta$$in$ and a$$utinatin$

    character o Tami.)(. The "roce## o ne+ Tree #et creation; exi#tin$ Tree #et modiication ; deetion o edundant Tree #et to

     be im"roved and tar$et tree #et "runin$, and "uriication o tran#er $rammar to be enhanced.c(. Enhancement o eature ba#ed in$ui#tic rue-#et in the verb $enerator and noun $enerator modue and

    iin$ua exicon correction and "uriication and eature attachment to be com"eted.

    5.0 ConcusionA the#e above indin$#, re#earch and im"ementation# to EILMT #%#tem $ive a more "roductive and

    evoutionar% $round in e-cor"u# in Indian #ubcontinent. And thi# $round +i deinite% rai#e #ome critica

    9ue#tionin$ and re-ana%in$ on machine tran#ation, text minin$, data "runin$, inormation extraction and

    retrieva, #"eech "athoo$% and technoo$% in IL to IL inormation exchan$e and acce##. Thu# the re#earchand #tud% on EILMT or Indian an$ua$e# #houd be $uided and ormaied a# oo+in$:a(. )tandardiation o Indian ta$#et and con#iderin$ the actor o mor"hoo$ica% rich an$ua$e amiie#

    and orma ta$$in$, #en#e ta$$in$ and emotion ta$$in$ o the e-cor"ora avaiabe in Indian an$ua$e#.)(. Memor% ba#ed "ar#in$ mana$ement to or$anie the muti"e an$ua$e +ith muti"e domain.c(. urther eature-ba#ed deveo"ment o mor"hoo$% ba#ed modue# or mor"hoo$ica% rich Indian

    an$ua$e#.d(. urther, memor% mana$ed MT +i increa#e the #%#tem eicienc% -?> more. The #co"e o thi#

    ana%#i# and #%nthe#i# can be extended or rever#e tran#ation a#-+e.

    F.> ReerenceArono, Mar&. /F3. =ord ormation in !enerative !rammar. Cambrid$e: MA: MIT Pre##hattachar%a, T. and P. Da#$u"ta //3. Classi#iers! $ord order and de#initeness in Bangla. In .). Lahmi

    and A. Mu&her(ee, ed. %ord order in Indian languages. FG-/B. H%derabad: oo&in.

    !an$o"adh%a%, Maa%a. 4//>5. The &oun Phrase in Bengali' Assignment o# (ole and the

     )aara*a Theor+. Dehi: Motia anar#ida##.'o#hi, Arvind, onnie =eber and Ivan )a$. /2, Eement# o Di#cour#e *nder#tandin$. Cambrid$e

    *niver#it% Pre##, 7e+ Kor&.

    1ri#hnamurti, h., C.P. Ma#ica and A. 1. )inha 4ed#5. 4/235. "outh Asian Languages' "tructure!

    Convergence and ,iglossia. Dehi: Motia enar#ida##.1roch, T. and A. 'o#hi 4/25. The Linguistic (elevance o# Tree Adoining .rammar . *niver#it% o 

    Penn#%vania. De"artment o Com"uter and Inormation.

    Patten, Terr% /2. A pro/lem solving approach to generating text #rom s+stematic grammars. Proceedin$#o ?nd Conerence on Euro"ean cha"ter o A##ociation or Com"utationa Lin$ui#tic#. !eneva. )+iterand.)incair, '. //. Cor"u#, concordance, coocation. Tu#can =ord Centre, 6xord: 6xord *niver#it% Pre##

     . ?>>B. Deveo"in$ Lin$ui#tic Cor"ora: A !uide to $ood "ractice. 6xord: 6xord *niver#it%

    Pre##

    )e&ir& 4/2?5. The "+ntax o# %ord . MIT Pre##.ardara(an, Mu. /22. A 0istor+ o# Tamil Literature. Tran#ated rom Tami b% E. )a. i#+anathan -F.

    )ahit%a Academ%. 7e+ Dehi.

    F

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    9/12

    A33E9URE 1 A

    &EILMT s#ste s$eciications(Ser/er Machine&s(

    Ser/er Machine : 1 &Main EILMT Ser/er(

       Hard+are:

    HP DL rac& Mount )erver, 2 core xeon G.>!HS, G?!AM, B2!U3 )A) Di#&.

      6"eratin$ )%#tem:=indo+# )erver ?>>2

       )ot+are e9uired:

    'ava "atorm: (d&..>> or (re..>>.

    )erver ver#ion: (bo##-B.>.?4A""ication #erver5.

    Databa#e ver#ion: M%)VL )erver 3.>, M%)VL Too# or .>

    Ser/er Machine : " &Ana%en Ser/er(

       Hard+are:

    HP DL rac& Mount )erver, 2 core xeon G.>!HS, G?!AM, B2!U3 )A) Di#&.

       6"eratin$ )%#tem:

    Linux edora 2 4or Con#ortia En$ine 7ame: Ana!en5.

    Cient Machine

       Hard+are:

    PC +ith B23MH or Hi$her 4Pentium "roce##or recommended5.

    3 M AM minimum.   6"eratin$ )%#tem:

    =indo+# /2 J Above

    Linux edora 2

       ro+#er #u""ort:

    Internet Ex"orer IE3, IEF

    Moia ireox G.>.-G,G..?-3

    !oo$e Chrome ?.>.F?.?2, ?.>.F?.GF, G.>./.?B, G.>./.?FA""e )aari G.?.?, B.>.?, B.>.G4G./.5, B.>.B4G.?.>5

    6"era /.3B

    A33E9URE I B

    &E/auation o EILMT s#ste Transation out$ut or En-ish6Tai an-ua-e $air(!ersion 4.0

    En-ish Sentence Transated out$ut &E6T( Ana#sis o Transation

    Si$e ; Co$ua

    rindavan i# a "i$rima$e. ரநவன ஒர யதயக இரற P6) >>P #%ntax >>

    !-#%ntax >>

    Lexicon F

    Phra#e-Mar&in$ >>

    )%nthe#ier >>

    Si$e ; Co$ua &Possessi/e or(

    The !an$a !oden 'ubieeMu#eum ha# a ar$e coection o

     "otter%, "aintin$#, car"et#, coin#,

    and armor%.

    கஙக கடன !"#$ ம%&ய'()*)ட+, , -யஙக. , க*/ஙக.

    , 01யஙக. (23 *டக45ன ஒர +*6ய7க6" இரனற

    P6) >>P #%ntax >>

    !-#%ntax >>

    Lexicon >>Phra#e Mar&in$ >>

    )%nthe#ier >>

    Si$e ; Co$ua &co6ordinate(

    The Mehran$arh ort ha# #even$ate# and "rovide# +onderu

    vie+# o the cit%.

    (கன89 :*9;>P #%ntax >>

    !-#%ntax >>

    Lexicon F>

    Phra#e Mar&in$ >>

    )%nthe#ier >>

    &In( Si$e ; Co$ua

    The be#t time to vi#it 'ai"ur i# +@"ப க1 மகA&ற"* 0 P6) >>

    2

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    10/12

    En-ish Sentence Transated out$ut &E6T( Ana#sis o Transation

    December to ebruar%December to F B bரஅ9யக  இரற P #%ntax 2>

    !-#%ntax 2>

    Lexicon >>

    Phra#e Mar&in$ 2>

    )%nthe#ier >>

    Si$e

    !-#%ntax F>Lexicon 2>Phra#e Mar&in$ >>

    )%nthe#ier />

    Reati/e Cause &su)ordinate cause( : Hidden

    The 'a-Maha i# a "icture#9ue "aace buit or ro%a duc& 

    #hootin$ "artie#

    (8 அM'6ய வ வ;டய<க;&கIகக க;ட"*;ட ஒர க)கவ9அ)(யக இரற

    P6) >>P #%ntax >>

    !-#%ntax >>

    Lexicon 3>

    Phra#e Mar&in$ >>

    )%nthe#ier >>

    A$$ositiona &/er) $artici$ia ; initia(

    Phetchaburi, a ver% od cit%, anim"ortant to+n, had been $iven

    #evera name# #uch a#, Phri""hri,

    Phri""hi or Phetcha"hi

    +8;Sஅ b K6 , ஒர மக *?ய 0க , -ரJய( 0க , T6""T6 , T6""Tலஅ4 +8;Sஅ"Tல * *4 +*ய9க.அகயவ அ" *;Hரந

    P6) >>P #%ntax >>

    !-#%ntax >>

    Lexicon 2>

    Phra#e Mar&in$ >>

    )%nthe#ier F>

    A$$ositiona&co$eent ; initia(

    a#amand La&e J Paace, anartiicia a&e, i# a #"endid #"ot

    and +a# buit in / AD.

    *7(ந U6 & அ)( , -ர +7ய2கயU6 , ஒர அ?க இட(க இரற (231159 V D இ க;ட"*;ட

    P6) >>P #%ntax >>

    !-#%ntax >>

    Lexicon /

    Phra#e Mar&in$ >>

    )%nthe#ier >>

    That Co$eent

    Madurai i# the ode#t cit% inTami 7adu and +a# home to the

    ancient Tami )an$am, the

    iterar% concave that "roduced

    the ir#t e"ic, )ia""athi&aram.

    ( மP 0;H மக"*?(ய 0க(கஇரந (23 J க"#ய ,W4"*;Xக *=9/ *)டய Tamil Sangam , இ4ய"*)டயY;ட' இர"#ட(க இரந

    P6) >>P #%ntax F>

    !-#%ntax F>

    Lexicon F>

    Phra#e Mar&in$ >>

    )%nthe#ier >

    &Co6ordinate( ; That Co$eent

    The "icture#9ue 1an$ra vae%

    ha# #evera #"ot# that oer maha#eer river car".

    க)கவ9 கங *./'' (8Z9 V3'றYற அ$ற *4 இடஙக. இரனற

    P6) >>

    P #%ntax >>!-#%ntax >>

    Lexicon 2>

    Phra#e Mar&in$ >>

    )%nthe#ier >>

    Co6ordinates

    /

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    11/12

    En-ish Sentence Transated out$ut &E6T( Ana#sis o Transation

    *dai"ur i# #ituated in the

    #outhern "art o a(a#than and i#

    #urrounded b% the Aravai ran$e.

    K@"ப9 C5ன +2' *'த=இட'[அ(க"*P #%ntax >>

    !-#%ntax >>

    Lexicon 2>

    Phra#e Mar&in$ >>

    )%nthe#ier >>

    ,iscourse connector

    *dai"ur i# &no+n or it# beautiua&e#, +e #tructured "aace#,

    u#h $reen $arden# and tem"e#

     but the ma(or attraction# o thi#

     "ace are the La&e Paace and theCit% Paace.

    K@"ப9 அQடய அ?க U6கIககஅ[ய"*ட"*3 +க)Hர'

    P6) 2>

    P #%ntax F>

    !-#%ntax F>Lexicon F>

    Phra#e Mar&in$ >>

    )%nthe#ier F>

    Co$ua அ?க (யஙக$ன ஒனறக இரற

    P6) /

    P #%ntax /

    !-#%ntax F>

    Lexicon FPhra#e Mar&in$ >>

    )%nthe#ier 2>

    >

  • 8/18/2019 A Pan Indian perspective in MT ver 5.0.doc

    12/12