revisionofmorphologicalanalysiserrors ...nlp.dse.ibaraki.ac.jp/~shinnou/papers/copy/amta-1998.pdfi...
TRANSCRIPT
ー・ ーDD8■a▼■・●●■■
●
01
●
■
○己。
融職職溌囎函墨潔儲
野凝‘稚蝿 ●ウ
●
1J
0●
●
#I##伶吟
L●
海
霧網見1呼号
諜
鑿蕊
;露
-1患
●■0■●0口■■■■宮。。r●0“■aDD■▼“■ワリGb■“守凸■■e■?●のり●■■、■6-3●国■■。●◆■▲pbeQ●t‐・■・古中■●申や
O■申旬■心心
験 C e
RevisionofMorphologicalAnalysisErrors
ThroughthePersonNameConstructionModel
HiroyukiShinnou
IbarakiUniversity
Dept・ofSystemsEngineeringNakanarusawa,4-12-1
Hitachi,Ibaraki,316-8511,Japan
shinnouO1ily.dse.ibaraki・ac.jp
● Abstract・Inthispaper,wepresentthemethodtoautomaticallyre-
visemorphologicalanalysiserrorscausedbyunregisteredpersonnames.Inordertodetectandrevisetheirerrors,weproposethePersonName
ConstructionModelfbrkaniichara.cterscomposingJapanesenames・Our
methodhastheadvantageofnotusingcontextinfbrmation,likeasuffix,torecognizepersonnames,thusmakingourmethodausefUlone.Throughtheexperiment,weshowthatourproposedmodeliseffective.
11ntroduction参”
ItisclearthatmorphologicalanalysisisanimportantmodulefbranNLPsystemliketheMTsystem.Oneprobleminthemorphologicalanalysisiswordsegmen-tationerrorscausedbyunregisteredwords・Mostunregisteredwordsarepropernouns,likeplacenames,organizationnames,andpersonnames.Inthispaper,wefbcusonpersonnames,andproposethePersonNameConstructionModeltocorrectmorphologicalanalysiserrorscausedbyunregisteredpersonnames.ThismodelgivesaScoretothegivenwordsequence.Thisscoreindicatesthedegreetowhichthegivenwordsequenceappearstobeaperson'sname.Bythescore,wecanextractthenameftomthemorphologicalanalysisresult.Iftheextractednameisnotconsistentwiththemorphologicalanalysisresult,we
revisetheresulttotaketheeXtractedpersonnameintoaccount.ThePersonNameConstrubtionModelisbasedontheheuristicthataper-
son'snameiscomposedofkanjicharacterswhichareplacedinthefirstposition,themiddlepositionandthelastpositionofthename.Ebrexample,inthecaseofthefamilyname,kanjicharactersfrequenblyusedinthefirstpositionare:@中")"松''and!G長''.Andinthemiddleposition,theyare"谷'',C<"'',(<曽"・Andinthelastposition)theyare"田")"藤"1"井".Ourproposedmodeldeducesthatthecharactersequences(@中谷田"2"長藤"and"松曽井",whichareacombinationoftheircharacters,havetheappearanceofbeingnames'.Howevwer,thismodeltendstojudgethegivencharactersequencetobeaperson'sname.So,inorder
典
●ぎ●心り凸■■●■■
●
ロ亜●。■5●由今。■等●巴毎G口
中
C・凸●■■①■・・妬.》p●・■eqc●2
癌記。崎守心囚墹路すみ‐J6JFr叱蝿刈呂1丁免恥上埠rシ濁鄙苅勗パパ舎刈鈩岫閂砺侭毎蒔少①込国心叺、四℃》《門〆母屋戸歩唖両日
●
●。●●■●。●●》台P
GI“、●▲『●凸氾ロロ
祇俳炉恥私捜鞍肝6離二
BIhq
'Thesecharactersequencesarenotregisteredinthedictionary.Wedon'tknowwhetherthesecharactersequencesarerealpersonnames.However,mostJapaneseagreethatthesecharactersequencesseemtobefamilynames.
吟
’可4■■
.p姑
IE0曇lI18Lo刺4
一.
『
;; ■■1IlII111lljF■■■!‐川111‐IllIII0l1lI
毎ロ
p同唾
P一■
399
torernovenon-namesfromtheextracbion,wealsousetheheuristicbasedonthe
morphologicalanalysiserrorpatternscausedbyunregisteredpersonnames.
AfeatureofourproposedmodelmakesnouseofconteXtualclues.Strategiestorecognizeunregisteredwordsaredividedintotwotypes.Theonetypeuses
contextualclues,likeasuffix(ex:<氏(Mr.)'',"さん(Ms..)"),apre6x(ex.<(故(thelate)","長女(thefirst-borndaughter)"),theinitialphrasing(ex."社長(thepresident)",(:大統領(thePresident")),averb(ex""i逮捕される(bearrested)","殺される(bekilled)")andsooninordertorecognizeunregisteredwords[3,81.Anothertypeusesonlycluesinthegivenwordsequence,anddoesn'tuseinfbr-
mationoutofthegivenwordsequence.Thefbrmerispowerfill,andcurrentlytheautomaticacquisitionofsuchcontextualcluesisbeingresearched[6,1,51.Howeverweoftenhavethesituationwithoutcontextualclues.Thusthefbrmer
strategyneedstohavethelatterstrategymodule.Fbrexample,inthecaseofthephrase"~社長(thepresident~)",the"~''partoftenincludesaname.Thus,thephrase<<~社長"isacontextualcluetorecognizepersonnames.However,the"~"wordsinthisphrasedonotalwaysincludeaperson'sname.Therefbrefromonlyinfbrmationinthe"~''sequences,wemustjudgewhetheritincludesaperson'snameornot.OurproposedmodelisusefUlindoingthis,andcanbeappliedtoallsortsoffbrmerstrategies.
Lastweexperimentedusingasmallsampling.Formorphologicalanalysiserrorscausedbyunregisteredpersonnames,oursystemrevisedthemwith63.8%precisionand72.5%recall.Investigatingoursystemfailures,wefbundmostfailuresacceptableandreasonable.SoourproposedmodelWasshowntobeusefillandeffectivefbrtherecognitionofunregisteredperSonnames.
’11f
●
1
q
■
心
■
2Extractionofpersonnamesandrevisionof
morphologicalanalysiserrors1
2.1Basicprocedures
First,wepickoutkanjiwordsequencesfbrdoingamorphologicalanalysisofasentence.Here,wedefinetheterm"kalLjiword"aswordscomposedofkalljicharacters.Forexample,fbrthefbllowingsentence(1),wegetsentence(2)astheresultofamorphologicalanalysis1andwepickoutthethreekanjiwordsequencesshownin(3).
(1)あの千葉大学の学生が鈴木健四郎社長です(ThatstudentgoingtoChibauniversityisthepresidentSuzukiKensirou.)
(2)/あの/千葉/大学/の/学生/が/鈴木/健/四郎/社長/です/(3)/千葉/大学/,/学生/,/鈴木/健/四郎/社長/
(Chibauniversity,student,thepresidentSuzukiKensirou)
●
11
凸■618■9日●凸U■《■gaU■■口▲。▲5。●“●■■0凸r凸Ⅱ8
1
11
Anameisextractedfifomeachkanjiwordsequenceifthesequencecontains
aperson'sname・Iftheextractednameisnotconsistentwiththemorphologicalanalysisresult,wecorrectthemorphologicalanalysisresultboaccountfbrtheextractedname.
側
=
胸
。
必
’
1
I岸
I
●
。1046010998●0,●。
11
・99、8...,口■■●。、早■POD●り
り
℃。口ロ 400
’1114IHlI
■固■局一ヨ、岨毛a制画雷盈宮口F『〃DB日臥囚‐‐・■rⅡRIP,‐IIFIlら。!‐ロd●4麗勺四J刺写り心幻一心○二言RU軍sⅡ区
1111‐I、d1aqdq1■■■羽■二口■写,●■〃巴少、”B■ず▽●ロロUDDQ弧●▲▽●《r○亜●・マ0母咽2口一ODU唖■ワ△
MmthRabpvefxample,wellaveeXiracted$henamesshowninTableLFromtheka叩sequence:g/千葉/大学/'l,weextractthename"/千葉/"as
*"W:(9"EWa=Sm=""c~5iS"WiIM:aofpmgic"anal"ygAMWeW"el'Oq""m";"""健四郎/",extractedasthefiEstnamefromthekanjisequence“/鈴木/健<(四郎/社琴空典雪翌f9野is雛呼h‘塗四空唾。ふi鮒l懸藍蚕t,(;'柵誉〈‘崖茜鰕issegmentedinto"/健/"and$C/四郎""herefbre,werevisethemorph6iogi6alanalysisresulbtothesequencet:/健四郎/"PI",Vedpscribethep¥oc.jilreboejttracttheperson'snameftomtheko叩
sequence,FirstweextractkanjiwordsubsequencesasapartofthegivenkanjiwordseWWWpes,,"Wegiyeeaphl"jiwordSub"glic555MWMMi:Mthedegreetowhichthegivenkanjiwordsubsequenceappearstobeaperson'sWFWtiWeid9p""kagjiWQEdsubseqUbnCgas~5MEH;HmMGE●
::MW"_Z"="dvalue.."epll"isthekaMMi~gn"H:¥cW2"""on'snameandclassifeabytyP5(iEXtmiMMXortheiortheircombination).
Zaket#egaseofthek.Iljiwordsequence"/鈴木/健/四郎/社長/"Weextract"WWWW¥fWit"-nce,arijgdt'tiggEBMrE&chMM;tifM"W"asshownmTable2Weoutbutflig""7鈴木/健四郎/"withthemaximumscore.
乍↑雨■
B■■GeUq◆わ●●oOf6
Gが
◆血U■
2.2PersonNa工neConstructionModel凹狸・ざ
〃昨脚州に幟伽嚇ム域隙俳恥肝稲↑晦
二■●
glWWWW"sgorewhichi"icaiesthedegreetowhichthegivelwWWWWWs#。be2"s。nisnamarmbMM6nm;iEM;wweproposethePersonNameConStructionModel.
JpW9seJlameSconsistofalasinameandfirs$nameLastnamescanbo""WWP"erp":"。"P=mi5IMSr""ai4MMM:MgW"-MM9)aWh9'="・jti。n~EMEEMW6;X":ththelasbname(@中曽根"hasfbllowingbhebhreecharacterjarts
LFC="中",LMC="曽",andLLC="根".
Inthecaseofthelastname(6鈴木",thecharacterpartsare:
LFC="鈴'',LMC="'',andLLC="木".
。山
!'?‘
Ni●ざB
i1qi偏
。負
鍾謹蕊搦
.!;。
、1.0
il2
#
庫一
~●
401↑
Table2.Scorefbrthekarljiwordsubsequence
’
● I
凸1
Intipsargleway,""amescandividedintothreecharacterparts:thefirs$pos#ion9harapterl"g),bhemiddlepositionoharacter(FMCjandtiiempositioncharacter(FLC).
OUrmodelassumesthatanykanjicharacter"a''hasascorewhichindicateshow.oftenthecharacter"a''isusedasanLFC・Alsothecharacter"a''hasscoresfbrLMCandLLC.Wede伽eSi/c(q)tobetheLFCscorefbracharacter"a"、Wedefineamc(α)andSIIc(α)similarly,Bythefbllowingexpresslon,we
●
deiinethescoreSi("),whiOhigdiC.testhedegredtowhicha,cliaM5fggaaSh657
α=a1a2a3・・・α冗appears,tobealastname.
菌
rO4h
角ー
I
1lIl
sI/c(。')+EgsImc(q;)+sIIc("")si(Q)=刀
Me,sarWewW,injhe"wingexpression,wedefineihescoreSf(8),which!ndie3teSthedegreetowhichacharacterseqilence"=616263-~""";~I5beafirstname.
’
|●S"c(61)+E"Sfmc(6:)*sfec(bn)Sf(")=泥
"a!ly,i"hefbllowingexp¥essiQn,wedefineoscoreindicabingihedegreetowhichastringo'appearstobealastnameandastringPafirs;namg
1
1
ロⅡUqljⅡq4dqd■■■quIlj■qI1d■■■11’1■■d可4qJ■■■11’Ⅱ■■■可114■■911.口■■91J■■111’0■qqjI6■■q1I■■■■010■■■
s1(Q)*Sf(8)
Ifthelengthofthecharactersequenceisover2,wecancalculatethescorefbrthecharactersequence,Ifthelengthofthecharactersequenceisl,i、e.thecharactersequenceisQI=(z,,wedefinethescoresasfbllows:
si(o#)=si'((z,)
Sf(q)=Sf,(@,)瞳
8
産
■《■やり0◆、■U〃●89母●■且●6,
dB。‐
ー
や
61グ0
9
J
402■■
WewilldefineSi,((M,)andSh(",)later.
WllenWearegivenbhekanjiwordsubsequenceP=uノ1uノ2.・・T"m,weregarditasthecharactersequenceP=(z,cz2…α冗・NextwecomputeeachscoreofSI(P),S/(P)andSi(q1q2…αf)*Sf(α簡+1ai+2…。m),andoubputonewibhthe●
rnaxIrnurnscore.
Finally,wemustexplainhowtoconstructscoresofSIノc(q,)andsoon.Inthis""Yve,usedprMe-year-91d.neWspapeWtiClesasihetr5iriiXigcorpusFirSt,~W5segmexitedwordsbymorPholOgicalanalysisfbrthetrainingcofbus~Weiagii"jRWWarsRS"esUltofihWorphologWlOnalysis,5ndinadeafteqng""l9(T1)fbr.tPesenanJeaAndthenwePickedolilp6rs6fi'5MI=~MMWWWsWbEmorphologigalanalysis.ndmadearfi:equencyfablemEthesenames・T2alwayshasafrgqllencyofLNextwemefgedi1and";mjWejiiirltoofteqllenPyiable(TL)fbrlastnamesaniofigqEMMlm""FmRFFurt#er,yredividedmipt。aheqd"EMME;"me29f#hglength!"afWl"cytable(TL2)fbrFnanibsofleimb~5fover.Similarly,wegOtmlandTF2Nextiftheffequencyofthem;figEα,a2a3…α"(">1)inTL2isノ,)"eWthevaluqftotheSIjc(。,),Simc(@2),Simc(q3),…,simc(α泥_1)andS/ec(α”)WerepeatedthisprocedilrbfbralliiaIfieSinTL2,AsaresultwearrivedatscoresSI/c(α),SImc(α)andSiIc(α).AndwealsogotscoresSfノ・(g),afmc(q)andSfec(。)iithigsame、Way.
WedefinedSh(a)andSi,([z)tobethefrequencyofthelastname"a''andthefirstname"a'',sotheseScorescanbedefingdin7rL1andTFI
L¥tljf,Wp"lqintheCaSethat&c((z)orSW,(。)isequaltozero.Inthatcase,basicaJlySi(o!lgrar(o:)isdefinedtbbez"'Howa'er,iftheclig"5fsequenceQIhasthefbllowingfbrm:
lastname+firstname,
weusedlO%ofSi(Q:)asSi(QI),and10%ofSf(o!)asS/(o#),
0
0
8
p
■
1
0
雪
も
8
日
6
屯
Ⅱ
●
。
P
▲
0
■
a
》
●
ム
グ
〃
』
咽
ロ
ク
ロ
6
0
全
B
〃
。
■
ず
-
0
型
■
■
力
や
、
と
=
QpQ■PI9伊①己、
2.3Useoゼnlorphologicalanalysisresultb●冒■■g◆■
画9号●■13●U、、
叩伊 Th9PersqnNameCo"uctionModeltendstoextracttoomanynamesfrom
kanjiwordsequences・Thisoccursbecausethismodelmeasurestheappearancegithepersq"a"alihoughappearanceisaweakindicationofaPgrSSif'5IIaagTh.refbreIitisdifficulttojudgebyonlythesecharacteristicsWliem5H15mgkanjiwordsequenceisaperson'sname.
Inthjspapgr,vVeuSet"regulbofmorphologicalanalysis,iogetherwibhthoPel:sonNameConstruciionModeLFirst,wehav5appliedtilefblloWiiIgiMMg
HOWIcFp.91ogicaLanalj(sis_errorcausedby$heunregisteredpersonnamein-cludesthekanjiwordwhoselengthisl:
鼎‘I.:
癖瓜勝鰡”溌睡鯆酔雌恥脈脆剛耀伽齪躍恥姻綱胴郷函卿杣畑鋤卿郷脚
Fbrexample,thefirstnarneC:健四郎''issegmentedinto"/健/"and"/四郎/",butthissegmentationiswrong、Thismorphologicalanalysiserrorincludesthekanjiword"/健/''wh9s91e"hisLMostJaPanesendmeshaveolengihofl2";So,if9Inorphol9gi.alOnalysishasin6orrectlysegmefileaam8f5person'sname,itisclearthatakanjiwordwithlengtlfliJinclu"
〃『
ロ
合。。●“0恥19出匂9辱些■IIu句
・Ic1jI抑。4&胃いり
鼬#
#、』
F‐
ー
写
口両
403
Byusingthe.eurisiicsIrOandthedictiqnary,wecanjudgethatakanjiwordsequenceisn'taperson'sn.meltshouldbenotedthattlieheuristicslibdoesnothelpustojudgewhetherakanjiwordsequenceisaname.Ifwecanjudgethatthegivenkanjiwordsequenceisn'taname,thescoreiszero,andifwecannotjudge,thescoreisobtainedbyusingthePersonNameConstructionModel.
Next,fbrthekanjiwordsequencewhichincludesakanjiwordwithlengthl,weusethefbllowingheuristics.
’91ロリOb90寺Ⅱ588PBB■
1
1
.’
●41‐IIIjllImll1..1川lT11‐1
!!I
l1II‘|::!1'!i,ノ、へ ’ 4
83,,1.
弓・'1。;
藤|'il.!!■
e*
■●庁
&ド
I!##111,1
’H11famorphologicalanalysiserrorcausedbytheunregisteredpersonname
includesthekanjiwOrdwhoselengthis2,thiskanjiwordisaperson'sname.●ⅡⅡ011■■0Ⅱ4 ●Intheaboveexample,themorphologicalanalysiserror(segmentationinto
"/健/''and"/四郎/")fbrtheffrstname<:健四郎"includesthekanjiword"/四郎/''withlength2,andthiswordisaperson'sname.TheheuristicsH1seemstenuous.Howeverweconfirmedittobeeffectivebythefbllowingexperiment.Firstwepickedpersonnameswithlength3ftomthedictionary.IfthepickedwordhasacharacterstringofAr,Ar2AF3,wemadethechargcterstringsル1k2andk2k3,andcheckedwhetherAF1AF2ork2&3isaperson'sname、78.0%ofthepickednamesAM2orAT2AF3resultedaspersonnames.ThisexperimentshowsthattheheuristicsH1iseffective.
ByusingtheheuristicsH1,wecanjudgethatakanjiwordsequenceisnotaperson'snameAgainnotethattheheuristicsHlcannotjUdgethatakanjiwordsequenceisaperson'sname.Ifwecanjudgethatthekanjiwordsequenceisn'taperson'sname,thescore.iszero,andifwecannotjudgeit,thescoreisobtainedbythePersonNameConstructionModel.
Lastlyweusethefbllowingheuristics:
’900
0
1申
illl
#
|;℃il,..1
◆opDp?。■、■《■●、①4申F1年。。,I,申●皇■◆①■Jrβ◆◆q‐■8二■gl1にLI‐む6■や申■’
凸
!
I■。
''’1iiil柵
iii,1
11M
I
茸
■Ⅱ11.1■■やb■9ⅡI60llllU▲0110■901もqIql■IIh
G○吋d■車間G1‐J■5M0口Ⅱ叩f判旧
H2<Cnumeralword+su缶xword"isnotaperson'sname
Thispatternappearsfi:equently.Thekanjiwordsequence#G/千/円/"isanexampleofthis.Weassumethatthesekanjiwordsequencesarenotpersonnarnes.
▲
■町H1
2.4Collectionofrevisionerror
EvenifweusetheproposedmodelandheuristicsHO,H1andH2,somekanjiwordsequencesarejudgedwronglyaspersonnames.Howeverthefrequencyofthesewrongrevisionpatternsislow,andwegatheredfrequentrevisionerrorstoavoidtheseerrors.
First,wedidmorphologicalanalysisonaPartofthetrainingcorpus2.Nextwerevisedmorphologicalanalysiserrorswithoursystem.Wecollectedrevised
personnames,andmadeafrequencytablefbrthenames・Becausethefrequencyofgeneralpersonnamesislow,nameswithhighfrequencyareregardedaSwrong
210%oftrainingcorpus
bこり寺06ⅡⅡⅡ■■0,恥18伽■加ロP0肝Ⅲ胴OrL陣8,脚的I限限0 ’ト
ー
-
『唾■脚
草■■■Pvg■0包句●YUD■・呈〃.もり合U凸甲巳、■■・●0●弓。■・P●●Bo
qjq■■■■■■■■■■■■■匹■■■Ⅱ
c
Fm唯膿弔‐‐邸朏.陛團田口國囿哩已一彊逼1-■旦剥ゞ1脳.‐、、11‐.‐:1,1..『”曾今,・・.『●..●ず1;‐・‐’1‐掩少I.’”・一同乱Ⅱ
404
revisions.Throughtheseexperiments,weregisteredthefbllowinglOphrasesasnon-narnes.
“日米",“対米",“対中",“国問",“花博",“各行",“一一極",“信金",“安門",‘‘日債”
3Experilnent
ToconfirmthatourproposedmodelisusefUlandeffective,wepickedl,095sentenceshomthebeginningofnewspaperarticles3,andexperimentedWiththem.WedidamorphologicalanalysisofthesesentencesusingtheJUMANSysteIn41nvestigatingtheresultsofthemorphologicalanalysigonthem,W6fbulld51errOrs(42kinds)causedbyunregisteredpersonnames.OursyStempevised58phrases(41kinds)thatresultedfiomthemorphologicalanalysisAcorrectionwasmadeon37phrases(28kinds).Thisresultsliowsthat~theprecisionratewas63.8%andtherecallratewas72.5%.
ThecorrectionsareshowninTable3.
Table3・Righbrevisions■
抄
。■一■●少
$■。●』・1.●■&■己■0②4α《○K一坪写出凸角冒口が●。■70の。‐&●g■B■。◆LI0卜○肝0F。』TI●
己夕。.●g』ザ●。’』■届月靴・PL,rで。、。。。‐Ppc:Friが。唖↑拙略爽醐両hhに
OUr_systemcouldnotdetectl4morphologicalanalysiserrors(13kinds)causedbyunregisteredpersonnames.WehaveclassifiedthereasonsfbrthiSintothefbllowing4types.
1.Segmentationofaregisteredwordiswrong(2errors,2kinds).Thesetwokanjiwordsequencesweresegmentedasfbllows:
-/井上/雅/晶代/表/(Rightsegmentationis/井上/雅晶/代表/)-/31/日田/嶌/徳弘/(Rightsegmentationis/31/日/田嶌/徳弘)
3MainichiShinbun'95CD-ROM.
4JUMANisastandardJapanesemorphologicalanalysissystem
弾80FF1一画届麺湿唖秤》畢副誰郵、和皿副『醐認銅函一
OLBa印巾■DqL■Pl4■qI■Pl4j・1PlJqIL則h4D
宝
齢、、40℃即00《
の
即どら0ヴヰテワ巴
‐~五
kanjiwordsequence.
/吉村/午/良/知事/‐-/橋本/大/二郎/知事/
/小/渕/恵三/自民党/副総裁//木/見/金治郎/門下/腸罷詞
/沢/近/脚國砺鵬
幸年■
correcbion
/吉村/午良/(lastname/firstname)/橋本/大二郎/(lastname/firstname)/小渕/恵三/(lastname/firstname)/木見/金治郎/(lastname/firstname)/米長/(lastname)/沢近/(lastname)
/岩國/哲人/(lastname/firstname)●。■
幻
IIl|#!II,!i00 ;’’i:
ヤli・Iq
巾■l|#.!(,!i00 ;’’1:ヤli・Iq
巾■
『『H叩■且同]■■『J■14頁4胡叫O則 O■
405405
Theregisteredwords("/代表/''and"/日/'')werealsowronglysegmentedlikethoseabove.Becauseoursystemassumesthabtherearenoneofthese
typesoferrors,oursystemcannotextractthenameorrevisethistypeoferror.
2.Person'snameisfbreign(7errors,2kinds).Forexample,"/"/仕/梅/"and"/王/文/煥/''aremorphologicalanalysiserrors.ButthesenameareChineseorKoreannames.
BecauseourproposedmodelisbasedonheuristicsfbundedonJapanesepersonname,ourmodelcanbasicallynotdetectthistypeoferror.
3.Person'snameisold(3errors,3kinds).Thethreenamesare"/大橋/宗/桂/'',"/本因坊/算/砂/"and"/算/砂/".Be-causeweusedcurreninewspapersasthetrainingcorpus,itisdi団culttodevisethemodelparametersfbroldJapanesenemes・Thesenamearenotcoveredbyourmode1.
4.Person'snameisveryrare(2errors,2kinds)、●
Thesenameare"/楠/部/彌/弍/''and"/田原/護/立/".Ourmodelmustre-visetheseetrors,butwasnotabletodothis.
Theregisteredwords("/代表/”and"/日/")werealsowronglysegmentedlikethoseabove・Becauseoursystemassumesthabtherearenoneofthese
typesoferrors,oursystemcannotextractthenameorrevisethistypeoferror、
2.Person'snameisfbEeign(7errors,2kinds).Forexample,“/鐘/仕/梅/”and“/王/文/煥/''aremorphologicalanalysiserrors,ButthesenameareChineseorKoreannames,
BecauseourproposedmodelisbasedonheuristicsfbundedonJapanesepersonname,ourmodelcanbasicallynotdetectthistypeoferror、
3Person'snameisold(3erroEs,3kinds).Thethzeenamesare"/大橋/宗/桂/","/本因坊/算/砂/"and"/算/砂/"・Be‐causeweusedcurrentnewspapersasthetrainingcorpus,itisdi団culttodevisethemodelparametersfbroldJapanesenames・Thesenamearenotcoveredbyourmode1.
4.Person,snamelsveryrare(2errors,2kinds).●
Thesenameare"/楠/部/彌/弍/"and"/田原/護/立/"、Ourmodelmustre‐visetheseetrors,butwasnotabletodothis.
1111●
Onlythe4therrortypehasbeenunsatisfactoryinourproposedmodel,soitisreasonabletoassumethatourproposedmodeliseffective.
Nextweclassifyrevisionerrors(19errors,16kinds)into4typesasfbllows.
Onlythe4therrortypehasbeenunsatisfactoryinourproposedmodel,soitisreasonabletoassumethatourproposedmodeliseffective,
NextweclassifyエevisioneErors(19eEroZs,16kinds)into4typesasfbllows.亙り0。△Ⅳ,IrIIIII011‐LlD0■Ⅱ1人67.◆
刊呵0叩0列剤I川1JⅢ叩引引引、。e6ll0
亙り0。△Ⅳ,IrIIIII011‐LlD0■Ⅱ1人67.◆
刊呵0叩0列剤I川1JⅢ叩引引引、。e6ll0
月0
-Thedetectionthatthegivenkanjiwordsequenceincludesaperson'sname,issuccessfill,butrevisionfails(4errors,3kin"),___¥InthecaseOfmorphologicalanalysiserror"/楠/部/彌ソ弍/",werevised"/楠/部/"to"/楠部/''(lagtname),butthisiswrongTherighirevisionig¥/楠部/彌弍/"(lastilarrie/firs、name)In"c"_"".m.hassuccessfUllydetettedthaithekanjiwordsequence((/楠/部/彌/弍/''includesaperson'snarne、
-iminregisteredpropernounwhichisnotaperson'snameisrevised(9e亜ors,7kinds)=Inthdcaseof'morphologicalanalysiserror"/星/島/日報/",werevised"/星/島/''to<@/星島''(laSiname)ThekanjiwordsequenceiI4星/島/"isanuiregistered.prop6rribun,andthe"tSegmentationis¥/星島/".SoourrevisIoniseffective,buttheword"/星島/"isnotaperson'snama
-TheunregisteredpiopernounwhiChisnotapersonnameisdetected,buttherevisionfails(5errors,5kinds).Inthecaseofmotphologicalanalysiserror"/油/麻/地/",werevised"/麻/地/''to"/麻地/"(lastname).Thekanjiwordsequence_:Y油/麻/地/"isanuriregistefedprbpetnoun,andtherightsegmentatioIlis_"/油麻地/''.Wesuc-6EMllydet5ctetlthattliiskanjiwordsequenceinclUdesunregist".words,butfaiibdtojudgetheunregisteredwordisaperson'sname,andtheseg-mentationbritfailed.
-"hltsofmorphologicalanalysiswerecorrectlyrevisedilerror,"mForexample,Wecorrectlyrevisedthesegmentation<&/東/p/''to"/東口/"(lastname).
一Thedetectionthatthegivenkanjiwordsequenceincludesaperson'sname,issuccessfUl,butrevisionfails(4errors,3kinds).”InthecaseofmorphologicalanalysiserEol:"/楠/部/彌ソ弍/",werevised"/楠/部/"to"/楠部/"(lastname),butthisiswrong、Therightrevisionis"/楠部/彌弍/"(lastname/firstname).Inthiscase,thesystemhassuccessfUllydetectedthatthekanjiwordsequence"/楠/部/彌/弍/''includesaperson,snarne、
一TheunregisteredpmpernounwhichisnotapeEson'snameisrevised(9e亜o暉s’7kinds)Inthecaseofmorphologicalanalysiserror“/星/島/日報/",werevised“/星/島/"to"/星島/”(lastname).Thekanjiwordsequence"/星/島/"isanunregisteredpropernoun,andtherightsegmentationis“/星島/".Soou砥Eevisioniseffective,buttheword"/星島/"isnotaperson'snama
-Theunregisteredpropernounwhichisnotapersonnameisdetected,buttherevisionfails(5erEors,5kinds).Inthecaseofmomphologicalanalysiserror"/油/麻/地/",werevised"/麻/地/"to"/麻地/"(lastname).ThekanjiwoEdsequence"/油/麻/地/"isanunregisteredpropernoun,andtherightsegmentationis"/油麻地/".Wesuc‐cessMlydetectedthatthiskanjiwordsequenceincludesunregisterwords,butfailedtojudgetheunregisteredwordisaperson'sname,andtheseg‐mentation化ritfailed・
一Resultsofmorphologicalanalysiswereco江ectlyrevised(lerror,1kind).Forexample,wecorrectlyrevisedthesegmentation"/東/口/”to"/東口/”(lastname).
内合
←
、山肌けqpWM肌囚UfP皿戸印も●KPHPg■19①。U“■●●B90np
、山肌けqpWM肌囚UfP皿戸印も●KPHPg■19①。U“■●●B90np
.I
111111111
1●
II1141
1141
あ了展。§
■■qg凸I
I’
可■8ⅡⅡHⅡ00.0ⅡIIIIIII型
可■8ⅡⅡHⅡ00.0ⅡIIIIIII型
■「
ー
ソ旬 言■■■■Ug●。“■■■B6bDB0凸■O守gろ■①▲dG・寺
rBⅡpIlfIHDhP
q○句今一 406
Generally,w:cannotjujgeWiillo"contexbualinfbrmationwhetheraprope[ggunisapgFson'sPnameomojT.erefbre,wecannotavoidthe2nd"ef5fKThOrecogI1itiqnqf.anunregisteredwordisusefillinNLPsysternSX:fbf~Hifsystem,onlythe4therrorisregardedasafailure.
Inconclusion,WeshouldnotethatourproposedmodelisusefUlandeffective.
111
T
凸
■
日
日
■
0
9
,
口
■
j
8
g
o
。b旬◆缶q■■4△0,■q●■■E●■ご●of■9,。
4Relnarks
TheaWofoursJ'Stem.istheautomaticrevisionofmorphologicalanalysiserrorocallsed.byunregistgredpersonnamesHowever,oursjStemtan65IggJ=~fiigextFactiQJlsystemfbrperSonnamesInrecentyears,tileinfbrmation&"EtiBIIsystemshavebeenactivelyresearched{411nthesesystems,itisimportantto"rrectlYpxtrqcjp.rsoxlnamesfromteXtSmOursysiemisdsefillinfiiE=M2heprpblem3ofextracting.perso"amesarbclassinedintothefbn6m'""P55:Thesetypephenomenamakeitdifficulttoextractnames.
1MorphologicalanalysiserrorscausebyunregisteredwordsIWxWRl9,WEiWsegmentationfbrthgcharactersequence::鈴木健四郎"is"/鈴木/健四郎/F',butamorphologicalanalysiswrdnglysegmentSitas@@/鈴木/健/四郎/",becausethefirstnaine:催四郎"isunregistered.
2.Assignmentofpartofspeechfails.
W9Fggmple,afmorphologicalanalysiscorrectlysegrnents(:細川正"as<C/細川/正/",butthepartofspeechfbr"細川"isassignedasageneralnoun.Thisiswrong.Thepartofspeechfbr"細川"istheperson'sname.
3."eWordis9qFrec$!yjudgedOsaperson'snameuponrmorphologicalanalysis,butthewordisnotaperson'snameintheconteXt
mWmple,。morphologicalanalysiscorrectlysegments<:松下塾"as::/松下/塾/",andthepartofspeechfbr:@松下"iscorrectlyassignedasaperson's
●
name.However,ininfbrmationextraction,theword"松下''shouldnotbee¥tr@ptedasaperson'sname,becausethephrase"/松下/塾/''istheorga-nizationname.
Oursys#efncaqbqusgfulinsolyingthefirstproblem.The2ndand3rdprob-lemscannotbesolvedwithoutcoliteftualinfbrfnation、Contextualinfbrmationisals9use"fbgthelstproblemHowever,asmentionedintheinif6aEM;gyeIlthemethojusingcoILbextualinfbrmationneedstojudgewhebhefMEMw9Wd¥queJlcejsaperson'SnameOrnohAndourmodel&nbeusedi6g:fhefwith,a!IIMletho4sllsingFontex、ualinfbrmationTheimprovementofiheifHIIEl"hichjudggswUetherthegivenwordsequenceisapers6n'snameofn56,aiigE"improvestheextractionsystemofpersonnames.
AMiofoursystemisthatscoresaredeiinedbyheuristicmethodWeshoulddeanescoresbyprobability.However,itisuncleathoWi6iilaMEc6f5c9"espondtptheprobability,andhowtodetermineprobabilitiesAdefinitidnQf,thegcorelWdonfrequencylikeoursystemissinmle,anJW5fkg~vfarSaIF
●●
siderationofthisaspectwillimproveourSystem
p
Q
U
b
G
●
●ヤロも
やp0p・48fU4やGRU自幻″vo001g間OPD■gzJUnクQfG0刀80△刃4
の長
勺ら
。け
14戸33q梱Ⅶ1吋1幻日11門刈り1刑1W川4勺1口隔鼎11
●」
ii:
#1,2.
;
§〃
ロ乳
具f》&
ii
‐‐‐=』
.~~~1
』
r■■-
0虹
=一=一
周
’’り、
51al
‐!〃
〆407407
II
』』
OurmodeldealswithJapanesenamesandnotfbreignnames.However,fbr_eignnamesexpressedbykalLjichgractersarealmostalwaysChinesenamesor
Koreannames.TherearealimitednumberoflastnamesofChineseandKorean,andthereisaheuristicthatthelengthofthelastnameislandthelengthofthefirstnameis2[21.WebelievethatitiseasytorecognizeunregisteredChinesenamesandKoreannamesinJapanesetexts.
OurmodeldealswithJapanesenamesandnotfbreignnames、However,fbr-eignnamesexpressedbykanjicharactersarealmostalwaysChinesenamesor
Koreannames,TherearealimitednumberoflastnamesofChineseandKorean,andthereisaheuristicthatthelengthofthelastnameislandthelengthofthefirstnameis2[21.WebelievethatitiseasytorecognizeunregisteredChinesenamesandKoreannamesinJapanesetexts.
;;■■■甲edqD■■Ⅱ09曲90.0○
申
▲EUgoCqOa8
■■■甲edqD■■Ⅱ09曲90.0○
申
▲EUgoCqOa8
5Conclusion5Conclusion
1
11
1
Inthispaper,wepresentedthemethodtoautomaticallyrevisemorphologi-calanalysiserrorscausedbyunregisteredpersonnames.Themainpartofour
methodisthemoduletogivethewordsequenceascorewhichindicatesbhedegreetowhichitappearsaperson'sname・Toimplementthismodule,wepro-
posedthePersonNameConstructionModelwhichappliestheheuristicruleonkanjicharacterscomposingJapanesenames.Throughtheexperiment,wehave
shownthatourproposedmodeliseffectiveandusefUl・Theproblemofourrevi-sionsystemishowtode伽escores.Forthisproblem,theimportofprobabilitymaybeeffective.ThisisourfUturetask.
Inthispaper,wepresentedthemethodtoautomaticallyrevisemorphologi‐calanalysiserrorscausedbyunregisteredpersonnames,Themainpartofourmethodisthemoduletogivethewordsequenceascorewhichindicatesbhedegreetowhichitappearsaperson,sname・Toimplementthismodule,wepro‐posedthePersonNameConstructionModelwhichappliestheheuristicruleonkanjicharacterscomposingJapanesenames、Throughtheexperiment,wehaveshownthatourproposedmodelise鮭ctiveandusefUl・Theproblemofourrevi‐sionsystemishowtode伽escores,Forthisproblem,theimportofprobabilitymaybeeffective、ThisisourfUturetask.
●
どLppfrざ。■7。■肌◇6Qザ日价、朝a■■▽由■Ua.、己ら6,0.89凸I出FD側・曲、I
旧10心■私I砿Tパイいり心呵069〃心河?02期蛆珊咽叩く
■。。●
どLppfrざ。■7。■肌◇6Qザ日价、朝a■■▽由■Ua.、己ら6,0.89凸I出FD側・曲、I
旧10心■私I砿Tパイいり心呵069〃心河?02期蛆珊咽叩く
■。。●
車hOol凸I
車hOol凸I
lIIIIi
e0bg・D9q■Boo0卓Ⅱ
lIIIIi
e0bg・D9q■Boo0卓Ⅱ
9ケ錘一雨写
$
AcknowledgxnentsAcknowledgxnents哩F8lll0IIIllI
哩F8lll0IIIllI
1
3
』
、
■
日
日
3
』
、
■
日
日
WeusedNikkeiShibunCD-ROM'90andMamichiShibunCDFROM'95asthe
corpus.TheNihonKeizaiShinbuncompanyandtheMainichiShinbuncom-panygaveuspermissionofuseoftheircollections.Weappreciatetheassistancegranbedbybothcompanies.
WeusedNikkeiShibunCD-ROM'90andMainichiShibunCDデROM'95asthe
corpus・TheNihonKeizaiShinbuncompanyandtheMainichiShinbuncom戸panygaveuspermissionofuseoftheircollections,Weappreciatetheassistancegranbedbybothcompanies.
I■6勺b■U〃pも。●心▲96◆①
■b■9600
89国88日ⅡJLJ0B■E6.96巳q△rPD凸
I■日日OH4r0E89d08ざ500屯8f0IqBⅡ
0。●0■■■■●
■■、□0。。■UpP4・巳・
■寺■■8●
66■ⅡUFhD・日100ⅡⅡI■Ⅱ600◆◆10
I■6勺b■U〃pも。●心▲96◆①
■b■9600
89国88日ⅡJLJ0B■E6.96巳q△rPD凸
I■日日OH4r0E89d08ざ500屯8f0IqBⅡ
0。●0■■■■●
■■、□0。。■UpP4・巳・
■寺■■8●
66■ⅡUFhD・日100ⅡⅡI■Ⅱ600◆◆10
!;!!:‘1
1!■!;!!:‘1
1!■
RefbrencesRe企rences 月功00”ず:卜声IpLO0間
●■・口冒Ba凸■・■○恥.寺■恥0■即一呂巳由■■
月功00”ず:卜声IpLO0間
●■・口冒Ba凸■・■○恥.寺■恥0■即一呂巳由■■
0BUG■U夢』■■0“Dも,aFHr■■■甲0■〃①凸■■■,、早■●●・gD9DrBL■■■8F日■B■6■日日■Hgも■□■■■日日日日■D■UIU80■田ひ8
0BUG■U夢』■■0“Dも,aFHr■■■甲0■〃①凸■■■,、早■●●・gD9DrBL■■■8F日■B■6■日日■Hgも■□■■■日日日日■D■UIU80■田ひ8
1.Bikel,D.,Miner,S.,Schwartz,R.andWeischedel,R.:<GNymble:aHigh-PerfbrmanceLearniilgNanie-Rndel:",Proc:ofANLP-97,pp、194-201(1997)
2.Chen,H-H.andLee,J-C.:"IdenbificationandOlassilicationofProperNounsinChineseTbxts'',Proc.ofCOLING-96,Pp.418-424(1996).
3.Chen,K-J.andLiu,S-H.:!cWordldentilicationfbrMandarinOhineseSentences'',
Proc.ofCOLING-92,pp.101-107(1992).4.Grishman,R.andSundheim,B.:"MessageUnderstandingConference-6:ABriefHistory",Proc.ofCOLING-96,pp.446-471(1996).
5.Sekine,S.,Grishman,R.andShinnou,H.:"ADecisionTreeMethodfbrFindingandClassifyingNamesinJapaneseTexts",Proc.ofWVLC-6,toappear(1998).
6.Strzalkowski,T.andWang,J.:"ASelfLearningUniversalConceptSpotter'',Proc.ofCOLING-96,pp.931-936(1996).
7.Wakao,T.,Gaizuska,R.andWilks,Y.:"EvaluationofanAlgorithmfOrtheRecogni-tionandClassificationofProperNames'',Proc.ofCOLING-96,pp.418-424(1996).
8.Wang,L-J.,Li,W-C.andChang,C-H.:<GRecognizingUnregisteredNamesfbrMan-darinWordldentification'',Proc.ofCOLING-92,pp.1239-1243(1992).
1.BikeI,,.,Miuer,S、,Schwartz,R・andWeischedel,R、:“Nymble:aHigh-PerfbrmanceLeamingNanie-findex:",Proc、ofANLP-97,pp、194-201(1997)
2.Chen,H-H・andLee,J-C.:“IdenbificationandClassiicationofProperNounsinChinese亜xts",Proc,ofCOLING-96,Pp、418-424(1996).
3.Chen,K-J,andLiu,S-H.:“WordldentiIicationfbrMandarinOhineseSentences,',Pにoc,ofCOLING-92,pp、101-107(1992).
4.Grishman,R・andSundheim,B、:“MessageUnderstandingConference-6:ABriefHistory",Proc、ofCOLING-96,pp,446-471(1996).
5.Sekine,S、,Grishman,R・andShinnou,H、:“ADecisionTreeMethodfbrFindingandClassifyingNamesinJapaneseTexts",Proc・ofWVLC-6,toappear(1998).
6.Strzalkowski,T、andWang,』.:“ASelfLeamingUniversalConceptSpotter",Proc・ofCOLING-96,pp、931-936(1996).
7.Wakao,T、,Gaizuska,R、andWilks,Y、:“EvaluationofanAlgorithmfbrtheRecogni-tionandClassificationofProperNames",Proc・ofCOLING-96,pp、418-424(1996).
8.Wang,L-J.,Li,W-C、andChang,C-H.:《‘RecognizingUnregisteredNamesfbrMan‐darinWbEdldentification",Proc,ofCOLING-92,pp,1239-1243(1992).
IIii1l
旬も■《■■00■咽J咽■0■叩
IIii1l
旬も■《■■00■咽J咽■0■叩 ●
IC・40U●▲UBbIC・40U●▲UBb
40画勺出町諒叱珊J比捌口肌凹■用‐‐州釘円缶司ゴーl可1UI叩4j射回℃11‐‐‐1,’’1川・川I引帥Ⅱ印0訓帥叩町Ⅱ4吋敏出足再自猷rU故hPrトサ日砂ト日困L回冊凹t
aQ早9
90112幻Ⅱ10J募り〃6J、olUIイ0幽私m0刈削4.恥刈Ⅷ川緬川棚叩汕小川恥添私.
■■■■■■④。■■■■■一■U■凸■■■■■■
llIIIblllI600●50口■b
甲
q9aO1Od●■80●。。●■印巳
40画勺出町諒叱珊J比捌口肌凹■用‐‐州釘円缶司ゴーl可1UI叩4j射回℃11‐‐‐1,’’1川・川I引帥Ⅱ印0訓帥叩町Ⅱ4吋敏出足再自猷rU故hPrトサ日砂ト日困L回冊凹t
aQ早9
90112幻Ⅱ10J募り〃6J、olUIイ0幽私m0刈削4.恥刈Ⅷ川緬川棚叩汕小川恥添私.
■■■■■■④。■■■■■一■U■凸■■■■■■
llIIIblllI600●50口■b
甲
q9aO1Od●■80●。。●■印巳
19
19
1閲I山川用■叩■Ⅶ出川Ⅲ沮叩鉛皿州日閃岫Ⅲ叩凹帥賄f9やF伺蛆紗伽I間・FF0先■刃W朋叩Ⅶ叫曲叩岫叩叩ⅡhⅡⅡⅡⅡ叩ⅡⅡⅡⅡⅡⅡⅡⅡ伽ⅢMⅡⅡⅧ皿0F80
1閲I山川用■叩■Ⅶ出川Ⅲ沮叩鉛皿州日閃岫Ⅲ叩凹帥賄f9やF伺蛆紗伽I間・FF0先■刃W朋叩Ⅶ叫曲叩岫叩叩ⅡhⅡⅡⅡⅡ叩ⅡⅡⅡⅡⅡⅡⅡⅡ伽ⅢMⅡⅡⅧ皿0F80
1
1
’
1
1
’F『
愉卜■もF亀白眼■L鞭、Jグ時、守佃脾伽ⅢⅡ即ヨ■ⅧⅡ肋帥Ⅱ郡ⅡⅢ叩Ⅱ師ⅡⅢⅧⅢ即いりい□砂
愉卜■もF亀白眼■L鞭、Jグ時、守佃脾伽ⅢⅡ即ヨ■ⅧⅡ肋帥Ⅱ郡ⅡⅢ叩Ⅱ師ⅡⅢⅧⅢ即いりい□砂
*●
I8Iblい■・日日11090■01NⅡ10Ⅱl0gi009■
I8Iblい■・日日11090■01NⅡ10Ⅱl0gi009■
ニ ー