![Page 1: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/1.jpg)
Morphology
11-711AlgorithmsforNLP21November2017– PartI
(SomeslidesfromLoriLevin,DavidMortenson)
![Page 2: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/2.jpg)
TypesofLexicalandMorphologicalProcessing
• Tokenization• Input:rawtext• Output:sequenceoftokensnormalizedforfurtherprocessing
• Recognition• Input:astringofcharacters• Output:isitalegalword?(yesorno)
• MorphologicalParsing• Input:aword• Output:ananalysisofthestructureoftheword
• MorphologicalGeneration• Input:ananalysisofthestructureoftheword• Output:aword
![Page 3: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/3.jpg)
Butfirst:Whatisaword?
• Thethingsthatareinthedictionary?• Buthowdidthelexicographersdecidewhattoputinthedictionary?
• Thethingsbetweenspacesandpunctuation?• Thesmallestunitthatcanbeutteredinisolation?
• Youcouldsaythiswordinisolation:Unimpressively• Thisonetoo: impress• Butyouprobablywouldn’tsaytheseinisolation,unlessyouweretalkingaboutmorphology:• un• ive• ly
![Page 4: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/4.jpg)
Sowhatisaword?
• Cangetprettytricky:• didn’t• would’ve• gonna• shoulda woulda coulda• Ima• blackboard(vs.schoolboard)• baseball(vs.golfball)• thepersonwholeft’s hat;JimandGregg’s apartment• acct.• LTI
![Page 5: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/5.jpg)
About1000pages.$139.99
Youdon’thavetoreadit.
Thepointisthatittakes1000pagesjusttosurveytheissuesrelatedtowhatwordsare.
![Page 6: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/6.jpg)
Sowhatisaword?
• Itisuptoyouorthesoftwareyouuseforprocessingwords.• Takelinguisticsclasses.• Makegooddecisionsinsoftwaredesignandengineering.
![Page 7: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/7.jpg)
Tokenization
![Page 8: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/8.jpg)
Tokenization
Input:rawtextOutput:sequenceoftokens normalizedforeasierprocessing.
![Page 9: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/9.jpg)
Tokenization
• SomeAsianlanguageshaveobviousissues:�)����2+���#0������22%�63,7*4 ��2+$���5�����2+$�'�!.�
• ButGermantoo:Noun-nouncompounds:Gesundheitsversicherungsgesellschaften
• Spanishclitics:Darmelo• EvenEnglishhasissues,toasmalldegree:GreggandBob’shouse
![Page 10: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/10.jpg)
Tokenization
• SomeAsianlanguageshaveobviousissues:�)����2+���#0������22%�63,7*4 ��2+$���5�����2+$�'�!.�
• ButGermantoo:Noun-nouncompounds:Gesundheits-versicherungs-gesellschaften (health
insurancecompanies)• Spanishclitics:Darmelo• EvenEnglishhasissues,toasmalldegree:GreggandBob’shouse
![Page 11: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/11.jpg)
Tokenization
• SomeAsianlanguageshaveobviousissues:�)����2+���#0������22%�63,7*4 ��2+$���5�����2+$�'�!.�
• ButGermantoo:Noun-nouncompounds:Gesundheitsversicherungsgesellschaften
• Spanishclitics:Dar-me-lo(Togivemeit)• EvenEnglishhasissues,toasmallerdegree:GreggandBob’shouse
![Page 12: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/12.jpg)
TokenizationInput:rawtext
Dr. Smith said tokenization of English is “harder than you’ve thought.” When in New York, he paid $12.00 a day for lunch and wondered what it would be like to work for AT&T or Google, Inc.
OutputfromStanfordParser:http://nlp.stanford.edu:8080/parser/index.jspwithpart-of-speechtags:
Dr./NNP Smith/NNP said/VBD tokenization/NN of/IN English/NNP is/VBZ ``/`` harder/JJR than/IN you/PRP 've/VBP thought/VBN ./. ''/’’When/WRB in/IN New/NNP York/NNP ,/, he/PRP paid/VBD $/$ 12.00/CD a/DT day/NN for/IN lunch/NN and/CC wondered/VBD what/WP it/PRP would/MD be/VB like/JJ to/TO work/VB for/IN AT&T/NNP or/CC Google/NNP ,/, Inc./NNP ./.
![Page 13: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/13.jpg)
MorphologicalPhenomena
![Page 14: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/14.jpg)
WhatisLinguisticMorphology?
• Morphologyisthestudyoftheinternalstructureofwords.
• Derivationalmorphology. Hownewwordsarecreatedfromexistingwords.• [grace]• [[grace]ful]• [un[grace]ful]]
• Inflectionalmorphology. Howfeaturesrelevanttothesyntacticcontextofawordaremarkedonthatword.• Thisexampleillustratesnumber(singularandplural)andtense(presentandpast).• Greenindicatesirregular.Blueindicateszeromarkingofinflection.Redindicatesregularinflection.• This student walks.• These studentswalk.• These students walked.
• Compounding. Creatingnewwordsbycombiningexistingwords• Withorwithoutspaces:surfboard,golfball,blackboard
![Page 15: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/15.jpg)
Morphemes
• Morphemes.Minimalpairingsofformandmeaning.
• Roots. The“core”ofawordthatcarriesitsbasicmeaning.• apple :‘apple’• walk :‘walk’
• Affixes (prefixes,suffixes,infixes,andcircumfixes).Morphemesthatareaddedtoabase(arootorstem)toperformeitherderivationalorinflectionalfunctions.• un- :‘NEG’• -s :‘PLURAL’
![Page 16: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/16.jpg)
LanguageTypology
![Page 17: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/17.jpg)
TypesofLanguages:
• Inorderofmorphologicalcomplexity:• Isolating(orAnalytic)• Fusional(orInflecting)• Agglutinative• Polysynthetic• Others
![Page 18: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/18.jpg)
IsolatingLanguages:ChineseLittlemorphologyotherthancompounding
• Chinese inflection• fewaffixes(prefixesandsuffixes):
• � "��� ������ mén:wǒmén,nǐmén,tāmén, tóngzhìménplural:we,you(pl.),theycomrades,LGBTpeople
• “suffixes”thatmarkaspect:- -zhě ‘continuousaspect’• Chinesederivation• /&� yìshùjiā ‘artist’
• Chineseisachampionintherealmofcompounding—upto80%ofChinesewordsareactuallycompounds.
( + 1 → (1
dú fàn dúfàn
‘poison,drug’ ‘vendor’ ‘drug trafficker’
![Page 19: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/19.jpg)
AgglutinativeLanguages:SwahiliVerbsinSwahilihaveanaverageof4-5morphemes,http://wals.info/valuesets/22A-swa
Swahili English
m-tu a-li-lala ‘Thepersonslept’
m-tu a-ta-lala ‘Thepersonwillsleep’
wa-tu wa-li-lala ‘Thepeopleslept’
wa-tu wa-ta-lala ‘Thepeople willsleep’
• Wordswrittenwithouthyphensorspacesbetweenmorphemes.• Orangeprefixesmarknounclass(likegender,exceptSwahili hasnineinsteadoftwoor
three).• Verbsagreewithnounsinnounclass.• Adjectivesalsoagreewithnouns.• Veryhelpfulinparsing.
• Blackprefixesindicatetense.
![Page 20: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/20.jpg)
TurkishExampleofextremeagglutinationButmostTurkishwordshavearoundthreemorphemes
uygarlaştıramadıklarımızdanmışsınızcasına�(behaving)asifyouareamongthosewhomwewerenotabletocivilize�
uygar �civilized�+laş �become�+tır �causeto�+ama �notable�+dık pastparticiple+larplural+ımız firstpersonpluralpossessive(�our�)+dan ablativecase(�from/among�)+mış past+sınız secondpersonplural(�y�all�)+casına finiteverb→adverb(�asif�)
![Page 21: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/21.jpg)
Operationalization
• operate(opus/opera+ate)• ion• al• ize• ate• ion
![Page 22: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/22.jpg)
FusionalLanguages:Spanish
Singular Plural
1st 2nd 3rdformal 2nd
1st 2nd 3rd
Present am-o am-as am-a am-a-mos am-áis am-an
Imperfect am-ab-a am-ab-as am-ab-a am-áb-a-mos am-ab-ais am-ab-an
Preterit am-é am-aste am-ó am-a-mos am-asteis am-aron
Future am-aré am-arás am-ará am-are-mos am-aréis am-arán
Conditional am-aría am-arías am-aría am-aría-mos am-aríais am-arían
![Page 23: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/23.jpg)
PolysyntheticLanguages:Yupik
• Polysyntheticmorphologiesallowthecreationoffull“sentences”bymorphologicalmeans.• Theyoftenallowtheincorporationofnounsintoverbs.• Theymayalsohaveaffixesthatattachtoverbsandtaketheplaceofnouns.• YupikEskimountu-ssur-qatar-ni-ksaite-ngqiggte-uqreindeer-hunt-FUT-say-NEG-again-3SG.INDIC‘Hehadnotyetsaidagainthathewasgoingtohuntreindeer.’
![Page 24: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/24.jpg)
Root-and-PatternMorphology:Arabic
• Root-and-pattern.A specialkindoffusional morphologyfoundinArabic,Hebrew,andtheircousins.• Rootusuallyconsistsofasequenceofconsonants.• Wordsarederivedand,tosomeextent,inflectedbypatternsofvowelsintercalatedamongtherootconsonants.• kitaab ‘book’• kaatib ‘writer;writing’• maktab ‘office;desk’• maktaba ‘library’
![Page 25: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/25.jpg)
OtherNon-Concatenative Morphological
Processes
Non-concatenativemorphology involvesoperationsotherthantheconcatenationofaffixeswithbases.• Infixation.Amorphemeisinsertedinsideanothermorphemeinsteadofbeforeorafterit.• Reduplication.Canbeprefixing,suffixing,andeveninfixing.
• Tagalog:• sulat (write,imperative)• susulat (reduplication)(write,future)• sumulat (infixing)(write,past)• sumusulat (infixingandreduplication)(write,present)
• Apophony,includingtheumlautinEnglishtooth→teeth;subtractivemorphology,includingthetruncation inEnglishnicknameformation(David→Dave);andsoon.• Tonechange;stressshift.Andmore...
![Page 26: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/26.jpg)
Type-TokenCurvesFinnishisagglutinative
Iñupiaq ispolysynthetic
0
1000
2000
3000
4000
5000
6000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Type
s
Tokens
Type-TokenCurves
English
Arabic
Hocąk
Inupiaq
Finnish
TypesandTokens:“Iliketowalk.Iamwalkingnow.Itookalongwalkearliertoo.”
Thetypewalk occurstwice.Sotherearetwotokensofthetypewalk.
Walking isadifferenttypethatoccursonce.
![Page 27: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/27.jpg)
MorphologicalProcessing
![Page 28: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/28.jpg)
Recognizing thewordsofalanguage
• Input:astring(fromsomealphabet)• Output:isitalegalword? (yesorno)
![Page 29: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/29.jpg)
FSAforEnglishNouns
Lexicon:
Note:“fox”becomespluralbyadding“es”not“s”.Wewillgettothatlater.
![Page 30: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/30.jpg)
Finite-StateAutomaton
• Q:afinitesetofstates• q0� Q:aspecialstartstate• F� Q:asetoffinalstates• Σ:afinitealphabet• Transitions:
• Encodesaset ofstringsthatcanberecognizedbyfollowingpathsfromq0 tosomestateinF.
qiqjs� Σ*
......
![Page 31: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/31.jpg)
FSAforEnglishAdjectives
Butnotethatthisacceptswordslike“unbig”.
Big,bigger,biggestHappy,happier,happiest,happilyUnhappy,unhappier,unhappiest,unhappilyClear,clearer,clearest,clearlyUnclear,unclearly
Cool,cooler,coolest,coollyRed,redder,reddestReal,unreal,really
![Page 32: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/32.jpg)
FSAforEnglishDerivationalMorphology
Howbigdotheseautomataget?Reasonablecoverageofalanguagetakesanexpertabouttwotofourmonths.
Whatdoesittaketobeanexpert?Studylinguisticstogetusedtoallthecommonandnot-so-commonthingsthathappen,andthenpractice.
![Page 33: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/33.jpg)
MorphologicalParsing
Input:awordOutput:theword’sstem(s)andfeaturesexpressedbyothermorphemes.
Example: geese→goose+N+Plgooses→goose+V+3P+Sgdog→{dog+N+Sg,dog+V}leaves→{leaf+N+Pl,leave+V+3P+Sg}
![Page 34: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/34.jpg)
UpperSide/LowerSide
talk+Past
talked
FST
uppersideorunderlyingform
lowersideorsurfaceform
![Page 35: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/35.jpg)
FiniteStateTransducers
• Q:afinitesetofstates• q0� Q:aspecialstartstate• F� Q:asetoffinalstates• ΣandΔ:twofinitealphabets• Transitions:
qiqj
s :ts� Σ*andt� Δ*
......
![Page 36: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/36.jpg)
MorphologicalParsingwithFSTs
Note�samesymbol�shorthand.
^denotesamorphemeboundary.
#denotesawordboundary.
![Page 37: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/37.jpg)
EnglishSpellingGettingbacktofox+s =foxes
![Page 38: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/38.jpg)
TheEInsertionRuleasaFST
✏ ! e/
8<
:
s
x
z
9=
; ^ s#
Generateanormallyspelledwordfromanabstractrepresentationofthemorphemes:
Input:fox^s#(fox^εs#)Output:foxes#(foxεes#)
![Page 39: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/39.jpg)
TheEInsertionRuleasaFST
✏ ! e/
8<
:
s
x
z
9=
; ^ s#
Parseanormallyspelledwordintoanabstractrepresentationofthemorphemes:
Input:foxes#(foxεes#)Output:fox^s#(fox^εs#)
![Page 40: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/40.jpg)
CombiningFSTs
parse
generate
![Page 41: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/41.jpg)
FSTOperations
Input:fox+N+plOutput:foxes#
![Page 42: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/42.jpg)
LanguageTypeComparisonwrt FSTs
• Morphologiesofalltypescanbeanalyzedusingfinitestatemethods.• Somepresentmorechallengesthanothers:• Analyticlanguages.Trivial,sincethereislittleornomorphology(otherthancompounding).• Agglutinatinglanguages.Straightforward—finitestatemorphologywas“made”forlanguageslikethis.• Polysyntheticlanguages.Similartoagglutinatinglanguages,butwithblurredlinesbetweenmorphologyandsyntax.• Fusional languages. Easyenoughtoanalyzeusingfinitestatemethodaslongasoneallows“morphemes”tohavelotsofsimultaneousmeaningsandoneiswillingtoemploysomeadditionaltricks.• Root-and-patternlanguages. Requiresomeveryclevertricks.
![Page 43: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/43.jpg)
Stemming(“PoorMan’sMorphology”)
Input:awordOutput:theword’sstem(approximately)
ExamplesfromthePorterstemmer:•-sses→-ss•-ies→i•-ss→s
![Page 44: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/44.jpg)
nonoahnob
nobilitynobisnoble
noblemannoblemennobleness
noblernobles
noblessenoblestnobly
nobodynocesnod
noddednoddingnoddlenoddlesnoddynods
nonoahnobnobilnobinoblnoblemannoblemennoblnoblernoblnoblessnoblestnoblinobodinocenodnodnodnoddlnoddlnoddinod
![Page 45: Morphology - Carnegie Mellon School of Computer …tbergkir/11711fa17/morphology-F17.pdfmorphology is a solved problem (as long as you can afford to write rules by hand). •Finite](https://reader030.vdocuments.mx/reader030/viewer/2022040121/5eb2d07f6850034e3511d37d/html5/thumbnails/45.jpg)
TheGoodNews
• Morethanalmostanyotherproblemincomputationallinguistics,morphologyisasolvedproblem(aslongasyoucanaffordtowriterulesbyhand).• Finitestatemethodsprovideasimpleandpowerfulmeansofgeneratingandanalyzingwords(aswellasthephonologicalalternationsthataccompanywordformation/inflection).• Finitestatemorphologyisoneofthegreatsuccessesofnaturallanguageprocessing.• OnebrilliantaspectofusingFSTsformorphology:thesamecode canhandlebothanalysis andgeneration.