cmsc 723: computational linguistics i session #3 finite...

67
Finite-State Morphology CMSC 723: Computational Linguistics I Session #3 Jimmy Lin Jimmy Lin The iSchool University of Maryland Wednesday, September 16, 2009

Upload: others

Post on 11-Sep-2019

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Finite-State MorphologyCMSC 723: Computational Linguistics I ― Session #3

Jimmy LinJimmy LinThe iSchoolUniversity of Maryland

Wednesday, September 16, 2009

Page 2: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Today’s AgendaComputational tools

Regular expressionsFinite-state automata (deterministic vs. non-deterministic)Finite-state transducers

Overview of morphological processesOverview of morphological processes

Computational morphology with finite-state methods

Page 3: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Regular ExpressionsA metalanguage for specifying simple classes of strings

Very useful in searching and matching text strings

Everyone does it!Implementations in the shell, Perl, Java, Python, …

Page 4: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Regular ExpressionsBasic regular expressions/happy/ → happy/[abcd]/ → a, b, c, d/[a-d]/ → a, b, c, d/[^a-d]/ → e, f, g, … z/[ a d]/ e, f, g, … z/[Tt]he/ → the, The/(dog|cat)/ → dog, cat

Special metacharacters/colou?r/ → color, colour/oo*h!/ → oh! ooh! oooh!/oo h!/ → oh!, ooh!, oooh!, …/oo+h!/ → ooh!, oooh!, ooooh!, …/beg.n/ → began, begin, begun, begbn, …

Page 5: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

NLP* with Regular ExpressionsTranscript with Eliza, simulation of a Rogerian psychotherapist (Weizenbaum, 1966)

User: Men are all alike

ELIZA: IN WHAT WAY

User: They’re always bugging us about something or other

ELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE?

User: Well, my boyfriend made me come here

ELIZA: YOUR BOYFRIEND MADE YOU COME HERE

User: He says I’m depressed much of the timeUser: He says I m depressed much of the time

ELIZA: I AM SORRY TO HEAR THAT YOU ARE DEPRESSED

Page 6: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

How did it work?.* all .* → IN WHAT WAY

.* always .* → CAN YOU THINK OF A SPECIFIC EXAMPLE

.* I’m (depressed|sad) .* → I AM SORRY TO HEAR YOU ARE \1

.* I’m (depressed|sad) .* → WHY DO YOU THINK YOU ARE \1?

Page 7: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Aside…What is intelligence?

What does Eliza tell us about intelligence?at does a te us about te ge ce

Page 8: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Equivalence RelationsWe can say the following

Regular expressions describe a regular languageRegular expressions can be implemented by finite-state automataRegular languages can be generated by regular grammars

So what?So what?

RegularLanguages

Regular ExpressionsFinite-State Automata

Languages

Regular Grammars

Page 9: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Sheeptalk!

baa!b !

Language:

R l E ibaaa!baaaa!baaaaa!...

/baa+!/Regular Expression:

Finite State Automaton:b a a !

Finite-State Automaton:

q0 q1 q2 q3 q4

a

Page 10: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Finite-State AutomataWhat are they?

What do they do?at do t ey do

How do they work?

Page 11: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSA: What are they?Q: a finite set of N states

Q = {q0, q1, q2, q3, q4}The start state: q0

The set of final states: F = {q4}

Σ: a finite input alphabet of symbolsΣ: a finite input alphabet of symbolsΣ = {a, b, !}

δ(q i): transition functionδ(q,i): transition function Given state q and input symbol i, return new state q'δ(q3,!) → q4

q0 q1 q2 q3 q4

b a a !

q0 q1 q2 q3 q4

a

Page 12: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSA: State Transition Table

InputState b a !State b a !

0 1 ∅ ∅

1 ∅ 2 ∅1 ∅ 2 ∅2 ∅ 3 ∅3 ∅ 3 43 ∅ 3 44 ∅ ∅ ∅

q0 q1 q2 q3 q4

b a a !

q0 q1 q2 q3 q4

a

Page 13: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSA: What do they do?Given a string, a FSA either rejects or accepts it

ba! → rejectbaa! → acceptbaaaz! → rejectbaaaa! → acceptbaaaa! acceptbaaaaaa! → acceptbaa → rejectmoooo rejectmoooo → reject

What does this have to do with NLP?Think grammaticality!Think grammaticality!

Page 14: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSA: How do they work?

q0 q1 q2 q3 q3 q4

b a a a ! ACCEPT

b a a !

q0 q1 q2 q3 q4

a

Page 15: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSA: How do they work?

q0 q1 q2

b a ! ! ! REJECT

b a a !

q0 q1 q2 q3 q4

a

Page 16: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

D-RECOGNIZE

Page 17: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Accept or Generate?Formal languages are sets of strings

Strings composed of symbols drawn from a finite alphabet

Finite-state automata define formal languages Without having to enumerate all the strings in the language

Two views of FSAs:Acceptors that can tell you if a string is in the languageGenerators to produce all and only the strings in the languageGenerators to produce all and only the strings in the language

Page 18: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Simple NLP with FSAs

Page 19: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Introducing Non-DeterminismDeterministic vs. Non-deterministic FSAs

Epsilon (ε) transitions

Page 20: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Using NFSAs to Accept StringsWhat does it mean?

Accept: there exist at least one path (need not be all paths)Reject: no paths exist

General approaches:Backup: add markers at choice points, then possibly revisit unexplored arcs at marked choice pointLook-ahead: look ahead in input to provide cluesParallelism: look at alternatives in parallel

Recognition with NFSAs as search through state space( )Agenda holds (state, tape position) pairs

Page 21: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

ND-RECOGNIZE

Page 22: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

ND-RECOGNIZE

Page 23: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

State OrderingsStack (LIFO): depth-first

Queue (FIFO): breadth-firstQueue ( O) b eadt st

Page 24: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

ND-RECOGNIZE: Example

ACCEPT

Page 25: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

What’s the point?NFSAs and DFSAs are equivalent

For every NFSA, there is a equivalent DFSA (and vice versa)

Equivalence between regular expressions and FSAEasy to show with NFSAs

Why use NFSAs?

Page 26: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Regular Language: Definition∅ is a regular language

a Σ ε, {a} is a regular language a ε, {a} s a egu a a guage

If L1 and L2 are regular languages, then so are:L1 · L2 = {x y | x L1 , y L2 }, the concatenation of L1 and L2L1 L2 {x y | x L1 , y L2 }, the concatenation of L1 and L2

L1 L2, the union or disjunction of L1 and L2

L1 , the Kleene closure of L1

Page 27: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Regular Languages: Starting Points

Page 28: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Regular Languages: Concatenation

Page 29: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Regular Languages: Disjunction

Page 30: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Regular Languages: Kleene Closure

Page 31: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Finite-State Transducers (FSTs)A two-tape automaton that recognizes or generates pairs of strings

Think of an FST as an FSA with two symbol strings on each arc

One symbol string from each tape

Page 32: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Four-fold view of FSTsAs a recognizer

As a generators a ge e ato

As a translator

As a set relaterAs a set relater

Page 33: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Summary: Computational ToolsRegular expressions

Finite-state automata (deterministic vs. non-deterministic)te state auto ata (dete st c s o dete st c)

Finite-state transducers

Page 34: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Computational MorphologyDefinitions and problems

What is morphology?Topology of morphologies

Computational morphologyFinite-state methods

Page 35: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

MorphologyStudy of how words are constructed from smaller units of meaning

Smallest unit of meaning = morphemefox has morpheme foxcats has two morphemes cat and –sNote: it is useful to distinguish morphemes from orthographic rules

Two classes of morphemes:Two classes of morphemes:Stems: supply the “main” meaningAffixes: add “additional” meaning

Page 36: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Topology of MorphologiesConcatenative vs. non-concatenative

Derivational vs. inflectionale at o a s ect o a

Regular vs. irregular

Page 37: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Concatenative MorphologyMorpheme+Morpheme+Morpheme+…

Stems (also called lemma, base form, root, lexeme):Ste s (a so ca ed e a, base o , oot, e e e)hope+ing → hopinghop+ing → hopping

Affixes:Prefixes: AntidisestablishmentarianismSuffixes: AntidisestablishmentarianismSuffixes: Antidisestablishmentarianism

Agglutinative languages (e.g., Turkish)uygarlaştıramadıklarımızdanmışsınızcasına →uygarlaştıramadıklarımızdanmışsınızcasına →uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casınaMeaning: behaving as if you are among those whom we could not cause to become civilizedcause to become civilized

Page 38: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Non-Concatenative MorphologyInfixes (e.g., Tagalog)

hingi (borrow)humingi (borrower)

Circumfixes (e.g., German)sagen (say)gesagt (said)

Reduplication (e g Motu spoken in Papua New Guinea)Reduplication (e.g., Motu, spoken in Papua New Guinea)mahuta (to sleep)mahutamahuta (to sleep constantly)mamahuta (to sleep, plural)

Page 39: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Templatic MorphologiesCommon in Semitic languages

Roots and patternsoots a d patte s

تك ב תכArabic Hebrewب تك ב תכ

و? ??َم ו? ??

بوكتم בוכת

maktuub ktuuvmaktuubwritten

ktuuvwritten

Page 40: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Derivational MorphologyStem + morpheme →

Word with different meaning or different part of speechExact meaning difficult to predict

Nominalization in English: -ation: computerization, characterization-ee: appointee, advisee-er: killer, helper

Adjective formation in English:-al: computational, derivational-less: clueless, helpless-able: teachable, computable

Page 41: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Inflectional MorphologyStem + morpheme →

Word with same part of speech as the stem

Adds: tense, number, person,…

Plural morpheme for English nouncat+sdog+s

Progressive form in English verbswalk+ingrain+ingrain+ing

Page 42: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Noun Inflections in EnglishRegular

cat/catsdog/dogs

Irregularmouse/miceox/oxengoose/geese

Page 43: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Verb Inflections in English

Page 44: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Verb Inflections in Spanish

Page 45: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Morphological ParsingComputationally decompose input forms into component morphemes

Components needed:A lexicon (stems and affixes)A model of how stems and affixes combineOrthographic rules

Page 46: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Morphological Parsing: ExamplesWORD STEM (+FEATURES)*

cats cat +N +PLcats cat

cat cat +N +SG

cities city +N +PLcities city +N +PL

geese goose +N +PL

ducks (duck +N +PL) or (duck +V +3SG)

merging merge +V +PRES-PART

caught (catch +V +PAST-PART) or (catch +V +PAST)

Page 47: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Different ApproachesLexicon only

Rules onlyu es o y

Lexicon and rulesfinite-state automatafinite state automatafinite-state transducers

Page 48: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Lexicon-onlySimply enumerate all surface forms and analyses

So what’s the problem?So at s t e p ob e

When might this be useful?

acclaim acclaim $N$acclaim acclaim $V+0$acclaimed acclaim $V+ed$acclaimed acclaim $V+en$acclaiming acclaim $V+ing$acclaims acclaim $N+s$acclaims acclaim $V+s$acclamation acclamation $N$

$ $acclamations acclamation $N+s$acclimate acclimate $V+0$acclimated acclimate $V+ed$acclimated acclimate $V+en$

$ $acclimates acclimate $V+s$acclimating acclimate $V+ing$

Page 49: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Rule-only: Porter StemmerCascading set of rules

ational → ate (e.g., reational → relate)ing → ε (e.g., walking → walk)sses → ss (e.g., grasses → grass)……

Examplescities → citicity→ citigeneralizations → generalization ge e a at o→ generalize → general → gener

Page 50: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Porter Stemmer: What’s the Problem?Errors…

Why is it still useful?

Page 51: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Lexicon + RulesFSA: for recognition

Recognize all grammatical input and only grammatical input

FST: for analysisIf grammatical, analyze surface form into component morphemesOtherwise, declare input ungrammatical

Page 52: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSA: English Noun MorphologyLexicon

i l i l lreg-noun irreg-pl-noun irreg-sg-noun pluralfoxcat

geesesheep

goosesheep

-s

R le

dog mice mouse

Note problem with orthography!Rule Note problem with orthography!

Page 53: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSA: English Noun Morphology

Page 54: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSA: English Verb Morphology

reg-verb-stem

irreg-verb-stem

irreg-past-verb

past past-part

pres-part

3sg

Lexicon

stem stem verb part partwalkfrytalk

cutspeakspoken

caughtateeaten

-ed -ed -ing -s

impeach singsang

R leRule

Page 55: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSA: English Adjectival MorphologyExamples:

big, bigger, biggestsmaller, smaller, smallesthappy, happier, happiest, happilyunhappy, unhappier, unhappiest, unhappilyunhappy, unhappier, unhappiest, unhappily

Morphemes:Roots: big, small, happy, etc.Affixes: un-, -er, -est, -ly

Page 56: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSA: English Adjectival Morphology

adj root : {happy real }adj-root1: {happy, real, …}adj-root2: {big, small, …}

Page 57: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSA: Derivational Morphology

Page 58: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Morphological Parsing with FSTsLimitation of FSA:

Accepts or rejects an input… but doesn’t actually provide an analysis

Use FSTs instead!One tape contains the input the other tape as the analysisOne tape contains the input, the other tape as the analysisWhat if both tapes contain symbols?What if only one tape contains symbols?

Page 59: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

TerminologyTransducer alphabet (pairs of symbols):

a:b = a on the upper tape, b on the lower tapea:ε = a on the upper tape, nothing on the lower tapeIf a:a, write a for shorthand

Special symbolsSpecial symbols# = word boundary^ = morpheme boundary(For now, think of these as mapping to ε)

Page 60: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FST for English NounsFirst try:

What’s the problem here?

Page 61: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FST for English Nouns

Page 62: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Handling Orthography

Page 63: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Complete Morphological Parser

Page 64: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

FSTs and Ambiguityunionizable

union +ize +ableun+ ion +ize +able

assessassess +Vass +N +essN

Page 65: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Optimizations

Page 66: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

Practical NLP ApplicationsIn practice, it is almost never necessary to write FSTs by hand…

Typically, one writes rules:Chomsky and Halle Notation: a → b / c__d

rewrite a as b when occurs between c and d= rewrite a as b when occurs between c and dE-Insertion rule

xε → e /

xsz

^ __ s #

Rule → FST compiler handles the rest…

Page 67: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics

What we covered today…Computational tools

Regular expressionsFinite-state automata (deterministic vs. non-deterministic)Finite-state transducers

Overview of morphological processesOverview of morphological processes

Computational morphology with finite-state methods

One final question: is morphology actually finite state?