finite-state technology and linguistic applications
DESCRIPTION
Finite-State Technology and Linguistic Applications. 12-16 March 2001 Xerox Research Centre Europe Grenoble Laboratory 6, chemin de Maupertuis 38240 MEYLAN, France Kenneth BEESLEY. Ken Beesley: Brief Introduction. B.A., Linguistics and Computer Science, Brigham Young University, 1978 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/1.jpg)
Beesley 2001
Finite-State Technology and Linguistic Applications
12-16 March 2001
Xerox Research Centre EuropeGrenoble Laboratory
6, chemin de Maupertuis38240 MEYLAN, France
Kenneth BEESLEY
![Page 2: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/2.jpg)
Beesley 2001
Ken Beesley: Brief Introduction
B.A., Linguistics and Computer Science, Brigham Young University, 1978
Diploma, Linguistics and Phonetics, Univ. of Glasgow, 1979
D.Phil., “Epistemics” (Cognitive Science), Univ. of Edinburgh, 1983
ALPNET, computer assisted translation, 1984-1990
1988-1990 Arabic morphology project, exposure to Finite-State Morphology from Lauri Karttunen at COLING 1988
Microlytics (Xerox spinoff), 1990-1993
Xerox Corporation 1993-present
Morphology projects: Arabic, Spanish, Portuguese, Italian, Dutch, (Malay), (Aymara); also teaching finite-state programming techniques
Some people are into finite-state programming for the mathematics and algorithms; I’m in it because it lets me build working systems for interesting natural languages.
![Page 3: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/3.jpg)
Beesley 2001
Goals for the Week
• Introduce finite-state theory• Introduce the Xerox Finite-State “Calculus”, a practical
software implementation of the theory: xfst, lexc• Try to convince you that finite-state natural-language
processing is a Good Thing• The Hope: Inspire a few of you to start your own
computational projects, perhaps on Maltese
Finite-state techniques are widely used today in both research and industry for natural-language processing. The software implementations and documentation are improving steadily, and they are increasingly available to all of us.
![Page 4: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/4.jpg)
Beesley 2001
Schedule
Monday 12 March LC1117 Gentle Introduction
17.00-19.00
Tuesday 13 Unix Lab Intro. to xfst
17.00-19.30
Wednesday 14 Unix Lab More on xfst
10.00-12.30
Thursday 15 Unix Lab Intro. to lexc
17.00-19.30
Friday 16 403 CCT Linguistics Circle
18.30-20.00
![Page 5: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/5.jpg)
Beesley 2001
Today’s Goals
• Understand “Regular” Languages and Relations.
• Understand the mathematical operations that can be performed on such Languages and Relations.
• Understand how Languages, Relations, Regular Expressions, and Networks are interrelated.
• Understand that we can create finite-state networks and compute with them using Xerox Finite-State Technology
• xfst interface– Regular-Expression Compiler
– Access to Finite-State Algorithms
• lexc language– Used mainly for lexicons and for describing morphotactics
![Page 6: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/6.jpg)
Beesley 2001
Why is “Finite State” Computing So Interesting?
• Finite-state systems are mathematically elegant, easily manipulated and modifiable.
• Computationally efficient. Usually very compact.
• The programming we linguists do is declarative. We describe the facts of our natural language; i.e. we write grammars. We do not hack ad hoc code.
• The runtime code, which applies our systems to linguistic input, is already written and it is completely language-independent.
• Finite-state systems are inherently bidirectional: we can use the same system to analyze and to generate.
![Page 7: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/7.jpg)
Beesley 2001
What is Finite-State Computing Good For?
Mostly “lower-level” natural language processing• Tokenization• Spelling checking/correction• Phonology• Morphological Analysis/Generation Emphasis this week• Part-of-Speech Tagging• “Shallow” Syntactic Parsing and “Chunking”
Finite-state techniques cannot do everything; but for tasks where they do apply, they are extremely attractive.
![Page 8: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/8.jpg)
Beesley 2001
Where is Xerox Finite-State Technology Used?
Xerox Research• Xerox Palo Alto Research Center
• Xerox Research Centre Europe
Xerox Business Units and Partners• ATS
• MKMS
• Inxight
Universities and Research Groups• Over 70 licensees
We would like to make Xerox technology the de facto standard
![Page 9: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/9.jpg)
Beesley 2001
The Gentle Introduction
• Chapter 1 of The Book• Physical Finite-State Machines (Automata)• Linguistic Finite-State Machines
– Symbol– Alphabet– Language
• Lookup and Generation• Quick Review of Set Theory• Languages, Relations and Transducers
![Page 10: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/10.jpg)
Beesley 2001
Physical Machines with Finite States
The Lightswitch Machine
OFF ON
PUSH UP
PUSH DOWN
![Page 11: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/11.jpg)
Beesley 2001
Physical Machines with Finite States
The Lightswitch Toggle Machine
OFF ON
PUSH
PUSH
![Page 12: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/12.jpg)
Beesley 2001
Physical Machines with Finite States
The Fan in Ken’s Old Car
OFF HILOW MED
R R R
LLL
![Page 13: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/13.jpg)
Beesley 2001
Physical Machines with Finite States
Three-Way Lightswitch
OFF HILOW MED
R R R
R
![Page 14: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/14.jpg)
Beesley 2001
The Cola Machine
• Need to enter 25 cents (USA) to get a drink
• Accepts the following coins:• Nickel = 5 cents
• Dime = 10 cents
• Quarter = 25 cents
• For simplicity, our machine needs exact change
• We will model only the coin-accepting mechanism
![Page 15: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/15.jpg)
Beesley 2001
Physical Machines with Finite States
The Cola Machine
0
N
D
Q
N N NN
D D D
5 10 15 20 25
Start State Final/Accept State
![Page 16: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/16.jpg)
Beesley 2001
The Cola Machine Language
• List of all the sequences of coins accepted:• Q• DDN• DND• NDD• DNNN• NDNN• NNDN• NNND• NNNNN
• Think of the coins as SYMBOLS or CHARACTERS
• The set of symbols accepted is the ALPHABET of the machine
• Think of sequences of coins as WORDS or “strings”
• The set of words accepted by the machine is its LANGUAGE
![Page 17: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/17.jpg)
Beesley 2001
Linguistic Machines
ca n t
o
t i g r e
m e s a
m e s a“Apply”
![Page 18: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/18.jpg)
Beesley 2001
More Linguistic Machines
c l e a
e
m e s a s“Apply Up”
v
r
e
“Apply Down”
m e s a +Noun +Fem +Pl
m e s a 0 0 s
A Transducermesas+Noun+Fem+Pl
![Page 19: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/19.jpg)
Beesley 2001
A Morphological Analyzer
Transducer
Surface Word Language
Analysis Word Language
![Page 20: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/20.jpg)
Beesley 2001
A Quick Review of Set Theory
A set is a collection of objects.
A B
D E
We can enumerate the “members” or “elements” of finite sets: { A, D, B, E}.
There is no significant order in a set, so { A, D, B, E } is the same set as { E, A, D, B }, etc.
![Page 21: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/21.jpg)
Beesley 2001
Uniqueness of Elements
You cannot have two or more ‘A’ elements in the same set
A B
D E
{ A, A, D, B, E} is just a redundant specification of the set { A, D, B, E }.
![Page 22: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/22.jpg)
Beesley 2001
Cardinality of Sets
The Empty Set:
A Finite Set:
An Infinite Set: e.g. The Set of all Positive Integers
Norway Denmark Sweden
![Page 23: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/23.jpg)
Beesley 2001
Simple Operations on Sets: Union
A B
C
DE
Set 1 Set 2
B C A D E
Union of Set1 and Set 2
![Page 24: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/24.jpg)
Beesley 2001
Simple Operations on Sets (2): Union
A B
C
CD
Set 1 Set 2
B C A D
Union of Set1 and Set 2
![Page 25: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/25.jpg)
Beesley 2001
Simple Operations on Sets (3): Intersection
A B
C
CD
Set 1 Set 2
C
Intersection of Set1 and Set 2
![Page 26: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/26.jpg)
Beesley 2001
Simple Operations on Sets (4): Subtraction
A B
C
CD
Set 1 Set 2
A B
Set 1 minus Set 2
![Page 27: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/27.jpg)
Beesley 2001
Formal Languages
Very Important Concept in Formal Language Theory:
A Language is just a Set of Words.
• We use the terms “word” and “string” interchangeably.
• A Language can be empty, have finite cardinality, or be infinite in size.
• You can union, intersect and subtract languages, just like any other sets.
![Page 28: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/28.jpg)
Beesley 2001
Union of Languages (Sets)
dog cat rat elephant mouse
Language 1 Language 2
dog cat rat
elephant mouse
Union of Language 1 and Language 2
![Page 29: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/29.jpg)
Beesley 2001
Intersection of Languages (Sets)
dog cat rat elephant mouse
Language 1 Language 2
Intersection of Language 1 and Language 2
![Page 30: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/30.jpg)
Beesley 2001
Intersection of Languages (Sets)
dog cat rat rat mouse
Language 1 Language 2
Intersection of Language 1 and Language 2
rat
![Page 31: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/31.jpg)
Beesley 2001
Subtraction of Languages (Sets)
dog cat rat rat mouse
Language 1 Language 2
Language 1 minus Language 2
dog cat
![Page 32: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/32.jpg)
Beesley 2001
Languages
• A language is a set of words (=strings).
• Words (strings) are composed of symbols (letters) that are “concatenated” together.
• At another level, words are composed of “morphemes”.
• In most natural languages, we concatenate morphemes together to form whole words.
For sets consisting of words (i.e. for Languages), the operation of concatenation is very important.
![Page 33: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/33.jpg)
Beesley 2001
Concatenation of Languages
work talk walk
Root Language
0 ing ed s
Suffix Language
work working worked works talk talking talked talks walk walking walked walks
The concatenation of the Suffix language after the Root language.
![Page 34: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/34.jpg)
Beesley 2001
Languages and Networks
w a l k
o r
t
Network/Language 1
Network/Language 2
s
o r
s The concatenation of Network 1 and Network 2
w a l k
t
a
as
ed
i n g
0
s
ed
i n g
0
s
![Page 35: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/35.jpg)
Beesley 2001
Grammars, Languages, Networks
Grammarwritten in xfstor lexc
Language or Relation
Finite-State Network
Describes Compiles Into
Recognize or Map
In the coming days, we will learn how to write xfst and lexc grammars and compile them into working systems.
![Page 36: Finite-State Technology and Linguistic Applications](https://reader035.vdocuments.mx/reader035/viewer/2022070406/5681427a550346895daea0ed/html5/thumbnails/36.jpg)
Beesley 2001
Tasks/Exercises
• Read chapter 1, at least up to page 28
• Do Exercises 1.10.1 (page 34) and 1.10.2 (page 36).
• For more rigor, read Chapter 2. Do the graphing exercise in Appendix B (page 381).