finite-state language processing.pdf
TRANSCRIPT
7/29/2019 Finite-State Language Processing.pdf
http://slidepdf.com/reader/full/finite-state-language-processingpdf 1/3
B o o k R e v i e w s
Finite-State Language Processing
E m m a n u e l R o c h e a n d Y v e s S ch a b e s ( ed i to r s )
( T e r a g r a m C o r p o r a t i o n )
C a m b r i d g e , M A : T h e M I T P r e s s ( T h e
M I T P r e s s s e r i e s i n L a n g u a g e , S p e e c h
a n d C o m m u n i c a t i o n ) , 1 9 9 7 ,
x v ii + 46 4 p p ; h a r d b o u n d , I S B N
0-262-18182-7, $45.00
Reviewed byMario Josd Cdccamo and Tomasz Kowaltowski
Un iversity o f Campinas, Brazil
1 . I n t r o d u c t i o n
F i n it e -s ta t e a u t o m a t a h a v e b e e n w i d e l y u s e d i n c o m p u t e r s c ie n c e si n c e i ts b e g in n i n g .
H o w e v e r , f o r a lo n g t im e t h e N L P c o m m u n i t y h a s p r e f e r r e d o t h e r t o o l s b a s e d o n
m o r e p o w e r f u l f o rm a l i s m s . C h o m s k y ' s a r g u m e n t t h a t fi n it e -s ta t e d e v i c e s a r e n o t a b l et o re p r e s e n t n a t u r a l l a n g u a g e s t ru c t u re s , e s p e c i a ll y t h o s e i n v o l v i n g c e n t ra l e m b e d d i n g
( r ecu r s io n ) , w a s o n e o f t h e r ea s o n s f o r th i s f a c t.
F i n i te - s ta t e a u t o m a t a w e r e i n t r o d u c e d f i rs t t o N L P a s to o l s fo r e f fi c i e n t c o m p u t a -
t i o n a l i m p l e m e n t a t i o n o f l a r g e v o c a b u l a r i e s a n d l e x i c o n s. E x c e ll e n t r e s u l t s a c h i e v e d i n
t h a t a r e a b r o u g h t i n t e r e s t i n a p p l y i n g f in i t e- s ta t e f o r m a l i s m s t o o t h e r f ie l d s o f c o m p u -
t a t i o n a l l i n g u i st i c s .Finite-State Languag e Processing i s a c o l l e c t i o n o f 1 5 p a p e r s w r i t t e n b y 2 1 a u t h o r s
f o c u s e d o n s t a t e o f t h e a r t f in i t e- s ta t e m o d e l s . I t c o n t a i n s p a p e r s r e p o r t i n g r e s e a r c h o n
m o r p h o l o g y , l e x i c o n c o n s t r u c t i o n , ( s u r f a c e ) p a r s i n g , p a r t - o f - s p e e c h t a g g i n g , p h o n e t i c
c o n v e r s io n , i n fo r m a t i o n r e tr ie v a l, a n d s p e e c h r e c o g ni t io n . A l t h o u g h m a n y c h a p t e r s a r e
w r i t t e n b y r e s e a rc h e r s w h o a r e c u r r e n t l y d e v e l o p i n g t h e i r w o r k a t A m e r i c a n r e s e a r ch
c e n te r s, m o s t c o n t r i b u ti o n s t o t h is b o o k c a m e f r o m t h e E u r o p e a n N L P s c h o o l.
T h e p r e f a c e , t h e 1 5 c h a p t e r s , a n d t h e i n d e x c o m p r i s e a l t o g e t h e r a b o o k o f 4 6 4
p a g e s . E a c h c h a p t e r i s a n i n d e p e n d e n t p a p e r a b o u t a s pe c if ic to p ic . M o s t o f th e p a p e r s
r e q u i r e s o m e r e a s o n a b l e b a c k g r o u n d i n c o m p u t a t i o n a l l i n g u i s t i c s a n d i n c o m p u t e r
sc ience .
C h a p t e r 1 p r e s e n t s a g e n e r a l i n t r o d u c t i o n t o t h e t h e o r y o f f i n it e -s t a te a u t o m a t a
a n d t r a n sd u c e r s . T h e o r d e r o f t h e n e x t c h a p t e r s d o e s n o t s e e m t o f o l lo w a n y s p e ci fi c
c r i t e r i o n ; h o w e v e r s o m e r e l a t e d c h a p t e r s a r e c o n s e c u t i v e . T h e b o o k i n c l u d e s a l i s t o f
c o n t r i b u t o r s w i t h t h e i r a f f i l i a t i o n s a n d a d d r e s s e s , a n d a s m a l l t e r m i n d e x .
2 . C o n t e n t s
C h a p t e r 1, w r i t t e n b y t h e e d i t o r s o f th e b o o k , i n t r o d u c e s m o s t o f t h e c o n c e p t s n e c e s s a r y
t o r e a d t h e b o o k , t h o u g h i t d o e s r e q u ir e s o m e p r e v i o u s k n o w l e d g e o f t h e su b je c t . T h e
7/29/2019 Finite-State Language Processing.pdf
http://slidepdf.com/reader/full/finite-state-language-processingpdf 2/3
Com putat ional Linguist ics Volume 24, ,Num ber 4
m o s t v a l u a b l e p o i n t o f t h i s c h a p t e r i s t h e p r e s e n t a t i o n o f s o m e f i ni t e -s t a te c o n c e p t s
t h r o u g h N L P e x a m p l e s .
C h a p t e r s 2 a n d 6 a r e r e l e v a n t t o m o r p h o l o g i c a l p r o c e s s i n g . I n C h a p t e r 2 , D a v i d
C l e m e n c e a u a d d r e s s e s t h e w e l l - k n o w n p r o b l e m s o f de r iv a t io n a l m o r p h o l o g y a n d t h e
o p p o s i t i o n b e t w e e n l ex ic a l m e t h o d s ( el e ct ro n ic d i c ti o n ar ie s ) a n d h e u r is t ic m e t h o d s
( d e r iv a t i o n r u le s ). H e a d v o c a t e s m i x i n g t h e t w o a p p r o a c h e s . H o w e v e r , o n e o f h i s a r-g u m e n t s a b o u t " u n r e a li s ti c a ll y " l a r g e d ic t io n a r ie s s e e m s to b e l o s in g s o m e g r o u n d d u e
t o r e c e n t a d v a n c e s i n t e c h n o l o g y . T h i s c h a p t e r a l s o p r e s e n t s M O R P H O , a m o r p h o l o g -
i ca l a n a l y z e r b a s e d o n K o s k e n n i e m i ' s t w o - l e v e l m o d e l . I n C h a p t e r 6, M a x S i l b e rz t e in
a d o p t s a s i m i l a r a p p r o a c h t o n a t u r a l l a n g u a g e l e xi c a l a n a l y s i s, e s p e c i a l l y i n t h e p r e s -
e n ce o f c o m p o u n d w o r d s .
C h a p t e r 3 , b y K i m m o K o s k e n n i e m i , d i s c u s s e s t h e r e l e v a n c e o f t h e t w o - l e v e l m o r -
p h o l o g y m o d e l t o f in i te - st a te c a l c u lu s a n d h o w t h e i d e a s b e h i n d t w o - l e v e l m o r p h o l o g y
c a n b e e x t e n d e d t o d e s i g n F i n i t e - S t a t e I n t e r s e c t i o n G r a n ~ n a r s ( F S I G s ) .
I n C h a p t e r 4 , L a u r i K a r t t u n e n e x t e n d s t h e c a l c u l u s o f r e g u l a r e x p r e s s i o n s w i t h
th e r e p l a c e o p e r a t o r a s a n o t h e r t o o l t o c a p t u r e a c o m m o n o p e r a t i o n i n N L P : s t ri n gr e p l a c e m e n t . T h e t h e o re t ic a l a s p e c t s b e h i n d t h e o p e r a t o r a r e c l ar if ie d a n d e x p l a i n e d
b y e x a m p l e s . T h e re l a ti o n b e t w e e n t h e r e p l a c e o p e r a t o r a n d r e w r i t e a n d t w o - l e v e l r u l e s
i s a l s o d i s c u s s e d .
I n C h a p t e r 5, F e r n a n d o C . N . P e r e i ra a n d R e b e c c a N . W r i gh t p r o p o s e a n a l g o r i th m
t o c o m p u t e f in i te - st a te a p p r o x i m a t i o n s o f c o n t e x t- fr e e g ra m m a r s . T h e u n d e r l y i n g i n-
t e re s t o f t h e a u t h o r s i s t h e e f fi ci e nt c o m p u t a t i o n a l i m p l e m e n t a t i o n o f p h r a s e s t r u c tu r e
g r a m m a r s . T h e a p p r o x i m a t i o n a l g o r i th m a c c e p t s a n y c o n te x t -f r ee g r a m m a r a s i n p u t
a n d r e t u r n s a f i n i t e - s t a t e a u t o m a t o n t h a t r e c o g n i z e s a l l t h e s e n t e n c e s g e n e r a t e d b y t h e
g r a m m a r a n d p e r h a p s s o m e m o r e . It i s p r o v e d t h a t t h e a p p r o x i m a t i o n i s e x a c t i f t h e
g r a m m a r i s l e f t- l in e a r o r r ig h t -l in e a r. T h i s is n o t s u r p r i s in g , a s li n e a r g r a m m a r s ( r ig h t
o r l e f t ) d e s c r i b e t h e s a m e l a n g u a g e s a s f i n i t e - s t a t e a u t o m a t a .
A f i n i te - s ta t e i m p l e m e n t a t i o n o f E r ic B r il l' s p a r t - o f - s p e e c h t a g g e r i s p r e s e n t e d i n
C h a p t e r 7 b y t h e e d i t o r s o f t h e b o o k . T h e i r c o n s t r u c ti o n i s ru l e - b a s e d , a s o p p o s e d
t o t h e u s u a l s t o c h a s t ic a p p r o a c h , a n d p r o d u c e s a n e ff i c ie n t d e t e r m i n i s t i c t r a n s d u c e r
w i t h o u t s a c r if i ci n g t h e q u a l i t y o f re s u l t s. T h i s t a g g e r i s f a s t e r th a n t h e o r i g i n a l o n e
d e v i s e d b y B rill.
T h e a p p l i c a t i o n o f t r a n s d u c e r s i s a l s o th e c e n t r a l t h e m e o f C h a p t e r s 8 , 1 2, a n d 1 4. I n
C h a p t e r 8, E m m a n u e l R o c h e s h o w s h o w t o b u i l d p a r s e r s fo r c o n te x t- fl e e g r a m m a r s o r
e v e n m o r e c o m p l e x l in g u is ti c s i t u a t io n s u s i n g t r a n s d u c e rs . C h a p t e r 1 2 i s d e v o t e d t o t h e
c o m p u t a t i o n a l m a n i p u l a t i o n o f s e q u e n t i a l t ra n s d u c e r s ( d e te r m i n i st i c o v e r th e i n pu t ).
I n t h i s c h a p t e r , M e h r y a r M o h r i p r e s e n t s t h e t h e o r y r e l a t e d t o s e q u e n t i a l t r a n s d u c e r s
a n d b r i e f l y d i s c u s s e s t h e i r a p p l i c a t i o n s t o p h o n o l o g y , m o r p h o l o g y , l e x i c o n c o n s t r u c -
t i o n, s y n t a x , a n d s p e e c h p r o c e s s i n g . I n C h a p t e r 1 4, E r ic L a p o r t e p r o p o s e s a s o l u t i o n t o
p h o n e t i c c o n v e r s i o n b a s e d o n tr a n s d u c e r s a n d b i m a c h i n e s ( t r a n s d u c e r s w h e r e i n p u t
r e a d i n g i s c a r r i e d b o t h f r o m l e f t t o r i g h t a n d f r o m r i g h t t o l ef t) . I m p l e m e n t a t i o n d e t a i l s
o f B i P h o , a p h o n e t i c c o n v e r s i o n s y s t e m , a re a l s o d i s c u s s e d .
T h e b o o k r e t u r n s t o t h e F S IG fo r m a l i s m i n C h a p t e r s 9 a n d 1 0. F S I G s d e s e r v e
t o b e m e n t i o n e d a s o n e o f t h e f i r s t f i n i t e - s t a t e a p p l i c a t i o n s i n N L P . A t r o V o u t i l a i n e n
e x a m i n e s i n C h a p t e r 9 t h e p r e p a r a t i o n o f t h e i n f o r m a t i o n t o b u i l d a d a t a b a s e f o ra f i n i t e - s t a t e p a r s e r . M a n y r e m a r k s m a d e i n t h i s c h a p t e r a r e a p p l i c a b l e t o p a r s e r s
i n g e n er a l. A d r a w b a c k p o i n t e d o u t i n th e f ir st w o r k s o n F S I G s w a s t h e t i m e a n d
s p a c e r e q u i r e m e n t s o f a n a c t u a l im p l e m e n t a t i o n o f a p a r s e r b a s e d o n t h a t f o rm a l i s m .
S o m e a l g o r it h m s t h a t o v e r c o m e th e s e p r o b l e m s w e r e s u g g e s t e d i n t h e l it e ra t u re . I n
C h a p t e r 1 0 , P a s i T a p a n a i n e n s h e d s l i g h t o n t h o s e s o l u t i o n s a n d g i v e s a c o m p a r a t i v e
c o m p l e x i t y a n a l y s i s o f t h e a l g o r i th m s .
642
7/29/2019 Finite-State Language Processing.pdf
http://slidepdf.com/reader/full/finite-state-language-processingpdf 3/3
Book Reviews
In Chapter 11, Maurice Gross addresses the lack of a systematic categorization
of the objects in linguistics. The author concludes that constraints encoded in finite-
state automata can be locally described, and therefore a cumulative approach to the
construction of grammars is possible.
The implementation of the sys tem Faustus is discussed in Chapter 13 by Jerry R.
Hobbs and his colleagues from SRI. Faustus is a system for extracting informationfrom running texts. The architecture of the system consists of a cascade of finite-state
transducers splitting the processing into several different stages. Unlike other papers
in the collection, this paper focuses on the implementation of a real system.
The final chapter, by Fernando C. N. Pereira and Michael D. Riley, presents a
general framework for implementing speech recognizers. The interesting point of this
chapter is the application of weigh ted finite-state automata and transducers to repre-
sent data structures common in speech recognition.
3 . E v a l u a t i o n
Finite-State Language Processing is probably the first book covering the current work in
the area in a comprehensive way. It will be valuable to many researchers in linguistics,
especially those who are interested in nonclassical approaches to NLP. It should be
also appealing to those who come from computer science and are mot ivated to work in
computat ional linguistics. As a textbook it is appropriate for a postgraduate seminar.
The chapters are in general well written in a direct and easy-to-read style with
ma ny examples. Each chapter includes its own list of references; there is no unified list,
which might have been useful. There are some minor mistakes and inconsistencies,
quite common in a collection of loosely related papers.
The text is not directed to those who look for immediate implementation solutions.
Much of the material is treated in a fairly theoretical way in spite of the discussion of
many practical aspects. Finally, there are some subjects that are no t covered or are just
mentioned, such as applications to corpus processing and machine translation.
Ma rio Josd Cdccamo is a doctoral student in computer science. In his recent Master's thesis, he de-
scribed the implementation of an FSA-based environment for syntactic pattern processing that
can be used for applications that require surface parsing such as agreement advisers. Tomasz
Kowaltowski is a Professor of Computing at the University of Campinas whose interests include
applications of FSAs in representing large linguistic databases. The reviewers' address is: Insti-
tute of Computing, University of Campinas, Caixa Postal 6176, 13083-970 Campinas, SP, Brazil;
e-mail: {mcaccamo,tomasz}@dcc.unicamp.br
643