finite-state language processing.pdf

3
7/29/2019 Finite-State Language Processing.pdf http://slidepdf.com/reader/full/finite-state-language-processingpdf 1/3 Book Reviews Finite-State Language Processing Emmanuel Roche and Yves Schabes (editors) (Teragram Corporation) Cambridge, MA: The MIT Press (The MIT Press series in Language, Speech and Communication), 1997, xvii+464 pp; hardbound, ISBN 0-262-18182-7, $45.00 Reviewed by Mario Josd Cdccamo and Tomasz Kowaltowski University of Campinas, Brazil 1. Introduction Finite-state automata have been widely used in computer science since its beginning. However, for a long time the NLP community has preferred other tools based on more powerful formalisms. Chomsky's argument that finite-state devices are not able to represent natural language structures, especially those involving central embedding (recursion), was one of the reasons for this fact. Finite-state automata were introduced first to NLP as tools for efficient computa- tional implementation of large vocabularies and lexicons. Excellent results achieved in that area brought interest in applying finite-state formalisms to other fields of compu- tational linguistics. Finite-State Language Processing is a collection of 15 papers written by 21 authors focused on state of the art finite-state models. It contains papers reporting research on morphology, lexicon construction, (surface) parsing, part-of-speech tagging, phonetic conversion, information retrieval, and speech recognition. Although many chapters are written by researchers who are currently developing their work at American research centers, most contributions to this book came from the European NLP school. The preface, the 15 chapters, and the index comprise altogether a book of 464 pages. Each chapter is an independent paper about a specific topic. Most of the papers require some reasonable background in computational linguistics and in computer science. Chapter 1 presents a general introduction to the theory of finite-state automata and transducers. The order of the next chapters does not seem to follow any specific criterion; however some related chapters are consecutive. The book includes a list of contributors with their affiliations and addresses, and a small term index. 2. Contents Chapter 1, written by the editors of the book, introduces most of the concepts necessary to read the book, though it does require some previous knowledge of the subject. The

Upload: eric9r

Post on 14-Apr-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Finite-State Language Processing.pdf

7/29/2019 Finite-State Language Processing.pdf

http://slidepdf.com/reader/full/finite-state-language-processingpdf 1/3

B o o k R e v i e w s

Finite-State Language Processing

E m m a n u e l R o c h e a n d Y v e s S ch a b e s ( ed i to r s )

( T e r a g r a m C o r p o r a t i o n )

C a m b r i d g e , M A : T h e M I T P r e s s ( T h e

M I T P r e s s s e r i e s i n L a n g u a g e , S p e e c h

a n d C o m m u n i c a t i o n ) , 1 9 9 7 ,

x v ii + 46 4 p p ; h a r d b o u n d , I S B N

0-262-18182-7, $45.00

Reviewed byMario Josd Cdccamo and Tomasz Kowaltowski

Un iversity o f Campinas, Brazil

1 . I n t r o d u c t i o n

F i n it e -s ta t e a u t o m a t a h a v e b e e n w i d e l y u s e d i n c o m p u t e r s c ie n c e si n c e i ts b e g in n i n g .

H o w e v e r , f o r a lo n g t im e t h e N L P c o m m u n i t y h a s p r e f e r r e d o t h e r t o o l s b a s e d o n

m o r e p o w e r f u l f o rm a l i s m s . C h o m s k y ' s a r g u m e n t t h a t fi n it e -s ta t e d e v i c e s a r e n o t a b l et o re p r e s e n t n a t u r a l l a n g u a g e s t ru c t u re s , e s p e c i a ll y t h o s e i n v o l v i n g c e n t ra l e m b e d d i n g

( r ecu r s io n ) , w a s o n e o f t h e r ea s o n s f o r th i s f a c t.

F i n i te - s ta t e a u t o m a t a w e r e i n t r o d u c e d f i rs t t o N L P a s to o l s fo r e f fi c i e n t c o m p u t a -

t i o n a l i m p l e m e n t a t i o n o f l a r g e v o c a b u l a r i e s a n d l e x i c o n s. E x c e ll e n t r e s u l t s a c h i e v e d i n

t h a t a r e a b r o u g h t i n t e r e s t i n a p p l y i n g f in i t e- s ta t e f o r m a l i s m s t o o t h e r f ie l d s o f c o m p u -

t a t i o n a l l i n g u i st i c s .Finite-State Languag e Processing i s a c o l l e c t i o n o f 1 5 p a p e r s w r i t t e n b y 2 1 a u t h o r s

f o c u s e d o n s t a t e o f t h e a r t f in i t e- s ta t e m o d e l s . I t c o n t a i n s p a p e r s r e p o r t i n g r e s e a r c h o n

m o r p h o l o g y , l e x i c o n c o n s t r u c t i o n , ( s u r f a c e ) p a r s i n g , p a r t - o f - s p e e c h t a g g i n g , p h o n e t i c

c o n v e r s io n , i n fo r m a t i o n r e tr ie v a l, a n d s p e e c h r e c o g ni t io n . A l t h o u g h m a n y c h a p t e r s a r e

w r i t t e n b y r e s e a rc h e r s w h o a r e c u r r e n t l y d e v e l o p i n g t h e i r w o r k a t A m e r i c a n r e s e a r ch

c e n te r s, m o s t c o n t r i b u ti o n s t o t h is b o o k c a m e f r o m t h e E u r o p e a n N L P s c h o o l.

T h e p r e f a c e , t h e 1 5 c h a p t e r s , a n d t h e i n d e x c o m p r i s e a l t o g e t h e r a b o o k o f 4 6 4

p a g e s . E a c h c h a p t e r i s a n i n d e p e n d e n t p a p e r a b o u t a s pe c if ic to p ic . M o s t o f th e p a p e r s

r e q u i r e s o m e r e a s o n a b l e b a c k g r o u n d i n c o m p u t a t i o n a l l i n g u i s t i c s a n d i n c o m p u t e r

sc ience .

C h a p t e r 1 p r e s e n t s a g e n e r a l i n t r o d u c t i o n t o t h e t h e o r y o f f i n it e -s t a te a u t o m a t a

a n d t r a n sd u c e r s . T h e o r d e r o f t h e n e x t c h a p t e r s d o e s n o t s e e m t o f o l lo w a n y s p e ci fi c

c r i t e r i o n ; h o w e v e r s o m e r e l a t e d c h a p t e r s a r e c o n s e c u t i v e . T h e b o o k i n c l u d e s a l i s t o f

c o n t r i b u t o r s w i t h t h e i r a f f i l i a t i o n s a n d a d d r e s s e s , a n d a s m a l l t e r m i n d e x .

2 . C o n t e n t s

C h a p t e r 1, w r i t t e n b y t h e e d i t o r s o f th e b o o k , i n t r o d u c e s m o s t o f t h e c o n c e p t s n e c e s s a r y

t o r e a d t h e b o o k , t h o u g h i t d o e s r e q u ir e s o m e p r e v i o u s k n o w l e d g e o f t h e su b je c t . T h e

Page 2: Finite-State Language Processing.pdf

7/29/2019 Finite-State Language Processing.pdf

http://slidepdf.com/reader/full/finite-state-language-processingpdf 2/3

Com putat ional Linguist ics Volume 24, ,Num ber 4

m o s t v a l u a b l e p o i n t o f t h i s c h a p t e r i s t h e p r e s e n t a t i o n o f s o m e f i ni t e -s t a te c o n c e p t s

t h r o u g h N L P e x a m p l e s .

C h a p t e r s 2 a n d 6 a r e r e l e v a n t t o m o r p h o l o g i c a l p r o c e s s i n g . I n C h a p t e r 2 , D a v i d

C l e m e n c e a u a d d r e s s e s t h e w e l l - k n o w n p r o b l e m s o f de r iv a t io n a l m o r p h o l o g y a n d t h e

o p p o s i t i o n b e t w e e n l ex ic a l m e t h o d s ( el e ct ro n ic d i c ti o n ar ie s ) a n d h e u r is t ic m e t h o d s

( d e r iv a t i o n r u le s ). H e a d v o c a t e s m i x i n g t h e t w o a p p r o a c h e s . H o w e v e r , o n e o f h i s a r-g u m e n t s a b o u t " u n r e a li s ti c a ll y " l a r g e d ic t io n a r ie s s e e m s to b e l o s in g s o m e g r o u n d d u e

t o r e c e n t a d v a n c e s i n t e c h n o l o g y . T h i s c h a p t e r a l s o p r e s e n t s M O R P H O , a m o r p h o l o g -

i ca l a n a l y z e r b a s e d o n K o s k e n n i e m i ' s t w o - l e v e l m o d e l . I n C h a p t e r 6, M a x S i l b e rz t e in

a d o p t s a s i m i l a r a p p r o a c h t o n a t u r a l l a n g u a g e l e xi c a l a n a l y s i s, e s p e c i a l l y i n t h e p r e s -

e n ce o f c o m p o u n d w o r d s .

C h a p t e r 3 , b y K i m m o K o s k e n n i e m i , d i s c u s s e s t h e r e l e v a n c e o f t h e t w o - l e v e l m o r -

p h o l o g y m o d e l t o f in i te - st a te c a l c u lu s a n d h o w t h e i d e a s b e h i n d t w o - l e v e l m o r p h o l o g y

c a n b e e x t e n d e d t o d e s i g n F i n i t e - S t a t e I n t e r s e c t i o n G r a n ~ n a r s ( F S I G s ) .

I n C h a p t e r 4 , L a u r i K a r t t u n e n e x t e n d s t h e c a l c u l u s o f r e g u l a r e x p r e s s i o n s w i t h

th e r e p l a c e o p e r a t o r a s a n o t h e r t o o l t o c a p t u r e a c o m m o n o p e r a t i o n i n N L P : s t ri n gr e p l a c e m e n t . T h e t h e o re t ic a l a s p e c t s b e h i n d t h e o p e r a t o r a r e c l ar if ie d a n d e x p l a i n e d

b y e x a m p l e s . T h e re l a ti o n b e t w e e n t h e r e p l a c e o p e r a t o r a n d r e w r i t e a n d t w o - l e v e l r u l e s

i s a l s o d i s c u s s e d .

I n C h a p t e r 5, F e r n a n d o C . N . P e r e i ra a n d R e b e c c a N . W r i gh t p r o p o s e a n a l g o r i th m

t o c o m p u t e f in i te - st a te a p p r o x i m a t i o n s o f c o n t e x t- fr e e g ra m m a r s . T h e u n d e r l y i n g i n-

t e re s t o f t h e a u t h o r s i s t h e e f fi ci e nt c o m p u t a t i o n a l i m p l e m e n t a t i o n o f p h r a s e s t r u c tu r e

g r a m m a r s . T h e a p p r o x i m a t i o n a l g o r i th m a c c e p t s a n y c o n te x t -f r ee g r a m m a r a s i n p u t

a n d r e t u r n s a f i n i t e - s t a t e a u t o m a t o n t h a t r e c o g n i z e s a l l t h e s e n t e n c e s g e n e r a t e d b y t h e

g r a m m a r a n d p e r h a p s s o m e m o r e . It i s p r o v e d t h a t t h e a p p r o x i m a t i o n i s e x a c t i f t h e

g r a m m a r i s l e f t- l in e a r o r r ig h t -l in e a r. T h i s is n o t s u r p r i s in g , a s li n e a r g r a m m a r s ( r ig h t

o r l e f t ) d e s c r i b e t h e s a m e l a n g u a g e s a s f i n i t e - s t a t e a u t o m a t a .

A f i n i te - s ta t e i m p l e m e n t a t i o n o f E r ic B r il l' s p a r t - o f - s p e e c h t a g g e r i s p r e s e n t e d i n

C h a p t e r 7 b y t h e e d i t o r s o f t h e b o o k . T h e i r c o n s t r u c ti o n i s ru l e - b a s e d , a s o p p o s e d

t o t h e u s u a l s t o c h a s t ic a p p r o a c h , a n d p r o d u c e s a n e ff i c ie n t d e t e r m i n i s t i c t r a n s d u c e r

w i t h o u t s a c r if i ci n g t h e q u a l i t y o f re s u l t s. T h i s t a g g e r i s f a s t e r th a n t h e o r i g i n a l o n e

d e v i s e d b y B rill.

T h e a p p l i c a t i o n o f t r a n s d u c e r s i s a l s o th e c e n t r a l t h e m e o f C h a p t e r s 8 , 1 2, a n d 1 4. I n

C h a p t e r 8, E m m a n u e l R o c h e s h o w s h o w t o b u i l d p a r s e r s fo r c o n te x t- fl e e g r a m m a r s o r

e v e n m o r e c o m p l e x l in g u is ti c s i t u a t io n s u s i n g t r a n s d u c e rs . C h a p t e r 1 2 i s d e v o t e d t o t h e

c o m p u t a t i o n a l m a n i p u l a t i o n o f s e q u e n t i a l t ra n s d u c e r s ( d e te r m i n i st i c o v e r th e i n pu t ).

I n t h i s c h a p t e r , M e h r y a r M o h r i p r e s e n t s t h e t h e o r y r e l a t e d t o s e q u e n t i a l t r a n s d u c e r s

a n d b r i e f l y d i s c u s s e s t h e i r a p p l i c a t i o n s t o p h o n o l o g y , m o r p h o l o g y , l e x i c o n c o n s t r u c -

t i o n, s y n t a x , a n d s p e e c h p r o c e s s i n g . I n C h a p t e r 1 4, E r ic L a p o r t e p r o p o s e s a s o l u t i o n t o

p h o n e t i c c o n v e r s i o n b a s e d o n tr a n s d u c e r s a n d b i m a c h i n e s ( t r a n s d u c e r s w h e r e i n p u t

r e a d i n g i s c a r r i e d b o t h f r o m l e f t t o r i g h t a n d f r o m r i g h t t o l ef t) . I m p l e m e n t a t i o n d e t a i l s

o f B i P h o , a p h o n e t i c c o n v e r s i o n s y s t e m , a re a l s o d i s c u s s e d .

T h e b o o k r e t u r n s t o t h e F S IG fo r m a l i s m i n C h a p t e r s 9 a n d 1 0. F S I G s d e s e r v e

t o b e m e n t i o n e d a s o n e o f t h e f i r s t f i n i t e - s t a t e a p p l i c a t i o n s i n N L P . A t r o V o u t i l a i n e n

e x a m i n e s i n C h a p t e r 9 t h e p r e p a r a t i o n o f t h e i n f o r m a t i o n t o b u i l d a d a t a b a s e f o ra f i n i t e - s t a t e p a r s e r . M a n y r e m a r k s m a d e i n t h i s c h a p t e r a r e a p p l i c a b l e t o p a r s e r s

i n g e n er a l. A d r a w b a c k p o i n t e d o u t i n th e f ir st w o r k s o n F S I G s w a s t h e t i m e a n d

s p a c e r e q u i r e m e n t s o f a n a c t u a l im p l e m e n t a t i o n o f a p a r s e r b a s e d o n t h a t f o rm a l i s m .

S o m e a l g o r it h m s t h a t o v e r c o m e th e s e p r o b l e m s w e r e s u g g e s t e d i n t h e l it e ra t u re . I n

C h a p t e r 1 0 , P a s i T a p a n a i n e n s h e d s l i g h t o n t h o s e s o l u t i o n s a n d g i v e s a c o m p a r a t i v e

c o m p l e x i t y a n a l y s i s o f t h e a l g o r i th m s .

642

Page 3: Finite-State Language Processing.pdf

7/29/2019 Finite-State Language Processing.pdf

http://slidepdf.com/reader/full/finite-state-language-processingpdf 3/3

Book Reviews

In Chapter 11, Maurice Gross addresses the lack of a systematic categorization

of the objects in linguistics. The author concludes that constraints encoded in finite-

state automata can be locally described, and therefore a cumulative approach to the

construction of grammars is possible.

The implementation of the sys tem Faustus is discussed in Chapter 13 by Jerry R.

Hobbs and his colleagues from SRI. Faustus is a system for extracting informationfrom running texts. The architecture of the system consists of a cascade of finite-state

transducers splitting the processing into several different stages. Unlike other papers

in the collection, this paper focuses on the implementation of a real system.

The final chapter, by Fernando C. N. Pereira and Michael D. Riley, presents a

general framework for implementing speech recognizers. The interesting point of this

chapter is the application of weigh ted finite-state automata and transducers to repre-

sent data structures common in speech recognition.

3 . E v a l u a t i o n

Finite-State Language Processing is probably the first book covering the current work in

the area in a comprehensive way. It will be valuable to many researchers in linguistics,

especially those who are interested in nonclassical approaches to NLP. It should be

also appealing to those who come from computer science and are mot ivated to work in

computat ional linguistics. As a textbook it is appropriate for a postgraduate seminar.

The chapters are in general well written in a direct and easy-to-read style with

ma ny examples. Each chapter includes its own list of references; there is no unified list,

which might have been useful. There are some minor mistakes and inconsistencies,

quite common in a collection of loosely related papers.

The text is not directed to those who look for immediate implementation solutions.

Much of the material is treated in a fairly theoretical way in spite of the discussion of

many practical aspects. Finally, there are some subjects that are no t covered or are just

mentioned, such as applications to corpus processing and machine translation.

Ma rio Josd Cdccamo is a doctoral student in computer science. In his recent Master's thesis, he de-

scribed the implementation of an FSA-based environment for syntactic pattern processing that

can be used for applications that require surface parsing such as agreement advisers. Tomasz

Kowaltowski is a Professor of Computing at the University of Campinas whose interests include

applications of FSAs in representing large linguistic databases. The reviewers' address is: Insti-

tute of Computing, University of Campinas, Caixa Postal 6176, 13083-970 Campinas, SP, Brazil;

e-mail: {mcaccamo,tomasz}@dcc.unicamp.br

643