pdf hosted at the radboud repository of the radboud ... · words beginning with strong syllables...

5
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/15650 Please be advised that this information was generated on 2018-07-07 and may be subject to change.

Upload: others

Post on 21-Mar-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PDF hosted at the Radboud Repository of the Radboud ... · words beginning with strong syllables (i.e. syllables containing a full vowel) as beginning with weak syllables (i.e. syllables

PDF hosted at the Radboud Repository of the Radboud University

Nijmegen

The following full text is a publisher's version.

For additional information about this publication click this link.

http://hdl.handle.net/2066/15650

Please be advised that this information was generated on 2018-07-07 and may be subject to

change.

Page 2: PDF hosted at the Radboud Repository of the Radboud ... · words beginning with strong syllables (i.e. syllables containing a full vowel) as beginning with weak syllables (i.e. syllables

207

THH P R O S O D I C S T R U C T U R E O P IN IT IA L S Y L L A B L E S IN E N G L I S H

A n n e C u t l e r ’ and David Car te r*

A B S T R A C T

Studies of' h u m a n c o n i i n u o u s - s p e e c h recogni t ion suggest that l i s teners use a s t ra tegy o f pos tu la t ing a word bounda ry , and ini t ial ing a lexical access p rocedure , at each met r ica l ly s t r u n g syllable. T h e likely success of' this s t ra tegy was here e s t im a ted against the character is t ics o f the English vocabulary. C o m p u t e r i s e d d ic t ionar ies o f Engl ish were found to list app rox imate ly th ree l imes as m a n y words be g in n in g with s t rong syllables (i.e. syllables con ta in ing a full vowel) as be g in n in g with weak syllables (i.e. syllables con ia in ing a reduced vowel) . F u r t h e r m o r e , the m ean f r equency of oc cu r r e nc e o f words b e g in n in g with s t rong syllables is near ly twice as great as that o f words beg inn ing with weak syllables. T h es e f indings m o t iv a t ed an es t imate for eve ryday speech recogni t ion that a p p r o x i m a t e l y X5% o f lexical words (i.e. exc lud ing func t ion words) will begin with s t rong syllables. In fact, in a large corpus o f s p o n t a n e o u s conv e r sa t io n 90 % o f lexical words were found to begin with s t rong syllables.

I N T R O D U C T I O N

Word recogni t ion in c o n t i n u o u s speech is compl ica ted by the ab sence o f reliable word b o u n d a r y correlates . P luman l is teners neve r the l e s s recognise words in r u n ­ning speech at least as eff icient ly as they recognise words in isolat ion, if' not m o r e eff icient ly ( r e f I). R ecen t s tud ies o f h u m a n speech process ing h av e sugges ted that l i s teners m a y use heur is t ic s t rategies for o v e r c o m i n g the ab sen ce o f word b o u n d a r y in fo rm a t ion . Such s t ra teg ies may al low l isteners to guide thei r lexical access a t t e m p t s by pos tu la t ing word onse t s at what l inguist ic e x p e r i e n c e sugges t s are the m o s t l ikely locat ions for word onse t s to occur.

Cu t l e r and Norr i s ( r e f 2) have p roposed such a s t ra tegy based on prosod ic s t r u c ­ture. In a s t ress language like English, syllables can be e i the r s t r o n g o r weak; s t rong syl lables con ta in full vowels, while weak syllables conta in reduced vowels (usua l ly schwa) . C u l l e r and Norr is found thai l i s teners w-ere s lower to de tec t the e m b e d d e d real word in miniavf ( in which the s econd vowel is s t rong) than in min- tef( in which ihe second vowel is schwa) . They sugges ted that l i s teners were s e g ­m e n t i n g miniii) / pr ior to the second syllable, so that de tec t ion o f mini t h e r e f o r e r e ­quired c o m b i n i n g speech mater ia l f rom paris o f the signal which had b e en seg ­m e n te d f rom one a no the r . No such diff icul ty would arise for the de t ec t io n o f mini in mime/, s ince the weak second syllable would not be s e g m e n t e d f rom the p reced ing mater ial . C u t l e r and Norr is p roposed that, in Engl ish, l i s teners use s t rong syl lables as the basis for a s e g m e n t a t i o n s t ra tegy in c o n t i n u o u s speech p r o ­cessing. S t r o n g syl lables are taken to be likely word onse ts , and the c o n t i n u o u s speech s t r e a m is s e g m e n t e d at s t r o n g syllables so that lexical access a t t e m p t s can be ini t iated.

M R C Appl ied Psychology Uni t , 15 C h a u c e r Rd. , C a m b r i d g e CB2 2EP.

' C o m p u t e r Labora to ry , U n ive r s i ty o f C am b r id ge , C o r n E x c h a n g e St., C a m b r i d g eCB2 3QG.

Page 3: PDF hosted at the Radboud Repository of the Radboud ... · words beginning with strong syllables (i.e. syllables containing a full vowel) as beginning with weak syllables (i.e. syllables

208

The success rale o f such a st rategy, how eve r , depends at least in part on how' real­istically it ref lects the character is t ics o f the vocabulary. H ypo thes i s ing that s t rong syl lables may be word onse t s is unl ikely to be a very eff icient s t ra tegy for d e l e c t ­ing actual word o n se t s if mos t actual words do not begin with s t r o n g syllables. T h e p resen t s tudy e s t ima tes the likely success rale o f the s t ra tegy p roposed by C u l l e r and Norr is against the character is t ics o f the English vocabulary, and then tes ts it on an aciual corpus o f English conversa t ion .

W O R D - I N I T I A L S Y L L A B L E S IN E N G L I S H

T h e M R C Psychol inguis t ic Da tabase ( r e f 3) is a lexicon o f o v e r 98000 words, based on the S h o r t e r Oxford Dict ionary. O v e r 33000 en t r i es h a v e ph o n e t i c t r an ­scr ipt ions . Eig. I shows the prosodic character is t ics o f the initial syl lables o f the t ranscr ibed words, divided into four categories: monosy l l ab les ( s u c h as bone or splint), polysyl lables with p r imary st ress on the first syllable ( s u c h as lettuce or splendour), polysyllables with s econdary s t ress on the first syllable ( s u c h as trom­bone o r polysyllabi city), and polysyllables with weak initial syl lables( in which the vowel in the first syl lable is usual ly schwa, as in annoy or trapeze, bu t m a y also be a r e du ced fo rm o f a n o t h e r vowel, as in invest o r external). A n y o f the first three ca tegor ies would sat isfy the s e g m e n t a t i o n s t ra tegy proposed by C u t l e r and Norris. It can be seen that these categories t o ge th e r account for 73% o f the words analysed.

Since the p roposed s t ra tegy is a im ed at the efficient ini t iat ion o f lexical access, ho w ev e r , it is r easonab le to exc lude f rom our analysis those words whose in­te rp re ta t ion in a speech con tex t relies not upon a lexical l o o k u p but upon strictly c o n te x tu a l factors; that is, it is r easonable to exc lude g rammat ica l w-ords ( s u c h as art icles, c o n ju n c t io n s and p r o n o u n s ) . T h e d is t r ibu t ion o f the p rosod ic cha rac te r i s ­tics o f the initial syllables o f lexical words ( n o u n s , verbs, adject ives and mos t a d ­ve rbs ) in the M R C D a tabase is, however , vir tual ly identical to Eig. I, s ince exc lu ­s ion o f g rammat ica l words reduced the total corpus size by less than 1%.

1 1 . 7 ™«J /o

2 7 0 4 %

/

//

«

/

TTT

K V . . . . . . . . . . . . C> l

H • * • « • • • • • • • • • • \« . . . « . . • • • • . . • • • a • • • • • • • • • • • \

. . . . • . . . . • \................... « • • • • • * # * • • • • • • • • • • • *

• • • • • # « • • • - ■_ • • • « • ................................................. ... • • •

• • « • • • • • • • • •

V V V . V . *□

mono poly 1 P o Ì y 2 polyO

. * • • • • • • • • • • • • •

. - . v . • . . . v . • •• -• • a V a •, . • ■ I I• • • • t • • • • • • • • • « • • • • • • • - • • • • •

. v . ' . ' . i ' • • .• •••• • * •••• •;• •• « . • • • « • • . . # • • • • • •

10

................................................................. « t t i * • • • • » • • • • •■ • « ■ • • • » I * * * * • « • * • * • • • # # • * • • • • • • • *•V. • , v # • • • • • • • • •• I • I I I i • * • • • • , I I « • • • • * _ • •

• • • • • • • • • • • •

’. v ..................v » V •*»*# V « V V •'«'•*/»** . V . * V

• » v . v . •a • • • • • • • # • • • • • • « • • • • • • • •

. f , ■ • • • • • • • t • # * . « a a . a « • ■ • • • • • • . > • • ' • • • • • • • • • • * * • ' .

' • • v yV j f * • ►

• • • • •. » . * • • • »

• » • • • • « « • • • • » • * • • • • • • • • • • • • • • % • • • •

• • • • * • * • • a • • • •I • • •

If * *'*4. • •• •• •••

a• a . . . . • • • • •

. a . • * • • * a a •• a . « a * » a . | a « l • • « • « • • • • • > • * • • a • • • # « # •

. .................................................................... ... • • • • • * * •a . . a • • • •. .

a * I a a * a a . • a • • a • • • • • • « • • •................................... .........................................I ................... ... • l a • a a a ....................... • ■ • •

• • • • • • • • * I • • • • • • • • • • • • • • • • • • • • • * .. • • • • • • • • * • « • • • • • ■ ' * • • * * • • • * •- * ‘ l a • 50.56%

. , ■ i a a a • • * • • • a a * # * * * - * • . # - a • a • • • • • • • •

J . . . • . a .................................. ... • a - • M*. a a • • a • • a ■ • • • a • • • < • a a a a , .

• • • • • « • • • • • • • • « a a • « a

Eig. I. P rosod ic categor ies as p ropor t ions o f the MRC' Database .

Page 4: PDF hosted at the Radboud Repository of the Radboud ... · words beginning with strong syllables (i.e. syllables containing a full vowel) as beginning with weak syllables (i.e. syllables

mono poly 1 p o 1 v 2 polyO

Fig. 2. M e a n f r equency of O c c u r r e n c e for lexical i t ems by prosodic category.

W O R D P R O S O D Y A N D F R E Q U E N C Y O F O C C U R R E N C E

The m o s t c o m m o n word type in English is clearly a polysyllable with initial s t ress. However , individual word types differ in the f r equency with w'hich they occur. F r e q u e n c y o f occu r r e nc e statistics ( r e f 4) are listed in the M R C Database . Fig. 2 shows the m e a n f r equency for the four prosodic word-ca tegor ies ( lexical words only). It can be seen that monosy l lab les occur on average far m o re f r equen t ly than o t h e r p rosodic types. Thus a l though the re are m o r e than seven t imes as m a n y polysyl lables in the language as the re are monosyl lab les , ave rage speech con tex t s are likely to con ta in a lmos t as m a n y monosy l l ab les as polysyllables. Fig. 3 sho w s an e s t ima te o f the likely d is t r ibu t ion of prosodic ca tegor ies in a real speech con tex t , de r ived f rom a co m b in a t io n o f the da ta in Figs. I and 2; this s u g ­gests that only 17% o f le xical tokens wi11 begin w ith weak syllables.

I / . 1 0 /0

Ò .0 / %

■ rn o n o0 Doly 1□ poi y 2□ polyO

i /o

Fig. 3. P red ic t ed d i s t r ibu t ion of prosodic ca tegor ies in real speech

Page 5: PDF hosted at the Radboud Repository of the Radboud ... · words beginning with strong syllables (i.e. syllables containing a full vowel) as beginning with weak syllables (i.e. syllables

2 1 0

W O R D P R O S O D Y IN A N A T U R A L S P E E C H S A M P L E

W e tes ted the e s t ima te sh o w n in Fig. 3 against a natura l speech sample , the London-Lund Corpus o f English Conversation ( r e f 5), us ing the f r equency c o u n t of this co rpus p repared by Brown ( r e f 6). T h e L o n d o n - L u n d corpus consis ts o f ap-

British Engl ish conve r sa t ion . Fig. 4p r o x im a te ly 190,000 words o f s p o n t a n e o u s British Engl ish conve r sa t ion , s h o w s the d i s t r ibu t ion o f prosodic categor ies for lexical words in this corpus . The t h r e e ca tegor ies with s t ro n g initial syllables accoun t for 90% o f the tokens ; only 10% o f the lexical words have weak initial syllables.

10.05%

2.55%

■ monom poly 1

poly2□ polyO

MB «I • • • • • • • • • • • • • • • • • • • • • • • • • • •

/ f /“S m •v.v/.%v.vav.v.%v.v.>sv.v.;.v.\v.v,v.v.x*X%vA ^ r • • • • • • • • • •

59 .54%

Fig. 4. D i s t r i b u t ion o f prosodic categor ies in the Corpus o f English Conversation.

C O N C L U S I O N

T h e d i s t r ibu t ion o f word types in the Engl ish vocabulary, c o m b i n e d with relative f r e q u e n c y o f oc cu r r e nc e across types, p rov ides an a d e q u a t e basis for the imple­m e n t a t i o n o f a s e g m e n t a t i o n s t ra tegy in c o n t i n u o u s speech recogni t ion whereby s t r o n g syl lables are a s s u m e d to be the onse t s o f lexical words.

A C K N O W L E D G E M E N T S

T h i s r e s e a r c h w as s u p p o r t e d by a g r a n t f r o m t he A l v e y D i r e c t o r a t e ( M M 1-069) to C a m ­b r i d g e U n i v e r s i t y , t he M e d ic a l R e s e a r c h C o u n c i l a n d S t a n d a r d T e l e c o m m u n i c a t i o n s L a ­b o r a t o r i e s . W e t h a n k G o r d o n B r o w n f o r m a k i n g ava i l ab l e t h e m a c h i n e - r e a d a b l e v e r s i on o f hi s f r e q u e n c y c o u n t o f t he L o n d o n - L u n d c o r p u s .

R E F E R E N C E S

1. E X . S c h w a b , H . C . N u s b a u m & D. B. P i s on i , H u m . F a c t o r s , 27, 395 ( 1985) .2. A. C u t l e r & D . G . N o r r i s , J. Exp . P s y c h o l : H u m . P ere . Pert', ( in p r e s s ) .3. M. C o l t h e a r t , Q u a r t . J. Ex p . P s y c h o l . J J A , 497 ( 198 1).4. H. K u c e r a & W . N . F r a n c i s , C o m p u t a t i o n a l A n a l y s i s o f P r e s e n t - D a y A m e r i c a n En g l i s h

( B r o w n U n i v . P r e s s , P r o v i d e n c e , 1967) .5. J. S v a r t v i k & R. Q u i r k , A C o r p u s o f E n g l i s h C o n v e r s a t i o n ( G l e e r u p , L u n d , 1980) .6 . G . D . A . B r o w n , Beh . Res . M e t h . Ins t r . & C o m p . / 6 , 502 ( 1984) .