phonetic recoding of print and its effect on the detection of
TRANSCRIPT
Has/Uns Laboratories Status Report on Speech Reuarch1992, SR-109/110, 179-190
Phonetic Recoding of Print and Its Effect on the Detectionof Concurrent Speech in Amplitude Modulated Noise*
Ram Frostt
When an amplitude-modulated noise generated from a spoken word is presentedsimultaneously with the word's printed version, the noise sounds more speeehlike. Thisauditory illusion obtained by Frost, Repp, and Katz (1988) suggests that subjects detectcorrespondences between speech amplitude envelopes and printed stimuli. The present studyinvestigated whether the speech envelope is assembled from the printed word or whether it islexically addressed. In two experiments subjects were presented with speech-plus-noise andwith noi!MH)n1y trials, and were required to detect the speech in the noise. The auditory stimuliwere accompanied with matching or nonmatching Hebrew print, which was unvoweled inExperiment 1 and voweled in Experiment 2. The stimuli of both experiments consisted of highfrequency words, low-frequency words, and nonwords. The results demonstrated thatmatching print caused a strong bias to detect speech in the noise when the stimuli were eitherhigh- or low-frequency words, whereas no bias was found for nonwords. The bias effect forwords or nonwords was not affected by spelling to sound regularity- that is, similar effectswere obtained in the voweled and the unvoweled conditions. These results suggest that theamplitude envelope of the word is not assembled from the print. Rather, it is addresseddirectly from the printed word and retrieved from the mental lexicon. Since amplitudeenvelopes are contingent on detailed phonetic structures, this outcome suggests thatrepresentations of words in the mental lexicon are not only phonologic but also phonetic incharacter.
It is generally assumed that the processing ofwords in the visual and auditory modalities differsin the initial phase because of different inputcharacteristics, but converges at later stages.Hence, findings regarding the influence oforthographic information on the perception ofspeech, and findings showing how spokeninformation visual word perception, may suggesthow print and speech are integrated in the mentallexicon.
This work was supported in part by a grant awarded to theauthor by the Basic Research Foundation administered by theIsrael Academy of Science and Humanities, and in part by theNational Institute of Child Health and Human DevelopmentGrant HD-01994 to Haskins Laboratories. I am indebted toLiora Yancovitch for her help in conducting the experiments,and to Benny Shanon, Bruno Repp, and three anonymousreviewers for their comments on earlier d.rafts of this paper.
179
The present study is concerned with a special formof interaction between the visual and auditorymodalities during word recognition. It discussesthe possible origins of an illusion of hearingspeech in noise caused by simultaneouspresentation of printed information.
The convergence of printed and spoken stimulirepresentations during processing has beenpreviously demonstrated in unimodal studies. Ithas been shown that lexical decisions to spokenwords are facilitated if successive words share thesame spelling (Jakimik, Cole, & Rudnicky, 1980).Similarly, Hillinger (1980) has shown thatpriming effects with printed words were enhancedwhen primes and targets were phonemicallysimilar. However, the influence of one modality onprocessing in the other modality can be shownmore directly in cross-modal studies. It has beenestablished that printed words can prime lexicaldecisions to spoken words and vice versa (Hanson,1981; Kirsner, Milech, & Standen, 1983).
180 Frost
Similarly, using the naming task, Tanenhaus,Flanigan, and Seidenberg (1980) havedemonstrated a visual-auditory interference in aStroop paradigm. These results were interpretedto show that reading and listening share onelexicon, which allows identical messages to beunderstood in the two modalities in the same way.
Stronger but more controversial evidenceconcerning the interaction of the visual and theauditory modalities comes from studies demonstrating cross-modal influence occurring beforethe completion of input analysis. According to astrongly interactive view, some or all stages of theperceptual process in one modality may beinfluenced by activation in the other modality. Forexample, it has been suggested that automaticgrapheme-to-phoneme activation might occurprior to word recognition, hereby affecting theprocess of auditory lexical access through sublexical activation in the visual modality (e.g.,Frost & Katz, 1989; Dijkstra, Schreuder, &Frauenfelder, 1989). Dijkstra et al. (1989) haveshown that a visual letter prime can facilitate theauditory detection of a vowel in a syllable.Similarly, Layer, Pastore, and Rettberg (1990)have reported results showing faster identificationof an initial auditory phoneme when congruentvisual information was presented simultaneously.
Perceptual cross-modal influences can be shownat levels higher than graphemes and phonemes.In a recent study, Frost et al. (1988) have reportedan auditory illusion occurring when printed wordsand masked spoken words appear simultaneously.Subjects were presented with speech-plus-noiseand with noise-only trials, and were required todetect the masked speech in a signal detectionparadigm. The auditory stimuli were accompaniedby print which either matched or did not matchthe masked speech. Since the noise used in thisexperiment was amplitude modulated, (Le., thespoken word was masked by noise with the sameamplitude envelope), when a printed wordmatched the spoken word, it also matched theamplitude envelope of the noise generated from it.Frost et al. (1988) have shown that, whetherspeech was indeed present in the noise or not,subjects had the illusion of hearing it in the noisewhen the printed stimuli matched the auditoryinput. These results demonstrate that subjectautomatically detected a correspondence betweennoise amplitude envelopes and printed stimuliwhen they matched. The detection of thiscorrespondence made the amplitude-modulatednoise sound more speechlike, causing a strong
response bias. This effect was extremely reliableand appeared for every subject tested. The biaseffect did not appear when the printed words andthe spoken words from which the amplitudeenvelopes were generated were merely similar intheir syllabic stress pattern, or phonologicstructure. These results suggest that the printedwords were recoded into a very detailed,speechlike, phonetic representation that matchedthe auditory information, thereby causing theillusion.
One important finding reported by Frost et al.(1988) relates to the processing of nonwords.When the printed and spoken stimuli were pseudowords (nonwords which were phonotacticallyregular), the bias to hear speech in the noise inthe matching condition was much smaller. Thisresult is of special interest because subjects couldnot identify the masked spoken stimuli, andtherefore were unaware that they consisted ofnonwords. Nevertheless, they could not detect acorrespondence between a printed letter stringand its amplitude envelope if it was not a legalword. One possible interpretation of this outcomeis that in contrast to words, the covertpronunciation of nonwords is generated eitherpre-lexically from the print, or indirectly byaccessing similar words in the lexicon.Apparently, either process is too slow or tootentative to enable subjects to match the resultinginternal phonetic representation to asimultaneous auditory stimulus before thatstimulus is fully processed.
However, a more radical interpretation of thewords-nonwords differences can be suggested. It ispossible that amplitude envelopes are stored asholistic patterns in the lexicon, and are addressedautomatically by printed words. According to thisinterpretation, the bias effect could not have beenobtained for nonwords, because nonwords are notrepresented in the mental lexicon, and theirprinted forms could not have addressed any storedamplitude envelope. It is important to explore thishypothesis further since it has direct relevance tomodels concerned with the representations ofspoken words in the mental lexicon, and withmodels of visual lexical access. Models of spokenword recognition often assume thatrepresentations of words in the lexicon arephonologic in nature, and that the contactrepresentations generated from the speech waveare abstract linguistic units like phonemes andsyllables (See Frauenfelder & Tyler, 1987, for areview). According to the above interpretation,
PJwnetic Receding ofPrint and Its Effect on the Detection ofConcurrent Speech in Amplitude Modulated Noise 181
however, representations of spoken words aremaximally rich, consisting not only of abstractlinguistic units, but also of detailed phoneticinformation· such as spectral templates, andamplitude envelopes. Amplitude envelopes cannotbe considered phonological representations because they do not provide the explicit phonemic orsyllabic structure of the word. Rather, they retainsome speechlike features and convey mostlyprosodic and stress information. A similar nonphonologic approach to the mental lexicon, wasadvocated by Klatt in his LAFS (Lexical AccessFrom Spectra) model (Klatt, 1979; see Klatt, 1989,for a review; see also Gordon, 1988; Jusczyk,1985).
This issue is also relevant to current discussionsconcerning the processing of printed words.Models of visual word perception are indisagreement concerning the extent of phonological recoding during printed word recognition(e.g., Seidenberg, 1985; Van Orden, 1987). Oneclass of models assumes that phonological codesare generated automatically following visualpresentation and mediate lexical access (Perfetti,Bell, & Delaney, 1988; Van Orden, Johnston, &Halle, 1988; and see Van Orden, Pennington, &Stone, 1990 for a review). In contrast, it has beensuggested that phonological codes are seldomgenerated during visual word recognition, andthat with the exception of very infrequent words,printed words activate orthographic units that aredirectly related to meaning in semantic memory(e.g. Seidenberg, 1985; Seidenberg, Waters,Barnes, & Tanenhaus, 1984). Thus, resultsdemonstrating that a visual presentation of aprinted word produces a detailed phoneticrepresentation that includes the word's amplitudeenvelope, even when the experimental task doesnot require it, provide support for automatic andrapid phonetic recoding in silent reading.
The aim of the present study was to examinefurther the hypothesis that amplitude envelopesrepresentations of spoken words are notassembled pre-lexically from the print, but arestored holistically in the mental lexicon, and areaddressed directly and automatically by matchingprinted words following lexical access. Thegeneration of a phonetic representation from theprint can theoretically be achieved through a prelexical process that maps representation ofgraphemes into representation of phonemes by .applying grapheme-phoneme correspondencerules, and subsequently by transforming theabstract phonologic structure into a detailed
representation for silent or overt reading. Thisprocess has been often suggested to characterizethe naming of novel words or of nonwords (e.g.,Coltheart, 1978). Note that whether thephonologic and phonetic structures are derived byapplying grapheme-phoneme correspondence rules(Venezky, 1970), or by analogy (Glushko, 1979) isirrelevant in the present context, since bothprocedures assume that the phonologic code isgenerated prior to the selection of a lexicalcandidate (i.e., prior to lexical access). In contrastto this account, the hypothesis forwarded in thepresent study suggests that possible differences inbias between words and nonwords do not resultfrom the relative speed or ease with which thegraphemic structure is transformed pre-lexicallyinto a phonetic code. Rather, they emerge becauseprinted words address a maximally rich lexicalrepresentation which contains, among otherthings, the amplitude envelope of the spokenword. Nonwords, on the other hand, are notrepresented in the mental lexicon, and thereforecannot address their amplitude envelope.
For this purpose, the present study employedthe speech detection task proposed by Frost et al.(1988) and examined whether the bias effectcaused by matching print depends on the speed ofprint processing and on spelling-to-soundregularity, or whether it has a lexical origin.Spelling-to-sound regularity and the speed ofgenerating phonological codes were manipulatedby using word frequency and the uniquecharacteristics of the Hebrew orthography.
In the two experiments reported here, subjectswere presented auditorily with speech-plus-noiseor with noise only trials, simultaneous with a visual presentation of printed Hebrew words. InHebrew, letters represent mostly consonants,while vowels can optionally be superimposed onthe consonants as diacritical marks. Like otherSemitic languages, Hebrew is based on wordfamilies derived from tri-consonant roots.Therefore, many words share a similar or anidentical letter configuration. If the vowel marksare absent, a single printed consonantal stringusually represents several different spoken words.Thus, in its unvoweled form, the Hebreworthography is considered a very deep orthography: it does not convey to the reader the fullphonemic structure of the printed word, and thereader is often faced with phonologicalambiguity.l In contrast, the voweled form is avery shallow writing system. The vowel marksconvey the missing phonemic information making
182 Frost
the printed word phonemically Wlequivocal (butsee Frost, in press, for a discussion).
Several studies in Hebrew have established thatthe presentation of Wlvoweled print encouragesthe use of orthographic codes to access the lexicon.In order to assign a correct vowel configuration tothe printed consonants to form a valid word, readers of Hebrew have to draw upon their lexicalknowledge. The complete phonological structure ofthe printed word can only be retrieved post-lexically, after one word candidate has been accessed.<Bentin, Bargai, & Katz, 1984; Frost, Katz, &Bentin, 1987). In contrast, the explicitpresentation of vowel marks provides the readerwith the complete phonemic structure of the word(or nonword). Because the voweled orthography ischaracterized by grapheme-to-phoneme regularity,the diacritical marks enable the generation of apre-lexical phonologic code by using simplespelling-to-sound conversion rules. This specialcharacteristic of the Hebrew orthography wasexploited in order to investigate whether the biaseffect caused by matching print on speechdetection in noise is affected by the print'sphonologic transparency. Specifically, we examined whether the bias effect is dependent onthe presentation or the omission ofvowel marks.
EXPERIMENT 1In Experiment 1 subjects were presented with
high- and low-frequency spoken words, as well aswith nonwords, which were masked by noise withthe same amplitude envelope. In addition, thenoises were presented alone. The subjects' taskconsisted of deciding in each trial whether speechwas present in the noise, or whether there wasnoise only. Simultaneous with the auditorypresentation, a printed Wlvoweled Hebrew letterstring appeared on a computer screen. Sometimesthe printed word or nonword matched theauditory stimulus, and sometimes it did not. Ineach of these experimental conditions it wasdetermined whether the print caused a bias tohear speech in the noise.
The purpose of this experiment was three-fold:First, to examine whether the bias effect obtainedin the shallower English orthography, can beobtained in the deeper Hebrew orthography. If theeffect depends on the speed of generatingamplitude envelopes pre-lexically from the printby using spelling-to-soWld conversion rules, thenthe Wlvoweled Hebrew is at a clear disadvantage.It does not convey explicitly to the reader the fullphonemic information necessary for the
construction of the amplitude envelope. Althoughthe vowel information can be retrieved from thelexicon following visual lexical access, this processis slower to develop. Indeed, a multilingualcomparison of naming latencies (Frost et al., 1987)revealed that naming in Wlvoweled Hebrew isslower than naming latencies in shallowerorthographies like English and Serbo-Croatian.Moreover, in contrast to English and SerboCroatian, naming latencies in Hebrew were fOWldto be slower than lexical decisions. This is becausethe phonemic structure necessary for naming isnot conveyed directly by the print, but retrievedfrom the lexicon (Frost et at, 1987).
Another factor which affects the speed ofgenerating phonetic codes from print is wordfrequency. Hence, the second aim of Experiment 1was to examine whether the bias effect, ifobtained, depends on word frequency, or merelyon word lexicality. If our previous differences inbias between words and nonwords resulted fromthe speed by which the printed words andnonwords were transformed into a phoneticstructure, then one should expect a stronger effectof bias for high-frequency words relative to lowfrequency words. This is because it is easier toretrieve the phonetic structure of high-frequencywords (as reflected by faster RTs for these wordsin the naming task). If, on the other hand, theorigin of the bias effect is purely lexical, then abias should be obtained for all words, whetherfrequent or nonfrequent, but not for nonwords.
Finally, the third aim of the experiment was toexamine the bias effect in a mixed design of wordsand nonwords. Note that in the original studyreported by Frost et at (1988) we employed ablocked design. One serious handicap with ourprevious blocked design was that subjects knew inadvance whether the auditory stimuli were wordsor nonwords. This might have encouraged theadoption of different strategies for words and fornonwords, hereby causing the differences weobtained in the bias effect. In a mixed design suchWliform strategy cannot be adopted. Thus, if ourprevious results were caused by thismethodological factor, then no significantdifferences in bias between words and nonwordsshould emerge in the present mixed design, andnonwords would show the effect as well.
MethodsSubjects. Twenty-four Wldergraduate students,
all native speakers of Hebrew, participated in theexperiment for course credit or for payment.
Phonetic Receding ofPrint and Its Effrct on the Detection ofConcurrent Speech in Amplitude Moduuded Noise 183
Stimulus preparation. The stimuli were generated from 24 disyllabic words and 12 disyllabicnonwords that had a stop consonant as their initial phoneme. The number of phonemes for allstimuli was either four or five. The 24 words consisted of 12 high-frequency words and 12 low-frequency words. Because there are no reliablesources of standard objective word frequencycounts in Hebrew, subjective frequencies were assessed by averaging the ratings of 50 subjects on a1 (least frequent) to 7 (most frequent) scale. Themean ratings of the high- and the low-frequencywords were 5.3 and 3.1, respectively. The 24 wordswere all unambiguous in unvoweled print- that is,their orthographic form represented only one lexical entry. Thus, each letter string could be read asa meaningful word in only one way, by assigningto the consonant one specific vowel configuration.The nonwords were, in fact, pseudowords, thatwere constructed by altering one or two phonemesof real words. All nonwords conformed to thephonotactic rules of the Hebrew language.
The auditory stimuli were originally spoken by amale native speaker in an acoustically shieldedbooth and recorded on an Otari MX5050 taperecorder. The speech was digitized at a 20 kHzsampling rate. From each digitized word, a noisestimulus with the same amplitude envelope wascreated by randomly reversing the polarity ofindividual samples with a probability of 0.5(Schroeder, 1968). This signal-correlated noiseretains a certain speechlike quality, even thoughits spectrum is flat and it cannot be identified as aparticular utterance unless the choices are verylimited (see Van Tasell, Soli, Kirby, & Widin,1987). The speech-plus-noise stimuli were createdby adding the waveform of each digitized word tothat of the matched noise, adjusting their relativeintensity to yield a signal-to-noise ratio of -10.7dB.
Each digitized stimulus was edited using awaveform editor. The stimulus onset wasdetermined visually on an oscilloscope and wasverified auditorily through headphones. A marktone was then inserted at the onset of eachstimulus, on a second track that was inaudible tothe subjects. The digitized edited stimuli wererecorded at three-second intervals on a two-trackaudiotape, one track containing the spoken wordswhile the other track contained the mark tones.The purpose of the mark tone was to trigger thepresentation of the printed stimuli on a Macintoshcomputer screen.
Design. Each of the high-frequency words, lowfrequency words, and nonwords was presented in
two auditory forms (1) Speech-pIus-noise trials, inwhich the spoken stimulus was presented maskedby noise. (2) Noise-only trials, in which the noisewas presented by itself without the speech. Eachof these auditory presentations was accompaniedby two possible visual presentations: (1) amatching condition (i.e. the same word or nonwordthat was presented auditorily and/or that wasused to generate the amplitude-modulated noise,was presented in print); (2) a nonmatchingcondition (i.e., a different word or nonword, havingthe same number of phonemes and a similarphonologic structure as the word or nonwordpresented auditorily, or that was used to generatethe noise, was presented in print). Thus, therewere four combinations of visual/auditorypresentations for each word or nonword, making atotal of 144 trials in the experiment.
Procedure and apparatus. Subjects were seatedin front of a Macintosh SE computer screen (9"diagonal, screen size), and listened binaurally overSennheiser headphones at a comfortable intensity.The subjects sat approximately 70 cm from thescreen, so that the stimuli subtended a horizontalvisual angle of 4 degrees on the average. A boldHebrew font, size 24, was used. The task consistedof pressing a "yes" key if speech was detected inthe noise, and a "no" key if it was not. Thedominant hand was always used for the "yes"responses. Although the task was introduced aspurely auditory, the subjects were requested toattend carefully to the screen as well. They weretold in the instructions that, when a word or anonword was presented on the screen, it wassometimes similar to the speech or noisepresented auditorily, and sometimes not.However, they were informed about the equalproportions of "yes" and "no" trials in each of thedifferent visual conditions.
The tape containing the auditory stimuli wasplaced on a two-channel Otari MX5050 taperecorder. The verbal stimuli were transmitted tothe subject's headphones through one channel,and the trigger tones were transmitted throughthe other channel to an interface that directlyconnected to the Macintosh, where they triggeredthe visual presentation.
The experimental session began with 24 practicetrials, after which the 144 experimental trialswere presented in one block.
Results and DiscussionThe indices of bias in the different experimental
conditions were computed following the proceduresuggested by Luce (1963). Results computed
184 Frost
according to Luce's procedure tend to be verysimilar to results produced by the standard signaldetection computations (e.g., Wood, 1976).However, Luce's indices do not require anyassumptions about the shapes of the underlyingsignal and noise distributions, and are easier tocompute relative to the standard measures ofsignal detection theory. The Luce indices of biasand sensitivity originally named- lnb and In1'\, butrenamed here for convenience b and dare:
b = 1t2 10 [p(Yesls+n) p(yesln) Ip(no/s+n) p(noln) J,
and
d = 112 10 [p(Yesls+n) p(no/n) Ip(Yes/n) p(no/s+n) ],
the high- and the low-frequency words (p<0.05).The apparent greater bias to say "no" in thenonmatching condition relatively to thenonmatching condition for the nonwords was notsignificant.
Table 1. Bias indices (b), and (Standard Error o/theMeans) for high-frequency words. low-frequency wordsand nonwords, when matching and nonmatching printis presented simultaneously with masked speech. Printis presented unvoweled. The top b indices wereaveraged for all subjects, whereas the bottom b indiceswere averaged for the 12 subjects with the highestdetectability scores (d).
HIgh.Frequency Low-Frequency NODWordsWords Words
The average d in the experiment was 0.15.Hence, the signal-to-noise ratio which wasemployed in the experiment resulted in a very lowlevel of detection.2 In order to ensure that theobtained pattern of bias was not affected by thelow detection level, the subject sample was split inhalf, and the average b indices were recomputedfor those subjects with highest d. The average bfor this sample in the different experimentalconditions are presented in Table 1 (bottom), andconfirm that the bias was unaffected by the levelof detection. This outcome is in accordance withresults presented by Frost and his colleaguesshowing significant bias effects over a wide rangeof signal-to-noise ratios.3
The data of Experiment 1 thus reveal that thebias in the visual matching condition was obtained even in the unvoweled Hebrew orthogra-
Average d = 0.32 (0=12)
Average d = 0.15 (11=24)
where s+n and n stand for speech-plus-noise andnoise only, respectively. The indice b assumespositive values for a tendency to say "yes" andnegative values for a tendency to say "no." Forexample, according to the above formula, in orderto obtain an average b of +0.5, the subject mustgenerate on the average 60 percent more positiveresponses than negative ones. The indice d assumes values in the same general range as the d'of signal detection theory, with zero representingchance performance.
The average values for the bias indices in eachexperimental condition are shown in Table 1 (top).There was a bias to say "yes" in the matchingcondition for high-frequency and for low-frequencywords, whereas there was no bias in thenonmatching condition. The bias effect found forhigh-frequency words was not stronger than thatfor low-frequency words. In fact the oppositepattern was obtained. In contrast to the high- andthe low-frequency words, there was no bias to say"yes" for nonwords in the matching condition.There was, however, a bias to say "no" in thenonmatching condition.
The bias indices were subjected to a two-wayanalysis of variance with the factors of word type(high-frequency words, low-frequency words, andnonwords) and visual condition (matching print,nonmatching print). The main effects of word typeand visual condition were significant (F(2,46=17.3,MSe = 0.48, p < 0.001, and F(1,23)=22.0, MSe=0.64, p<O.OOl, respectively). The two-wayinteraction was also significant (F(2,46>:6.5,MSe=0.19, p<0.003). A Tukey post-hoc analysisrevealed that the differences in bias betweeneither type of words and between the nonwordswere reliable, as well as the difference between
Match
No Match
Match
No Match
0.55
(0.14)
-0.11
(0.12)
0.57
(0.20)
-0.35
(0.17)
0.94
(0.18)
0.02
(0.14)
0.99
(0.19)
-0.15
(0.13)
-0.19
(0.14)
-0.48
(0.13)
-0.20
(0.15)
-0.67
(0.17)
Phonetic Recoding ofPrint and Its Effrct on the Detection ofConcurrent Speech in Amplitude Modulated Noise 185
phy. However, similar to our previous study, thiseffect can be demonstrated only when legal wordsare presented in the visual modality. The bias effect was not reduced when the printed words wererelatively infrequent. This suggests that the speedby which the phonetic structure of the word is retrieved does not affect the illusion of hearingspeech in the noise in the matching condition. 4
The unexpected stronger effect of bias obtained forthe low-frequency words may be possibly relatedto the phonetic features of the words employed.This possibility will be further considered in theGeneral Discussion.
The most significant outcome of the experimentis that there was no bias in the matching condition for nonwords. Since in the present study amixed design was employed, this effect cannot beattributed to a uniform "set" strategy adopted forthe nonwords. Although there was a greatertendency to say "no" when nonwords appeared inthe nonmatching condition relative to thematching condition, this tendency was not foundto be statistically reliable. These results suggestthen, that in contrast to words, the presentation ofprinted nonwords did not easily invoke a phoneticrepresentation which could be compared to theamplitude envelopes presented auditorily. Hence,the outcome of Experiment 1 lends support to thehypothesis that the bias to say "yes" in thematching condition for words only, regardless oftheir frequency, results from the automaticretrieval of their amplitude envelopes from thelexicon. This process does not appear to beaffected by factors related to the speed ofgenerating a phonetic code.
EXPERIMENT 2One possible criticism of the results of
Experiment 1 is that in the unvoweled Hebreworthography the phonemic structure of printedwords can be retrieved from the mental lexiconfollowing visual access. In contrast, the phonemicstructure of nonwords cannot be determinedunequivocally, since the printed consonants do notspecify how exactly a nonword should be read. Itmight be argued that this caused the differentpattern of bias found for words relatively tononwords. According to this interpretation, theamplitude envelopes of words were not stored assuch in the lexicon, but generated on-line frommore abstract phonologic or phonetic structureswhich were retrieved post-lexically for the words.Because nonwords are not represented in thelexicon, and because the complete phonetic
structure of the nonwords was not specified by theunvoweled print, no bias was obtained fornonwords.
In order to ascertain that this factor did notaffect our previous findings, in Experiment 2 theeffect of bias was measured when the printedstimuli were voweled. By adding the diacriticalvowels marks to the consonants, the Hebreworthography is as shallow as other orthographieswhich have a clear and unequivocal mapping ofspelling-to-sound (e.g., Serbo-Croatian). Themarks convey the full phonemic information thatis necessary to produce a pre-lexical phonologiccode for both words and nonwords. Therefore, theexplicit presentation of vowels eliminates thesuperiority of words over nonwords in regard tophonologic and phonetic processing: Phonologicrecoding ofboth words and nonwords can be easilyand unequivocally occur through a fast pre-lexicalprocess by applying grapheme-to-phoneme correspondence rules, and a phonetic representationthat includes the word's amplitude envelope maybe generated subsequently from the pre-lexicalphonologic representation. If, indeed, an amplitude envelope can be formed on-line from suchpre-lexical representations, then the addition ofvowel marks should produce a bias effect for nonwords as well as for words in the matchingcondition.
MethodSubjects. Twenty-four undergraduate students,
all native speakers of Hebrew, participated in theexperiment for course credit or for payment. Noneof the subjects participated in Experiment 1.
The design, procedure, and apparatus wereidentical to Experiment 1, except that the printedwords and nonwords were voweled by adding theirdiacritical marks.
Results and DiscussionThe bias indices are presented in Table 2. As in
Experiment 1, there was a bias to say "yes" in thematching condition for high- and for low-frequencywords, but no positive bias whatsoever for nonwords. There was no positive bias in the nonmatching condition for words. However, similar tothe pattern obtained in Experiment 1, there was abias to say "no" in the nonmatching condition fornonwords. The b indices were subjected to a twoway ANOVA with the factors of word type and visual presentation. The main effects of word typeand visual presentation were significant(F(2,46)=10.8, MSe=O.5, p<O.001, F(l,23)=20.2,
186 Frost
Average d = 0.28 (n=24)
Table 2. Bias indices and (Standard Error of theMeans) for high-frequency words, low-frequency wordsand nonwords, when matching and nonmatching printis presented simultaneously with masked speech. Printis presented vowekd.
MSe=O.99, p<O.001, respectively). The interactionof word type and visual presentation wassignificant (F(2,46)=3.30, MSe=O.15, p<O.04). ATukey post-hoc analysis revealed that thedifferences in bias between either type of wordsand between the nonwords were significant(p<O.05). The greater bias for a "no" responsefound for nonwords, in the nonmatching conditionrelatively the nonmatching condition, wassignificant as well. The difference in bias betweenthe high- and the low-frequency words was notstatistically reliable.
The results of Experiment 2 suggest that theaddition of vowel marks did not produce an effectof bias to say "yes" in the matching condition fornonwords. It could be pointed out that there was asignificant greater tendency to say "no" whennonwords appeared in the nonmatching conditionrelative to the matching condition. However, evenif the absolute relative difference between the twovisual conditions serves as a measure for theeffect, this difference was almost twice as large forwords than for nonwords, as revealed by thesignificant two-way interaction. Thus, althoughthe vowel marks conveyed an unequivocalphonemic structure for the printed nonwords, andallowed the generation of a phonologicalrepresentation for both words and nonwords, thedifference in bias between words and nonwordsremained unchanged. This suggests that thephonetic representation that includes theamplitude envelope information was availableonly for words to influence the subjects' judgment.The overall similarity in the effects of bias inExperiments 1 and 2 is striking. This outcome
confirms that the bias is independent of the printspelling-to-sound regularity, and providesadditional support for the claim that the effect islexically mediated.
General DiscussionThe present study investigated the source of
readers' ability to detect a correspondence betweena printed word and its amplitude envelope.Experiment 1 revealed that matching print causeda bias to detect speech in a noise amplitude envelope, even in the unvoweled Hebrew orthography.This effect of bias was demonstrated only forwords, whether high- or low-frequency, and not fornonwords. In Experiment 2 we found an identicalpattern of bias when the printed words were voweled, and therefore were phonologically unequivocal. All voweled words produced the effect, but notthe voweled nonwords. Moreover, the overall difference between the matching and the nonmatching conditions was much larger for words than fornonwords.
The bias to perceive speech embedded in amplitude-modulated noise derives from an automaticdetection of correspondence between the printedletter string and the speech envelope related to it.The present study was concemed with how exactly is this correspondence detected. In order tomatch the visual to the auditory information, subjects had to generate from the print the relevantamplitude envelope. We examined whether thiscan be done by simply applying spelling-to-soundconversion rules to assemble a phonologic representation, and by generating the envelope on-linefrom a phonetic structure that is contingent on thephonologic representation derived from the print.The results of both experiments suggest that itcannot. Subjects did not show any bias to detectspeech in the noise in the matching conditionwhen nonwords were presented. Note that Frostet al. (1988) did found a small effect of bias fornonword in the matching condition. This smalleffect for nonwords is reflected in the presentstudy by the greater tendency to say "no" in thenonmatching condition relative to the matchingcondition. This tendency might be related to theoverall lower detectability level obtained in thepresent study. In any event, the absolute difference in bias in the matching relative to the nonmatching condition was much larger for wordsthan for nonwords.
Although a phonetic representation could havebeen easily generated from the printed nonwordswhen they were voweled, the difference in bias
0.00
(0.15)
-0.50
(0.13)
0.84
(0.18)
-0.0
(0.15)
-0.16
0.67
(0.16)
(0.15)
High-Frequency Low-Frequency Nonwords
Match
No Match
Plwnetic Recoding ofPrint and Its Efftct on the Detection ofConcurrent Speech in Amplitude Modulated Noise 187
between words and nonwords remained unchanged. This outcome suggests that the biaseffect is independent of spelling-to-sound regularity. Moreover, the effect seemed unaffected bythe speed of print processing. Since the phoneticrepresentation of low-frequency words is slower togenerate from the print, the strong bias effectfound for low-frequency words relative to highfrequency words suggests that speed of printprocessing is not a crucial determinant ofthe effect. Note that the addition of vowels inHebrew was previously shown to accelerate thephonologic processing of low-frequency wordsmore than for high-frequency words (Koriat,1985). Nevertheless, the bias effect found for lowfrequency words did not increase in the voweledcondition relatively to the unvoweled condition.
The stronger bias obtained for the low-frequencywords might be related to the phonetic features ofthe stimuli employed. The magnitude of the biaseffect depends among other things on thedistinctiveness (or uniqueness) of the amplitudeenvelope, that affects the clarity of correspondencebetween the amplitude envelopes presentedauditorily, and the word depicted by the print. Itis possible that for some low-frequency words thiscorrespondence was exceptionally clear. Recentresults by Frost (submitted) support theconclusion that the bias effect is not affected byword frequency per se. In this study the bias forhigh- and low-frequency phonological alternativesof heterophonic homographs was examined, withan identical signal-to-noise ratio. The resultsdemonstrated very similar bias effects for thehigh. and the low-frequency phonologicalalternatives (0.55 and 0.51, respectively).
Taken together, the results of Experiment 1 and2 suggest that the effect of bias reported in thepresent and in previous studies is lexicallymediated. We assume that the printed wordaddressed a lexical entry which contained, amongother phonologic and phonetic information, theword's amplitude envelope. Thus, the envelopewas retrieved from the mental lexicon. By thisview, a strong effect of bias can be shown only ifthe printed letter string can be related to anexisting lexical entry. Nonwords do not satisfy thisrequirement, and therefore did not produce theeffect to the same extent.
The conclusion that envelopes are stored as lexical representations is supported by a recent studythat examined the influence of lipreading on detection of speech in noise (Repp, Frost, & Zsiga,1991). This study examined the effect of a
visual presentation of a speaker's face on thedetection of words and nonwords in amplitudemodulated noise. The results demonstrated thatan audio-visual match created a strong bias torespond "yes· when the stimuli were words,whereas no bias emerged when the stimuli werenonwords. In contrast to orthographicinformation, there is a natural isomorphismbetween some visible articulatory movements andsome acoustic properties of speech. Thus, therelations of articulatory movements tophonological and phonetic structure arenonarbitrary, and the correspondence betweenarticulatory information and amplitude envelopesmay be perceived without lexical mediation.Nevertheless, subjects did not produce any bias inthe matching condition for nonwords.
The proposal that amplitude envelopes arecontained as holistic acoustic patterns in themental lexicon is consistent with a view thatlexical representations of spoken or printed wordsare not exclusively phonologic. Models of speechperception often assume that the speechprocessing system transforms the physicalacoustic pattern into a more abstract linguisticrepresentation which makes contact with thelexicon during word recognition. Regardless of thenature of this representation (i.e., what specificunit serves for activating a lexical candidate),lexical access is often viewed as a process whichmediates access to more abstract linguisticinformation (e.g., Mehler, 1981; Pisoni & Luce,1987). Our present results seem to suggest thatthe information contained in the lexicon is richer.In the present study, the presentation of a printedword resulted in the retrieval of an acoustictemplate- the word's amplitude envelope- from thelexicon.
At first glance, storing the word's amplitudeenvelope as a holistic pattern might seem to bewithout apparent benefits. The envelopes cannotidentify a specific lexical candidate. However, theydo convey prosodic and segmental information(e.g., speech timing, number of syllables, relativestress, and several major classes of consonantmanner), that might help in selecting a lexicalcandidate among a highly constrained set ofresponse alternatives (Van Tasell et aI., 1987).Thus, the amplitude envelope might serve asadditional information used by the listener inorder to identify spoken words which have severalacoustic realizations, or which their phonemicstructure was not clearly conveyed (cf. Gordon,1988). In these cases, a match between the
188 Frost
perceived amplitude envelope and the storedtemplate might confirm the identity of a lexicalcandidate. Clearly, richer representations do notconstitute a parsimonious storage system.Nevertheless, the advantage of a more complexrepresentational system is that it often allows amore efficient performance of the nativespeaker/listener.
One possible role of amplitude envelopes can besuggested in regard to the psychologicaldistinction between words and nonwords. It isoften assumed that positive lexical decisions givento a letter string or to a spoken phonemic sequence are based on their relation to a semanticrepresentation, whereas negative decisions resultfrom the lack of such connections to the semanticnetwork. In other words, positive and negativedecisions are related to the meaningfulness of thepresented stimuli. The results of the present studysuggest possibly a different type of criterion. Ifwords address stored amplitude envelopes andnonwords do not, fast lexical decisions might bebased, at least in part, on whether the printedletter string invoked a detailed phoneticrepresentation such as the amplitude envelope.According to this interpretation, one factor thatdifferentiates between words and nonwords, andcontributes to the word/nonword differences in thelexical decision task, is the generation of aphonetic code that contains envelope information.This suggestion, however, remains speculativeand deserves further investigation.
The present study has additional relevance toold and recent debates concerning the processingof printed words. Models of printed word recognition are in disagreement concerning the extent ofphonological recoding during visual word recognition. One important controversy relates to the automaticity of phonologic recoding. It is often assumed that phonologic recoding is very slow to develop, and lexical access occurs (with the possibleexception of very infrequent words) directly fromthe visual structure of the printed words to meaning. This view is supported by results demonstrating that spelling-to-sound regularity affects lexicaldecisions only for low-frequency words (e.g.,Seidenberg et at, 1984). In contrast, severalstudies have suggested that phonologic information is available very rapidly as part of visual access to the lexicon (Perfetti et at, 1988; VanOrden et a1. 1988). Perfetti and his colleagueshave shown that the effect of a pseudoword maskon the perception of a target word was reduced ifthere was a phonemic similarity between maskand target (i.e. "made," "mayd"). Using a different
experimental technique, Van Orden (1987); VanOrden et at, (1988) showed that when subjectshad to decide whether a visually presented wordbelonged to a semantic category, they often madeerrors to homophones or pseudohomophones ofcategory instances (i.e., positive responses weregiven to a "rows" in the category of flowers, or"sute" in the category of clothing). These resultswere taken to demonstrate that phonologic recoding occurs automatically and pre-lexically duringlexical access.
The results of the present study support theview that phonetic recoding occurs automaticallyfollowing the presentation ofa printed word. Whatour results teach us is that the processing of aprinted word results not only in a pre-lexicalphonologic representation but also in a very detailed phonetic speech representation, that is lexical, and includes the word's amplitude envelope.This representation is automatically retrievedfrom the lexicon. Note that in this and previousstudies which used the speech detection technique,subjects were not required to respond to theprinted information. Nevertheless, they detectedautomatically the correspondence between the visual stimulus and the speech envelope.
In summary, the present study suggests thatthe presentation of words in the visual or the auditory modalities results in the generation of arich array of orthographic, phonologic, andphonetic representations. One of these representation is the word's amplitude envelope.Because each of these representations may contactthe mental lexicon, auditory illusions can becaused by visual printed information. The bias todetect speech in noise is caused by matching printbecause printed information arouses very detailedspeech codes.
REFERENCESBentin, 5., Bargai, N., &. Katz, L. (1984). Orthographic and
phonemic coding for lexical access: Evidence &om Hebrew.Jounud ofExperimenttli Psychology: Ltrrming Memory & Cognition,10, 353-368.
Coltheart, M. (1978). Lexical access in simple reading tasks. In C.Underwood (Ed), Strategies of injimntltion processing (pp. 151216). London: Academic Press.
Dijkstra. T., Schreuder, R., &. Frauenfelder, U.H. (1989). Graphemecontext effects on phonemic processing. umgwlge lind Speech,32,89-1<»3.
Frauenfe1der, U. H., &. Tyler, L. M. (1987). The process of spokenword recognition: An introduction. Cognition, 25, 1-20.
Frost, R. (submitted). Phonetic recading of phonologicallyambiguous printed words: Evidence &om a speech detectiontask.
Frost, R. (in press). Orthography and phonology: Thepsychological reality of orthographic depth. In M. NOOlllU\, P.Downing, &. S. Lima (Eds.), Literacy lind linguistics. ~amins.
Phonetic Recoding ofPrint and Its Effect on the Detection ofConcurrent Speech in Amplitude Modulated Noise 189
Frost, R., &: Katz, L. (1989). Orthographic depth and theinteraction of visual and auditory processing in wordrecognition. Memory & Cognition, 17, 302-311.
Frost, R., Katz, L, &: Bentin, S. (1987). Strategies for visual wordrecognition and orthographical depth: A multilingualcomparison. Journal of Experimental Psychology: HumanPerception and Perjimrulnce, 13, 104-115.
Frost, R., Repp, B. H., &: Katz, L. (1988). Can speech perception beinfluenced by a simultaneous presentation of print? Journal ofMemory and Language, 27, 741-755.
Glushko, R. J. (1979). The organization of activation oforthographic knowledge in reading aloud. Journlll ofExperimental Psychology: Human Perception lind Performance, 9,674-691.
Gordon, P. C. (1988). Induction of rate-dependent processing bycoarse-grained aspects of speech. Perception &: Psychophysics,43,137-146.
Hanson, V. L. (1981). Processing of written and spoken words:Evidence for common coding. Memory & Cognition, 9, 93-100.
Hillinger, M. L. (1980). Priming effects with phonemically similarwords: the encoding bias hypothesis reconsidered. Memory &Cognition, 8, 115-223.
Jakimik, J, Cole, R. A., &: Rudnicky, A. I. (1980). The influence ofspelling on speech perception. Paper presented lit t~ XXIllnnUilImeeting of the Psychonomic Society. 5t. Louis, Missouri.
Jusczyk, P. W. (1985). Auditory versus phonetic coding of speechsignals dUring infancy. In J. Mehler, M. Garrett, &: E. Walker(Eds.). Perspedives in Mentlll Representlltion: Experimentlll lindThevretkaJ Studies ofCognitive Processes lind CIlpacities. Hillsdale,NJ: Lawrence Erlbaum Associates.
Kirsner, K., Milech, D., &: Standen, P. (1983). Common andmodality-specific processes in the mental lexicon. Memory &Cognition, 11, 621-630.
Klatt, D. H. (1979). Speech perception: A model of acousticphonetic analysis and lexical access. In R. A. Cole (Ed.),Perception lind Production of Fluent Speech. Hillsdale, NJ:Erlbaum.
Klatt, D. H. (1989). Review of selected models of speechperception. In W. Marslen-Wilson (Ed.), LexiCilI representlltionand process. The MIT Press: Cambridge, Massachusetts, London,England.
Konat, A. (1985). Lexical access for low- and for high-frequencywords in Hebrew. Memory & Cognition, 13,37-44.
Luce, R. D. (1963). Response times: Their role in inferring e1ementarymental orgllniz.lltion. New York: Oxford University Press.
Layer, J. K., Pastore, R. E., &: Rettberg, E. (1990). The influence oforthographic information on the identification of an auditoryspeech event. Pilper presented lit t~ 119th IlnnUilI meeting oft~Aroustiall Society ofAmeriCll, PennsylVania.
Mehler, J. (1981). The role of syllables in speech processing: Infantand adult data. Philosophiall TrllnSllctions of the Raylll Society,8295, 333-352.
Perfetti, C. A., Bell, L. C., &: Delaney, S. M. (1988). Automatic(prelexical) phonetic activation in silent word reading:Evidence from backward masking. Journal of Memory lindLangUllge, 27, 59-70.
Pisoni, D. B., &: Luce, P. A. (1987). Acoustic-phoneticrepresentations in word recognition. Cognition, 25, 21-52.
Repp, B. H., Frost, R., &: Zsiga, E. (1991). Lexical mediationbetween sight and sound in speech reading. Haskins LaborlltoriesStatus Report on Speech Resetlrch, SR-107/l08, 243-254.
Schroeder, M. R. (1968). Reference signal for signal qualitystudies. Journal of the AcoustiCilI Society ofAmerial, 43, 1735-1736.
Seidenberg, M. S. (1985). The time course of phonological codeactivation in two writing systems. Cognition, 19, 1-30.
Seidenberg, M.S., Waters, G. 5., Barnes, M. &: Tanenhaus, M. K.(1984). When does irregular spelling or pronunciation influenceword recognition? JOUTnIlI ofVerbal Lerzrning lind Verblll BehatJior,23,383-404.
Tanenhaus, M. K., Flanigan, H. P., &: Seidenberg, M. S. (1980).Orthographic and phonological activation in auditory andvisual word recognition. Memory & Cognition, 8, 513-520.
Van Orden, G. C. (1987). A ROWS is a ROSE: spelling, sound, andreading. Memory & Cognition, 15, 181-198.
Van Orden, G. c., Johnston, J. c., &: Halle, B.L. (1988). Wordidentification in reading proceeds from spelling to sound tomeaning. Journal of Experimentlll Psychology: Learning, Memory,and Cognition, 14, 371-386.
Van Orden, G. c., Pennington, B. F., &: Stone, G. O. (1990). Wordidentification in reading and the promise of subsymbolicpsycholinguistics. Psychological Review, 97, 488-522.
Van Tassel, D. J., Soli, S. D., Kirby, V. M., &: Widin, G. P. (1987).Speech waveform envelope cues for consonant recognition.Journlll oft~ Acoustiazl Society ofAmerial, 82, 1152-1161.
Venezky, R. L. (1970). T~ structure of English ortlwgrllphy. TheHague: Mouton.
Wood, C.c. (1976). Discriminability, response bias, and phonemecategories in discrimination of voice onset time. Journal of theAroustiall Society ofAmeriaz, 60, 1381-1389.
FOOTNOTES"Cognition, in press.tDepartment of Psychology, The Hebrew University, Jerusalem,
Israel.I A demonstration of this form of ambiguity may be portrayed in
English by the following example: The consonantal string "bttr"may be read as "better," "butter," "bitter," or "batter," which aremeaningful words. In addition, many other vowelconfigurations could be added to the consonants to formnonwords. The Hebrew reader is faced with this form ofphonological ambiguity regularly in the unvoweledorthography. The addition of the diacritical marks specifiesuniquely one phonological alternative.
2The discriminability indices obtained in the present experimentswere lower than those obtained by Frost et al. (1988), with acomparable signal-to-noise ratio. This difference may possibly beattributed to differences in the spoken stimuli from which theenvelopes were generated. In the present study the spokenstimuli were recorded by a male speaker, whereas Frost et al.(1988) employed stimuli recorded by a female speaker. Thedetection of speech in amplitude modulated noise is achieved byperceiving local spectral peaks that rise above the flat spectrallevel represented by the masking noise. Such peaks are moresalient with a female speaker because of the higher frequenciesthat are characteristic to female voices.
3Although Frost et al. (1988) showed significant bias effects over awide range of signal-to-nolse ratios, they found reduced biasindices at the lowest ratios. Hence, it is possible that the biasvalues obtained in the present study were lower than thoseobtained by Frost et al. (1988), because of the lower level ofdetection in the present experiment. Note, however, that theinterpretation of the results is unaffected by the overall biasvalues, since it is concerned with the differences in bias betweenthe matching and the nonmatching conditions.
4Throughout this paper the assumption is made that theprocessing of printed low-frequency words is slower than theprocessing of printed high-frequency words. This assumption issupported by the well documented frequency effect in visualword recognition, but was not directly examined in the present
190 Frost
study (subjects were not required to convey their decisions withany time constraints). Decision latencies in the speech detectiontask do not reflect exclusively the speed of processing theprinted words, but also the complexity of processing theauditory stimuli. Hence, the monitoring of reaction times in thistask does not necessarily portray the speed of processing the
printed information. Note, however, that the set of stimuliemployed in Experiment 1 and 2 is a subset of the stimuliexamined by Frost, Katz, and Bentin (1987) in the lexical decisionand the naming tasks. The frequency elfect obtained by Frost etal. (1987) was over 100 ms supporting the assumption that theprocessing of the low-frequency words was indeed slower.