listener experience and perception of voice voice research...journal of speech and hearing research,...

13
Journal of Speech and Hearing Research, Volume 33, 103-115, March 1990 Surprisingly little is known about the cognitive and per- ceptual processes underlying voice discrimination and rec- ognition, despite a long history of active research in this area (see Bricker & Pruzansky, 1976, Hecker, 1971, for review). Most studies of voice perception have focused on stimulus characteristics, rather than on listener behavior. Researchers traditionally have favored designs in which stimulus conditions are varied and differences in listener performance are measured as a function of these variations. Such studies argue that changes in recognition and discrim- ination performance emerge because these stimulus dimen- sions are important for voice perception or recognition. Few studies have examined variations in the perceptual strategies used by different listeners or listener groups to evaluate voice quality. These have failed to detect differ- ences. Murry, Singh, and Sargent (1977) examined the relative perceptual importance of different voice charac- teristics by informally comparing subject weights for highly and moderately experienced clinicians in a multi- dimensional scaling study of pathological voice quality. No apparent group differences were observed. Kempster (1984) also compared subject weights in another multidi- mensional scaling study of abnormal voice quality. She found good agreement among listeners (graduate students in speech-language pathology) on the relative importance of the obtained perceptual characteristics, with 19 of 25 listeners relying most heavily on the first perceptual di- mension. Her listeners did differ in their relative reliance on the second and third dimensions in the study, however. Kreiman (1987; Kreiman & Papcun, 1986) used multidi- mensional scaling to argue that listeners who differed significantly in discrimination accuracy did not differ cor- respondingly in their perceptual strategies. In all these studies, it is possible that differences among listeners in perceptual strategies have failed to emerge simply be- cause the investigators did not use research designs that were sensitive to such differences. For example, it is not clear that the Murry et al. (1977) or Kreiman (1987) listen- ers differed enough in their level of experience for differ- ences in perceptual strategies to occur. A recently-proposed model of long-term memory for voice quality (Papcun, Kreiman, & Davis, 1989) suggests that such differences should exist. This model states that listeners code voice information in terms of a prototype or central category member, and a set of deviations from that prototype. A number of studies using artificial visual stimuli (e.g., Homa, Cross, Cornell, Goldman, & Schwartz, 1973; Posner & Keele, 1968, 1970; see Mervis & Rosch, 1981, for review) have demonstrated that proto- types are built up with repeated exposure to a class of stimuli: Subjects who have seen sets of patterns that vary around prototypical values "recognize" the prototypes (which they have not in fact seen)' with greater certainty than the figures with which they were actually trained. Grieser and Kuhl (1989) recently have demonstrated that infants as young as 6 months have developed auditory prototypes for vowels. Papcun et al. (1989) contend that listeners, by virtue of their life-long experience with voices, have developed central category members for vocal quality and use them when judging or remembering voices. Because prototypes derive from perceptual expe- rience, listeners who differ significantly in experience presumably would differ in perceptual strategy. In this study, we used multidimensional scaling to determine the characteristics of dysphonic and normal voices that are perceptually important for listeners with and without clinical training. We hypothesized that the perceptual dimensions used by naive listeners to evalu- ate vocal qualities would differ significantly from those used by listeners with extensive training in the clinical evaluation of voices. METHOD Voice Samples The voices of 18 male speakers with voice disorders were selected from a library of 67 audio recordings. © 1990, American Speech-Language-Hearing Association LISTENER EXPERIENCE AND PERCEPTION OF VOICE QUALITY JODY KREIMAN BRUCE R. GERRATT KRISTIN PRECODA VA Medical Center, West Los Angeles, and UCLA School of Medicine Five speech-language clinicians and 5 naive listeners rated the similarity of pairs of normal and dysphonic voices. Multidimen- sional scaling was used to determine the voice characteristics that were perceptually important for each voice set and listener group. Solution spaces were compared to determine if clinical experience affects perceptual strategies. Naive and expert listeners attended to different aspects of voice quality when judging the similarity of voices, for both normal and pathological voices. All naive listeners used similar perceptual strategies; however, individual clinicians differed substantially in the parameters they considered important when judging similarity. These differences were large enough to suggest that care must be taken when using data averaged across clinicians, because averaging obscures important aspects of an individual's perceptual behavior. KEY WORDS: voice, vocal quality, perception of voices, listener perceptions 103 0022-4685/90/3301-0 103$01.00/0 Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman o Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

Upload: others

Post on 31-Jan-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Speech and Hearing Research, Volume 33, 103-115, March 1990

Surprisingly little is known about the cognitive and per-ceptual processes underlying voice discrimination and rec-ognition, despite a long history of active research in thisarea (see Bricker & Pruzansky, 1976, Hecker, 1971, forreview). Most studies of voice perception have focused onstimulus characteristics, rather than on listener behavior.Researchers traditionally have favored designs in whichstimulus conditions are varied and differences in listenerperformance are measured as a function of these variations.Such studies argue that changes in recognition and discrim-ination performance emerge because these stimulus dimen-sions are important for voice perception or recognition.

Few studies have examined variations in the perceptualstrategies used by different listeners or listener groups toevaluate voice quality. These have failed to detect differ-ences. Murry, Singh, and Sargent (1977) examined therelative perceptual importance of different voice charac-teristics by informally comparing subject weights forhighly and moderately experienced clinicians in a multi-dimensional scaling study of pathological voice quality.No apparent group differences were observed. Kempster(1984) also compared subject weights in another multidi-mensional scaling study of abnormal voice quality. Shefound good agreement among listeners (graduate studentsin speech-language pathology) on the relative importanceof the obtained perceptual characteristics, with 19 of 25listeners relying most heavily on the first perceptual di-mension. Her listeners did differ in their relative relianceon the second and third dimensions in the study, however.Kreiman (1987; Kreiman & Papcun, 1986) used multidi-mensional scaling to argue that listeners who differedsignificantly in discrimination accuracy did not differ cor-respondingly in their perceptual strategies. In all thesestudies, it is possible that differences among listeners inperceptual strategies have failed to emerge simply be-cause the investigators did not use research designs thatwere sensitive to such differences. For example, it is notclear that the Murry et al. (1977) or Kreiman (1987) listen-ers differed enough in their level of experience for differ-ences in perceptual strategies to occur.

A recently-proposed model of long-term memory forvoice quality (Papcun, Kreiman, & Davis, 1989) suggeststhat such differences should exist. This model states thatlisteners code voice information in terms of a prototype orcentral category member, and a set of deviations from thatprototype. A number of studies using artificial visualstimuli (e.g., Homa, Cross, Cornell, Goldman, &Schwartz, 1973; Posner & Keele, 1968, 1970; see Mervis& Rosch, 1981, for review) have demonstrated that proto-types are built up with repeated exposure to a class ofstimuli: Subjects who have seen sets of patterns that varyaround prototypical values "recognize" the prototypes(which they have not in fact seen)' with greater certaintythan the figures with which they were actually trained.Grieser and Kuhl (1989) recently have demonstrated thatinfants as young as 6 months have developed auditoryprototypes for vowels. Papcun et al. (1989) contend thatlisteners, by virtue of their life-long experience withvoices, have developed central category members forvocal quality and use them when judging or rememberingvoices. Because prototypes derive from perceptual expe-rience, listeners who differ significantly in experiencepresumably would differ in perceptual strategy.

In this study, we used multidimensional scaling todetermine the characteristics of dysphonic and normalvoices that are perceptually important for listeners withand without clinical training. We hypothesized that theperceptual dimensions used by naive listeners to evalu-ate vocal qualities would differ significantly from thoseused by listeners with extensive training in the clinicalevaluation of voices.

METHOD

Voice Samples

The voices of 18 male speakers with voice disorderswere selected from a library of 67 audio recordings.

© 1990, American Speech-Language-Hearing Association

LISTENER EXPERIENCE AND PERCEPTION OF VOICEQUALITY

JODY KREIMAN BRUCE R. GERRATT KRISTIN PRECODAVA Medical Center, West Los Angeles, and UCLA School of Medicine

Five speech-language clinicians and 5 naive listeners rated the similarity of pairs of normal and dysphonic voices. Multidimen-sional scaling was used to determine the voice characteristics that were perceptually important for each voice set and listener group.Solution spaces were compared to determine if clinical experience affects perceptual strategies. Naive and expert listeners attendedto different aspects of voice quality when judging the similarity of voices, for both normal and pathological voices. All naive listenersused similar perceptual strategies; however, individual clinicians differed substantially in the parameters they considered importantwhen judging similarity. These differences were large enough to suggest that care must be taken when using data averaged acrossclinicians, because averaging obscures important aspects of an individual's perceptual behavior.

KEY WORDS: voice, vocal quality, perception of voices, listener perceptions

103 0022-4685/90/3301-0 103$01.00/0

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

104 Journal of Speech and Hearing Research

Because we were interested in comparing listenergroups, rather than in the specific perceptual qualities ofthe stimulus voices per se, voice selection was random,although mildly and severely disordered vocal qualities(as judged by the second author) were approximatelyequally represented. Samples of 18 normal male voiceswere also selected at random from a similar library ofsamples. No attempt was made to match normal andpathological speakers. All speakers were originally re-corded using a Bruel and Kjaer condenser microphoneand a reel-to-reel tape recorder (Revox B77, model MKII). Speakers were asked to sustain the vowel l/a/ as longas possible at a conversational level of pitch and loud-ness. Only native speakers of American English served asspeakers.

Voice samples were lowpass filtered using two 4-poleButterworth filters with cutoff frequencies of 6300 Hz,and two with cutoff frequencies of 7500 Hz, for a totalreduction in amplitude of 3.2 dB at 5.6 kHz and 39.4 dB at9 kHz. They were then sampled at 17.8 kHz using a 16-bitA/D converter. A 1.67-second sample was taken from themiddle portion of each speaker's /a/. These digitizedsegments were normalized for peak voltage, and onsetsand offsets were multiplied by a 10-ms ramp to eliminateclick artifacts. Stimuli were then output through a 16-bitD/A converter using the same antialiasing filters. Anexperimental tape was constructed for each set of voices(normal and disordered). Each tape included both orders(AB and BA) of all possible pairs of the 18 voices, for atotal of 306 trials per voice set. Voice samples within apair were separated by 1 second; pairs were separated by6 seconds. All listeners heard the voice pairs in the samerandom order; again, because our primary interest was incomparing listener groups, and because both orders of allpairs were used, it was judged unnecessary to re-ran-domize stimuli for each listener. Each voice pair waspreceded on the experimental tape by its consecutivenumber.

Acoustic Measures and Perceptual Ratings

Both time- and frequency-domain measurements weremade on each voice sample, for use in interpreting thederived perceptual dimensions. These measures are rou-tinely made on voice samples recorded in our laboratoryand together generally provide a good description ofvoices (see, e.g., Baken, 1987). The fundamental fre-quency (Fo) and the frequencies of the first three formants(F1, F2, and F3) were measured from spectrographicdisplays (Kay Elemetrics Model 5500). Fo was measuredfrom narrow-band displays with a frequency range of 0-1kHz; the center frequencies of the three clearest harmon-ics were measured to ensure accuracy. Formants weremeasured with reference to both narrow- and wide-band

'The voice of 1 pathological speaker was clearly diplophonic.Thus, only formant measurements and perceptual ratings wereavailable for him.

displays (with a frequency range of 0-4 kHz), and todisplays of line spectra of the vowels. Measurementswere taken from sections of the display where the for-mants appeared most steady and level.

For jitter and shimmer measurements, a point on eachwaveform cycle that could be identified reliably fromcycle to cycle was selected by hand. Measurements ofmean jitter, standard deviation of jitter, percent jitter,directional jitter, and the coefficient of variation for jitterwere then calculated using parabolic interpolation whenthe point marked was a peak and linear interpolationwhen the zero crossing was marked (Titze, Horii, &Scherer, 1987). Analogous shimmer measures were alsocalculated, using the difference in dB between the high-est and lowest points in each marked cycle as the ampli-tude (except that percent shimmer was not calculated,because these measures were already normalized by theuse of dB).

Several additional acoustic measures were also ob-tained. The natural logarithm of the standard deviation ofthe period lengths (LNSD; see Wolfe & Steinfatt, 1987)was calculated for each voice sample, as was LNSDdivided by the mean of the period lengths. The harmon-ics-to-noise ratio (HTN) was calculated as described byYumoto, Gould, and Baer (1982), and the ratio of theamplitude of the first to the second harmonic (H1/H2;Bickley, 1982; Ladefoged, 1981) and the number of visi-ble harmonics were calculated using a smoothed linearmagnitude spectrum. Finally, an unnamed algorithm de-scribed by Ladefoged, Maddieson, and Jackson (1988),which we will call a "partial period comparison," wasused to calculate the "measured roughness" of the voicesamples. This algorithm is a time-domain comparison ofthe standard deviations of differences between movingvectors (i.e., portions of the acoustic signal, each about 0.6times the estimated period in length). In order to gener-ate this measure over a long segment of phonation whilelimiting computational time, our analysis differed some-what from that reported previously. We applied the algo-rithm to a sample approximately three glottal cycles long,skipped the next two cycles, applied it to the next threecycles, and so on for the duration of the vowel sample.The mean and standard deviation of the indices gener-ated for the entire voice sample constitute our roughnessmeasures.2

In addition to these acoustic measures, ratings of thepathological voices gathered 1 year prior to the presentstudy were used to help interpret the derived perceptualdimensions described below. Note that these ratingswere used only for interpretation, and did not affect thestructure of the perceptual spaces in any way. Fiveclinicians (four of whom participated in this experiment)

'Several parameters were set differently from those describedby Ladefoged, Maddieson, & Jackson (1988). The vector lengthwas 0.6 times the estimated cycle length; the searching windowwas 2.3 times the estimated cycle length; the reference vectormoved in 1-ms steps; and the comparison vectors moved by onepoint at a time.

33 103--15 March 1990

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

KREIMAN ET AL.: Voice Quality Perception 105

rated the pathological voice samples (along with 49 othervoices) for their breathiness, roughness, instability andoverall abnormality. Listeners heard the voice samplesindividually over a loud speaker in a sound-treated booth,and rated them using a visual-analog scale by marking apoint along a 7-inch line to indicate the extent to whicheach voice possessed a given characteristic. Ratings fromthe five clinicians were averaged to produce a singlevalue for each scale for each voice.

Listeners

Two groups of five listeners participated in the twolistening tests. Each group included 2 females and 3males. The first group (expert listeners) consisted of threespeech pathologists and two otolaryngologists, each witha minimum of 4 years' experience evaluating and treatingvoice disorders. The second group (naive listeners) in-cluded five listeners with no training in linguistics, audi-ology, or speech pathology, and with no previous formalexposure to pathological voices.

Listening Tasks

Each listener participated in two experimental ses-sions, one for each voice set. Sessions were held onseparate days; order of presentation of the voice sets wascounterbalanced across listeners. Listeners were testedindividually in a sound-treated room. Stimuli were pre-sented at a constant level (approximately 80 dB SPL) infree field over two loudspeakers equidistant from thelistener.

At each session, listeners were first instructed that wewere interested in how they as individuals judged thesimilarity of each voice pair. They were asked to listencarefully to each pair, and to rate the similarity of thevoices on a 7-point equal-appearing-interval scale rang-ing from exact same (1) to most different (7). Listenersthen heard four practice pairs of voices (normal or disor-dered, as appropriate, but not drawn from the stimulusset). Finally, they heard all 18 voice samples in randomorder, to familiarize them with the range of stimuli theywould encounter at that session. Listeners then heard theexperimental tape and rated the voice pairs. They weregiven a break half-way through the task. Each test sessionlasted approximately 1 1/2 hours.

RESULTS

Reliability of Individual Similarity Ratings

Because the experimental task required listeners toconcentrate for extended intervals, individual similaritymatrices were examined to ensure that the ratings of thetwo presentation orders of each voice pair did not differsystematically. Each instance where a single listener's

ratings for the AB and BA orders of a pair of voicesdiffered by a scale value of three or more was noted, andthe total pattern of such asymmetrical ratings across alllisteners was examined. Listeners showed no tendency toagree on the pairs to rate asymmetrically: across the fourexperimental conditions, only 28 voice pairs attractedmore than a single asymmetrical rating (25 voice pairswere asymmetrically rated by two listeners, three bythree listeners, and none by more than 3 listeners).Further, asymmetries did not cluster in any particularportion of the experimental tape. The experimental tapewas divided into thirds, and the frequency of asymmetri-cal ratings for AB/BA combinations occurring in differentparts of the tape (e.g., both AB and BA orders in the firstthird of the tape, AB in the second third and BA in the lastthird, etc.) was determined. No significant difference inrates of asymmetrical ratings was observed when theseportions of the test tape were compared ( 2 = 3.01, df =5, p > .05). Ratings asymmetries thus did not seem torepresent practice or fatigue effects. Therefore, all differ-ences in ratings were treated as noise in the data, andsimilarity matrices were symmetrized by averaging acrossthe diagonal in all subsequent analyses.

Multidimensional Scaling Analyses

Separate nonmetric multidimensional scaling solutionsin 2-6 dimensions were found for each listener group andvoice set using the individual differences model of SASPROC ALSCAL (SAS Institute, Inc., 1983; Schiffman,Reynolds, & Young, 1981). R2 values, which indicate howmuch variance in the original similarity ratings is ac-counted for by each solution, are shown in Figure 1.Based on the location of elbows in these plots, shown byarrows in the figure, and on interpretability, solutionswere selected as indicated. 3 The solutions selected fit thedata quite well, accounting for an average of 76.5% of the

3 Solution selection for the pathological voices was straightfor-ward: Both stress and R2 values clearly pointed to the samesolutions and these solutions had reasonable interpretations.Solution selection was more complicated for the normal voiceset. The R2 curve for the expert group had elbows at 3 and 5dimensions, but stress values suggested the 3-dimensional solu-tion was the correct one. For naive listeners, it was not clearwhether to select the 2- or the 5-dimensional solution. Bothstress and R2 values seemingly pointed to the 5-dimensionalsolution, which did account for 9% more variance in similarityratings than did the 2-dimensional solution. However, the 2- and3-dimensional solutions differed by less than 1% in their fit to thedata. Further, the correlation between the unscaled similarityratings and Fo was .76, suggesting that approximately half thevariance in similarity ratings could be accounted for by a singledimension, and thus, that there was a true elbow at 2 dimensionsfor these data. Finally, when the 5-dimensional solution wasinterpreted, dimensions were found to relate to the same acous-tic parameters as did dimensions in the 2-dimensional solution.Because the extra dimensions did not provide any additionalinformation about parameters underlying perceived similarity,and because the three extra dimensions added only 9% to thevariance accounted for, the 2-dimensional solution was ulti-mately chosen for these data.

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

106 Journal of Speech and Hearing Research

Pathological Voices variance in the underlying similarity ratings. Note that R2

values are higher for the naive listeners for both voicesets. We will return to this point below.

Naive

.7'

(0.17)

Normal VoicescYCr

Experts

.9

.8

.7

Naive

) (018)(0.21)

Number of DimensionsFIGURE 1. Values of R2 for the four multidimensional scalinganalyses. Arrows indicate elbows in the curves. Stress values forthe selected solutions are given in parentheses.

Perceptual Spaces for the Pathological Voice Set

ALSCAL calculates coordinates for each stimulus voiceon each dimension, such that voices that are perceptuallysimilar are close together in the space described by thecoordinates. Both the expert and naive listener groupspaces were interpreted by examining the correlation ofthese stimulus coordinates with the acoustic measuresand descriptive scale ratings described above. Becausemany of the variables used to interpret the dimensionsare themselves intercorrelated, multiple regression wasused to determine which measures explained uniqueparts of the variance on each dimension. Results for thepathological voice solutions are given in Table 1; com-plete intercorrelation matrices are included as AppendixA. Table 1 also includes the values of R2 for eachdimension, which indicate the average importance of adimension in the perceptual space. R2 values for individ-ual dimensions sum to the value for the entire space,which represents the amount of variance in the underly-ing similarity ratings that is accounted for by the solutionas a whole.

As Table 1 shows, expert and naive listeners differedboth in the stimulus characteristics on which they reliedwhen making their similarity judgments, and in the rela-tive importance of the dimensions they shared. For thepathological voices, the first dimension (D1) in the expertspace was correlated with F, rated breathiness, andH1/H2; multiple regression indicates that all three ex-plained independent parts of the variance on this dimen-sion (multiple R = .94). The second dimension (D2) wascorrelated with various shimmer measures, with ratedroughness and abnormality, and with measured rough-ness (partial period comparison); measured roughnessprovided the best interpretation (R = .82). The thirddimension (D3) was significantly correlated only with Fo(R = .69). Note that on the average the expert listeners

TABLE 1. Multiple regression results, pathological voices.

Multiple Stimulus StandardizedDimension R characteristics coefficient (B)

Expert listeners1 (R2 = .27) .94 Fo -.575

H1/H2 -. 394Rated breathiness -. 298

2 (R2 = .25) .82 Measured roughness3 (R2 = .23) .69 F0

Naive listeners1 (R2 = .35) .92 F02 (R2 = .21) .65 Rated abnormality3 (R2 = .14) .55 Directional jitter4 (R2 = .12) .83 H1/H2 -.580

Rated roughness .404

Note: Only coefficients significant at p .05 are listed.

Experts

Ii i · 6 w .

w

-rF2 4 6

33 103-115 March 1990

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

KREIMAN ET AL.: Voice Quality Perception 107

weighed these dimensions almost equally: the R2 valuesin Table 1 indicate that each explains about the sameamount of variance in the similarity ratings.

For naive listeners, D1 in the space for pathologicalvoices was correlated only with Fo (R = .92). D2 wascorrelated with rated abnormality and rated breathiness;abnormality provided the best interpretation (R = .65).D3 was correlated with directional jitter (R = .51); andthe fourth dimension (D4) was significantly correlatedwith both rated roughness and H1/H2, each of whichcontributed significantly to the explained variance (mul-tiple R = .83). Unlike the expert listeners, who weighedtheir dimensions roughly equally, naive listeners as agroup relied primarily on the first dimension (F), whichaccounted for nearly half the variance explained by thetotal solution (35% out of 83%).

Further correlations compared stimulus coordinates inthe expert and naive listeners' perceptual spaces (Table2). All three expert dimensions were highly correlatedwith dimensions in the naive listeners' space. The firstexpert dimension (Fo + breathiness + H1/H2) was re-lated most strongly to D4 (H1/H2 + roughness) in thenaive space, supporting a "breathy" interpretation forboth. The experts' D2 (correlated with measured rough-ness) was significantly related to D2 (rated abnormality)in the naive space. Expert D3 (Fo) was related to naiveD1 (Fo). Note that naive D3 (jitter) was not significantlyrelated to any dimension in the expert space, and evi-dently represented a perceptual dimension unique tonaive listeners in this study.

To summarize, experts' ratings of the similarity of pairsof pathological voices may be explained primarily interms of breathiness (measured by H1/H2 and by previ-ously obtained voice quality ratings), measured and ratedroughness, and fundamental frequency. These dimen-sions were weighed approximately equally, and togetheraccounted for 75.3% of the variance in the original simi-larity ratings. Naive listeners judged the similarity ofpathological voices primarily in terms of differences inFo0, although rated abnormality, jitter, and roughness/breathiness (H1/H2) also played a role. The four dimen-sions together accounted for 82.3% of the variance in thesimilarity ratings. Expert listeners paid more attention tobreathiness and roughness than did naive listeners; naivelisteners relied more on Fo than did the experts.

Perceptual Spaces for the Normal Voice Set

For the normal voice set, the three-dimensional solu-

TABLE 2. Correlations between dimensions for pathologicalvoices.

Expert dimensionNaive dimension 1 2 3

1 -.55 .01 -.822 .41 .75 .063 .03 -.45 .294 .76 -.36 -. 42

tion for the expert listeners accounted for 71.6% of thevariance in the similarity ratings. Significant correlationsbetween stimulus coordinates and voice characteristicsare given in Table 3, along with the variance explained byeach dimension. (The complete intercorrelation matrix isincluded as Appendix B.) Multiple regression was againused to eliminate redundancies in these correlations, butdetailed results for expert listeners will not be presentedbecause in each case a single parameter provided the bestinterpretation of the dimension.

The first dimension in the expert space was not signif-icantly correlated with any rated or measured voice char-acteristics. D2 was correlated with various jitter andshimmer measures, with shimmer standard deviationproviding the best interpretation4 (R = -. 83). D3 wassignificantly correlated with Fo, directional jitter, and thenatural logarithm of the standard deviation of the periods;only Fo contributed uniquely to the variance on thisdimension (R = -.94). As in the pathological voice space,on the average experts weighed each dimension in thissolution roughly equally.

The two-dimensional solution for naive listeners andnormal voices accounted for 77% of the variance in thisset of similarity ratings. As for the pathological voices,naive listeners relied primarily on vocal pitch whenjudging the similarity of normal voices. The first dimen-sion accounted for most of the variance in the solutionspace (56.5% out of 77%), and was highly correlated withFo (R = -. 98). The dimension was also significantlycorrelated with a variety of jitter and shimmer measures.D2 was correlated with shimmer and formant frequen-cies; a combination of F2/F1 and shimmer covarianceprovided the best interpretation (multiple R = .73).

Additional correlations again tested the equivalence ofthe expert and naive listeners' perceptual spaces. Resultsare given in Table 4. D1 (uninterpreted) in the experts'space was significantly related to D2 (resonances + shim-

4Note that the correlations between coordinates and severaljitter/shimmer measurements are roughly equal. Although ineach case we have selected the one variable best correlated withthese dimensions, they are perhaps best thought of as general-ized "jitter/shimmer" dimensions (although as shown by D2 inthe expert pathological voice space, it is possible to have onewithout the other).

TABLE 3. Multiple regression results, normal voices.

Multiple Stimulus StandardizedDimension R characteristic coefcient (B)

Expertlisteners

1 (R2 = .26) -2 (R2 = .23) .84 Shimmer SD3 (R2

= .23) .94 F oNaive listeners

1 (R2 = .57) .98 Fo2 (R2 = .21) .73 Shimmer covariance .492

F2/F1 .484

Note: Only coefficients significant at p c .05 are listed.

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

108 Journal of Speech and Hearing Research

TABLE 4. Correlations between dimensions for normal voices.

Expert dimensionNaive dimension 1 2 3

1 0.38 -0.48 0.932 -0.66 -0.45 0.23

mer) in the naive listeners' space. Expert D2 (shimmer)was only weakly related to both D1 and D2 (resonances +shimmer) in the naive space, and may represent a dimen-sion unique to experts. Expert D3 (Fo) was well-corre-lated with D1 (Fo) in the naive space.

Thus, for normal voices, as for pathological voices,expert and naive listeners differed both in the particularaspects of voice quality to which they attended whenmaking their similarity judgments, and in the relativeimportance of common dimensions. Fo and shimmerappeared in both solutions; F was again much moreimportant for naive listeners than for experts, and shim-mer was more important to experts than to naive listeners.Naive listeners also relied on formant frequencies/vowelquality information, which experts largely ignored: Al-though expert D1 was significantly related to naive D2(which was correlated with both F2/F1 and F3), thecorrelation between expert D1 and F2/F1 was only -. 08(p > .05), and the correlation with F3 was .38 (p > .05).5

Differences Among Individual Listeners

In addition to calculating the overall importance ofeach dimension in a group perceptual space, ALSCALcalculates the importance of each dimension to eachindividual subject and outputs a set of weights for eachsubject on each dimension. These weights, which reflectthe relative importance of a dimension to an individuallistener, are given in Tables 5 and 6.

These tables show that expert listeners differed signif-icantly in which dimension they weighed most heavily:For both pathological and normal voices, each dimensionwas both most and least salient to some listener, and nosignificant agreement on perceptual importance wasfound (Kendall's Coefficient of Concordance: for thepathological voices = .04, S = 2, p > .05; for the normalvoices = .04, S = 2, p > .05).

In contrast, naive listeners differed in precisely howheavily they weighed each dimension. However, theyagreed significantly on the relative order of importance ofthe dimensions, with the first dimension most importantfor every listener, the second dimension usually second

5This apparent contradiction-one dimension significantlycorrelated with another without being significantly correlatedwith the interpretation of that dimension-may be explained bythe relatively low values of both correlations. Note that onlyabout 53% of the variance on naive D2 is explained by thecombination of resonances and shimmer. The correlation be-tween expert D1 and naive D2 (which is only -. 66) may involvethe "left over" 47% of variance on this dimension.

TABLE 5. Relative importance of individual dimensions to expertlisteners.

Weight on dimensiona

Voice set Listener 1 2 3

Pathological 1 .600 .485 .4952 .471 .587 .2923 .467 .501 .5464 .602 .445 .4755 .437 .469 .554

Normal 1 .646 .372 .4752 .547 .481 .3493 .389 .551 .5004 .602 .493 .3045 .299 .470 .661

aSquared weights sum to R2 for the individual subject.

most important, and so on, for both voice sets. (Kendall'sCoefficient of Concordance: for the pathological voices =.712, S = 89, p < .05. No statistic was calculated for thenormal voice set because of the small number of degreesof freedom.)

DISCUSSION

Table 7 summarizes the multidimensional scaling so-lutions found for naive and expert listeners and forpathological and normal voices. As suggested by proto-type models (Papcun et al., 1989), a listener's backgroundaffects the perceptual strategy used when judging pairs ofvoices. Naive listeners differed from experts both in thedimensions that emerged, and in the relative salience ofthose dimensions the two groups shared. These resultssuggest that, because a listener's background and experi-ence affect perceptual strategy, models of voice percep-tion should incorporate the notion "population of listen-ers." A similar phenomenon has been reported byTerbeek (1977), whose multidimensional scaling study ofvowel quality perception revealed systematic differencesin perceptual spaces for listeners whose native languagesdiffered in vowel inventories.

Recall that the scaling solutions found for naive listen-ers accounted for a greater portion of the variance in

TABLE 6. Relative importance of individual dimensions to naivelisteners.

Weight on dimension"Voice set Listener 1 2 3 4

Pathological 1 .483 .457 .406 .3992 .614 .471 .371 .2383 .559 .457 .465 .3544 .695 .558 .275 .2085 .591 .313 .342 .466

Normal 1 .697 .4722 .787 .3113 .634 .5674 .937 .1685 .662 .598

aSquared weights sum to R2 for the individual subject.

33 103-115 March 1990

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

KREIMAN ET AL.: Voice Quality Perception 109

TABLE 7. Summary of multidimensional scaling results.

GroupVoice set Expert listeners Naive listeners

Pathological DI: F0 + Breathiness + Di: F0H1/H2

D2: Roughness (+ shimmer) D2: Rated AbnormalityD3: F0 D3: Rated Roughness + H1/H2

Normal Di: - DI: FoD2: Jitter/Shimmer D2: Resonances + Shimmer

(+ Breathiness)D3: F0

similarity ratings than did the expert listeners' solutions.This finding is consistent with our view that voice per-ception involves both "features," which are useful acrossvoice pairs, and idiosyncratic details of vocal quality,which may be useful when comparing one pair of voices,but not another (see, e.g., Kreiman & Papcun, 1988; VanLancker & Kreiman, 1987). Naive listeners in this studyseem to have relied heavily on features-they apparentlyapplied a rather inflexible perceptual strategy in roughlythe same fashion across voice pairs, so their similarityjudgments could be very well accounted for with only afew perceptual dimensions. Experts, on the other hand,may have varied their strategies somewhat, depending onthe characteristics of the voice pair in question. Clini-cians' training and experience allowed them to access alarger range of information than naive listeners were ableto use. This sort of flexible strategy is not easy to summa-rize in a few features of the sort extracted by multidimen-sional scaling. To the extent that clinicians are able toadjust their perceptual strategies to the demands of agiven task, their R2 values would be expected to be lowerthan those of naive listeners.

Individual naive listeners showed the same pattern ofsubject weights on their perceptual dimensions: Theyagreed about which voice quality dimensions were mostimportant, for both pathological and normal voices. Ex-pert listeners, on the other hand, did not agree about therelative importance of different aspects of voice quality.These data suggest that clinical training and experiencecause listeners to differ more, not less, in how theyperceive voice quality, at least in tasks that involveunstructured similarity judgments. The differences be-tween clinicians were large enough to suggest that aver-aging data across subjects may produce misleading re-sults and obscure important aspects of an individualsubject's perceptual behavior. Recall that examination ofexpert group data gave the impression that each dimen-sion is equally important to the similarity of these voices.Subject weights show that this is not the case. Rather, thesubstantial differences among individual clinicians areaveraged away when group data are considered. Caremust therefore be taken when using data averaged acrossclinicians.

These differences in clinicians' perceptual strategieshave several possible explanations, including differencesin training (e.g., speech pathology vs. otolaryngology) andin the populations of patients most frequently treated

(e.g., cancer patients, stroke patients, patients with neu-rological disorders). Although it is not possible to elimi-nate either explanation based on our sample of fiveexperts, both sources of variability are probably impli-cated in our results. It is also possible that clinicians havedeveloped more than one prototype for different sorts ofpathological voices. More detailed analyses of individualdata from a larger set of clinicians will help answer thesequestions.

The finding that clinicians differ significantly in theirperceptual strategies apparently contradicts a large num-ber of studies reporting good agreement among experts inrating scale tasks (e.g., Bassich & Ludlow, 1986; Darley,Aronson, & Brown, 1969; Kruel & Hecker, 1971). Suchstudies examined the extent to which clinicians can bemade to agree, via task-specific training and restriction ofthe rating task to a few dimensions (e.g., breathiness,hoarseness). In contrast, our purpose was to determinehow listeners vary when left to structure the perceptualtask for themselves. Listeners in the present study werefree to attend to whatever voice characteristics seemedrelevant for a given stimulus pair, and were not forced torestrict their judgments to a single feature that mightprovide little information about that pair. Recent studiessuggest voice quality information is normally processedin a very flexible fashion, with individual voice featuresattended to or ignored, as appropriate for a given voice orvoice pair (see Van Lancker, Kreiman, & Cummings, inpress; Van Lancker, Kreiman, & Emmorey, 1985; VanLancker, Kreiman, & Wickens, 1985). Because the dimen-sions along which listeners judged the voices were notspecified in advance, and because clinicians rarely attendperceptual training sessions like those used in structuredrating scale tasks in the course of their everyday clinicalwork, the present task may more nearly approximate thesort of perceptual judgment listeners make every day.Structured rating scale tasks may well force listeners tobehave in ways that are not consistent with normalperceptual processing, and we question the externalvalidity of conclusions such studies draw regarding theevaluation of vocal quality.

Our results also suggest that those parameters whichemerge as perceptually salient from a multidimensionalscaling analysis depend in part on the populations ofvoices under study, at least for naive listeners. Expertsevidently used roughly the same parameters (Fo, breathi-ness, roughness and jitter/shimmer) for both voice sets.

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

110 Journal of Speech and Hearing Research

TABLE 8. Summary of previous multidimensional scaling studies of voice quality perception.

Varianceaccounted

Study Speakers Listeners Dimensions for

Normal voicesMurry & Singh (1980) 20 male 10 Fo 54%

naive ResonancesNasalityF2

20 female 10 Fo 42%naive Breathiness

F2/F1Effortlhoarseness

Matsumoto, Hiki, Sone, 8 male 6 Fo not given& Nimura (1973) naive Glottal source

spectrumF1 + F2

Pathological voicesMurry, Singh, & 20 male 16 Periodicity 48%Sargent (1977) expert +/- tumor

Volume velocityFoUninterpreted

Kempster (1984) 30 female 25 Intensity + 60%expert HTN

FoPerturbation

Naive listeners relied primarily on Fo for both voice sets,but also attended to abnormality and breathiness for thepathological voices (but not for normal voices), and toresonance information for normal voices (but not for thepathological set). However, the large differences betweenresults of other multidimensional scaling studies in theliterature suggest that within-population differences areas great as those between different populations of voices,and thus that the concept of populations of voices may notbe justified. Table 8 summarizes the previous studies(Kempster, 1984; Matsumoto, Hiki, Sone, & Nimura,1973; Murry & Singh, 1980; Murry et al., 1977) usingsteady-state vowel stimuli to examine the perception ofnormal and pathological voice quality. Some of the dif-ferences shown in this table are attributable to method-ological variations; for example, stimuli were equated foramplitude by Murry et al. (1977) but not by Kempster(1984). Nevertheless, only a single parameter-Fo--iscommon to all solutions. Murry and Singh (1980) sug-gested, "... besides the Fo/pitch measure, there is nocommon set of acoustic parameters for judging voicesapplicable to both sexes and phonation conditions [i.e.,for both vowel and phrase stimuli]" (p. 1300). Our findingthat groups differed in their perceptions of the samevoices argues further that, apart from F, a set of param-eters that is common to different populations of listenersjudging the same voices may not exist. Perceptual heter-ogeneity seems to be the rule in voice quality evalua-tions.

ACKNOWLEDGMENTS

This research was supported by NINCDS award NS20707, bya NINCDS post-doctoral traineeship to the first author

(NS07059), and by Veterans Administration Rehabilitation Re-search and Development grant C468-R. We thank Gerald Berke,David Hanson, Jean Holle, and Jill Zweier for serving as sub-jects. We also thank Peter Ladefoged and the UCLA PhoneticsLaboratory for use of their digital spectrograph.

REFERENCES

BAKEN, R. J. (1987). Clinical measurement of speech and voice.Boston: College-Hill.

BASSICH, C. J., & LUDLOw, C. L. (1986). The use of perceptualmethods by new clinicians for assessing voice quality. Journalof Speech and Hearing Disorders, 51, 125-133.

BICKLEY, C. (1982). Acoustic analysis and perception of breathyvowels. M.I.T., R.L.E. Speech Communications Group: Work-ing Papers, 1, 71-82.

BRICKER, P., & PRUZANSKY, S. (1976). Speaker recognition. InN. J. Lass (Ed.), Contemporary issues in experimental phonet-ics (pp. 295-326). New York: Academic Press.

DARLEY, F. L., ARONSON, A. E., & BROWN, J. R. (1969). Differ-ential diagnostic patterns of dysarthria. Journal of Speech andHearing Research, 12, 246-269.

GRIESER, D., & KUHL, P. (1989). Categorization of speech byinfants: Support for speech-sound prototypes. DevelopmentalPsychology, 25, 577-589.

HECKER, M. H. L. (1971). Speaker recognition: An interpretivesurvey of the literature. ASHA Monographs, 16.

HOMA, D., CROSS, J., CORNELL, D., GOLDMAN, D., &SCHWARTZ, S. (1973). Prototype abstraction and classificationof new instances as a function of number of instances definingthe prototype. Journal of Experimental Psychology, 101, 116-122.

KEMPSTER, G. (1984). A multidimensional analysis of vocalquality in two dysphonic groups. Unpublished doctoral dis-sertation, Northwestern University.

KREIMAN, J. (1987). Human memory for unfamiliar voices.Unpublished doctoral dissertation, University of Chicago.

KREIMAN, J., & PAPCUN, G. (1986, May). The perception of voicequality: Multidimensional scaling evidence. Paper presentedat the 111th Meeting of the Acoustical Society of America,

33 103-15 March 1990

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

KREIMAN ET AL.: Voice Quality Perception 111

Cleveland, OH.KREIMAN, J., & PAPCUN, G. (1988, May). Voice 'features" in

long-term memory. Paper presented at the 115th Meeting ofthe Acoustical Society of America, Seattle, WA.

KRUEL, E. J., & HECKER, M. H. L. (1971). Descriptions of thespeech of patients with cancer of the vocal folds. Part II:Judgments of age and voice quality. Journal of the AcousticalSociety of America, 49, 428-435.

LADEFOGED, P. (1981, May). The relative nature of voice qual-ity. Paper presented at the 101st Meeting of the AcousticalSociety of America, Ottawa, Ontario.

LADEFOGED, P., MADDIESON, I., & JACKSON, M. (1988). Investi-gating phonation types in different languages. In O. Fujimura(Ed.), Vocal fold physiology: Voice production, mechanismsand functions (pp. 297-317). New York: Raven Press.

MATSUMOTO, H., HIKI, S., SONE, T., & NIMURA, T. (1973).Multidimensional representation of personal quality of vowelsand its acoustical correlates. IEEE Transactions on Audio andElectroacoustics, AU-21, 428-436.

MERVIS, C., & ROSCH, E. (1981). Categorization of naturalobjects. Annual Review of Psychology, 32, 89-115.

MURRY, T., & SINGH, S. (1980). Multidimensional analysis ofmale and female voices. Journal of the Acoustical Society ofAmerica, 68, 1294-1300.

MURRY T., SINGH, S., & SARGENT, M. (1977). Multidimensionalclassification of abnormal voice qualities. Journal of the Acous-tical Society of America, 61, 1630-1635.

PAPCUN, G., KREIMAN, J., & DAVIS, A. (1989). Long-term mem-ory for unfamiliar voices. Journal of the Acoustical Society ofAmerica, 85, 913-925.

POSNER, M., & KEELE, S. (1968). On the genesis of abstract ideas.Journal of Experimental Psychology, 77, 353-363.

POSNER, M., & KEELE, S. (1970). Retention of abstract ideas.Journal of Experimental Psychology, 83, 304-308.

SAS INSTITUTE, INC. (1983). SUGI supplemental library user's

guide. Cary, NC: SAS Institute, Inc.SCHIFFMAN, S. S., REYNOLDS, M. L., & YOUNG, F. W. (1981).

Introduction to multidimensional scaling. New York: Aca-demic.

TERBEEK, D. (1977). A cross-language multidimensional scalingstudy of vowel perception. UCLA Working Papers in Phonet-ics, 37.

TITZE, I., HORII, Y., & SCHERER, R. (1987). Some technicalconsiderations in voice perturbation measurements.Journal ofSpeech and Hearing Research, 30, 252-260.

VAN LANCKER, D., & KREIMAN, J. (1987). Voice discriminationand recognition are separate abilities. Neuropsychologia, 25,829-834.

VAN LANCKER, D., KREIMAN, J., & CUMMINGS, J. (in press). Voiceperception deficits: Neuroanatomical correlates of phonagno-sia. Journal of Clinical and Experimental Neuropsychology.

VAN LANCKER, D., KREIMAN, J., & EMMOREY, K. (1985). Familiarvoice recognition: Patterns and parameters. Part I: Recogni-tion of backward voices. Journal of Phonetics, 13, 19-38.

VAN LANCKER, D., KREIMAN, J., & WICKENS, T. (1985). Familiarvoice recognition: Patterns and parameters. Part II: Recogni-tion of rate-altered voices. Journal of Phonetics, 13, 39-52.

WOLFE, V., & STEINFATT, T. (1987). Prediction of vocal severitywithin and across voice types. Journal of Speech and HearingResearch, 30, 230-240.

YUMOTO, E., GOULD, W., & BAER, T. (1982). Harmonics-to-noiseratio as an index of the degree of hoarseness. Journal of theAcoustical Society of America, 71, 1544-1550.

Received May 5, 1989Accepted August 18, 1989

Requests for reprints should be sent to Jody Kreiman, VAMedical Center, Audiology and Speech Pathology (126),Wilshire and Sawtelle Boulevards, Los Angeles, CA 90073.

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

112 Journal of Speech and Hearing Research

APPENDIX A

CORRELATIONS BETWEEN DIMENSIONS AND STIMULUS CHARACTERISTICS FOR THE PATHOLOGICAL VOICES

Key to abbreviations: ExDn = Stimulus coordinates on expert dimension n; NaDn = Stimulus Coordinates on naive dimension n;Abnormal = Rated abnormality; Breathy = Rated breathiness; Instable = Rated instability; Rough = Rated roughness; # Harmon = Thenumber of visible harmonics; HTN = Harmonics-to-noise ratio; ShCovar = Covariance of shimmer; DirSh = Directional shimmer;ShMean = Mean shimmer; ShSD = Shimmer standard deviation; JitCovar = Covariance of jitter; DirJit = Directional jitter; JitMean= Mean jitter; JitSD = Jitter standard deviation; LNSD = The natural logarithm of the standard deviation of the period lengths;LNSDMN = LNSD divided by the mean period length; MeasRough = Measured roughness; RoughSD = Standard deviation of theroughness measure (see text for details).

ExD1

ExD2ExD3NaD1NaD2NaD3NaD4AbnormalBreathyInstableRoughFoF1F2F3F2-F1F2/F1H 1/H2#HarmonHTNShCovarDirShShMeanShSDJitCovarDirJit% JitterJitMeanJitSDLNSDLNSDMNMeasRoughRoughSD

.08

.03- .55

.41

.03

.76

.01- .63

.49

.38-. 75

.23

.20-. 29

.0-. 13- .63

.46-. 32-. 30-. 02

.32

.36

.27-. 03

.11

.29

.34

.69

.24

.19

.40

ExD2

.02

.01

.75-. 45-. 36-. 68-. 56

.09

.67

.02-. 32

.05

.16

.42

.43-. 09

.33

.07

.40-. 32-. 73-. 74

.05-. 55-. 18-. 16-. 07

.20

.01-. 82-. 37

ExD3

-. 82.06.29

-. 42-. 30

.10- .03- .38-. 69-. 32- .23- .24

.06

.23

.34

.14

.41

.05

.02

.04

.05- .42

.23- .44- .20-. 28

.22- .29-. 07-. 17

NaDl

-. 13-. 15-. 09

.15

.23-. 27

.07

.92

.16

.15

.32

.02-. 09

.04-. 31-. 19

.12-. 04-. 13-. 17

.12

.14

.22- .08-. 04- .58

.04-. 02-. 08

NaD2

.11-. 01-.65-.62

.21-. 38-. 32

.16

.36-.18

.28

.10-.31

.42-. 30-. 57-. 23-. 25-. 22

.14-. 15-. 10

.0

.03

.49

.15- .39-. 12

NaD3

.09

.01

.36- .03

.20-. 26

.49

.40

.02-. 04-. 26

.18

.07-. 10

.04

.08

.44

.47-. 28

.51-. 21-. 15-. 29-. 05-. 17

.45

.08

NaD4 Abnorm

.41- .27

.36

.73-. 37

.45

.30-. 02-. 12-. 32-. 75

.16-. 52-. 05- .08

.51

.58

.35-. 05

.25

.29

.36

.41

.27

.56

.59

.51

.37

.79

.18

.21-. 07

.14-. 33-. 32-. 15-. 19-. 42

.28

.19

.31

.27

.48

.37

.61

.49

.48

.03

.42

.69

.67

Breathy Instable Rough Fo Fl F2 F3 F2-F1

InstableRoughFoF1F2F3F2-F1F2/F1H1/H2#HarmonHTNShCovarDirShShMeanShSDJitCovarDirJit% JitterJitMeanJitSDLNSD

-. 17.19.36.10.08.26

-. 01-. 03

.45-. 51

.26

.52-. 13

.07

.08-. 40

.21-. 17-. 28-. 35-. 63

.41-. 26

.09-. 03-.11-.13-. 13-. 50

.15-. 40-. 43-.21-.28-. 15

.55-. 05

.27

.24

.29

.52

-. 07.40.13.21

- .28- .40- .42- .05-. 40

.09

.25

.62

.62.49.38.57.52.53.24

-. 03-. 03

.31-. 01

.0

.18-. 43-.01

.11

.0-. 31-. 34

.08-. 09

.16-. 14-. 11-. 62

.65

.30- .28-. 72-. 19

.07-. 39-. 10

.14

.26

.26

.10

.13

.07

.06

.03

.11

.45

.55

.05- .24

.21

.05- .30-. 20

.11

.02-. 35

.03- .27- .26- .27-. 22

.24

.02

.01-. 07

.20-. 09-. 06-.10-. 17-. 06-. 02

.05

.01

.04-. 23

.86-. 10

.19

.46-. 26-. 38-. 13-. 23-. 55-.10-. 42-. 39-. 37-. 39

____�__________________ �

33 103-115 March 1990

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

KREIMAN ET AL.: Voice Quality Perception 113

Breathy Instable Rough Fo F1 F2 F3 F2-Fi

LNSDMN -.37 .56 .41 .04 .10 -.40 -. 12 -. 61MeasRough .27 .19 .78 -. 16 .26 -. 07 - .03 - .36RoughSD .04 .55 .77 -. 20 -. 05 -. 13 -. 12 -. 11

F2/F1 H1/H2 Harmon HTN ShCovar DirSh ShMean ShSD

H1/H2 .03#Harmon .10 -. 18HTN .53 .21 -. 17ShCovar -. 16 .47 -. 45 .02DirSh -. 39 .13 .16 -. 07 -. 17ShMean -. 23 -. 11 -. 13 -. 02 .17 .46ShSD -. 30 -. 21 -. 31 -. 01 .23 .29 .94JitCovar -. 46 -. 35 .06 -.72 -. 26 .33 -.09 -.09DirJit -. 13 .20 -. 06 -. 25 .03 .59 .62 .60% Jitter -.35 -.20 .03 -.57 -.15 .50 .15 .12JitMean -. 32 -.17 .15 -. 52 -. 17 .54 .27 .26JitSD -. 29 -. 28 .15 -. 50 -. 20 .47 .20 .20LNSD -. 33 -. 31 .37 -. 55 -. 32 .28 .11 .14LNSDMN -. 50 -. 29 .0 -. 76 -. 24 .33 -. 08 -. 07MeasRough -. 40 -. 34 -. 13 -. 53 .14 .34 .74 .81RoughSD - .05 -. 53 .10 - .44 -. 07 .02 .35 .44

JitCovar DirJit Jitter JitMean JitSD LNSD LNSDMN MeasRough

DirJit% JitterJitMeanJitSDLNSDLNSDMNMeasRoughRoughSD

.19

.89

.83

.86

.69

.96

.45

.65

.29.36.24.32.33.59.16

.95

.95

.49

.78

.63

.75

.99

.66

.75

.56

.70

.66

.76

.50

.73

.75

.15

.35.42.52 .73

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

114 Journal of Speech and Hearing Research

APPENDIX B

CORRELATIONS BETWEEN DIMENSIONS AND STIMULUS CHARACTERISTICS FOR THE PATHOLOGICAL VOICES

Key to abbreviations: ExDn = Stimulus coordinates on expert dimension n; NaDn = Stimulus Coordinates on naive dimension n; #Harmon = The number of visible harmonics; HTN = Harmonics-to-noise ratio; ShCovar = Covariance of shimmer; DirSh = Directionalshimmer; ShMean = Mean shimmer; ShSD = Shimmer standard deviation; JitCovar = Covariance of jitter; DirJit = Directional jitter;JitMean = Mean jitter; JitSD = Jitter standard deviation; LNSD = The natural logarithm of the standard deviation of the period lengths;LNSDMN = LNSD divided by the mean period length; MeasRough = Measured roughness; RoughSD = Standard deviation of theroughness measure (see text for details).

ExD1 ExD2 ExD3 NaDl NaD2 Fo F1 F2

ExD2 -. 07ExD3 .21 -. 21NaDl .38 -. 48 .93NaD2 -. 66 -. 45 .23 .17Fo -. 40 .39 -. 94 -. 98 -. 11F1 .19 .11 -. 30 -.28 -.29 .18F2 .17 -. 39 -. 18 -.09 .23 .08 .45F3 .38 -. 13 -. 30 -. 06 -. 53 .09 .04 -. 09F2-F1 .06 -. 51 .0 .10 .45 -.04 -.19 .80F2/F1 -.08 -.48 .09 .16 .54 -.07 -.54 .51H1/H2 -.15 -. 02 .06 .03 .23 .0 .05 -.22#Harmon .17 -. 01 -.08 -.03 -.29 .08 .20 .03HTN -.40 -.04 -.42 -.40 .15 .42 .17 .35ShCovar -. 29 -. 34 .31 .27 .55 -. 24 -. 16 -. 07DirSh .14 -. 33 .22 .29 .23 -. 26 -. 19 .39ShMean .06 -. 83 .49 .65 .53 - .60 -. 10 .29ShSD .04 -. 83 .48 .65 .52 -. 60 -. 10 .25JitCovar -. 07 .02 .47 .39 .27 -. 35 -. 62 -. 30DirJit .23 -. 76 .61 .81 .28 -.75 -.15 .10% Jitter .13 -. 77 .42 .64 .29 -.57 -.14 .07JitMean .20 -. 76 .56 .76 .27 -. 69 -. 18 .01JitSD .19 -. 76 .53 .74 .26 -. 66 -. 21 .0LNSD .20 -. 20 .83 .79 .23 -. 78 -. 52 -. 23LNSDMN -. 02 .02 .50 .41 .24 -. 39 -. 61 -. 27MeasRough .20 -. 64 .33 .51 .31 -. 51 .25 .33RoughSD .18 -. 48 .43 .54 .24 -. 55 .20 .08

F3 F2-F1 F2/F1 HI/H2 # Harmon HTN ShCovar DirSh

F2-F1 -.12F2/F1 -. 17 .93H1/H2 -.17 -.27 -. 22#Harmon .14 -. 10 -. 18 -. 17HTN .12 .27 .19 -. 11 .21ShCovar -. 41 .03 .12 .46 -. 12 - .24DirSh .21 .56 .52 -.50 .13 .11 -.09ShMean -. 08 .38 .36 .20 -. 13 -. 27 .64 .32ShSD -. 02 .34 .33 .21 -. 15 -. 25 .59 .29JitCovar -.12 .10 .31 -.08 -.59 -.43 .05 .15DirJit .09 .21 .23 .05 .01 -. 23 .33 .28% Jitter .11 .17 .20 .06 -.13 -.28 .31 .19JitMean .09 .14 .18 .08 -.10 -.35 .36 .21JitSD .12 .14 .19 .05 -.12 -.33 .34 .23LNSD -.14 .10 .26 -. 03 -. 45 -.53 .20 .20LNSDMN -.13 .11 .32 -. 05 -. 63 -. 46 .08 .10MeasRough .06 .20 .07 .20 .08 - .20 .30 .18RoughSD .0 -. 05 -.12 .22 .25 -.35 .49 .14

ShMean ShSD JitCovar DirJit % Jitter JitMean JitSD LNSD

ShSDJitCovarDirJit% JitterJitMeanJitSD

.99

.12

.83

.80

.84

.82

.11

.83

.80

.84

.82

.16

.22

.26

.28

.93

.95

.94.98.98 .99

33 103-115 March 1990

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx

KREIMAN ET AL.: Voice Quality Perception 115

ShMean ShSD JitCovar DirJit % Jitter JitMean JitSD

LNSD .43 .42 .85 .52 .46 .56 .55LNSDMN .15 .14 .99 .18 .22 .27 .29 .88MeasRough .73 .75 - .24 .71 .63 .63 .58 .12RoughSD .69 .69 -. 23 .68 .56 .60 .55 .15

LNSDMN MeasRoughMeasRough -. 22RoughSD -. 21 .90

LNSD

Downloaded From: http://jslhr.pubs.asha.org/pdfaccess.ashx?url=/data/journals/jslhr/929128/ by University of California, Los Angeles - YRL Serials Dept, Jody Kreiman on 07/14/2017Terms of Use: http://pubs.asha.org/ss/rights_and_permissions.aspx