homepages.wmich.eduhomepages.wmich.edu/~hillenbr/501/breathyvoicelab.docx · web viewrounding of...

Lab: Objective Measurement of Breathy Voice

Background

Breathy voice (also known as murmur) is one of the most common symptoms voice disorders, both organic and functional (Aronson, 1971; Aronson, 1990; Boone & McFarlane, 1988; Colton & Casper, 1990). There is also evidence that breathiness is associated with aging (Hollien, 1987; Ryan & Burk, 1974), and there is a tendency for women – on average - to produce somewhat breathier voices than men (Klatt & Klatt, 1990; McKay, 1987).

In terms of the underlying laryngeal vibratory pattern, two major features that are associated with breathy voice are described below.

1. Aspiration noise. The most obvious aerodynamic feature of breathy voice is that air escapes during the “closed” phase of the vibratory cycle. (The word closed is in quotation marks because, in the case of breathy voice, the vocal folds do not entirely meet at midline during the portion of the phonatory cycle in which glottal area reaches a minimum.) There is a surprising variety of laryngeal configurations that can be responsible for this air leakage, but the most commonly described (e.g., Södersten et al., 1995) is the posterior glottal (or glottic) chink, in which the anterior portion of the vocal folds periodically meet at (or near) midline (producing the buzzy component of breathy voice) while the posterior portion of the vocal folds remains open (producing the aperiodic, hissy component of breathy voice). This hissy component is called aspiration or aspiration noise. (This use of the term aspiration is entirely distinct from the use of this word in swallowing, referring to the entry of liquids or other unwanted gunk into the trachea and lungs.) Air leakage during the phonatory cycle results in turbulence, which is heard as noise.

The spectrum of aspiration noise – the hissy component of breathy voice – is stronger in the mid- and high-frequencies than it is in the lower frequencies. The spectrum of the buzzy (periodic) component of breathy voice, on the other hand, is exactly the opposite; i.e., it is stronger in the lower frequencies than it is in the highs. For this reason, aspiration is more easily seen in the mid-and high-frequencies, which the opposite is true of the periodic/harmonic component. These features can be seen in the spectra of Figure 2.

The presence of aspiration can be seen in the time domain: the waveform of a clearly phonated (modal) voice will appear highly periodic, and the degree of periodicity will decrease as the voice becomes more breathy. However, these differences in waveform periodicity are not always easy to see by eye – but they can usually be easily seen in the spectrum. For example,

Figure 1. Glottal source waveforms for breathy and non-breathy voice.

2

compare the narrow band spectra for the clear (non-breathy) and breathy voices in Figure 3. Notice that the degree of harmonic organization is much greater for the non-breathy voice; i.e., most of the energy is at harmonically related frequencies. The breathy voice, on the other hand, shows a reasonable degree of harmonic organization mainly in the low frequencies (see the glottal source spectra in the lower part of Figure 2). Differences in the degree of harmonic organization can also be seen in the output spectra (as opposed to source spectra) of Figure 3.

Figure 2. Airflow functions and spectra for non-breathy (left) and breathy voice. Notice the more rounded glottal wave for breathy voice, which produces a spectrum with a stronger 1st harmonic or, to state it the other way around, less energy spread into the higher frequencies.

Figure 3. Narrow band spectra for clear phonation (left) and for breathy voice. Notice that the degree of harmonic organization is much greater for the non-breathy voice on the left; i.e., most of the energy is at harmonically related frequencies, while the breathy voice shows a reasonable degree of harmonic organization mainly in the low frequencies. Also, note the strong 1st harmonic in the breathy signal – more on this below.

3

2. Rounding of the glottal source waveform for breathy voice. In non-breathy phonation in modal register, airflow rises gradually to a peak during the opening phase, but typically falls more abruptly during the closing phase (Figure 1). However, in breathy voice, the closing phase of the glottal source function is more gradual, producing a more rounded source signal. This can be seen in the top of Figure 2, which shows a more rounded (i.e., more sinusoidal) source waveform for the breathy voice. Once again, these breathiness-related differences in the degree of rounding of the glottal signal are more easily observed in the spectrum rather than in the time domain – in this case, by measuring the relative amplitude of the first harmonic (H1). Here is the reasoning: For a perfect sinusoid, all of the energy in the signal is at H1, with no spread of energy into the higher frequencies. (One definition of a sinusoid is that changes over time are as smooth as they can possibly be. As we discussed in the section on basic acoustics, the sinusoid is the extension over time of motion around a circle, with a circle being the smoothest shape possible.) As the source waveform becomes more abrupt (i.e., more like an impulse), the spread of energy to higher frequencies increases – meaning that the relative amplitude of H1 will decrease. The bottom line is that we would generally expect to see higher amplitude first harmonics for more breathy voices and weaker first harmonics for less breathy voices. Compare the first harmonic amplitudes for the breathy and non-breathy voices in Figures 2 and 3. Which of the spectra show stronger 1st harmonics – the breathy or the non-breathy?

The Lab

The lab uses ten sustained [a] vowels out of 25 voice samples that were used many years ago in a study of breathy vocal quality in dysphonic speakers (Hillenbrand & Houde, 1996). These voice samples, in turn, were drawn from a large database of recordings that were made at Massachusetts Eye and Ear Infirmary by Robert Hillman. The samples that we picked were intended to represent a fairly broad range of breathiness percepts from clear phonation to very breathy voice.

Procedure

1. Open SpeechTool/Ztool, then use the File menu to open ‘br01.wav’ (‘c:\ztool\br01.wav’ – on the LRC machines, it’s ‘r:\ztool\br01.wav’).

2. Play the signal as many times as you wish and rate how breathy the voice is on a scale of 1 to 5, with 5 being the most breathy. Record your rating for this signal. So, the row of data for this signal will look like this:

br01.wav 3 (or whatever)

3. Toward the end of the string of buttons at the top, you will see one called ‘CPP’. Bugging this button will run a program that estimates how periodic the signal is by measuring the degree of harmonic organization in its narrow band spectrum. The very last number that the program gives you is called “Mean CPPS”. Larger CPPS values indicate a higher degree of harmonic organization – i.e., most of the energy is at harmonically related frequencies, indicating a more periodic signal. Consequently, small values of CPPS should be associated

4

with less periodic (breathier) voices. Write this number down in the same row as your breathiness rating. Your row of data should now look something like this:

br01.wav 3 0.71

4. Do the same thing for the remaining nine signals. In your table of results, you should have your breathiness rating (1-5) and a CPPS value for each test signal.

5. Below is a table of breathiness ratings for each signal.

br01.wav 8.22br02.wav 5.21br03.wav 2.12br04.wav 2.86br05.wav 5.33br06.wav 1.51br07.wav 7.94br08.wav 3.42br09.wav 4.07br10.wav 4.87

These are very much like the breathiness ratings (BR) that you made, except that these ratings are averages from a panel of 21 listeners doing pretty much what you did. (These values vary from ~1.5 to ~8.2 instead of 1-5, but this doesn’t matter.) Copy these breathiness ratings into a new column of the table you created. So, each row in your table will have, in this order: (1) the name of the signal, (2) your breathiness rating, (3) the CPPS value from Ztool, and (4) the average rating from the 21-listener panel. Use Word to create a file called ‘brXXXX.txt’ (where ‘XXXX’ is the last 4 digits of your WIN – or any random string of numbers, e.g., ‘br1598.txt’) in the ztool folder with all of these numbers in it (filename, your BR, CPP, and panel BR – for all 10 signals). The 1st line should look something like this:

br01.wav 3 0.71 8.22

All 10 lines need to be in exactly this format; e.g., you want “0.71” NOT “0.71 dB”.

In Word, set the font to Courier and use the space bar only, not the Tab key. SAVE YOUR FILE AS PLAIN TEXT (File>Save as>Choose plain text, using the name ‘brXXXX.txt’; e.g., ‘br4598.txt’. (If Word asks you about “text encoding”, just leave it at the Windows default setting.)

6. The last step is to measure correlations between: (1) your BR and the panel BR (columns 2 and 4), (2) your BR and CPPS (columns 2 and 3), and (3) the panel BR and CPPS (columns 3

5

and 4). A correlation is a measure of the strength of the relationship between two sets of numbers.1 The easiest way to measure a correlation happens to be the most arcane, but it’s not that bad:

a. Hold down the Windows key (the one with the flag-looking thing on it) and hit ‘R’.b. Type ‘cmd’ into the text box that pops up. c. Put your cursor in the black window that appears and type:

c:<ENTER> cd c:\ztool’ <ENTER>

(LRC people: Use r: instead of c:)

d. Let’s assume you want to measure the correlation between your BR (col 2) and CPPS (col 3). Type this arcane thing:

.\tcor brXXXX.txt 2 brXXXX.txt 3 (measure the correlation between col 2 and col 3)

(e.g.: .\tcor br1760.txt 2 br1760.txt 3)

Notes: 1. The weird ‘.\’ thing has to be there. It needs to be a backslash (‘\’) and not a forward slash (‘/’). 2. If you get an error from tcor, take a close look at the format or your data file. All 10 lines need to be in exactly this format, with no extra lines:

br01.wav 3 0.71 8.22

‘tcor’ will type out a bunch of stuff; the only numbers you need are the values for ‘r’ and ‘rsq’ (r2, aka variance explained); e.g.:

r: -0.92022rsq: 0.84680

Do the same thing for the two other correlations that you need; e.g.,

.\tcor brdata.txt 2 brdata.txt 4

.\tcor brdata.txt 3 brdata.txt 4

Results: r r2

correlation between your BR and the panel BR _______ _______

correlation between your BR and CPP _______ _______

correlation between the panel BR CPP _______ _______

1Correlations vary from 0 to +1 for positive relationships (large values on one variable tend to be associated with large values on the other variable) or from 0 to -1 for negative relationships (large values on one variable are associated with small values on the other variable).

6

Questions:

1. How well do your breathiness ratings agree with the panel ratings? Note that the more important measure of the strength of a relationship is rsq (r2) rather than r: for example, an r value of 0.8 is not 80% of perfect, but an rsq value of 0.8 is 80% of perfect.

2. How well do the CPPS measures predict your breathiness ratings?

3. How well do the CPPSmeasures predict the panel breathiness ratings?

4. Why is the correlation between breathiness ratings and CPPSnegative? (If you’re not sure, see footnote 1).

5. What do you make of all this? For example, is there any advantage to using this measure of periodicity measure in place of your own subjective estimate?

6. Look at the figures on the last two pages of this document and read the description at the top of the page. Pick the two spectra that seem to show the most harmonic organization, and the two spectra that seem to show the least harmonic organization. (These are subjective

Answer here:

Answer here:

Answer here:

Answer here:

Answer here:

7

judgments, so there are no right and wrong answers. The signal you pick for the most harmonic organization should be very clean looking, with most of its energy at harmonically related frequencies; vice versa for the signal with the least harmonic organization.) Record your results below, along with the panel breathiness rating and the CPPSvalue for each signal:

Signal with the most harmonic organization

File name (e.g., br09) Panel BR CPP

________________ ______ ____

Signal with the second most harmonic organization

File name Panel BR CPP

________________ ______ ____

Signal with the least harmonic organization


________________ ______ ____

Signal with the second least harmonic organization


________________ ______ ____

7. Last question: Do the voices that you judged to have the most harmonic organization tend to be among the signals with: (a) the lowest breathiness ratings and/or (b) the largest CPPSvalues?

Answer here:

8

REFERENCES

Aronson, A.E. (1971). Early motor unit disease masquerading as psychogenic breathy dysphonia: A clinical case presentation. Journal of Speech and Hearing Disorders, 36, 115-124.

Aronson, A.E. (1990). Clinical voice disorders (3rd ed). New York: Thieme.

Boone, D.R., and McFarlane, S.C. (1988). The voice and voice therapy (4th ed). Englewood Cliffs, NJ: Prentice Hall.

Colton, R.A., and Casper, J.K. (1990). Understanding voice problems: A physiological perspective for diagnosis and treatment. Baltimore: Williams and Wilkins.

Hillenbrand, J.M., and Houde, R.A. (1996). Acoustic characteristics of breathy vocal quality: Dysphonic voices and continuous speech. Journal of Speech and Hearing Research, 39, 311-321.

Hollien, H. (1987). "Old voices": What do we really know about them? Journal of Voice, 1, 2-17.

Klatt, D.H., and Klatt, L.C. (1990). Analysis, synthesis, and perception of voice quality variations among female and male talkers. Journal of the Acoustical Society of America, 87, 820-57.

McKay, I. (1987). Phonetics: The Science of Speech Production (2nd ed.). Boston: College Hill.

Ryan, W.J., and Burk, K.W. (1974). Perceptual and acoustic correlates of aging in the speech of males. Journal of Communication Disorders, 1, 181-192.

Södersten. M., Hertegård S., Hammarberg B. (1995). Glottal closure, transglottal airflow, and voice quality in healthy middle-aged women. Journal of Voice, 9, 182-97.

9

Narrow Band Spectra of the Test Signals

The figures below are narrow band amplitude spectra of the ten test signals. Notice that the spectra vary quite a bit in the degree of harmonic organization, which reflects how periodic the signal is. For example, for br06 nearly all of the energy is at harmonic frequencies (whole number multiples of f0). The same is true of br03, though to a somewhat lesser extent. The spectra of some of the other signals, however, show all kinds of energy at non-harmonic frequencies; e.g., br01, br05, br07, and br10.

The CPPSalgorithm attempts to measure these variations in harmonic organization, with large CPPSvalues reflecting a high degree of harmonic organization (i.e., high periodicity). The assumption is that signals with large CPPSvalues tend to be less breathy – and vice versa.

homepages.wmich.eduhomepages.wmich.edu/~hillenbr/501/breathyvoicelab.docx · web viewrounding of...

Documents