sources of discrepancy between loudness perception and...

1
Sources of discrepancy between loudness perception and model predictions Tzu-Ling J. Yu, Robert S. Schlauch, Heekyung J. Han, & Edward Carney Department of Speech-Language-Hearing Sciences, University of Minnesota I. Introduction VI. References Background A clinically significant discrepancy between the pure-tone average and spondee threshold is observed in persons feigning hearing loss. It has been assumed that this discrepancy is due to their reliance on loudness for setting a criterion for responding in a threshold task. If so, this loudness-related discrepancy should be predictable by loudness models. Schlauch et al. (2015) found that a model designed for predicting loudness for sustained sounds (ANSI S3.4-2007) tended to overestimate the discrepancy between speech and 1-kHz tone. The model predicted a 27.6 dB difference while the behavioral data only shows a difference of 11.2 dB. Schlauch et al. (2015) reasoned that it could be because a loudness model that was designed for sustained stimuli does not capture the dynamic nature of speech. However, Han et al. (2015) found that this loudness-related discrepancy was still overestimated by a time-varying loudness model (Glasberg and Moore, 2002). As shown in Figure 1, compared to the behavioral findings, the model for earphone stimuli predicted a larger loudness differences between tones and the other stimuli—speech-shaped noise, sustained vowels, and spondees. Purpose of Study To eliminate possible confounds from 1) headphone frequency response corrections; 2) dynamic stimulus variations; and 3) a judgment bias based on blocked stimuli (Poulton, 1979), the current study investigated magnitude estimations of sustained speech-shaped noise and 1.0 kHz tones measured in a free field with two kinds of blocked presentations— (1) a single stimulus type within a block; and (2) the noise and tone presented within the same block. Secondly, this study compared two published loudness models for dynamic sounds —Glasberg & Moore (2002) and ISO 532-1 (“Time-Dependent Loudness”, 2014)—in predicting the loudness difference between speech-shaped noise and the1.0 kHz tone. V. Concluding Remarks IV. Discussion Schlauch et al. (2015) found that a loudness model for sustained stimuli overestimated the equal-loudness level difference between speech and 1-kHz tone. Han et al. (2015) found that a model for time-varying stimuli produced less discrepancy than a static model, but still predicted a larger discrepancy than behavioral findings of perceived loudness for stimuli presented through earphones. After controlling possible procedural confounds (Han et al., 2015), the current study found that the time-dependent models still overestimated the equal-loudness level difference between sustained speech-shaped noise and 1-kHz tone. Moreover, the ISO 532-1 (Time-Dependent Loudness, 2014) produced less discrepancy from perceived loudness than the Glasberg & Moore (2002). Data from our lab suggest that the current ANSI standard for loudness should include the ability to analyze time-varying sounds. The standard should also be modified to reduce high- frequency loudness summation, a problem noted in this study and in Schlittenlacher et al., 2014. Abstract Schlauch et al. (2015) found that a static model of loudness overestimates pure-tone- average/spondee-threshold differences in functional hearing loss. To explore this discrepancy, loudness was measured under earphones and in a free field for stimuli chosen to reveal possible mechanisms contributing to the static model’s over- prediction. Compared to a static model of loudness (ANSI S3.4, 2007), a time-varying model (Glasberg & Moore, 2002) for earphone stimuli produced more accurate estimation for spondees but still over-predicted loudness differences between tones and the other stimuli—speech-shaped noise, sustained vowels, and spondees. To eliminate possible confounds from headphone frequency response corrections and dynamic variations in a model’s predictions, magnitude estimations of sustained speech-shaped noise and 1.0 kHz tones were measured in a free field. The two models evaluated in this study still overestimated loudness differences. These findings suggest that current models (Glasberg & Moore, 2002; ISO532-1) overestimate loudness summation of wideband sounds. American National Standards Institute. (2007). Procedure for the computation of loudness of steady sounds (ANSI S3.4-2007). New York, NY: Author. Glasberg, B. R., & Moore, B. C. (2002). A model of loudness applicable to time-varying sounds. Journal of the Audio Engineering Society, 50, 331-342. Han, H. J., Schlauch, R. S., Yu, T. J., & Rao, A. (2015, March). Pure tone-spondee threshold relationships in functional hearing loss: A test. Poster session presented at the Annual Convention of the American Auditory Society, Scottsdale, AZ. International Organizational for Standardization. (1975). ISO 532B: Acoustics—Method for calculating loudness level. Genève, Switzerland: International Organization for Standardization. Poulton, E. C. (1979). Models for biases in judging sensory magnitude. Psychological Bulletin, 86, 777-803. Schlauch, R. S., Koerner, T. K., & Marshall, L. (2015). Effective identification of functional hearing loss using behavioral threshold measures. Journal of Speech, Language, and Hearing Research, 58, 453—465. Schlittenlacher, J., Ellermeier, W., & Hashimoto, T. (2015). Spectral loudness summation: Shortcomings of current standards. The Journal of Acoustical Society of America, 137, EL26-EL31. Time-Dependent Loudness—ISO 532-1. (2014, Fall). HEADlines, 33 (Germany). Retrieved 18 Feb 2016 from http://www.head-acoustics.de/downloads/eng/headlines/nvh/HEADlines_33_e.pdf Zwicker, E. (1977). Procedure for calculating loudness of temporally variable sounds. The Journal of Acoustical Society of America, 62, 675-682. Participants Eighteen young adults (8 females and 10 males), with an average age of 28 years and self-reported normal hearing, participated in the experiment. Test Environment, Stimuli, and Design Multi-Sensory Perception Lab (Figure 2) Laboratory measures were obtained in a 10 ft. by 13 ft., single-walled, sound- attenuating chamber. Participants sat in the center of the room, facing 5 speakers that presented sounds from the front (± 20˚ azimuth). Equipment used in this study included 1 D/A converter (Lynx Studio Technology, Aurora 16), 3 power amplifiers (Crown Audio, XLS 1500), 1 sound card (Lynx Studio Technology, AES16e), and 5 speakers (Anthony Gallo Acoustics, A’Diva Ti). Stimuli Speech-shaped noise and a 1-kHz tone were used. All stimuli were generated with a 48-kHz sampling rate and output using a 24-bit digital-to-analog converter. The duration for each stimulus presentation was 1.2 sec. Three within-subject conditions—all three stimulus types were presented to each participant in a random order. Speech-shaped noise (SSN) only 1-kHz tone only Mixed condition: presenting SSN and 1-kHz tone in random order Procedure Warm-Up Activity To familiarize participants with the magnitude estimation task, they were asked to judge the length of five objects, ranging from 4.75 to 157 in. in random order. All participants were able to use numbers correctly and participate in the experiment. Experimental Task: Loudness magnitude estimation Participants were presented the three conditions in random order while seated in a sound-attenuating chamber. Each stimulus was presented to participants at six levels (40-90 dB SPL) in random order at ± 20˚ azimuth in a sound field. Participants were asked to maintain the same reference for loudness for all stimuli and were instructed to write down a number to correspond to the sound’s sensation magnitude. Except for negative values, the assigned numbers can be in any form, such as whole numbers, fractions, or decimals. III. Results Figure 1. Perceived and predicted loudness for earphone stimuli. Loudness of SSN and the 1-kHz tone as a function of level for each condition (Figure 3) Performance in the mixed condition (SSN and tone in random order) produced slightly larger loudness differences than when presented separately. This result is consistent with the biases described by Poulton (1979). Only the results of the mixed condition were used for subsequent analyses. Figure 3. Loudness of SSN and 1-kHz pure tone 1 as a function of level. Figure 4. Perceived and predicted loudness as a function of level. Figure 5. Boxplots show the results of bootstrapping the differences between the predicted and perceived equal-loudness levels. II. Methods Perceived and predicted loudness as a function of level When the two stimuli were perceived to be equally loud, the level of the 1-kHz tone was between 8.1 and 11.6 dB higher than the level of SSN for the behavioral data shown in the right panel of Figure 3. Figure 4 shows that the Dynamic Loudness Model (Moore & Glasberg, 2002) and the ISO model (“Time-Dependent Loudness”, 2014) predicted larger level differences than were observed in the behavioral data. Range of differences between predicted and perceived equal- loudness levels as estimated by the “bootstrap” method Both the ISO 532-1 (“Time-Dependent Loudness”, 2014) and the Glasberg & Moore (2002) overestimated the equal-loudness level difference but the prediction from ISO 532-1 produced a smaller discrepancy. This study assessed the loudness prediction from two models designed for time-varying stimuli. Both models overestimated the equal-loudness level differences between speech- shaped noise and 1-kHz tone. This study attempted to rule out possible confounding factors as suggested by Han et al. (2015). These included: 1) headphone frequency response corrections; 2) dynamic stimulus variations; and 3) a judgment bias resulting from judging stimuli in separate blocks (Poulton, 1979). Schlittenlacher et al. (2014) reported a similar finding for static loudness models. He observed that current standardized loudness models—ANSI S3.4 (2007) and DIN 45631 (1991)—for steady sounds overestimate the effects of loudness summation for broadband noise especially in high-frequency range (1.25–5 kHz), and the predictions by DIN 45631 tend to be closer to the subjective evaluation than ANSI S3.4 (Figure 6). Given that the dynamic loudness models—ISO 532-1 and Glasberg & Moore (2002)—are based on many of the same principles as the static models—ISO 532 B (1975) and ANSI S3.4 (2007), it is possible that dynamic models also overestimate loudness differences resulting from spectral loudness summation. Figure 6. Perceived (shown using boxes) and predicted (shown with filled points) differences between filtered pink noise and a 1-kHz tone (from Schlittenlacher et al., 2014). DIN 45631 (1991) is a refinement of ISO 532 B (1975) and both are based on the Zwicker method (1977) for static sounds. Plotted behavioral data (boxes) represent loudness matches from 20 participants. Figure 2. Multi-Sensory Perception Lab. The participant was in the center of the booth and maintained his head at 0˚ azimuth while listening to stimuli.

Upload: others

Post on 09-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sources of discrepancy between loudness perception and ...catss.umn.edu/Archive/msp/Yu_etal_AAS_2016.pdf · Figure 1, compared to the behavioral findings, the model for earphone stimuli

Sources of discrepancy between loudness perception and model predictions

Tzu-Ling J. Yu, Robert S. Schlauch, Heekyung J. Han, & Edward Carney

Department of Speech-Language-Hearing Sciences, University of Minnesota

I. Introduction

VI. References

Background

A clinically significant discrepancy between the pure-tone average and spondee threshold is observed in persons feigning hearing loss. It has been assumed that this discrepancy is due to their reliance on loudness for setting a criterion for responding in a threshold task. If so, this loudness-related discrepancy should be predictable by loudness models.

Schlauch et al. (2015) found that a model designed for predicting loudness for sustained sounds (ANSI S3.4-2007) tended to overestimate the discrepancy between speech and 1-kHz tone. The model predicted a 27.6 dB difference while the behavioral data only shows a difference of 11.2 dB. Schlauch et al. (2015) reasoned that it could be because a loudness model that was designed for sustained stimuli does not capture the dynamic nature of speech. However, Han et al. (2015) found that this loudness-related discrepancy was still overestimated by a time-varying loudness model (Glasberg and Moore, 2002). As shown in Figure 1, compared to the behavioral findings, the model for earphone stimuli predicted a larger loudness differences between tones and the other stimuli—speech-shaped noise, sustained vowels, and spondees. Purpose of Study

To eliminate possible confounds from 1) headphone frequency response corrections; 2) dynamic stimulus variations; and 3) a judgment bias based on blocked stimuli (Poulton, 1979), the current study investigated magnitude estimations of sustained speech-shaped noise and 1.0 kHz tones measured in a free field with two kinds of blocked presentations—(1) a single stimulus type within a block; and (2) the noise and tone presented within the same block.

Secondly, this study compared two published loudness models for dynamic sounds—Glasberg & Moore (2002) and ISO 532-1 (“Time-Dependent Loudness”, 2014)—in predicting the loudness difference between speech-shaped noise and the1.0 kHz tone.

V. Concluding Remarks

IV. Discussion

Schlauch et al. (2015) found that a loudness model for sustained stimuli overestimated the equal-loudness level difference between speech and 1-kHz tone. Han et al. (2015) found that a model for time-varying stimuli produced less discrepancy than a static model, but still predicted a larger discrepancy than behavioral findings of perceived loudness for stimuli presented through earphones.

After controlling possible procedural confounds (Han et al., 2015), the current study found that the time-dependent models still overestimated the equal-loudness level difference between sustained speech-shaped noise and 1-kHz tone. Moreover, the ISO 532-1 (Time-Dependent Loudness, 2014) produced less discrepancy from perceived loudness than the Glasberg & Moore (2002).

Data from our lab suggest that the current ANSI standard for loudness should include the ability to analyze time-varying sounds. The standard should also be modified to reduce high-frequency loudness summation, a problem noted in this study and in Schlittenlacher et al., 2014.

Abstract Schlauch et al. (2015) found that a static model of loudness overestimates pure-tone-average/spondee-threshold differences in functional hearing loss. To explore this discrepancy, loudness was measured under earphones and in a free field for stimuli chosen to reveal possible mechanisms contributing to the static model’s over-prediction. Compared to a static model of loudness (ANSI S3.4, 2007), a time-varying model (Glasberg & Moore, 2002) for earphone stimuli produced more accurate estimation for spondees but still over-predicted loudness differences between tones and the other stimuli—speech-shaped noise, sustained vowels, and spondees. To eliminate possible confounds from headphone frequency response corrections and dynamic variations in a model’s predictions, magnitude estimations of sustained speech-shaped noise and 1.0 kHz tones were measured in a free field. The two models evaluated in this study still overestimated loudness differences. These findings suggest that current models (Glasberg & Moore, 2002; ISO532-1) overestimate loudness summation of wideband sounds.

American National Standards Institute. (2007). Procedure for the computation of loudness of steady sounds (ANSI S3.4-2007). New York, NY: Author.

Glasberg, B. R., & Moore, B. C. (2002). A model of loudness applicable to time-varying sounds. Journal of the Audio Engineering Society, 50, 331-342.

Han, H. J., Schlauch, R. S., Yu, T. J., & Rao, A. (2015, March). Pure tone-spondee threshold relationships in functional hearing loss: A test. Poster session presented at the Annual Convention of the American Auditory Society, Scottsdale, AZ.

International Organizational for Standardization. (1975). ISO 532B: Acoustics—Method for calculating loudness level. Genève, Switzerland: International Organization for Standardization.

Poulton, E. C. (1979). Models for biases in judging sensory magnitude. Psychological Bulletin, 86, 777-803. Schlauch, R. S., Koerner, T. K., & Marshall, L. (2015). Effective identification of functional hearing loss using

behavioral threshold measures. Journal of Speech, Language, and Hearing Research, 58, 453—465. Schlittenlacher, J., Ellermeier, W., & Hashimoto, T. (2015). Spectral loudness summation: Shortcomings of

current standards. The Journal of Acoustical Society of America, 137, EL26-EL31. Time-Dependent Loudness—ISO 532-1. (2014, Fall). HEADlines, 33 (Germany). Retrieved 18 Feb 2016 from

http://www.head-acoustics.de/downloads/eng/headlines/nvh/HEADlines_33_e.pdf Zwicker, E. (1977). Procedure for calculating loudness of temporally variable sounds. The Journal of Acoustical

Society of America, 62, 675-682.

Participants

Eighteen young adults (8 females and 10 males), with an average age of 28 years and self-reported normal hearing, participated in the experiment. Test Environment, Stimuli, and Design Multi-Sensory Perception Lab (Figure 2)

•  Laboratory measures were obtained in a 10 ft. by 13 ft., single-walled, sound-attenuating chamber.

•  Participants sat in the center of the room, facing 5 speakers that presented sounds from the front (± 20˚ azimuth).

•  Equipment used in this study included 1 D/A converter (Lynx Studio Technology, Aurora 16), 3 power amplifiers (Crown Audio, XLS 1500), 1 sound card (Lynx Studio Technology, AES16e), and 5 speakers (Anthony Gallo Acoustics, A’Diva Ti).

Stimuli •  Speech-shaped noise and a 1-kHz tone were used. •  All stimuli were generated with a 48-kHz sampling rate and output using a 24-bit

digital-to-analog converter. •  The duration for each stimulus presentation was 1.2 sec.

Three within-subject conditions—all three stimulus types were presented to each participant in a random order.

•  Speech-shaped noise (SSN) only •  1-kHz tone only •  Mixed condition: presenting SSN and 1-kHz tone in random order

Procedure

Warm-Up Activity

To familiarize participants with the magnitude estimation task, they were asked to judge the length of five objects, ranging from 4.75 to 157 in. in random order.

All participants were able to use numbers correctly and participate in the experiment. Experimental Task: Loudness magnitude estimation

Participants were presented the three conditions in random order while seated in a sound-attenuating chamber.

Each stimulus was presented to participants at six levels (40-90 dB SPL) in random order at ± 20˚ azimuth in a sound field.

Participants were asked to maintain the same reference for loudness for all stimuli and were instructed to write down a number to correspond to the sound’s sensation magnitude. Except for negative values, the assigned numbers can be in any form, such as whole numbers, fractions, or decimals.

III.  Results

Figure 1. Perceived and predicted loudness for earphone stimuli.

Loudness of SSN and the 1-kHz tone as a function of level for each condition (Figure 3)

Performance in the mixed condition (SSN and tone in random order) produced slightly larger loudness differences than when presented separately. This result is consistent with the biases described by Poulton (1979).

Only the results of the mixed condition were used for subsequent analyses.

Figure 3. Loudness of SSN and 1-kHz pure tone 1 as a function of level.

Figure 4. Perceived and predicted loudness as a function of level.

Figure 5. Boxplots show the results of bootstrapping the differences between the predicted and perceived equal-loudness levels.

II. Methods

Perceived and predicted loudness as a function of level

When the two stimuli were perceived to be equally loud, the level of the 1-kHz tone was between 8.1 and 11.6 dB higher than the level of SSN for the behavioral data shown in the right panel of Figure 3.

Figure 4 shows that the Dynamic Loudness Model (Moore & Glasberg, 2002) and the ISO model (“Time-Dependent Loudness”, 2014) predicted larger level differences than were observed in the behavioral data.

Range of differences between predicted and perceived equal-loudness levels as estimated by the “bootstrap” method

Both the ISO 532-1 (“Time-Dependent Loudness”, 2014) and the Glasberg & Moore (2002) overestimated the equal-loudness level difference but the prediction from ISO 532-1 produced a smaller discrepancy.

This study assessed the loudness prediction from two models designed for time-varying stimuli. Both models overestimated the equal-loudness level differences between speech-shaped noise and 1-kHz tone. This study attempted to rule out possible confounding factors as suggested by Han et al. (2015). These included: 1) headphone frequency response corrections; 2) dynamic stimulus variations; and 3) a judgment bias resulting from judging stimuli in separate blocks (Poulton, 1979).

Schlittenlacher et al. (2014) reported a similar finding for static loudness models. He observed that current standardized loudness models—ANSI S3.4 (2007) and DIN 45631 (1991)—for steady sounds overestimate the effects of loudness summation for broadband noise especially in high-frequency range (1.25–5 kHz), and the predictions by DIN 45631 tend to be closer to the subjective evaluation than ANSI S3.4 (Figure 6). Given that the dynamic loudness models—ISO 532-1 and Glasberg & Moore (2002)—are based on many of the same principles as the static models—ISO 532 B (1975) and ANSI S3.4 (2007), it is possible that dynamic models also overestimate loudness differences resulting from spectral loudness summation.

Figure 6. Perceived (shown using boxes) and predicted (shown with filled points) differences between filtered pink noise and a 1-kHz tone (from Schlittenlacher et al., 2014). DIN 45631 (1991) is a refinement of ISO 532 B (1975) and both are based on the Zwicker method (1977) for static sounds. Plotted behavioral data (boxes) represent loudness matches from 20 participants.

Figure 2. Multi-Sensory Perception Lab. The participant was in the center of the booth and maintained his head at 0˚ azimuth while listening to stimuli.

Tzu-Ling Yu
Poster # 64Email [email protected]