effects of reverberation on pitch, onset/offset, and...

25
Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University

Upload: others

Post on 22-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

DeLiang Wang

Perception & Neurodynamics LabThe Ohio State University

Page 2: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

2

Outline of presentation

IntroductionHuman performanceReverberation effects

On pitch On onset/offsetOn binaural cues

Monaural enhancement of reverberant signalBinaural segregation of reverberant signalDiscussion and summary

Page 3: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

3

Reverberation as linear transmission system

Early reflections

Time (ms)

Late reflections

( ) ( ) ( )∫∞

∞−−= τττ dtshtx x(t): reverberant signal; s(t): source signal

h(τ): room impulse response function

Page 4: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

4

Reverberation and speech quality

Room reverberation causes two distinct perceptual effects on speech quality

Early reflections lead to coloration or spectral deviation, determined by signal-to-reverberant energy ratio; it also boasts loudnessLate reflections (long-term reverberation) smear the time-frequency components of speech, and are characterized by the reverberation time (T60)

Page 5: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

5

Human performance

• Though speech perception in quiet seems robust to reverberation, speech intelligibility in noise suffers in the presence of reverberation (Plomp’76; Culling et al.’03)• Culling et al. showed that reverberation

(T60 = 0.4 s) produces 5 dB increase in speech reception threshold when naturally intonated speech is presented together with a competing talker

• Hearing impaired listeners are particularly susceptible to reverberation

• The binaural advantage for speech perception in noise is diminished by reverberation• The Culling et al. study found no

advantage at all

Culling et al. (2003)

Page 6: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

6

Human performance

• Darwin and Hukin (2000) compared reverberation effects on spatial, pitch, and vocal-tract size cues for sequential organization and found that• ITD cues are seriously impaired by reverberation• Pitch cues (F0 trajectory) are more resistant • A combination of pitch and vocal-tract size cues is very

resistant to reverberation

Page 7: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

7

Outline of presentation

IntroductionHuman performanceReverberation effects

On pitch On onset/offsetOn binaural cues

Monaural enhancement of reverberant signalBinaural segregation of reverberant signalDiscussion and summary

Page 8: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

8

Pitch tracking of a single utterance

0.0 2.5

5000

0

Freq

uenc

y

0.0 2.5

5000

0

Freq

uenc

y0.0 2.5

40

60

80

100

Time (sec)Pi

tch

(tim

e la

g)

CleanReverberant

Clean Female Utterance

Reverberant Female Utterance (T60=0.3 s)

Pitch Tracking

0.0 2.5

5000

0

Freq

uenc

y

0.0 2.5

5000

0

Freq

uenc

y

0.0 2.5

80

120

160

Time (sec)

Pitc

h (ti

me

lag)

CleanReverberant

Clean Male Utterance

Reverberant Male Utterance (T60=0.3 s)

Pitch Tracking

• Pitch is pretty robust to reverberation, especially for slowly changing pitch tracks and long voiced speech segments

• Noticeable artifacts: elongated pitch tracks

Page 9: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

9

Pitch tracking of two utterances

0.0 2.5

5000

0

Freq

uenc

y (H

z)

0.0 2.540

80

120

160

Time (sec)

Pitc

h (ti

me

lag)

One-source trackingTwo-source tracking

Reverberant Mixture (T60=0.3 s)

Pitch Tracking

Multipitch tracking using the Wu et al. algorithm (2003). Even with multiple reverberant sources, pitch tracking works reasonably well

Page 10: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

10

Reverberation effects on harmonic structure

• From Darwin and Hukin (2000). The utterance is “Could you please write the word bead down now.” T60 = 0.4 s

• Primarily in the low-frequency range

Page 11: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

11

Implications on pitch-based grouping

Pitc

h (ti

me

lag)

0.0 1.570

100

130

160

190

Pitc

h (ti

me

lag)

0.0 1.570

100

130

160

190

0 50 100 15070

100

130

160

190

Time (sec)

Pitc

h (ti

me

lag)

CleanReverberant

Histogram of selected peaks (Clean)

Histogram of selected peaks (T60=0.3 s)

Pitch Tracks

Smearing of harmonic structure is worse in the high-frequency range. The figure shows the histogram of peak positions that are nearest to the detected pitch periods for frequencies greater than 800 Hz. This smearing effect would degrade the performance of pitch-based grouping.

Page 12: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

12

Reverberation effects on temporal envelope

0 0.5 1 1.5 2

0

20

40

60

(a) Smoothed temporal envelope of anechoic utterance

Am

plitu

de (d

B)

0 0.5 1 1.5 2-20

0

20

40

(b) Smoothed temporal envelope of reverberant utterance

Am

plitu

de (d

B)

Time (s)

• Response envelope of a gammatone filter centered near 1 kHz to the utterance “That noise problem grows more annoying each day.” (a) T60 = 0 and (b) T60 = 0.3 s

• Amplitude modulation (AM) depth is reduced, but the AM pattern is reasonably maintained

Page 13: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

13

(b) Reverberant utterance

Time (s)

Freq

uenc

y (H

z)

0.5 1.0 1.5 2 2.550

363

1246

3255

8000

(a) Anechoic utterance

Freq

uenc

y (H

z)

0.5 1.0 1.5 2 2.550

363

1246

3255

8000

Onset and offset detection

Cochleogram representation. Red/black marks indicate detected onsets/offsets.The utterance: “That noise problem grows more annoying each day.”

Page 14: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

14

Reverberation effects on onset/offset detection

• Both the times and strengths of onsets and offsets are affected• Onset times are slightly shifted• Onsets of weak phones (e.g. unvoiced stops) are smeared • Offset times are shifted forward (delayed)

• Reverberation introduces spurious offsets

Page 15: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

15

Reverberation effects on binaural cues: ITD

• Shinn-Cunningham and Kawakyu (2003) showed that the responses of a neural model to ITD (interaural time difference) are poor indicators of source azimuth in the presence of reverberation

• Integration over time enhances the estimation robustness

Page 16: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

16

ITD estimation in time-frequency (T-F) units

Azimuth (degrees)

Cha

nnel

Cen

ter F

requ

ency

(Hz)

-90 0 45 900

5000

Azimuth (degrees)

Cha

nnel

Cen

ter F

requ

ency

(Hz)

-90 0 45 900

5000

AZIMUTH HISTOGRAM: Target source at 450, anechoic

AZIMUTH HISTOGRAM: Target source at 450, T60=0.3 s

Azi

mut

h (d

egre

es)

0 0.5 1 1.5 2 2.5

-90

0

45

90

Time (sec)

Azi

mut

h (d

egre

es)

0 0.5 1 1.5 2 2.5

-90

0

45

90

Across Frequency Integration (Clean)

Across Frequency Integration (T60

=0.3 s)

ITD estimation in individual T-F units using a cross-correlation model (Roman et al.’03). The input is natural speech. The distribution of local azimuth estimates is much noisier in the reverberant condition

Page 17: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

17

Interaural intensity difference estimation in T-F units

IID (dB)

Cha

nnel

Cen

ter F

requ

ency

(Hz)

-10 -5 0 5 10 15 200

5000

IID (dB)

Cha

nnel

Cen

ter F

requ

ency

(Hz)

-10 -5 0 5 10 15 200

5000

IID HISTOGRAM: Target source at 450, anechoic

IID HISTOGRAM: Target source at 450, T60=0.3 s

0 5000-2

0

2

4

6

8

10

12

14

16

Channel Center Frequency (Hz)

IID (d

B)

CleanReverberant

Mean IID for one utterance

The distribution of IID (interaural intensity difference) is also much noisier in reverberation, and the mean IID values lose characteristics

Page 18: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

18

Outline of presentation

IntroductionHuman performanceReverberation effects

On pitch On onset/offsetOn binaural cues

Monaural enhancement of reverberant signalBinaural segregation of reverberant signalDiscussion and summary

Page 19: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

19

A two-stage enhancement algorithm (Wu’03)

Identify an inverse filter to reduce coloration distortion by maximizing kurtosis of LPC residue (Gillespie et al.’01)

Estimate and subtract the effects of long-term reverberation

Clean speech (kurtosis = 12.2)

Reverberant speech (kurtosis = 3.6)

Time (ms)

Page 20: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

20

Results of Wu’s enhancement algorithm

Original speech

Reverberant speech

Enhanced speech

Inverse-filtered speech

Page 21: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

21

Binaural segregation of reverberant speechRoman and Wang (2004) proposed a figure-ground segregation strategy to identify the T-F units dominated by target using spatial information, without imposing restrictions on the number, location or content of interfering sourcesBasic idea

First perform cancellation of reverberant target (with detected target location) using adaptive filteringThen label those T-F units that have been largely attenuated in the first stage since they are more likely to originate from the target location

-

H2S+N2

H1S+N1 D

FT M

ATR

IX

W DFT

MA

TRIX

. . .

. . .

. . .

. . .

BIN

AR

Y M

ASK

Page 22: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

22

Segregation resultsAn example with a target speaker at 0ο and 4 other interfering speakers at (-135ο, -45ο, 45ο, 135ο) and T60 = 0.3 s

Page 23: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

23

ASR results

The segregation output is fed to a missing data recognizer (Cooke et al.’01)

(a) 5 speaker configuration

(b) Nonspeech intrusion: rock music at 45º

Baseline performanceEstimated binary maskIdeal binary mask

Page 24: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

24

Summary and discussion

Reverberation corrupts auditory cuesPitch estimation is relatively robust, but harmonic structure issmeared, particularly in high-frequencyAM depth is reduced but the AM pattern is reasonably maintainedOnset times, and especially offset times, are shifted; onset and offset synchrony is weakenedBinaural cues become unreliable

A two-stage monaural algorithm for reverberant speech enhancementA binaural algorithm for segregating reverberant speechIssues

What is “ground truth” pitch for a reverberant signal?Dereverberation versus enhancementHow to deal with both segregation and reverberation monaurally?

Page 25: Effects of Reverberation on Pitch, Onset/Offset, and ...labrosa.ee.columbia.edu/Montreal2004/talks/deliang2.pdfPitch Trac king 0.0 5 5 0.0 5 5 0 2.5 80 120 160 Tim (s c) Cle n Reverberan

25

Acknowledgment

N. Roman and G. Hu for performing some computer experimentsFunding by AFOSR/AFRL and NSF