compensation for nonlinear distortion in ...ph.d. thesis defense october 27, 2014 introduction 2...
TRANSCRIPT
![Page 1: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/1.jpg)
COMPENSATION FOR NONLINEAR DISTORTION IN NOISE FOR ROBUST SPEECH RECOGNITION
Mark J. Harvilla Ph.D. Thesis Defense October 27, 2014
![Page 2: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/2.jpg)
Introduction
2
Topic Symbol Fraction of thesis work
Dynamic range compression (DRC) and automatic speech recognition (ASR)
11%
Blind amplitude normalization (BAN) 14%
Blind amplitude reconstruction (BAR) 28%
Robust estimation of distortion (RED) 28%
Artificially-matched training (AMT) 9%
The Big Picture 10%
DRC & ASR
BAN
BAR
RED
AMT
Big Picture
DRC & ASR BAN BAR RED AMT Big Picture Conclusion Introduction
![Page 3: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/3.jpg)
Dynamic Range Compression (DRC) • A form of nonlinear distortion
Ø Nonlinear systems are common (e.g., AM/FM radio, rectifiers)
• DRC is used extensively in audio engineering typically for one of three reasons: 1. Adhere to dynamic range limitations of a signal transmission
system, while increasing average signal power 2. Increase perceived signal loudness 3. Eliminate drastic changes in volume (e.g., automatic gain control)
• Because of the ubiquity of DRC, speech systems—like ASR—
are likely to encounter compressed speech
3
BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR
![Page 4: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/4.jpg)
−1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1−1
−0.8−0.6−0.4−0.2
00.20.40.60.8
1
input amplitude
outp
ut a
mpl
itude
τ = 0.6
τ = 0.1
R = 1.5R = 2.5R = ∞
Dynamic Range Compression (DRC) • DRC is characterized by two parameters, ratio (R) and
threshold (τ).
4
BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR
![Page 5: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/5.jpg)
0 0.005 0.01 0.015 0.02 0.025 0.03−1
−0.8−0.6−0.4−0.2
00.20.40.60.8
1
time (seconds)
−1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1−1
−0.8−0.6−0.4−0.2
00.20.40.60.8
1
input amplitude
outp
ut a
mpl
itude
R = 1R = 1.5R = 2.5R = ∞
Dynamic Range Compression (DRC)
5
BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR
−1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 10
0.005
0.01
0.015
0.02
0.025
0.03
amplitude
time
(sec
onds
)
![Page 6: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/6.jpg)
0 10 20 30 40 50 60 70 80 90 1000
4
8
12
16
20
τ, threshold (percentile)
SNR
(dB)
R=1.5R=2R=3R=6R=∞
Dynamic Range Compression (DRC)
6
BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR
![Page 7: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/7.jpg)
Some examples
7
BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR
Threshold (τ) Ratio (R) Audio Crest Factor Word Error
Rate (WER) WER after processing
P100 1 17.1 dB 6.4% 6.4%
P75 4 7.7 dB 20.3% 6.4%
P75 ∞ 4.1 dB 30.8% 13.5%
P50 4 6.7 dB 30.2% 6.4%
P50 ∞ 2.2 dB 49.5% 23.0%
![Page 8: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/8.jpg)
Measuring the effect of DRC on ASR
8
BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR
Clean acoustic model
clean speech
signal
Controlled parameter
values: (R,τ)
Measure word error rate (WER) DRC ASR
Experiment 1 (no additive noise):
![Page 9: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/9.jpg)
Measuring the effect of DRC on ASR
9
BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR
Clean acoustic model
clean speech
signal
Controlled parameter
values: (R,τ)
Measure word error rate (WER) DRC ASR
Experiment 2 (additive, channel noise):
Additive noise at
controlled SNR
+
![Page 10: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/10.jpg)
Measuring the effect of DRC on ASR
10
BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR
Experiment 1 (no additive noise):
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
R = ∞R = 20R = 10R = 6R = 4R = 2R = 1
Clean acoustic model
clean speech
signal
Controlled parameter
values: (R,τ)
Measure word error rate (WER) DRC ASR
No additive noise
![Page 11: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/11.jpg)
Measuring the effect of DRC on ASR
11
BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR
Experiment 2 (additive, channel noise): Clean acoustic
model
clean speech
signal
Controlled parameter
values: (R,τ)
Measure word error rate (WER) DRC ASR
Additive noise at
controlled SNR
+
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
R = ∞R = 20R = 10R = 6R = 4R = 2R = 1
Additive noise at 20-dB SNR w.r.t. compressed signal
![Page 12: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/12.jpg)
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
Measuring the effect of DRC on ASR
12
BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR
Experiment 2 (additive, channel noise): Clean acoustic
model
clean speech
signal
Controlled parameter
values: (R,τ)
Measure word error rate (WER) DRC ASR
Additive noise at
controlled SNR
+
Additive noise at 15-dB SNR w.r.t. compressed signal
![Page 13: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/13.jpg)
Counteracting the effects of DRC
13
BAN BAR RED AMT Big Picture Conclusion Introduction DRC & ASR
DRC
Saturating “clipping”
Non-saturating “compression”
Blind amplitude reconstruction
(BAR)
Blind amplitude normalization
(BAN)
Artificially-matched
training (AMT)
Robust estimation of nonlinear distortion function (RED)
![Page 14: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/14.jpg)
Blind Amplitude Normalization (BAN) (Balchandran & Mammone; ICASSP 1998)
• Step 1: Obtain estimate of the cumulative distribution function (CDF) of the observed speech, and of clean, unadulterated reference speech.
14
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
−1 −0.6 −0.2 0.2 0.6 10
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
−0.08 −0.048 −0.016 0.016 0.048 0.080
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
Observed speech (R = 10, τ = P50) Clean speech
![Page 15: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/15.jpg)
• Step 2: For a given reference signal amplitude, find the amplitude in the observed CDF with the same cumulative probability.
15
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
Ø Input amplitude of 0.061 maps to 0.2
Blind Amplitude Normalization (BAN) (Balchandran & Mammone; ICASSP 1998)
−1 −0.6 −0.2 0.2 0.6 10
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
−0.08 −0.048 −0.016 0.016 0.048 0.080
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
![Page 16: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/16.jpg)
• Step 3: Repeat for each input signal amplitude to obtain a full non-parametric estimate of the nonlinear mapping.
16
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
Blind Amplitude Normalization (BAN) (Balchandran & Mammone; ICASSP 1998)
−1 −0.6 −0.2 0.2 0.6 10
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
−0.08 −0.048 −0.016 0.016 0.048 0.080
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
![Page 17: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/17.jpg)
How well does BAN work? • Experiment 1 (no additive noise):
17
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
R = ∞R = 20R = 10R = 6R = 4R = 2R = 1
Before BAN
![Page 18: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/18.jpg)
How well does BAN work? • Experiment 1 (no additive noise):
18
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
R = ∞R = 20R = 10R = 6R = 4R = 2R = 1
After BAN
![Page 19: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/19.jpg)
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
R = ∞R = 20R = 10R = 6R = 4R = 2R = 1
How well does BAN work? • Experiment 2 (additive, channel noise at 20-dB SNR):
19
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
Before BAN
![Page 20: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/20.jpg)
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
How well does BAN work? • Experiment 2 (additive, channel noise at 20-dB SNR):
20
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
After BAN
![Page 21: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/21.jpg)
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
How well does BAN work? • Experiment 2 (additive, channel noise at 15-dB SNR):
21
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
Before BAN
![Page 22: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/22.jpg)
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
How well does BAN work? • Experiment 2 (additive, channel noise at 15-dB SNR):
22
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
After BAN
![Page 23: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/23.jpg)
Robust BAN (Harvilla & Stern; unpub.)
• Idea: Shift each input sample by the amount the centroid of it and its neighbors is changed when inverting the nonlinearity.
23
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
Observed speech after low-pass filter (R = 10, τ = P50, SNR = 15 dB)
Clean speech after low-pass filter
−0.08 −0.048 −0.016 0.016 0.048 0.080
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
−1 −0.6 −0.2 0.2 0.6 10
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
![Page 24: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/24.jpg)
Robust BAN (Harvilla & Stern; unpub.)
• Step 1: As before, for a given reference signal amplitude, find the amplitude in the observed CDF with the same cumulative probability.
24
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
−0.08 −0.048 −0.016 0.016 0.048 0.080
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
−1 −0.6 −0.2 0.2 0.6 10
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
![Page 25: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/25.jpg)
Robust BAN (Harvilla & Stern; unpub.)
• Step 2: The difference between the output and the input is the offset to be added to the original, noisy and compressed waveform.
25
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
−0.08 −0.048 −0.016 0.016 0.048 0.080
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
−1 −0.6 −0.2 0.2 0.6 10
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
Offset = output – input = 0.2 – 0.061 = 0.139
![Page 26: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/26.jpg)
Robust BAN (Harvilla & Stern; unpub.)
26
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
−0.08 −0.048 −0.016 0.016 0.048 0.080
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
−1 −0.6 −0.2 0.2 0.6 10
0.2
0.4
0.6
0.8
1
amplitude
cum
ulat
ive
prob
abili
ty
• Step 3: Repeat for each input signal amplitude, always using the inverse mapping defined by the smoothed signals.
![Page 27: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/27.jpg)
Robust BAN (Harvilla & Stern; unpub.)
27
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
• Step 1: For each sample, find the centroid of the value and its surrounding 4 samples. • Step 2: Pass the centroid value through the inverse
nonlinearity estimate. • Step 3: Find the difference (“offset”) between the output of
the inverse nonlinearity and the centroid. • Step 4: Add the offset to the original noisy and compressed
sample value from Step 1. • Step 5: Repeat for each sample in the input signal.
![Page 28: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/28.jpg)
Robust BAN (Harvilla & Stern; unpub.)
28
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
0 0.0037 0.0075 0.0112 0.0149 0.0187−0.3−0.2−0.1
00.10.20.30.4
time (seconds)
ampl
itude
originalDRC + noise (SNR = 15dB)
0 0.0037 0.0075 0.0112 0.0149 0.0187−0.3−0.2−0.1
00.10.20.30.4
time (seconds)
ampl
itudeRepaired
using BAN:
![Page 29: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/29.jpg)
Robust BAN (Harvilla & Stern; unpub.)
29
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
0 0.0037 0.0075 0.0112 0.0149 0.0187−0.3−0.2−0.1
00.10.20.30.4
time (seconds)
ampl
itude
originalDRC + noise (SNR = 15dB)
0 0.0037 0.0075 0.0112 0.0149 0.0187−0.3−0.2−0.1
00.10.20.30.4
time (seconds)
ampl
itude
Repaired using
Robust BAN:
![Page 30: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/30.jpg)
R=2 R=4 R=6 R=10 R=20−30
−20
−10
0
10
20
30
(RB
AN−B
AN
) rel
. im
prov
. (%
)
15−dB SNR20−dB SNR
• RBAN is more useful as R becomes large and SNR decreases:
Results summary
30
DRC & ASR BAR RED AMT Big Picture Conclusion Introduction BAN
![Page 31: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/31.jpg)
Blind Amplitude Reconstruction (BAR) • When R = ∞, BAN techniques are ineffective. • All samples greater than |τ| are completely lost (“clipping”).
31
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
0 0.5 1 1.5 2 2.5 3x 10−3
−0.3
−0.1
0.1
0.3
0.5
time (seconds)
ampl
itude
![Page 32: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/32.jpg)
Consistent Iterative Hard Thresholding (Kitic et al.; ICASSP 2013)
• Kitic-IHT works by learning a sparse representation of the incoming clipped speech in term of Gabor basis vectors. • Learning is done using a modified version of the Iterative
Hard Thresholding (IHT) algorithm. • The learned sparse representation is then used to
reconstruct the signal on a frame-by-frame basis.
32
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
Kitic-IHT will be used as a baseline to compare novel declipping algorithm performance.
Gabor basis vectors
Sparse representation, learned from clipped observation
Repaired signal frame
![Page 33: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/33.jpg)
Constrained BAR (Harvilla & Stern; Interspeech 2014) • Declip the signal by interpolating missing samples such that
the energy in the second derivative is minimized (i.e., for smoothness). • Ensure the interpolation matches the sign of the clipped
signal and is greater than |τ| in the absolute sense.
33
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
0 0.5 1 1.5 2 2.5 3x 10−3
−0.3
−0.1
0.1
0.3
0.5
time (seconds)
ampl
itude
![Page 34: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/34.jpg)
Constrained BAR (Harvilla & Stern; Interspeech 2014) • Explaining masking matrices
34
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
Isolates reliable samples
Isolates clipped samples
![Page 35: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/35.jpg)
Constrained BAR (Harvilla & Stern; Interspeech 2014)
35
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
minimize
subject to
xc
CBAR objective function:
![Page 36: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/36.jpg)
Constrained BAR (Harvilla & Stern; Interspeech 2014) • Because Constrained BAR (CBAR) imposes a hard constraint
when minimizing the objective function, it is very slow.
• A line search algorithm is used to solve the constrained optimization separately for every frame.
• In the worst case, it is 400 times slower than real time. • This motivates the development of a declipping algorithm
that does not require a hard constraint.
36
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
![Page 37: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/37.jpg)
Regularized BAR (Harvilla & Stern; ICASSP 2015)
37
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
• Replace CBAR’s hard constraint with regularization terms:
minimize
subject to
xc
CBAR objective function:
![Page 38: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/38.jpg)
Regularized BAR (Harvilla & Stern; ICASSP 2015)
38
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
• Replace CBAR’s hard constraint with regularization terms:
minimize xc
![Page 39: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/39.jpg)
Regularized BAR (Harvilla & Stern; ICASSP 2015)
39
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
• Replace CBAR’s hard constraint with regularization terms:
minimize xc
![Page 40: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/40.jpg)
Regularized BAR (Harvilla & Stern; ICASSP 2015)
40
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
• Replace CBAR’s hard constraint with regularization terms:
minimize xc
RBAR objective function: xc can be solved for in closed form!
![Page 41: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/41.jpg)
Regularized BAR (Harvilla & Stern; ICASSP 2015)
41
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
• Replace CBAR’s hard constraint with regularization terms:
Frame-specific solution: xc can be solved for in closed form!
![Page 42: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/42.jpg)
Regularized BAR (Harvilla & Stern; ICASSP 2015)
42
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
• The t0 and t1 terms are target vectors. • They “float” above the clipped segments at the target
amplitude. • They are defined as a function of the fraction of clipped
samples in a frame.
![Page 43: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/43.jpg)
Regularized BAR (Harvilla & Stern; ICASSP 2015)
43
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
0 0.5 1 1.5 2 2.5 3x 10−3
−0.3
−0.1
0.1
0.3
0.5
time (seconds)
ampl
itude
![Page 44: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/44.jpg)
0 0.5 1 1.5 2 2.5 3x 10−3
−0.3
−0.1
0.1
0.3
0.5
time (seconds)
ampl
itude
Regularized BAR (Harvilla & Stern; ICASSP 2015)
44
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
t0
t1
![Page 45: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/45.jpg)
0 0.5 1 1.5 2 2.5 3x 10−3
−0.3
−0.1
0.1
0.3
0.5
time (seconds)
ampl
itude
Regularized BAR (Harvilla & Stern; ICASSP 2015)
45
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
![Page 46: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/46.jpg)
0 0.5 1 1.5 2 2.5 3x 10−3
−0.3
−0.1
0.1
0.3
0.5
time (seconds)
ampl
itude
Regularized BAR (Harvilla & Stern; ICASSP 2015)
46
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
The target amplitudes underestimate the true peak (future research).
![Page 47: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/47.jpg)
Regularized BAR (Harvilla & Stern; ICASSP 2015)
47
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
• Amplitude prediction
0 0.2 0.4 0.6 0.8 10
80
160
240
320
400
fraction of clipped samples
P 95 / τ
exponentialpower−law
ρ: fraction of clipped samples in frame
![Page 48: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/48.jpg)
Processing speed
48
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
20 40 60 80−2 [0.13]
−1 [0.37]
0 [1.00]
1 [2.71]
2 [7.39]
3 [20.1]
4 [54.6]
5 [148]
6 [403]
τ, threshold (percentile)
log(
TRT)
[run
time/
inpu
t dur
atio
n]
CBARKitic−IHTRBAR
![Page 49: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/49.jpg)
Declipping performance
49
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
• Experiment 1 (no additive noise):
15 35 55 75 95 1000
20
40
60
80
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
no declippingRBARKitic−IHTCBAR
![Page 50: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/50.jpg)
Declipping performance
50
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
• Experiment 1 (no additive noise), relative improvements:
15 35 55 75 95−30−20−10
010203040506070
τ, threshold (percentile)
Rel
ativ
e de
crea
se in
WER
(%)
relative to no declippingrelative to RBARrelative to Kitic−IHT
15 35 55 75 95−30−20−10
010203040506070
τ, threshold (percentile)
Rel
ativ
e de
crea
se in
WER
(%)
relative to no declippingrelative to Kitic−IHT
CBAR RBAR
![Page 51: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/51.jpg)
Declipping performance
51
DRC & ASR BAN RED AMT Big Picture Conclusion Introduction BAR
• Experiment 2 (additive noise):
5 10 15 200
20
40
60
80
100
SNR (dB)
Wor
d er
ror r
ate
(%)
5 10 15 200
20
40
60
80
100
SNR (dB)
Wor
d er
ror r
ate
(%)
no declippingRBARCBARKitic−IHT
τ = P75 τ = P95
The location of all clipped samples is assumed known.
Kitic-IHT is more robust to additive noise (future research).
![Page 52: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/52.jpg)
Is audio exposed to
DRC?
Is audio clipped?
Apply BAR
Extract features
Apply BAN
yes
no
yes
no
Receive audio
Robust Estimation of Distortion (RED) • Given a received speech signal, how does one determine if
declipping (BAR) or decompression (BAN) need to be performed?
52
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
![Page 53: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/53.jpg)
Robust Estimation of Distortion (RED) • Given a received speech signal, how does one determine if
declipping (BAR) or decompression (BAN) need to be performed?
53
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
Is audio exposed to
DRC?
Is audio clipped?
Apply BAR
Search for peaks in the probability distribution of the waveform amplitudes
Accurately estimate the value of R (recall: if R is “very” large, speech is effectively clipped)
Requires estimation of which samples are clipped and must assume the possibility of noise (e.g., as in Experiment 2)
✔
✗
✔
![Page 54: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/54.jpg)
Clipped speech detection & τ estimation (Harvilla & Stern; ICASSP 2015)
• Exposure to DRC significantly modifies the waveform amplitude distribution of the speech
54
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
−0.6 −0.4 −0.2 0 0.2 0.4 0.60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
amplitude
probability
−0.6 −0.4 −0.2 0 0.2 0.4 0.60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
amplitude
probability
Uncompressed speech with noise at 15-dB SNR
DRC’ed speech (R=6, τ=0.06) + noise at 15 dB
![Page 55: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/55.jpg)
−0.6 −0.4 −0.2 0 0.2 0.4 0.60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
amplitude
probability
Clipped speech detection & τ estimation (Harvilla & Stern; ICASSP 2015)
• Exposure to DRC significantly modifies the waveform amplitude distribution of the speech
55
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
DRC’ed speech (R=6, τ=0.06) + noise at 15 dB
Clipping detection and τ estimation algorithm: 1. Detect peaks in the
distribution 2. Compute:
3. Output indicates clipping occurrence and amplitude value of τ (0.5*( |-τ| + 0 + |τ| ))
(if output is ∞, no clipping)
![Page 56: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/56.jpg)
Clipped speech detection & τ estimation (Harvilla & Stern; ICASSP 2015)
56
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
Clipped signal detection accuracies
5 10 15 200
20
40
60
80
100
SNR (dB)
Clip
ped
signa
l det
. acc
. (%
)
5 10 15 200
20
40
60
80
100
SNR (dB)Cl
ippe
d sig
nal d
et. a
cc. (
%)
τ = P95 τ = P75
Because the amplitude distribution merges into one lobe (thus, one peak) with decreasing SNR and τ, detection accuracy correspondingly decreases.
![Page 57: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/57.jpg)
Clipped speech detection & τ estimation (Harvilla & Stern; ICASSP 2015)
57
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
SNR = 20 dB SNR = 15 dB
SNR = 10 dB SNR = 5 dB
τ-estimation accuracies for R = ∞
0.03 0.06 0.09 0.12 0.15−0.01
0.02
0.05
0.08
0.11
0.14
0.17
0.2
0.23
τ, actual
τ, e
stim
ate
0.03 0.06 0.09 0.12 0.15−0.01
0.02
0.05
0.08
0.11
0.14
0.17
0.2
0.23
τ, actual
τ, e
stim
ate
0.03 0.06 0.09 0.12 0.15−0.01
0.02
0.05
0.08
0.11
0.14
0.17
0.2
0.23
τ, actual
τ, e
stim
ate
0.03 0.06 0.09 0.12 0.15−0.01
0.02
0.05
0.08
0.11
0.14
0.17
0.2
0.23
τ, actual
τ, e
stim
ate
![Page 58: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/58.jpg)
Clipped sample estimation (Harvilla & Stern; ICASSP 2015)
• Given the amplitude value of τ, how do we determine the location of clipped samples?
58
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
0 0.5 1 1.5 2 2.5 3x 10−3
−0.3
−0.1
0.1
0.3
0.5
time (seconds)
ampl
itude
signal samplesclipping threshold
0 0.5 1 1.5 2 2.5 3x 10−3
−0.3
−0.1
0.1
0.3
0.5
time (seconds)
ampl
itude
Clipped speech, no noise Clipped speech + noise at 10-dB SNR
![Page 59: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/59.jpg)
Clipped sample estimation (Harvilla & Stern; ICASSP 2015)
• Given the amplitude value of τ, how do we determine the location of clipped samples? • Solution:
Given, amplitude value of τ percentile value of τ variance of the additive noise (σw
2) variance of the observed signal (σy
2)
• Model the clean speech and noise with separate Gaussians • For each sample, classify as clipped if
59
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
Pr( clipped|observed sample, τ, σw2, σy
2) > Pr( not clipped|observed sample, τ, σw2, σy
2)
![Page 60: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/60.jpg)
Clipped sample estimation (Harvilla & Stern; ICASSP 2015)
60
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
−0.2 −0.12 −0.04 0.04 0.12 0.20
5.2
10.4
15.6
20.8
26
amplitude
prob
abili
ty d
ensi
ty
clippednot clipped
Speech clipped at τ = 0.07 and added to noise at 15-dB SNR
![Page 61: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/61.jpg)
Clipped sample estimation (Harvilla & Stern; ICASSP 2015)
61
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
0 4 8 12 16 20 2460
70
80
90
100
SNR (dB)
mea
n cl
assif
icat
ion
accu
racy
τ = P95τ = P75τ = P55τ = P35
![Page 62: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/62.jpg)
Is audio exposed to
DRC?
Is audio clipped?
Apply BAR
Extract features
Apply BAN
yes
no
yes
no
Receive audio
Robust Estimation of Distortion (RED) • Given a received speech signal, how does one determine if
declipping (BAR) or decompression (BAN) need to be performed?
62
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
![Page 63: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/63.jpg)
Clipped sample estimation (Harvilla & Stern; ICASSP 2015)
63
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
Apply BAR
Voice activity
detection
Estimation of noise variance
Estimation of τ
percentile
Clipped sample
estimation
Declipping
![Page 64: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/64.jpg)
Clipped sample estimation (Harvilla & Stern; ICASSP 2015)
64
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
• Experiment 2 (additive noise):
τ = P75 τ = P95
The location of all clipped samples is assumed known.
5 10 15 200
20
40
60
80
100
SNR (dB)
Wor
d er
ror r
ate
(%)
5 10 15 200
20
40
60
80
100
SNR (dB)
Wor
d er
ror r
ate
(%)
no declippingRBARCBARKitic−IHT
![Page 65: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/65.jpg)
5 10 15 200
20
40
60
80
100
SNR (dB)
Wor
d er
ror r
ate
(%)
no declippingRBARCBARKitic−IHT
5 10 15 200
20
40
60
80
100
SNR (dB)
Wor
d er
ror r
ate
(%)
Clipped sample estimation (Harvilla & Stern; ICASSP 2015)
65
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
• Experiment 2 (additive noise):
τ = P75 τ = P95
Clipping occurrence and location is detected using RED techniques
![Page 66: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/66.jpg)
5 10 15 200
20
40
60
80
100
SNR (dB)
Wor
d er
ror r
ate
(%)
← clipped signal detection accuracy
no declippingRBARCBARKitic−IHT
5 10 15 200
20
40
60
80
100
SNR (dB)
Wor
d er
ror r
ate
(%)
← clipped signal detection accuracy
Clipped sample estimation (Harvilla & Stern; ICASSP 2015)
66
DRC & ASR BAN BAR AMT Big Picture Conclusion Introduction RED
• Experiment 2 (additive noise):
τ = P75 τ = P95
Clipping occurrence and location is detected using RED techniques
![Page 67: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/67.jpg)
Artificially-Matched Training (AMT) • So far, the developed techniques have sought to repair
clipped, compressed and noisy speech to “look like” clean speech:
67
DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT
noisy observations
compensation
clean models
![Page 68: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/68.jpg)
Artificially-Matched Training (AMT) • Ultimately, it’s only important for the Acoustic Model and
testing data conditions to match. They both need not be “clean.”
68
DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT
noisy observations
noisy models
![Page 69: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/69.jpg)
Artificially-Matched Training (AMT)
69
DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT
• Experiment 1 (no additive noise):
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
R = ∞R = 20R = 10R = 6R = 4R = 2R = 1
Clean training
![Page 70: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/70.jpg)
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
R = ∞R = 20R = 10R = 6R = 4R = 2R = 1
Artificially-Matched Training (AMT)
70
DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT
• Experiment 1 (no additive noise):
DRC-matched training
![Page 71: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/71.jpg)
Artificially-Matched Training (AMT)
71
DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT
• One approach to achieving this in practice:
xn
MFCC ASR WER
Regression on DRC parameters
{R,τ}
{Rk-1, τk-1} {R1, τ1} {R0, τ0} … Bank of
acoustic models
Artificially-Matched Training with Acoustic Model Selection (AMT-AMS)
Current implementation uses the following parameter sets: R = {∞} τ = {P15, P35, P55, P75, P95}
![Page 72: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/72.jpg)
Artificially-Matched Training (AMT)
72
DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT
• Experiment 1 (no additive noise):
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
R = ∞R = 20R = 10R = 6R = 4R = 2R = 1
Clean training
![Page 73: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/73.jpg)
15 35 55 75 95 100
102030405060708090
100
τ, threshold (percentile)
Wor
d er
ror r
ate
(%)
R = ∞R = 20R = 10R = 6R = 4R = 2R = 1
Artificially-Matched Training (AMT)
73
DRC & ASR BAN BAR RED Big Picture Conclusion Introduction AMT
• Experiment 1 (no additive noise):
AMT-AMS
![Page 74: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/74.jpg)
x[n] y[n]
+
w[n]
SNR in dB drawn from N(µ,σ2)
τ drawn uniformly in [τ0,τ1]
Compress with probability pc
Add noise with probability pn
R drawn from Gamma dist., [kR,θR]
The Big Picture • With no knowledge of the noise conditions and
characteristics of the incoming speech, how well does the combination of algorithms from the thesis work in practice?
74
DRC & ASR BAN BAR RED AMT Conclusion Introduction Big Picture
pc = 0.9 t0 = 60 t1 = 98 pn = 0.75 µ = 20 σ2 = 25 k = 3 θ = 2
Compression
![Page 75: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/75.jpg)
The Big Picture
75
DRC & ASR BAN BAR RED AMT Conclusion Introduction Big Picture
Compression
12
19
26
33
40
Wor
d er
ror r
ate
(%)
none
RBAN BAN
![Page 76: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/76.jpg)
x[n] y[n]
+
w[n]
SNR in dB drawn from N(µ,σ2)
τ drawn uniformly in [τ0,τ1]
Clip with probability pc
Add noise with probability pn
The Big Picture • With no knowledge of the noise conditions and
characteristics of the incoming speech, how well does the combination of algorithms from the thesis work in practice?
76
DRC & ASR BAN BAR RED AMT Conclusion Introduction Big Picture
pc = 0.9 t0 = 60 t1 = 98 pn = 0.75 µ = 20 σ2 = 25
Clipping
![Page 77: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/77.jpg)
12
19
26
33
40
Wor
d er
ror r
ate
(%)
none
RBAR
CBARKitic−IHT
AMT−AMS AMT−AMS
(RBAR)
The Big Picture
77
DRC & ASR BAN BAR RED AMT Conclusion Introduction Big Picture
Clipping
![Page 78: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/78.jpg)
Summary & Conclusions • A previously-unexplored problem in speech recognition, DRC,
was introduced. • Novel solutions to the two primary aspects of the problem,
clipping and compression, were developed. • Techniques for detecting the occurrence of DRC were
considered. • A comprehensive solution to DRC for speech recognition was
proposed. • DRC, especially in noise, is a very hard problem, but this
thesis lays the groundwork for very promising future research.
78
DRC & ASR BAN BAR RED AMT Big Picture Introduction Conclusion
![Page 79: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/79.jpg)
Summary & Conclusions • Areas of future research include: • Improving target amplitude estimates for RBAR [BAR] • Improving the robustness of BAR methods to additive noise [BAR] • Improving the robustness of clipped/compressed signal detection to
low-valued SNR and τ [RED, Big Picture] • Development of an R-estimation algorithm [RED, Big Picture] • Further investigation of the performance of AMT-AMS with an
increasing granularity of acoustic model references [AMT]
79
DRC & ASR BAN BAR RED AMT Big Picture Introduction Conclusion
![Page 80: COMPENSATION FOR NONLINEAR DISTORTION IN ...Ph.D. Thesis Defense October 27, 2014 Introduction 2 Topic Symbol Fraction of thesis work Dynamic range compression (DRC) and automatic](https://reader033.vdocuments.mx/reader033/viewer/2022050113/5f4abb834373311cbd6215b1/html5/thumbnails/80.jpg)
Thank you! • Questions?
80
DRC & ASR BAN BAR RED AMT Big Picture Introduction Conclusion