![Page 1: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/1.jpg)
IWAENC 2016
Soumitro Chakrabarty, Deepth Pilakeezhu, Emanuël Habets
HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED SINGLE CHANNEL
SPEECH ENHANCEMENT
![Page 2: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/2.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
2
Head Orientation
§ Three degrees of freedom for head movement
§ Propagation of sound waves in the horizontal plane
§ Orientation of interest: YAW
![Page 3: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/3.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
3
Motivation
§ Similar problems can be witnessed in hands-free communication systems, speech based human machine interfaces etc.
![Page 4: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/4.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
4
Motivation
§ Human speakers radiate sound primarily to the front [1] § Also, the radiation pattern is frequency dependent [1]
[1] H. K. Dunn and D. W. Farnsworth, “Exploration of pressure field around the human head during speech,” Journal Acoust. Soc. of America, vol. 10, no. 1, pp. 83–83, 1938.
![Page 5: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/5.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
5
Sound Radiation Pattern Frequency Dependency
0.5 1 1.5 2X [m]
0.5
1
1.5
2
Y [m
]
0
5
10
15
20
0.5 1 1.5 2X [m]
0.5
1
1.5
2Y
[m]
0
5
10
15
20
(a) 100 Hz (b) 3 kHz
§ Generated using spherical microphone impulse response generator (SMIRgen) [2]
[2] D. P. Jarrett, E. A. P. Habets, M. R. P. Thomas, and P. A. Naylor, “Rigid sphere room impulse response simulation: Algorithm and applications,” Journal Acoustical Society of America, vol. 132, pp. 1462, 2012.
![Page 6: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/6.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
6
Speaker-Microphone Setup
§ Aim: Compensate for the reduction in sound energy due to the relative orientation of the speaker with respect to the microphone, while attenuating the noise.
Speaker Microphone
0◦
90◦
180◦
270◦
0�
![Page 7: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/7.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
7
Problem Formulation
§ Microphone signal (STFT Domain) Y (n, k) = H(✓, k)S(n, k) + V (n, k)
0◦
θ
270◦
90◦
180◦
Y (n, k)S(n, k)0◦
θ
270◦
90◦
180◦
![Page 8: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/8.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
8
Problem Formulation
§ Microphone signal (STFT Domain) Y (n, k) = H(✓, k)S(n, k) + V (n, k)
0◦
θ
270◦
90◦
180◦
Y (n, k)S(n, k)0◦
θ
270◦
90◦
180◦
Orientation-dependent ATF
Head orientation
![Page 9: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/9.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
9
Problem Formulation
§ Microphone signal (STFT Domain) Y (n, k) = H(✓, k)S(n, k) + V (n, k)
0◦
θ
270◦
90◦
180◦
Y (n, k)S(n, k)0◦
θ
270◦
90◦
180◦
Source signal
Head orientation
![Page 10: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/10.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
10
Problem Formulation
§ Microphone signal (STFT Domain) Y (n, k) = H(✓, k)S(n, k) + V (n, k)
0◦
θ
270◦
90◦
180◦
Y (n, k)S(n, k)0◦
θ
270◦
90◦
180◦
Noise
Head orientation
![Page 11: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/11.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
11
Problem Formulation
§ Microphone signal (STFT Domain)
§ Can be formulated as
Y (n, k) = H(✓, k)S(n, k) + V (n, k)
0◦
θ
270◦
90◦
180◦
Y (n, k)S(n, k)
Y (n, k) = A(✓, k)X(n, k) + V (n, k)
0◦
θ
270◦
90◦
180◦
![Page 12: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/12.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
12
Problem Formulation
§ Microphone signal (STFT Domain)
§ In terms of attenuation
with and
Y (n, k) = H(✓, k)S(n, k) + V (n, k)
0◦
θ
270◦
90◦
180◦
Y (n, k)S(n, k)
Y (n, k) = A(✓, k)X(n, k) + V (n, k)
A(✓, k) =H(✓, k)
H(0, k)X(n, k) = H(0, k)S(n, k)
0◦
θ
270◦
90◦
180◦
Attenuation factor
![Page 13: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/13.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
13
Problem Formulation
§ Microphone signal (STFT Domain)
§ In terms of attenuation
with and
Y (n, k) = H(✓, k)S(n, k) + V (n, k)
0◦
θ
270◦
90◦
180◦
Y (n, k)S(n, k)
Y (n, k) = A(✓, k)X(n, k) + V (n, k)
A(✓, k) =H(✓, k)
H(0, k)X(n, k) = H(0, k)S(n, k)
0◦
θ
270◦
90◦
180◦
Attenuation factor Desired Signal
![Page 14: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/14.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
14
Orientation Compensation Filter Derivation
§ Assuming all signal components to be independent
�Y (n, k) = |A(✓, k)|2�X(n, k) + �V (n, k)
![Page 15: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/15.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
15
Orientation Compensation Filter Derivation
§ Assuming all signal components to be independent
�Y (n, k) = |A(✓, k)|2�X(n, k) + �V (n, k)
Microphone signal PSD
Desired signal PSD
Noise PSD
![Page 16: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/16.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
16
Orientation Compensation Filter Derivation
§ Assuming all signal components to be independent
§ Estimate of desired signal
�Y (n, k) = |A(✓, k)|2�X(n, k) + �V (n, k)
X(n, k) = W (✓, k)Y (n, k)
![Page 17: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/17.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
17
Orientation Compensation Filter Derivation
§ Assuming all signal components to be independent
§ Estimate of desired signal
§ Using MMSE criterion
§ Solution:
�Y (n, k) = |A(✓, k)|2�X(n, k) + �V (n, k)
X(n, k) = W (✓, k)Y (n, k)
W (✓, k) = arg minW
E{|WY (n, k)�X(n, k)|2}
W (✓, k) =|A(✓, k)|�X
|A(✓, k)|2�X + �V
![Page 18: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/18.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
18
Orientation Compensation Filter Orientation Dependent Gain § Defining orientation dependent gain
G(✓, k) = |A(✓, k)|�1
![Page 19: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/19.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
19
Orientation Compensation Filter Orientation Dependent Gain § Defining orientation dependent gain
§ Solution: G(✓, k) = |A(✓, k)|�1
W (✓, k) = G(✓, k) · �X
�X +G2(✓, k)�V
![Page 20: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/20.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
20
Orientation Compensation Filter Orientation Dependent Gain § Defining orientation dependent gain
§ Solution: G(✓, k) = |A(✓, k)|�1
W (✓, k) = G(✓, k) · �X
�X +G2(✓, k)�V
Y (n, k)
G(θ, k)
X(n, k)
W (θ, k)
ReductionNoise
![Page 21: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/21.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
21
Orientation Compensation Filter Orientation Dependent Gain § Defining orientation dependent gain
§ Solution: G(✓, k) = |A(✓, k)|�1
W (✓, k) = G(✓, k) · �X
�X +G2(✓, k)�V
Need to be estimated
Y (n, k)
G(θ, k)
X(n, k)
W (θ, k)
ReductionNoise
![Page 22: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/22.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
22
Head Orientation Estimation Video-based Method
§ Camera is co-located with the microphone
Speaker Microphone
0◦
90◦
180◦
270◦
Camera
0�
![Page 23: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/23.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
23
Head Orientation Estimation Video-based Method
§ Camera is co-located with the microphone § Proprietary software from Fraunhofer IIS, SHORETM, is used to obtain a
single orientation estimate at each time frame
Speaker Microphone
0◦
90◦
180◦
270◦
Camera
0�
![Page 24: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/24.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
24
Head Orientation Estimation Video-based Method
§ Video-based orientation estimation § Proprietary software from Fraunhofer IIS, SHORETM, is used to obtain a
single orientation estimate at each time frame n
§ Current limitation: We do not obtain estimates in the range of [90�, 270�]
Speaker Microphone
0◦
90◦
180◦
270◦
Camera
Speaker Microphone
0◦
90◦
180◦
270◦
Camera
Speaker Microphone
0◦
90◦
180◦
270◦
Camera
In this work, we assume the orientation to be known
![Page 25: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/25.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
25
Orientation Dependent Gain § Use SMIRgen as a mouth simulator to compute a gain table § Head is modeled as a rigid sphere
§ Mouth is an omnidirectional point source placed on the sphere
§ Orientation dependent gain is selected from the pre-computed gain table, at each time frame, based on the current estimate of the orientation
![Page 26: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/26.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
26
Gain Table Computation
0◦
90◦
180◦
270◦I discrete points
(1) Sample the orientation range at points
I
![Page 27: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/27.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
27
Gain Table Computation
0◦270◦
90◦
180◦
θi
(1) Sample the orientation range at points (2) Compute the ATF at each
✓i
I
![Page 28: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/28.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
28
Gain Table Computation (1) Sample the orientation range at points (2) Compute the ATF at each (3) Compute the corresponding gain at each point, for each bin, as
0◦270◦
90◦
180◦
θi
✓i
I
G(✓i, k) =
�����H(✓i, k)
H(0, k)
�����
�1
G(✓, k) = |A(✓, k)|�1follows from
![Page 29: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/29.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
29
Gain Table Computation
G(✓i, k) =
�����H(✓i, k)
H(0, k)
�����
�1
G(✓, k) = |A(✓, k)|�1follows from
(1) Sample the orientation range at points (2) Compute the ATF at each (3) Compute the corresponding gain at each point, for each bin, as
(4) The gain table is a matrix of size .
✓i
I
I ⇥K
![Page 30: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/30.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
30
Gain Table
Orientation (◦)0 60 120 180 240 300
Frequ
ency
[kHZ]
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
9
10
Figure: Gain table computed with SMIRgen for an anechoic environment
0◦
90◦
180◦
270◦
![Page 31: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/31.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
31
Gain Table
Orientation (◦)0 60 120 180 240 300
Frequ
ency
[kHZ]
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
9
10
Figure: Gain table computed with SMIRgen for an anechoic environment
§ Can be computed for reverberant environments using SMIRgen § Can be computed using measured ATFs
![Page 32: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/32.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
32
System Overview
Y (n, k) X(n, k)
Orientation-dependentgain computation
G(θ, k)
Noise and Signal
φV , φX
power estimation
Orientationcompensation
filter
Orientationestimation
θOrientation (◦)
0 60 120 180 240 300
Frequ
ency
[kHZ]
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
9
10
![Page 33: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/33.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
33
Experimental Results Measured RIRs § Measurement setup
§ Room size: § Source-to-microphone distance: 1 m § s
4.55m⇥ 4.45m⇥ 2.55m
T60 = 0.17
Speaker Microphone
0◦
90◦
180◦
270◦
KEMAR Dummy Head
![Page 34: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/34.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
34
Experimental Results Measured RIRs § Measurement setup
§ Room size: § Source-to-microphone distance: 1 m § s
§ Stationary white noise with iSNR = 20 dB § STFT parameters: 16 kHz sampling rate, frame length of 1024 samples
with 50% overlap § Noise PSD: estimated from silent frames § Desired signal PSD: Decision directed approach [3]
4.55m⇥ 4.45m⇥ 2.55m
T60 = 0.17
[3] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. 33, no. 2, pp. 443–445, 1985.
![Page 35: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/35.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
35
Experimental Results Measured RIRs
§ Results presented for three different gain table computations with resolution of 30 degrees, plus for only noise reduction
0◦
90◦
180◦
270◦I discrete points
![Page 36: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/36.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
36
Experimental Results Measured RIRs
§ Results presented for three different gain table computations with resolution of 30 degrees, plus for only noise reduction § NR: Only noise reduction, no application of orientation related gain
0◦
90◦
180◦
270◦I discrete points
![Page 37: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/37.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
37
Experimental Results Measured RIRs
§ Results presented for three different gain table computations with resolution of 30 degrees, plus for only noise reduction § NR: Only noise reduction, no application of orientation related gain § AG: Assuming anechoic environment (using SMIRgen)
0◦
90◦
180◦
270◦I discrete points
![Page 38: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/38.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
38
Experimental Results Measured RIRs
§ Results presented for three different gain table computations with resolution of 30 degrees, plus for only noise reduction § NR: Only noise reduction, no application of orientation related gain § AG: Assuming anechoic environment (using SMIRgen) § RGSA: Spatially averaged reverberant gain (using SMIRgen)
0◦
90◦
180◦
270◦I discrete points
![Page 39: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/39.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
39
Experimental Results Measured RIRs
§ Results presented for three different gain table computations with resolution of 30 degrees, plus for only noise reduction § NR: Only noise reduction, no application of orientation related gain § AG: Assuming anechoic environment (using SMIRgen) § RGSA: Spatially averaged reverberant gain (using SMIRgen) § RGMES: Using measured ATFs
0◦
90◦
180◦
270◦I discrete points
![Page 40: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/40.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
40
Experimental Results Measured RIRs
(a) PESQ Improvement (b) meanLSD
0 60 120 180 240 300 3600.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Orientation (◦)
∆PESQ
0 60 120 180 240 300 3600
1
2
3
4
5
6
Orientation (◦)mLSD
[dB]
0 60 120 180 240 300 3600
1
2
3
4
5
6
Orientation (◦)mLSD
[dB]
0 60 120 180 240 300 3600
1
2
3
4
5
6
Orientation (◦)mLSD
[dB]
NR: Noise Reduction
0◦
90◦
180◦
270◦
![Page 41: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/41.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
41
Experimental Results Measured RIRs
0 60 120 180 240 300 3600.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Orientation (◦)
∆PESQ
0 60 120 180 240 300 3600
1
2
3
4
5
6
Orientation (◦)mLSD
[dB]
0 60 120 180 240 300 3600
1
2
3
4
5
6
Orientation (◦)mLSD
[dB]
(a) PESQ Improvement (b) meanLSD
0◦
90◦
180◦
270◦
AG: Anechoic Gain (SMIRgen)
![Page 42: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/42.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
42
Experimental Results Measured RIRs
0 60 120 180 240 300 3600.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Orientation (◦)
∆PESQ
0 60 120 180 240 300 3600
1
2
3
4
5
6
Orientation (◦)mLSD
[dB]
(a) PESQ Improvement (b) meanLSD
0◦
90◦
180◦
270◦
RGSA: Spatially Averaged Reverberatn Gain (SMIRgen)
![Page 43: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/43.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
43
Experimental Results Measured RIRs
0 60 120 180 240 300 3600.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Orientation (◦)
∆PESQ
0 60 120 180 240 300 3600
1
2
3
4
5
6
Orientation (◦)mLSD
[dB]
0 60 120 180 240 300 3600
1
2
3
4
5
6
Orientation (◦)mLSD
[dB]
0 60 120 180 240 300 3600
1
2
3
4
5
6
Orientation (◦)mLSD
[dB]
0 60 120 180 240 300 3600
1
2
3
4
5
6
Orientation (◦)mLSD
[dB]
(a) PESQ Improvement (b) meanLSD
0◦
90◦
180◦
270◦
RGMES: Measured ATFs
![Page 44: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/44.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
44
Audio Examples Measured RIRs
0◦
90◦
180◦
270◦
0◦180◦
θ = 150◦
270◦
90◦
0◦
90◦
180◦
270◦RGMES gain
0◦
90◦
180◦
270◦Noise Reduction Only
![Page 45: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/45.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
45
Conclusions and Outlook § A single channel speech enhancement framework, that incorporates
head orientation information was presented
§ Experimental results provided motivation for further exploring the significance head orientation information for speech enhancement
§ Current research focuses on developing methods to learn the attenuation characteristics due to the orientation of the speaker to perform the compensation
§ Future work would involve relaxing the constraints of the current system, and develop a method more suitable for a practical setting
![Page 46: HEAD ORIENTATION COMPENSATION WITH VIDEO-INFORMED … · 2017. 8. 25. · Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement 4 Motivation § Human](https://reader035.vdocuments.mx/reader035/viewer/2022070922/5fbaca8047ec3858c329a500/html5/thumbnails/46.jpg)
© AudioLabs, 2016
Soumitro Chakrabarty Head Orientation Compensation with Video-Informed Single Channel Speech Enhancement
46
Thank you for your attention
Questions?