spatial covariance models for under-determined reverberant audio source separation

14
Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation N. Duong, E. Vincent and R. Gribonval METISS project team, IRISA/INRIA, France Oct. 2009 1

Upload: gemma-lyons

Post on 31-Dec-2015

27 views

Category:

Documents


1 download

DESCRIPTION

Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation. N. Duong, E. Vincent and R. Gribonval METISS project team, IRISA/INRIA, France Oct. 2009. Content. Under-determined source separation Spatial covariance models Model parameter estimation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Spatial Covariance Models For Under-Determined Reverberant

Audio Source Separation

N. Duong, E. Vincent and R. Gribonval

METISS project team, IRISA/INRIA, FranceOct. 2009

1

Page 2: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Content

2

Under-determined source separation

Spatial covariance models

Model parameter estimation

Experimental evaluation

Conclusion

Page 3: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Under-determined source separation

3

Use recorded mixture signals to separate sources , where

Convolutive mixing model: Denote the vector of mixing filters from source to microphone array, the contribution of to all microphones and the vector of mixture signals

are computed as:

img

1

( ) ( )J

jj

t t

x s

img ( ) ( ) ( )j j jt s t

s h

1( ) ( ),..., ( )T

It x t x tx( )js t

IJ I J

1 ,...,T

j j Ijh h hJ

( )js timg ( )j ts

Page 4: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

BSS approaches

Sparsity assumption: only FEW sources are active at each time-frequency point

Binary masking: only ONE source is active at each time-frequency point

L1-norm minimization:

Short-term Fourier

transform

img ( )

( ) ( )

j t

j jj

t s t s

x h

( , )

( , ) ( ) ( , )

imgj n f

j jj

n f f s n f s

x h

,n f

( , ) 1

arg min ( , ) , s.t. j

J

js n f j

s n f

4

Page 5: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Beamforming model

is denoted as and approximated by the distance between each source to microphones [T. Gustafsson et.al.], i.e. in stereo mixture:

Covariance matrix of source images2

( )( , )

( , ) ( , ) ( ) ( )imgj

jj

Hj j j

fv n f

n f s n f f fs

R

R a a

Spatial covariance matrix (rank 1) modeling the mixing

process

( )j fa( )j fh

1

2

2

1

2

2

1 4( )

1 4

j

j

i f r c

j

j i f r c

j

r ef

r e

a

5

Sourcevariance

Page 6: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Spatial covariance models

6

Purpose of the paper: explore the extension of Gaussian framework, i.e. and , that better account for reverberation

We evaluate potential separation performance by estimating the spatial model parameter from training data

Source separation by Wiener filtering

Models for spatial covariance matrix: Rank-1 convolutive model Rank-1 anechoic model Full-rank direct+diffuse model Full-rank unconstrained model.

( , ) ( , ) ( )imgj

j jn f v n f fsR R ( , ) 0, ( , )j js n f N v n f

1imgˆ ( , ) ( , ) ( ) ( , ) ( ) ( , )j j j j jjn f v n f f v n f f n f

s R R x

Page 7: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Rank-1 models

7

Rank-1 anechoic model

Where is steering vector specified in the beamforming approach

Rank-1 convolutive model

Where is the Fourier transform of the mixing filters

( ) ( ) ( )Hj j jf f fR h h

( )j fh ( )j h

( ) ( ) ( )Hj j jf f fR a a

( )j fa

Page 8: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Full-rank direct+diffuse model

8

Assuming that the direct part and the reverberant part are uncorrelated and the reverberant part is diffuse

where and can be specified from statistical room acoustic, i.e. depends on the microphone distance , wall area , and wall reflection coefficient

- In the rectangular room:

2

covariance of direct partcovariance of reverberant part

1 ( , )( ) ( ) ( )

( , ) 1H

j j j rev

d ff f f

d f

R a a

2rev ( , )d f

sin(2 / )( , )

(2 / )

fd cd f

fd c

2

2

2

4

1rev

A

d A

Page 9: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Full-rank unconstrained model

9

- A more general model than the previous models where the coefficients of are not related a priori

- Allows more flexible modeling of the mixing process since the reverberation part is rarely diffuse and is correlated with the direct part in practice

- Expected to improve separation performance of real-worldconvolutive mixtures.

( )j fR

Page 10: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Model parameter estimation

We investigate the potential separation performance achievable via each model in:

Semi-blind context: Spatial covariance matrices are estimated from true source images but source variances are blindly estimated from the mixture in the ML sense

Where is the Kullback-Leibler (KL) divergence between the empirical covariance matrices and the model-based matrices.

Oracle context: Both and are estimated from the true source images.

( )j fR

( , )jv n f

( )j fR ( , )jv n f

( , ), ( )

ˆ ( , ) | ( , ) ( )arg minj j

KL j jjv n f f

D n f v n f fxR

R R

|KLD

Page 11: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Experiment Purpose:- Compare the source separation performance of the model-based algorithms

- Criteria: SDR, SIR, SAR

r

s1s2

s3

m1 m2

1.8m

1.5m

Room dimensions: 4.45 x 3.35 x 2.5 mSource and microphone height: 1.4 mMicrophone distance: d = 20 cm or 5 cmSource-to-microphone distance: 120 cm or 50 cm

Experimental setup:

- Speech length: 5 seconds

- Sampling rate: 16 kHz

- Sine window for STFT with length of 1024 taps

11

Page 12: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Experimental result

Context Covariance modelNumber of

spatial parameters

SDR SIR SAR

Semi-blind

Rank-1 anechoic 6 0.9 1.7 4.9Rank-1 convolutive 3078 4.0 6.4 6.5

Full-rank direct+diffuse

8 3.1 6.1 5.6

Full-rank unconstrained

6156 5.8 10.3 7.9

Binary masking 3078 3.3 10.3 2.9L1-norm minimization 3078 2.4 8.1 3.8

Oracle

Rank-1 anechoic 6 0.4 4.4 7.0Rank-1 convolutive 3078 4.2 10.2 5.3

Full-rank direct+diffuse

8 10.217.311.4

Full-rank unconstrained

6156 10.917.912.1

12

Page 13: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Conclusion

13

- Proposed to model the convolutive mixing process by full-rank spatial covariance matrices

- Experimental results confirm that full-rank spatial covariance matrices better account for reverberation and potentially improve separation performance compared to rank-1 matrices.

Work in progress- Validated the power of the proposed algorithms over real-world recordings with small source movement (demo session)- Blind context: learning the model parameters from the recorded mixture (submitted to ICASSP 2010 ).

Future work:- Consider separation of diffuse and semi-diffuse sources

Page 14: Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Thanks for your attention!See you again in the demo session

tonight & Your comments…?

14