blues from music · microsoft powerpoint - blues from music.ppt author: str_msp created date:...

16
BLUES from Music: BLind Underdetermined Extraction of Sources from Music Michael Syskind Pedersen Tue Lehn-Schiøler Jan Larsen IMM, Technical University of Denmark ICA2006, Charleston, SC, USA

Upload: others

Post on 08-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

BLUES from Music:BLind Underdetermined Extraction of Sources

from Music

Michael Syskind Pedersen Tue Lehn-Schiøler

Jan LarsenIMM, Technical University of Denmark

ICA2006, Charleston, SC, USA

Page 2: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Separating music into basic components

Michael Syskind Pedersen, IMM, Technical University of Denmark

Page 3: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Motivation: Why separating music?

• Music Transcription• Identifying instruments• Identify vocalist

Michael Syskind Pedersen, IMM, Technical University of Denmark

Page 4: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Assumptions• Stereo recording of the music piece is

available.• The instruments are separated to some

extent in time and in frequency, i.e. the instruments are sparse in the time-frequency (T-F) domain.

• The different instruments originate from spatially different directions.

Michael Syskind Pedersen, IMM, Technical University of Denmark

Page 5: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Separation principle 1: T-F masking

Page 6: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Stereo channel 1 Stereo channel 2

Gain difference between channels

Page 7: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Separation principle 2: ICA

sources mixedsignals

recovered source signals

mixing

x = As

separation

ICAy = Wx

What happens if a 2-by-2 separation matrix W is applied

to a 2-by-N mixing system?

Michael Syskind Pedersen, IMM, Technical University of Denmark

Page 8: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

ICA on stereo signals

• We assume that the mixture can be modeled as an instantaneous mixture, i.e.

• The ratio between the gains in each column in the mixing matrix corresponds to a certain direction.

⎥⎦

⎤⎢⎣

⎡=

)()()()(

)(212

111

N

N

rrrr

Aθθθθ

θL

LsAx N ), ... ,( 1 θθ=

Michael Syskind Pedersen, IMM, Technical University of Denmark

Page 9: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Direction dependent gain|)(|log20)( θWAθr =

When W is applied, the two separated channels each contain a group of sources, which is as independent as possible from the other channel.

Michael Syskind Pedersen, IMM, Technical University of Denmark

Page 10: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

x1 x2

ICA

STFT STFT

y1 y2

Y1(t, f) Y2(t, f)

⎩⎨⎧ >

= otherwise 0

when 1

c / YYBM 21

1⎩⎨⎧ >

= otherwise 0

when 1

c / YYBM 12

2

X1(t,f)

BM1 BM2

x1(2) x2

(2)

ICA+BM Separator

^ ^

Combining ICA and T-F masking

ISTFT

X2(t,f)

ISTFT

X1(t,f)

x1(2) x2

(2)^ ^ISTFT

X2(t,f)

ISTFT

Page 11: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Method applied iterativelyx1 x2

ICA+BM

ICA+BM ICA+BM

ICA+BM ICA+BM

Michael Syskind Pedersen, IMM, Technical University of Denmark

Page 12: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Improving method• The assumption of

instantaneous mixing may not always hold.

• Assumption can be relaxed.

• Separation procedure is continued until very sparse masks are obtained.

• Masks that mainly contain the same source are afterwards merged.

ICA+BM

ICA+BM

ICA+BM

ICA+BM

ICA+BM ICA+BM ICA+BM

ICA+BM ICA+BM ICA+BM ICA+BM ICA+BM ICA+BM ICA+BM ICA+BM

ICA+BMICA+BMICA+BMICA+BMICA+BMICA+BMICA+BMICA+BM ICA+BMICA+BMICA+BMICA+BM ICA+BMICA+BMICA+BMICA+BM

ICA+BM ICA+BM ICA+BM ICA+BM ICA+BM ICA+BM ICA+BM ICA+BMICA+BM ICA+BM ICA+BM ICA+BM

ICA+BM ICA+BM ICA+BM ICA+BMICA+BM ICA+BM ICA+BM ICA+BM ICA+BM ICA+BM ICA+BM ICA+BMICA+BM ICA+BM ICA+BM ICA+BMICA+BM ICA+BM ICA+BM ICA+BM

Michael Syskind Pedersen, IMM, Technical University of Denmark

Page 13: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Mask mergingIf the signals in the time domain are correlated, their corresponding masks are merged.

The resulting signal from the merged mask is of higher quality.

Page 14: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Results• Evaluation on real stereo music

recordings, with the stereo recording of each instrument available, before mixing.

• We find the correlation between the obtained sources and the by the ideal binary mask obtained sources.

• Other segregated music examples are available online.

Michael Syskind Pedersen, IMM, Technical University of Denmark

Page 15: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Bas

s

Bas

s D

rum

Gui

tar d

Gui

tar f

Sna

re D

rum

Output1

72% 92%

3% 1% 17%

Output2 5% 1%

55%

4% 14%

Output3 9% 4% 9%

72% 21%

Remaining

14% 3%

32% 23% 48%

% of power 46% 27% 1% 7% 7%

Results

• The segregated outputs are dominated by individual instruments

• Some instruments cannot be segregated by this method, because they are not spatially different.

Michael Syskind Pedersen, IMM, Technical University of Denmark

Page 16: BLUES from Music · Microsoft PowerPoint - BLUES from Music.ppt Author: str_msp Created Date: 3/14/2006 10:58:18 AM

Conclusion and future work• We have presented an unsupervised method for

segregation of single instruments or vocal sound from stereo music.

• Our method is based on combining ICA and T-F masking.

• The segregated signals are maintained in stereo.• Only spatially different signals can be segregated

from each other. • The proposed framework may be improved by

combining the method with single channel separation methods.

Michael Syskind Pedersen, IMM, Technical University of Denmark