source separation tutorial mini-series iii: extensions and ...njb/teaching/sstutorial/part3.pdf ·...

131
Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative Matrix Factorization Nicholas Bryan Dennis Sun Center for Computer Research in Music and Acoustics, Stanford University DSP Seminar April 9th, 2013

Upload: others

Post on 25-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Source Separation Tutorial Mini-Series III:Extensions and Interpretations to Non-Negative

Matrix Factorization

Nicholas BryanDennis Sun

Center for Computer Research in Music and Acoustics,Stanford University

DSP SeminarApril 9th, 2013

Page 2: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Roadmap of Talk

1 Review

2 Further Insight

3 Supervised and Semi-Supervised Separation

4 Probabilistic Interpretation

5 Extensions

6 Evaluation

7 Future Research Directions

8 Matlab

Page 3: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Roadmap of Talk

1 Review

2 Further Insight

3 Supervised and Semi-Supervised Separation

4 Probabilistic Interpretation

5 Extensions

6 Evaluation

7 Future Research Directions

8 Matlab

Page 4: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Non-Negative Matrix Factorization

Data[V

]≈

Basis Vectors[W

] Weights[H

]

• A matrix factorization where everything is non-negative

• V ∈ RF×T+ - original non-negative data

• W ∈ RF×K+ - matrix of basis vectors, dictionary elements

• H ∈ RK×T+ - matrix of activations, weights, or gains

• K < F < T (typically)• A compressed representation of the data• A low-rank approximation to V

Page 5: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Non-Negative Matrix Factorization

Data[V

]≈

Basis Vectors[W

] Weights[H

]

• A matrix factorization where everything is non-negative

• V ∈ RF×T+ - original non-negative data

• W ∈ RF×K+ - matrix of basis vectors, dictionary elements

• H ∈ RK×T+ - matrix of activations, weights, or gains

• K < F < T (typically)• A compressed representation of the data• A low-rank approximation to V

Page 6: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

NMF With Spectrogram Data

V ≈ W H

NMF of Mary Had a Little Lamb with K = 3 play stop

• The basis vectors capture prototypical spectra [SB03]

• The weights capture the gain of the basis vectors

Page 7: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Factorization Interpretation I

Columns of V ≈ as a weighted sum (mixture) of basis vectors

v1 v2 ... vT

≈ K∑j=1

Hj1 wj

K∑j=1

Hj2 wj ...K∑j=1

HjT wj

Page 8: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Factorization Interpretation II

V is approximated as sum of matrix “layers”

= + +

v1 v2 . . . vT

≈w1 w2 . . . wK

hT

1

hT

2...

hT

K

V ≈ w1 h

T1 +w2 h

T2 + . . .+ wK hT

K

Page 9: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

General Separation Pipeline

1 STFT

2 NMF

3 FILTER

4 ISTFT

NMF

STFT

ISTFT

FILTERx

\XxS

x1

x2ISTFT

ISTFT

...

W H,

|X1|

|X2|

|XS |V = |X|

Page 10: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

An Algorithm for NMF

Algorithm KL-NMF

initialize W,Hrepeat

H← H .∗WT V

WH

WT 1

W←W .∗V

WHHT

1HT

until convergence return W,H

Page 11: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Roadmap of Talk

1 Review

2 Further Insight

3 Supervised and Semi-Supervised Separation

4 Probabilistic Interpretation

5 Extensions

6 Evaluation

7 Future Research Directions

8 Matlab

Page 12: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Non-Negativity

• Question: Why do we get a ’parts-based’ representation ofsound?

• Answer: Non-negativity avoids destructive interference

Page 13: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Non-Negativity

• Question: Why do we get a ’parts-based’ representation ofsound?

• Answer: Non-negativity avoids destructive interference

Page 14: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Constructive and Destructive Interference

Constructive Interference

x + x = 2x

Destructive Interference

x + (−x) = 0

Page 15: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Constructive and Destructive Interference

Constructive Interference

x + x = 2x

Destructive Interference

x + (−x) = 0

Page 16: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Non-Negative Constructive and Destructive Interference

Constructive Interference

|x| + |x| = 2|x|

Destructive Interference

|x| + | − x| = 2|x|

Page 17: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Non-Negative Constructive and Destructive Interference

Constructive Interference

|x| + |x| = 2|x|Destructive Interference

|x| + | − x| = 2|x|

Page 18: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Non-negativity Avoids Destructive Interference

• With non-negativity, destructive interference cannot happen

• Everything must cumulatively add to explain the original data

• But . . .

Page 19: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Non-negativity Avoids Destructive Interference

• With non-negativity, destructive interference cannot happen

• Everything must cumulatively add to explain the original data

• But . . .

Page 20: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Non-negativity Avoids Destructive Interference

• With non-negativity, destructive interference cannot happen

• Everything must cumulatively add to explain the original data

• But . . .

Page 21: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Approximation I

In doing so, we violate the superposition property of sound

x = x1 +x2 + . . .+ xN

and actually solve

|X| ≈ |X1|+ |X2|+ . . .+ |XN |

Page 22: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Approximation II

Alternatively, we can see this approximation via:

x = x1 +x2 + . . .+ xN

|X| ejφ = |X1| ejφ1 + |X2| ejφ2 + . . .+ |XN | ejφN

|X| ejφ ≈ (|X1|+ |X2|+ . . .+ |XN |) ejφ

|X| ≈ |X1|+ |X2|+ . . .+ |XN |

Page 23: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Roadmap of Talk

1 Review

2 Further Insight

3 Supervised and Semi-Supervised Separation

4 Probabilistic Interpretation

5 Extensions

6 Evaluation

7 Future Research Directions

8 Matlab

Page 24: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Unsupervised Separation I

Single, simultaneously estimation of W and H from a mixture V

V ≈ W H

What we’ve seen so far

Page 25: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Unsupervised Separation II

• Complex sounds need more than one basis vector

• Difficult to control which basis vector explain which source

• No way to control the factorization other than F, T, and K

Page 26: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Unsupervised Separation II

• Complex sounds need more than one basis vector

• Difficult to control which basis vector explain which source

• No way to control the factorization other than F, T, and K

Page 27: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Unsupervised Separation II

• Complex sounds need more than one basis vector

• Difficult to control which basis vector explain which source

• No way to control the factorization other than F, T, and K

Page 28: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Supervised Separation

General idea:

1 Use isolated training data of each source within a mixture topre-learn individual models of each source [SRS07]

2 Given a mixture, use the pre-learned models for separation

Page 29: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Supervised Separation

General idea:

1 Use isolated training data of each source within a mixture topre-learn individual models of each source [SRS07]

2 Given a mixture, use the pre-learned models for separation

Page 30: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Supervised Separation I

Example:Drum + Bass

Fre

qu

en

cy

Time

Drum and Bass Loop play stop

Page 31: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Supervised Separation II

Use isolated training data to learn factorization for each source

Bass Loop play stop

Basis Vectors

Freq

ue

ncy (H

z)

Basis

Drum + Bass

Freq

uenc

y

Time

0 1 2 3 4 5 6 7

Activat

ions

Time (seconds)

V1 ≈W1H1

Drum Loop play stop

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

Drum + Bass

Freq

uenc

y

Time

Activat

ions

Time (seconds)

V2 ≈W2H2

Page 32: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Supervised Separation II

Use isolated training data to learn factorization for each source

Bass Loop play stop

Basis Vectors

Freq

ue

ncy (H

z)

Basis

Drum + Bass

Freq

uenc

y

Time

0 1 2 3 4 5 6 7

Activat

ions

Time (seconds)

V1 ≈W1H1

Drum Loop play stop

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

Drum + Bass

Freq

uenc

y

Time

Activat

ions

Time (seconds)

V2 ≈W2H2

Page 33: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Supervised Separation III

Throw away the activations H1 and H2

Bass Loop play stop

Basis Vectors

Freq

ue

ncy (H

z)

Basis

Drum + Bass

Freq

uenc

y

Time

V1 ≈W1H1

Drum Loop play stop

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

Drum + Bass

Freq

uenc

y

Time

V2 ≈W2H2

Page 34: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Supervised Separation IV

Concatenate basis vectors of each source for complete dictionary

Basis Vectors

Fre

qu

ency (H

z)

Basis

Drum + Bass

Freq

uenc

y

Time

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

Drum + Bass

Freq

uenc

y

Time

W ≈[W1 W2

]=

Basis Vectors

Freq

uency (H

z)

Basis

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

Page 35: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Supervised Separation VNow, factorize the mixture with W fixed (only estimate H)

Drum + Bass

Fre

qu

en

cy

Time

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

V ≈ W H

≈[W1 W2

] [H

T

1

HT

2

]

Page 36: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Supervised Separation VNow, factorize the mixture with W fixed (only estimate H)

Drum + Bass

Fre

qu

en

cy

Time

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

V ≈ W H

≈[W1 W2

] [H

T

1

HT

2

]

Page 37: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Complete Supervised Process

1 Use isolated training data to learn a factorization (WsHs) foreach source s

2 Throw away activations Hs for each source s

3 Concatenate basis vectors of each source (W1,W2,...) forcomplete dictionary W

4 Hold W fixed, and factorize unknown mixture of sources V(only estimate H)

5 Once complete, use W and H as before to filter and separateeach source

Page 38: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Complete Supervised Process

1 Use isolated training data to learn a factorization (WsHs) foreach source s

2 Throw away activations Hs for each source s

3 Concatenate basis vectors of each source (W1,W2,...) forcomplete dictionary W

4 Hold W fixed, and factorize unknown mixture of sources V(only estimate H)

5 Once complete, use W and H as before to filter and separateeach source

Page 39: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Complete Supervised Process

1 Use isolated training data to learn a factorization (WsHs) foreach source s

2 Throw away activations Hs for each source s

3 Concatenate basis vectors of each source (W1,W2,...) forcomplete dictionary W

4 Hold W fixed, and factorize unknown mixture of sources V(only estimate H)

5 Once complete, use W and H as before to filter and separateeach source

Page 40: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Complete Supervised Process

1 Use isolated training data to learn a factorization (WsHs) foreach source s

2 Throw away activations Hs for each source s

3 Concatenate basis vectors of each source (W1,W2,...) forcomplete dictionary W

4 Hold W fixed, and factorize unknown mixture of sources V(only estimate H)

5 Once complete, use W and H as before to filter and separateeach source

Page 41: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Complete Supervised Process

1 Use isolated training data to learn a factorization (WsHs) foreach source s

2 Throw away activations Hs for each source s

3 Concatenate basis vectors of each source (W1,W2,...) forcomplete dictionary W

4 Hold W fixed, and factorize unknown mixture of sources V(only estimate H)

5 Once complete, use W and H as before to filter and separateeach source

Page 42: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Sound Examples

Drum + Bass

Fre

quen

cy

Time

Fre

quen

cy

Time

Layer from Source 1

Fre

quen

cy

Time

Layer from Source 2

Mixture sound (left) p s and separated drums p s and bass p s .

Freq

uenc

y

Time

Mask for Source 1

Freq

uenc

y

Time

Mask for Source 2

Masking filters used to process mixture into the separated sources.

Page 43: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Question

• What if you don’t have isolated training data for each source?

• And unsupervised separation still doesn’t work?

Page 44: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Question

• What if you don’t have isolated training data for each source?

• And unsupervised separation still doesn’t work?

Page 45: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Semi-Supervised Separation

General Idea:

1 Learn supervised dictionaries for as many sources as you can[SRS07]

2 Infer remaining unknown dictionaries from the mixture(only fix certain columns of W)

Page 46: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Semi-Supervised Separation

General Idea:

1 Learn supervised dictionaries for as many sources as you can[SRS07]

2 Infer remaining unknown dictionaries from the mixture(only fix certain columns of W)

Page 47: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Semi-Supervised Separation I

Example:Drum + Bass

Fre

qu

en

cy

Time

Drum and Bass Loop play stop

Page 48: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Semi-Supervised Separation II

Use isolated training data to learn factorization for as manysources as possible (e.g. one source)

Bass Loop play stop

Basis Vectors

Freq

ue

ncy (H

z)

Basis

Drum + Bass

Freq

uenc

yTime

0 1 2 3 4 5 6 7

Activat

ions

Time (seconds)

V1 ≈W1H1

Page 49: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Semi-Supervised Separation III

Throw away the activations H1

Bass Loop play stop

Basis Vectors

Freq

ue

ncy (H

z)

Basis

Drum + Bass

Freq

uenc

y

Time

V1 ≈W1H1

Page 50: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Semi-Supervised Separation IV

Concatenate known basis vectors with unknown basis vectors(initialized randomly) for complete dictionary

Basis Vectors

Freq

ue

ncy (H

z)

Basis

Drum + Bass

Freq

uenc

y

Time

Known bass basis vectors Unknown drum basis vectors

(initialized randomly)

W ≈[W1W2

]=

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

Page 51: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Semi-Supervised Separation VNow, factorize the mixture with W1 fixed (estimate W2 and H)

Drum + Bass

Fre

qu

en

cy

Time

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

V ≈ W H

≈[W1 W2

] [H

T

1

HT

2

]

Page 52: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Semi-Supervised Separation VNow, factorize the mixture with W1 fixed (estimate W2 and H)

Drum + Bass

Fre

qu

en

cy

Time

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

Basis Vectors

Fre

qu

en

cy (H

z)

Basis

V ≈ W H

≈[W1 W2

] [H

T

1

HT

2

]

Page 53: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Complete Semi-Supervised Process

1 Use isolated training data to learn a factorization (WsHs) foras many sources s as possible

2 Throw away activations Hs for each known source s

3 Concatenate known basis vectors with random init vectors forunknown sources to construct complete dictionary W

4 Hold the columns of W fixed which correspond to knownsources, and factorize a mixture V (estimate H and anyknown column of W)

5 Once complete, use W and H as before to filter and separateeach source

Page 54: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Complete Semi-Supervised Process

1 Use isolated training data to learn a factorization (WsHs) foras many sources s as possible

2 Throw away activations Hs for each known source s

3 Concatenate known basis vectors with random init vectors forunknown sources to construct complete dictionary W

4 Hold the columns of W fixed which correspond to knownsources, and factorize a mixture V (estimate H and anyknown column of W)

5 Once complete, use W and H as before to filter and separateeach source

Page 55: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Complete Semi-Supervised Process

1 Use isolated training data to learn a factorization (WsHs) foras many sources s as possible

2 Throw away activations Hs for each known source s

3 Concatenate known basis vectors with random init vectors forunknown sources to construct complete dictionary W

4 Hold the columns of W fixed which correspond to knownsources, and factorize a mixture V (estimate H and anyknown column of W)

5 Once complete, use W and H as before to filter and separateeach source

Page 56: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Complete Semi-Supervised Process

1 Use isolated training data to learn a factorization (WsHs) foras many sources s as possible

2 Throw away activations Hs for each known source s

3 Concatenate known basis vectors with random init vectors forunknown sources to construct complete dictionary W

4 Hold the columns of W fixed which correspond to knownsources, and factorize a mixture V (estimate H and anyknown column of W)

5 Once complete, use W and H as before to filter and separateeach source

Page 57: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Complete Semi-Supervised Process

1 Use isolated training data to learn a factorization (WsHs) foras many sources s as possible

2 Throw away activations Hs for each known source s

3 Concatenate known basis vectors with random init vectors forunknown sources to construct complete dictionary W

4 Hold the columns of W fixed which correspond to knownsources, and factorize a mixture V (estimate H and anyknown column of W)

5 Once complete, use W and H as before to filter and separateeach source

Page 58: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Sound ExamplesSupervised the bass.

Drum + BassF

requ

ency

Time

Freq

uenc

y

Time

Layer from Source 1

Freq

uenc

y

Time

Layer from Source 2

Mixture sound (left) p s and separated drums p s and bass p s .

Freq

uenc

y

Time

Mask for Source 1

Freq

uenc

y

Time

Mask for Source 2

Masking filters used to process mixture into the separated sources.

Page 59: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Roadmap of Talk

1 Review

2 Further Insight

3 Supervised and Semi-Supervised Separation

4 Probabilistic Interpretation

5 Extensions

6 Evaluation

7 Future Research Directions

8 Matlab

Page 60: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Probabilistic InterpretationSome notation:z indexes basis vectors, f frequency bins, and t time frames.

The model:For each time frame t, repeat the following:

• Choose a component from p(z|t). z−→

t −→

= H

• Choose a frequency from p(f |z). f−→

z −→

= W

The spectrogram Vft are the counts that we obtain at the end ofthe day. We want to estimate p(z|t) and p(f |z).

Page 61: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Probabilistic InterpretationSome notation:z indexes basis vectors, f frequency bins, and t time frames.The model:For each time frame t, repeat the following:

• Choose a component from p(z|t). z−→

t −→

= H

• Choose a frequency from p(f |z). f−→

z −→

= W

The spectrogram Vft are the counts that we obtain at the end ofthe day. We want to estimate p(z|t) and p(f |z).

Page 62: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Probabilistic InterpretationSome notation:z indexes basis vectors, f frequency bins, and t time frames.The model:For each time frame t, repeat the following:

• Choose a component from p(z|t). z−→

t −→

= H

• Choose a frequency from p(f |z). f−→

z −→

= W

The spectrogram Vft are the counts that we obtain at the end ofthe day. We want to estimate p(z|t) and p(f |z).

Page 63: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Probabilistic InterpretationSome notation:z indexes basis vectors, f frequency bins, and t time frames.The model:For each time frame t, repeat the following:

• Choose a component from p(z|t). z−→

t −→

= H

• Choose a frequency from p(f |z). f−→

z −→

= W

The spectrogram Vft are the counts that we obtain at the end ofthe day. We want to estimate p(z|t) and p(f |z).

Page 64: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Probabilistic InterpretationSome notation:z indexes basis vectors, f frequency bins, and t time frames.The model:For each time frame t, repeat the following:

• Choose a component from p(z|t). z−→

t −→

= H

• Choose a frequency from p(f |z). f−→

z −→

= W

The spectrogram Vft are the counts that we obtain at the end ofthe day. We want to estimate p(z|t) and p(f |z).

Page 65: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Probabilistic Interpretation

Is this realistic?

• We’re assuming the spectrogram contains counts. We sample“quanta” of spectral energy at a time.

• This model is popular in topic modeling, where we assumedocuments are generated from first sampling a topic fromp(z|d) and then a word from p(w|z).

• probabilistic latent semantic indexing, or pLSI [Hof99]• latent Dirichlet allocation, or LDA [BNJ03]

• In audio, this model is called probabilistic latent componentanalysis, or PLCA [SRS06]

Page 66: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Probabilistic Interpretation

Is this realistic?

• We’re assuming the spectrogram contains counts. We sample“quanta” of spectral energy at a time.

• This model is popular in topic modeling, where we assumedocuments are generated from first sampling a topic fromp(z|d) and then a word from p(w|z).

• probabilistic latent semantic indexing, or pLSI [Hof99]• latent Dirichlet allocation, or LDA [BNJ03]

• In audio, this model is called probabilistic latent componentanalysis, or PLCA [SRS06]

Page 67: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Probabilistic Interpretation

Is this realistic?

• We’re assuming the spectrogram contains counts. We sample“quanta” of spectral energy at a time.

• This model is popular in topic modeling, where we assumedocuments are generated from first sampling a topic fromp(z|d) and then a word from p(w|z).

• probabilistic latent semantic indexing, or pLSI [Hof99]

• latent Dirichlet allocation, or LDA [BNJ03]

• In audio, this model is called probabilistic latent componentanalysis, or PLCA [SRS06]

Page 68: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Probabilistic Interpretation

Is this realistic?

• We’re assuming the spectrogram contains counts. We sample“quanta” of spectral energy at a time.

• This model is popular in topic modeling, where we assumedocuments are generated from first sampling a topic fromp(z|d) and then a word from p(w|z).

• probabilistic latent semantic indexing, or pLSI [Hof99]• latent Dirichlet allocation, or LDA [BNJ03]

• In audio, this model is called probabilistic latent componentanalysis, or PLCA [SRS06]

Page 69: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Probabilistic Interpretation

Is this realistic?

• We’re assuming the spectrogram contains counts. We sample“quanta” of spectral energy at a time.

• This model is popular in topic modeling, where we assumedocuments are generated from first sampling a topic fromp(z|d) and then a word from p(w|z).

• probabilistic latent semantic indexing, or pLSI [Hof99]• latent Dirichlet allocation, or LDA [BNJ03]

• In audio, this model is called probabilistic latent componentanalysis, or PLCA [SRS06]

Page 70: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Latent Variable Model

We only observe the outcomes Vft. But the full model involvesunobserved variables Z.

p(z|t) Z

p(f |z)

F

NT

The Expectation-Maximization (EM) algorithm is used to fitlatent variable models. It is also used in estimating Hidden MarkovModels, Gaussian mixture models, etc.

Page 71: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Latent Variable Model

We only observe the outcomes Vft. But the full model involvesunobserved variables Z.

p(z|t) Z

p(f |z)

F

NT

The Expectation-Maximization (EM) algorithm is used to fitlatent variable models. It is also used in estimating Hidden MarkovModels, Gaussian mixture models, etc.

Page 72: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Latent Variable Model

We only observe the outcomes Vft. But the full model involvesunobserved variables Z.

p(z|t) Z

p(f |z)

F

NT

The Expectation-Maximization (EM) algorithm is used to fitlatent variable models. It is also used in estimating Hidden MarkovModels, Gaussian mixture models, etc.

Page 73: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Maximum Likelihood EstimationTo fit the parameters, we choose the parameters that maximize thelikelihood of the data. Let’s zoom in on a single time frame:

p(v1, ..., vF ) =(∑

f vf )!

v1!...vF !

F∏f=1

p(f |t)vf

According to the model on the previous slide, the frequency couldhave come from any of the latent components. We don’t observethis so we average over all of them.

p(f |t) =∑z

p(z|t)p(f |z)

Putting it all together, we obtain:

p(v1, ..., vF ) =(∑

f vf )!

v1!...vF !

F∏f=1

(∑z

p(z|t)p(f |z))vf

Page 74: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Maximum Likelihood EstimationTo fit the parameters, we choose the parameters that maximize thelikelihood of the data. Let’s zoom in on a single time frame:

p(v1, ..., vF ) =(∑

f vf )!

v1!...vF !

F∏f=1

p(f |t)vf

According to the model on the previous slide, the frequency couldhave come from any of the latent components. We don’t observethis so we average over all of them.

p(f |t) =∑z

p(z|t)p(f |z)

Putting it all together, we obtain:

p(v1, ..., vF ) =(∑

f vf )!

v1!...vF !

F∏f=1

(∑z

p(z|t)p(f |z))vf

Page 75: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Maximum Likelihood EstimationTo fit the parameters, we choose the parameters that maximize thelikelihood of the data. Let’s zoom in on a single time frame:

p(v1, ..., vF ) =(∑

f vf )!

v1!...vF !

F∏f=1

p(f |t)vf

According to the model on the previous slide, the frequency couldhave come from any of the latent components. We don’t observethis so we average over all of them.

p(f |t) =∑z

p(z|t)p(f |z)

Putting it all together, we obtain:

p(v1, ..., vF ) =(∑

f vf )!

v1!...vF !

F∏f=1

(∑z

p(z|t)p(f |z))vf

Page 76: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Maximum Likelihood Estimation

p(v1, ..., vF ) =(∑

f vf )!

v1!...vF !

F∏f=1

(∑z

p(z|t)p(f |z))vf

• We want to maximize this over p(z|t) and p(f |z).

• In general, with probabilities it is easier to maximize the logthan the thing itself:

log p(v1, ..., vF ) =

F∑f=1

vf log

(∑z

p(z|t)p(f |z))

+ const.

• Remember from last week: First thing you should alwaystry is differentiate and set equal to zero. Does this work here?

Page 77: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Maximum Likelihood Estimation

p(v1, ..., vF ) =(∑

f vf )!

v1!...vF !

F∏f=1

(∑z

p(z|t)p(f |z))vf

• We want to maximize this over p(z|t) and p(f |z).

• In general, with probabilities it is easier to maximize the logthan the thing itself:

log p(v1, ..., vF ) =

F∑f=1

vf log

(∑z

p(z|t)p(f |z))

+ const.

• Remember from last week: First thing you should alwaystry is differentiate and set equal to zero. Does this work here?

Page 78: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Maximum Likelihood Estimation

p(v1, ..., vF ) =(∑

f vf )!

v1!...vF !

F∏f=1

(∑z

p(z|t)p(f |z))vf

• We want to maximize this over p(z|t) and p(f |z).

• In general, with probabilities it is easier to maximize the logthan the thing itself:

log p(v1, ..., vF ) =

F∑f=1

vf log

(∑z

p(z|t)p(f |z))

+ const.

• Remember from last week: First thing you should alwaystry is differentiate and set equal to zero. Does this work here?

Page 79: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

The Connection to NMF

• Last week, we talked about minimizing the KL divergencebetween V and WH.

D(V ||WH) = −∑f,t

Vft log

(∑z

WfzHzt

)+∑f,t

∑z

WfzHzt+const.

• Compare with maximizing the log-likelihood:

log p(v1, ..., vF ) =

F∑f=1

vf log

(∑z

p(z|t)p(f |z))

+ const.

subject to∑

z p(z|t) = 1 and∑

f p(f |z) = 1.

• Last week, we used majorization-minimization on D(V ||WH):

− log

(∑z

φftzWfzHzt

φftz

)≤ −

∑z

φftz logWfzHzt

φftz

• Now watch what we do with the log-likelihood....

Page 80: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

The Connection to NMF

• Last week, we talked about minimizing the KL divergencebetween V and WH.

D(V ||WH) = −∑f,t

Vft log

(∑z

WfzHzt

)+∑f,t

∑z

WfzHzt+const.

• Compare with maximizing the log-likelihood:

log p(v1, ..., vF ) =

F∑f=1

vf log

(∑z

p(z|t)p(f |z))

+ const.

subject to∑

z p(z|t) = 1 and∑

f p(f |z) = 1.

• Last week, we used majorization-minimization on D(V ||WH):

− log

(∑z

φftzWfzHzt

φftz

)≤ −

∑z

φftz logWfzHzt

φftz

• Now watch what we do with the log-likelihood....

Page 81: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

The Connection to NMF

• Last week, we talked about minimizing the KL divergencebetween V and WH.

D(V ||WH) = −∑f,t

Vft log

(∑z

WfzHzt

)+∑f,t

∑z

WfzHzt+const.

• Compare with maximizing the log-likelihood:

log p(v1, ..., vF ) =

F∑f=1

vf log

(∑z

p(z|t)p(f |z))

+ const.

subject to∑

z p(z|t) = 1 and∑

f p(f |z) = 1.

• Last week, we used majorization-minimization on D(V ||WH):

− log

(∑z

φftzWfzHzt

φftz

)≤ −

∑z

φftz logWfzHzt

φftz

• Now watch what we do with the log-likelihood....

Page 82: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

The Connection to NMF

• Last week, we talked about minimizing the KL divergencebetween V and WH.

D(V ||WH) = −∑f,t

Vft log

(∑z

WfzHzt

)+∑f,t

∑z

WfzHzt+const.

• Compare with maximizing the log-likelihood:

log p(v1, ..., vF ) =

F∑f=1

vf log

(∑z

p(z|t)p(f |z))

+ const.

subject to∑

z p(z|t) = 1 and∑

f p(f |z) = 1.

• Last week, we used majorization-minimization on D(V ||WH):

− log

(∑z

φftzWfzHzt

φftz

)≤ −

∑z

φftz logWfzHzt

φftz

• Now watch what we do with the log-likelihood....

Page 83: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

The Connection to NMF

• Last week, we talked about minimizing the KL divergencebetween V and WH.

D(V ||WH) = −∑f,t

Vft log

(∑z

WfzHzt

)+∑f,t

∑z

WfzHzt+const.

• Compare with maximizing the log-likelihood:

log p(v1, ..., vF ) =

F∑f=1

vf log

(∑z

p(z|t)p(f |z))

+ const.

subject to∑

z p(z|t) = 1 and∑

f p(f |z) = 1.

• Last week, we used majorization-minimization on D(V ||WH):

− log

(∑z

φftzWfzHzt

φftz

)≤ −

∑z

φftz logWfzHzt

φftz

• Now watch what we do with the log-likelihood....

Page 84: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

EM Algorithm

• Suppose we observed the latent component for a frequencyquanta. Then we wouldn’t need to average over thecomponents; its log-likelihood would be:

log p(z|t)p(f |z)

• But we don’t know the latent component, so let’s average thisover our best guess of the probability of each component:∑

z

p(z|f, t) log p(z|t)p(f |z)

• In summary, we’ve replaced

log

(∑z

p(z|t)p(f |z))

by∑z

p(z|f, t) log p(z|t)p(f |z)

Look familiar?

Page 85: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

EM Algorithm

• Suppose we observed the latent component for a frequencyquanta. Then we wouldn’t need to average over thecomponents; its log-likelihood would be:

log p(z|t)p(f |z)

• But we don’t know the latent component, so let’s average thisover our best guess of the probability of each component:∑

z

p(z|f, t) log p(z|t)p(f |z)

• In summary, we’ve replaced

log

(∑z

p(z|t)p(f |z))

by∑z

p(z|f, t) log p(z|t)p(f |z)

Look familiar?

Page 86: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

EM Algorithm

• Suppose we observed the latent component for a frequencyquanta. Then we wouldn’t need to average over thecomponents; its log-likelihood would be:

log p(z|t)p(f |z)

• But we don’t know the latent component, so let’s average thisover our best guess of the probability of each component:∑

z

p(z|f, t) log p(z|t)p(f |z)

• In summary, we’ve replaced

log

(∑z

p(z|t)p(f |z))

by∑z

p(z|f, t) log p(z|t)p(f |z)

Look familiar?

Page 87: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

EM Algorithm

E-step: Calculate

p(z|f, t) =p(z|t)p(f |z)∑z p(z|t)p(f |z)

M-step: Maximize∑f,t

Vft∑z

p(z|f, t) log p(z|t)p(f |z)

Majorization: Calculate

φftz =WfzHzt∑zWfzHzt

Minimization: Minimize

−∑f,t

Vft∑z

φzft logWfzHzt+∑f,t,z

WfzHzt

The EM updates are exactly the multiplicative updates for NMF,up to normalization!The EM algorithm is a special case of MM, where the minorizingfunction is the expected conditional log likelihood.

Page 88: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

EM Algorithm

E-step: Calculate

p(z|f, t) =p(z|t)p(f |z)∑z p(z|t)p(f |z)

M-step: Maximize∑f,t

Vft∑z

p(z|f, t) log p(z|t)p(f |z)

Majorization: Calculate

φftz =WfzHzt∑zWfzHzt

Minimization: Minimize

−∑f,t

Vft∑z

φzft logWfzHzt+∑f,t,z

WfzHzt

The EM updates are exactly the multiplicative updates for NMF,up to normalization!The EM algorithm is a special case of MM, where the minorizingfunction is the expected conditional log likelihood.

Page 89: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

EM Algorithm

E-step: Calculate

p(z|f, t) =p(z|t)p(f |z)∑z p(z|t)p(f |z)

M-step: Maximize∑f,t

Vft∑z

p(z|f, t) log p(z|t)p(f |z)

Majorization: Calculate

φftz =WfzHzt∑zWfzHzt

Minimization: Minimize

−∑f,t

Vft∑z

φzft logWfzHzt+∑f,t,z

WfzHzt

The EM updates are exactly the multiplicative updates for NMF,up to normalization!

The EM algorithm is a special case of MM, where the minorizingfunction is the expected conditional log likelihood.

Page 90: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

EM Algorithm

E-step: Calculate

p(z|f, t) =p(z|t)p(f |z)∑z p(z|t)p(f |z)

M-step: Maximize∑f,t

Vft∑z

p(z|f, t) log p(z|t)p(f |z)

Majorization: Calculate

φftz =WfzHzt∑zWfzHzt

Minimization: Minimize

−∑f,t

Vft∑z

φzft logWfzHzt+∑f,t,z

WfzHzt

The EM updates are exactly the multiplicative updates for NMF,up to normalization!The EM algorithm is a special case of MM, where the minorizingfunction is the expected conditional log likelihood.

Page 91: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Geometric Interpretation

• We can think of the basis vectors p(f |z) as lying on aprobability simplex.

• The possible sounds for a given source is the convex hull ofthe basis vectors for that source.

Page 92: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Geometric Interpretation

• We can think of the basis vectors p(f |z) as lying on aprobability simplex.

• The possible sounds for a given source is the convex hull ofthe basis vectors for that source.

Page 93: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Geometric Interpretation

In supervised separation, we try to explain time frames of themixture signal as combinations of the basis vectors of the differentsources.

Page 94: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Roadmap of Talk

1 Review

2 Further Insight

3 Supervised and Semi-Supervised Separation

4 Probabilistic Interpretation

5 Extensions

6 Evaluation

7 Future Research Directions

8 Matlab

Page 95: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Extensions

• The number of parameters that need to be estimated is huge:FK +KT .

• In high-dimensional settings, it is useful to impose additionalstructure.

• We will look at two ways to do this: priors andregularization.

Page 96: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Extensions

• The number of parameters that need to be estimated is huge:FK +KT .

• In high-dimensional settings, it is useful to impose additionalstructure.

• We will look at two ways to do this: priors andregularization.

Page 97: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Extensions

• The number of parameters that need to be estimated is huge:FK +KT .

• In high-dimensional settings, it is useful to impose additionalstructure.

• We will look at two ways to do this: priors andregularization.

Page 98: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Priors

• Assume the parameters are also random, e.g., H = p(z|t) isgenerated from p(H|α). This is called a prior distribution.

α H

p(z|t)

Z

p(f |z)

F

NT

• Estimate the posterior distribution p(H|α, V ).

• Bayes’ rule: p(H|α, V ) =p(H,V |α)

p(V |α)=p(H|α)p(V |H)

p(V |α)

Page 99: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Priors

• Assume the parameters are also random, e.g., H = p(z|t) isgenerated from p(H|α). This is called a prior distribution.

α H

p(z|t)

Z

p(f |z)

F

NT

• Estimate the posterior distribution p(H|α, V ).

• Bayes’ rule: p(H|α, V ) =p(H,V |α)

p(V |α)=p(H|α)p(V |H)

p(V |α)

Page 100: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Priors

• Assume the parameters are also random, e.g., H = p(z|t) isgenerated from p(H|α). This is called a prior distribution.

α H

p(z|t)

Z

p(f |z)

F

NT

• Estimate the posterior distribution p(H|α, V ).

• Bayes’ rule: p(H|α, V ) =p(H,V |α)

p(V |α)=p(H|α)p(V |H)

p(V |α)

Page 101: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Priors

• Assume the parameters are also random, e.g., H = p(z|t) isgenerated from p(H|α). This is called a prior distribution.

α H

p(z|t)

Z

p(f |z)

F

NT

• Estimate the posterior distribution p(H|α, V ).

• Bayes’ rule: p(H|α, V ) =p(H,V |α)

p(V |α)=p(H|α)p(V |H)

p(V |α)

Page 102: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Bayesian Inference

• Bayes’ rule gives us an entire distribution over H = p(z|t).

• One option is the posterior mean: computationallyintractable.

• An easier option is the posterior mode (MAP):

maximizeH

log p(H|α, V ) = log p(H|α) + log p(V |H)− p(V |α)

Page 103: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Bayesian Inference

• Bayes’ rule gives us an entire distribution over H = p(z|t).

• One option is the posterior mean: computationallyintractable.

• An easier option is the posterior mode (MAP):

maximizeH

log p(H|α, V ) = log p(H|α) + log p(V |H)− p(V |α)

Page 104: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Bayesian Inference

• Bayes’ rule gives us an entire distribution over H = p(z|t).

• One option is the posterior mean: computationallyintractable.

• An easier option is the posterior mode (MAP):

maximizeH

log p(H|α, V ) = log p(H|α) + log p(V |H)− p(V |α)

Page 105: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Bayesian Inference

• Bayes’ rule gives us an entire distribution over H = p(z|t).

• One option is the posterior mean: computationallyintractable.

• An easier option is the posterior mode (MAP):

maximizeH

log p(H|α, V ) = log p(H|α)︸ ︷︷ ︸log prior

+ log p(V |H)︸ ︷︷ ︸likelihood

−p(V |α)

• We can choose priors that encode structural assumptions, likesparsity.

Page 106: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Bayesian Inference

• Bayes’ rule gives us an entire distribution over H = p(z|t).

• One option is the posterior mean: computationallyintractable.

• An easier option is the posterior mode (MAP):

maximizeH

log p(H|α, V ) = log p(H|α)︸ ︷︷ ︸log prior

+ log p(V |H)︸ ︷︷ ︸likelihood

−p(V |α)

• We can choose priors that encode structural assumptions, likesparsity.

Page 107: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Regularization Viewpoint

• Another way is to add another term to the objective function:

minimizeW,H≥0

D(V ||WH) + λΩ(H)

Ω encodes the desired structure, λ controls the strength.

• We showed earlier that D(V ||WH) is the negative loglikelihood. So:

λΩ(H)⇐⇒ − log p(H|α)

• Some common choices for Ω(H):

• sparsity: ||H||1 =∑

z,t |Hzt|• smoothness:

∑z,t(Hz,t −Hz,t−1)2

Page 108: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Regularization Viewpoint

• Another way is to add another term to the objective function:

minimizeW,H≥0

D(V ||WH) + λΩ(H)

Ω encodes the desired structure, λ controls the strength.

• We showed earlier that D(V ||WH) is the negative loglikelihood. So:

λΩ(H)⇐⇒ − log p(H|α)

• Some common choices for Ω(H):

• sparsity: ||H||1 =∑

z,t |Hzt|• smoothness:

∑z,t(Hz,t −Hz,t−1)2

Page 109: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Regularization Viewpoint

• Another way is to add another term to the objective function:

minimizeW,H≥0

D(V ||WH) + λΩ(H)

Ω encodes the desired structure, λ controls the strength.

• We showed earlier that D(V ||WH) is the negative loglikelihood. So:

λΩ(H)⇐⇒ − log p(H|α)

• Some common choices for Ω(H):

• sparsity: ||H||1 =∑

z,t |Hzt|• smoothness:

∑z,t(Hz,t −Hz,t−1)2

Page 110: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Regularization Viewpoint

• Another way is to add another term to the objective function:

minimizeW,H≥0

D(V ||WH) + λΩ(H)

Ω encodes the desired structure, λ controls the strength.

• We showed earlier that D(V ||WH) is the negative loglikelihood. So:

λΩ(H)⇐⇒ − log p(H|α)

• Some common choices for Ω(H):• sparsity: ||H||1 =

∑z,t |Hzt|

• smoothness:∑

z,t(Hz,t −Hz,t−1)2

Page 111: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Regularization Viewpoint

• Another way is to add another term to the objective function:

minimizeW,H≥0

D(V ||WH) + λΩ(H)

Ω encodes the desired structure, λ controls the strength.

• We showed earlier that D(V ||WH) is the negative loglikelihood. So:

λΩ(H)⇐⇒ − log p(H|α)

• Some common choices for Ω(H):• sparsity: ||H||1 =

∑z,t |Hzt|

• smoothness:∑

z,t(Hz,t −Hz,t−1)2

Page 112: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Roadmap of Talk

1 Review

2 Further Insight

3 Supervised and Semi-Supervised Separation

4 Probabilistic Interpretation

5 Extensions

6 Evaluation

7 Future Research Directions

8 Matlab

Page 113: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Evaluation Measures

• Signal-to-Interference Ratio (SIR)

• Signal-to-Artifact Ratio (SAR)

• Signal-to-Distortion Ratio (SDR)

We want all of these metrics to be as high as possible [VGF06]

Page 114: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Evaluation Measures

To compute these three measures, we must obtain:

• s ∈ RT×N original unmixed signals (ground truth)

• s ∈ RT×N estimated separated sources

Then, we decompose these signals into

• starget — actual source estimate

• einterf — interference signal (i.e. the unwanted source)

• eartif — artifacts of the separation algorithm

Page 115: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Evaluation Measures

To compute starget, einterf , and eartif

• starget = Psj sj

• einterf = Pssj − Psj sj• eartif = sj − Pssj

where Psj and Ps are T × T projection matrices

Page 116: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Signal-to-Interference Ratio (SIR)

A measure of the suppression of the unwanted source

SIR = 10 log10||starget||2||einterf ||2

Page 117: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Signal-to-Artifact Ratio (SAR)

A measure of the artifacts that have been introduced by theseparation process

SAR = 10 log10||starget + einterf ||2

||eartif ||2

Page 118: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Signal-to-Distortion Ratio (SDR)

An overall measure that takes into account both the SIR and SAR

SDR = 10 log10||starget||2

||eartif + einterf ||2

Page 119: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Selecting Hyperparameters using BSS Eval Metrics

• One problem with NMF is the need to specify the number ofbasis vectors K.

• Even more parameters if you include regularization.

• BSS eval metrics give us a way to learn the optimal settingsfor source separation.

• Generate synthetic mixtures, try different parameter settings,and choose the parameters that give the best BSS evalmetrics.

Page 120: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Selecting Hyperparameters using BSS Eval Metrics

• One problem with NMF is the need to specify the number ofbasis vectors K.

• Even more parameters if you include regularization.

• BSS eval metrics give us a way to learn the optimal settingsfor source separation.

• Generate synthetic mixtures, try different parameter settings,and choose the parameters that give the best BSS evalmetrics.

Page 121: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Selecting Hyperparameters using BSS Eval Metrics

• One problem with NMF is the need to specify the number ofbasis vectors K.

• Even more parameters if you include regularization.

• BSS eval metrics give us a way to learn the optimal settingsfor source separation.

• Generate synthetic mixtures, try different parameter settings,and choose the parameters that give the best BSS evalmetrics.

Page 122: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Selecting Hyperparameters using BSS Eval Metrics

• One problem with NMF is the need to specify the number ofbasis vectors K.

• Even more parameters if you include regularization.

• BSS eval metrics give us a way to learn the optimal settingsfor source separation.

• Generate synthetic mixtures, try different parameter settings,and choose the parameters that give the best BSS evalmetrics.

Page 123: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

BSS Eval Toolbox

A Matlab tool box for source separation evaluation [VGF06]:

http://bass-db.gforge.inria.fr/bss_eval/

Page 124: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Roadmap of Talk

1 Review

2 Further Insight

3 Supervised and Semi-Supervised Separation

4 Probabilistic Interpretation

5 Extensions

6 Evaluation

7 Future Research Directions

8 Matlab

Page 125: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Research Directions

• Score-informed separation - sheet music

• Interactive separation - user-interaction

• Temporal dynamics - how sounds change over time

• Unsupervised separation - grouping basis vectors, clustering

• Phase estimation - complex NMF, STFT constraints, etc.

• Universal models - big data for general models of sources

Page 126: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

Demos

• Universal Speech Models

• Interactive Source Separation• Drums + Bass• Guitar + Vocals + AutoTune• Jackson 5 Remixed

Page 127: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

STFT

x1 = wavread(‘bass’);

x2 = wavread(‘drums’);

[xm fs] = wavread(‘drums+bass’);

FFTSIZE = 1024;

HOPSIZE = 256;

WINDOWSIZE = 512;

X1 = myspectrogram(x1,FFTSIZE,fs,hann(WINDOWSIZE),-HOPSIZE);

V1 = abs(X1(1:(FFTSIZE/2+1),:));

X2 = myspectrogram(x2,FFTSIZE,fs,hann(WINDOWSIZE),-HOPSIZE);

V2 = abs(X2(1:(FFTSIZE/2+1),:));

Xm = myspectrogram(xm,FFTSIZE,fs,hann(WINDOWSIZE),-HOPSIZE);

Vm = abs(Xm(1:(FFTSIZE/2+1),:)); maxV = max(max(db(Vm)));

F = size(Vm,1);

T = size(Vm,2);

• https://ccrma.stanford.edu/~jos/sasp/Matlab_

listing_myspectrogram_m.html

• https://ccrma.stanford.edu/~jos/sasp/Matlab_

listing_invmyspectrogram_m.html

Page 128: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

NMFK = [25 25]; % number of basis vectors

MAXITER = 500; % total number of iterations to run

[W1, H1] = nmf(V1, K(1), [], MAXITER,[]);

[W2, H2] = nmf(V2, K(2), [], MAXITER,[]);

[W, H] = nmf(Vm, K, [W1 W2], MAXITER, 1:sum(K));

function [W, H] = nmf(V, K, W, MAXITER, fixedInds)

F = size(V,1); T = size(V,2);

rand(’seed’,0)

if isempty(W)

W = 1+rand(F, sum(K));

end

H = 1+rand(sum(K), T);

inds = setdiff(1:sum(K),fixedInds);

ONES = ones(F,T);

for i=1:MAXITER

% update activations

H = H .* (W’*( V./(W*H+eps))) ./ (W’*ONES);

% update dictionaries

W(:,inds) = W(:,inds) .* ((V./(W*H+eps))*H(inds,:)’) ./(ONES*H(inds,:)’);

end

% normalize W to sum to 1

sumW = sum(W);

W = W*diag(1./sumW);

H = diag(sumW)*H;

Page 129: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

FILTER & ISTFT

% get the mixture phase

phi = angle(Xm);

c = [1 cumsum(K)];

for i=1:length(K)

% create masking filter

Mask = W(:,c(i):c(i+1))*H(c(i):c(i+1),:)./(W*H);

% filter

XmagHat = Vm.*Mask;

% create upper half of frequency before istft

XmagHat = [XmagHat; conj( XmagHat(end-1:-1:2,:))];

% Multiply with phase

XHat = XmagHat.*exp(1i*phi);

% create upper half of frequency before istft

xhat(:,i) = real(invmyspectrogram(XmagHat.*exp(1i*phi),HOPSIZE))’;

end

Page 130: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

References I

David M. Blei, Andrew Y. Ng, and Michael I. Jordan, Latentdirichlet allocation, J. Mach. Learn. Res. 3 (2003), 993–1022.

T. Hofmann, Probabilistic latent semantic indexing,Proceedings of the 22nd annual international ACM SIGIRConference on Research and Development in InformationRetrieval (New York, NY, USA), SIGIR ’99, ACM, 1999,pp. 50–57.

P. Smaragdis and J.C. Brown, Non-negative matrixfactorization for polyphonic music transcription, IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA), oct. 2003, pp. 177 – 180.

P. Smaragdis, B. Raj, and M. Shashanka, A ProbabilisticLatent Variable Model for Acoustic Modeling, Advances inNeural Information Processing Systems (NIPS), Workshop onAdvances in Modeling for Acoustic Processing, 2006.

Page 131: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative

References II

, Supervised and semi-supervised separation of soundsfrom single-channel mixtures, International Conference onIndependent Component Analysis and Signal Separation(Berlin, Heidelberg), Springer-Verlag, 2007, pp. 414–421.

E. Vincent, R. Gribonval, and C. Fevotte, Performancemeasurement in blind audio source separation, IEEETransactions on Audio, Speech, and Language Processing 14(2006), no. 4, 1462 –1469.