![Page 1: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/1.jpg)
Source Separation Tutorial Mini-Series III:Extensions and Interpretations to Non-Negative
Matrix Factorization
Nicholas BryanDennis Sun
Center for Computer Research in Music and Acoustics,Stanford University
DSP SeminarApril 9th, 2013
![Page 2: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/2.jpg)
Roadmap of Talk
1 Review
2 Further Insight
3 Supervised and Semi-Supervised Separation
4 Probabilistic Interpretation
5 Extensions
6 Evaluation
7 Future Research Directions
8 Matlab
![Page 3: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/3.jpg)
Roadmap of Talk
1 Review
2 Further Insight
3 Supervised and Semi-Supervised Separation
4 Probabilistic Interpretation
5 Extensions
6 Evaluation
7 Future Research Directions
8 Matlab
![Page 4: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/4.jpg)
Non-Negative Matrix Factorization
Data[V
]≈
Basis Vectors[W
] Weights[H
]
• A matrix factorization where everything is non-negative
• V ∈ RF×T+ - original non-negative data
• W ∈ RF×K+ - matrix of basis vectors, dictionary elements
• H ∈ RK×T+ - matrix of activations, weights, or gains
• K < F < T (typically)• A compressed representation of the data• A low-rank approximation to V
![Page 5: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/5.jpg)
Non-Negative Matrix Factorization
Data[V
]≈
Basis Vectors[W
] Weights[H
]
• A matrix factorization where everything is non-negative
• V ∈ RF×T+ - original non-negative data
• W ∈ RF×K+ - matrix of basis vectors, dictionary elements
• H ∈ RK×T+ - matrix of activations, weights, or gains
• K < F < T (typically)• A compressed representation of the data• A low-rank approximation to V
![Page 6: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/6.jpg)
NMF With Spectrogram Data
V ≈ W H
NMF of Mary Had a Little Lamb with K = 3 play stop
• The basis vectors capture prototypical spectra [SB03]
• The weights capture the gain of the basis vectors
![Page 7: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/7.jpg)
Factorization Interpretation I
Columns of V ≈ as a weighted sum (mixture) of basis vectors
v1 v2 ... vT
≈ K∑j=1
Hj1 wj
K∑j=1
Hj2 wj ...K∑j=1
HjT wj
![Page 8: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/8.jpg)
Factorization Interpretation II
V is approximated as sum of matrix “layers”
= + +
v1 v2 . . . vT
≈w1 w2 . . . wK
hT
1
hT
2...
hT
K
V ≈ w1 h
T1 +w2 h
T2 + . . .+ wK hT
K
![Page 9: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/9.jpg)
General Separation Pipeline
1 STFT
2 NMF
3 FILTER
4 ISTFT
NMF
STFT
ISTFT
FILTERx
\XxS
x1
x2ISTFT
ISTFT
...
W H,
|X1|
|X2|
|XS |V = |X|
![Page 10: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/10.jpg)
An Algorithm for NMF
Algorithm KL-NMF
initialize W,Hrepeat
H← H .∗WT V
WH
WT 1
W←W .∗V
WHHT
1HT
until convergence return W,H
![Page 11: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/11.jpg)
Roadmap of Talk
1 Review
2 Further Insight
3 Supervised and Semi-Supervised Separation
4 Probabilistic Interpretation
5 Extensions
6 Evaluation
7 Future Research Directions
8 Matlab
![Page 12: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/12.jpg)
Non-Negativity
• Question: Why do we get a ’parts-based’ representation ofsound?
• Answer: Non-negativity avoids destructive interference
![Page 13: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/13.jpg)
Non-Negativity
• Question: Why do we get a ’parts-based’ representation ofsound?
• Answer: Non-negativity avoids destructive interference
![Page 14: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/14.jpg)
Constructive and Destructive Interference
Constructive Interference
x + x = 2x
Destructive Interference
x + (−x) = 0
![Page 15: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/15.jpg)
Constructive and Destructive Interference
Constructive Interference
x + x = 2x
Destructive Interference
x + (−x) = 0
![Page 16: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/16.jpg)
Non-Negative Constructive and Destructive Interference
Constructive Interference
|x| + |x| = 2|x|
Destructive Interference
|x| + | − x| = 2|x|
![Page 17: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/17.jpg)
Non-Negative Constructive and Destructive Interference
Constructive Interference
|x| + |x| = 2|x|Destructive Interference
|x| + | − x| = 2|x|
![Page 18: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/18.jpg)
Non-negativity Avoids Destructive Interference
• With non-negativity, destructive interference cannot happen
• Everything must cumulatively add to explain the original data
• But . . .
![Page 19: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/19.jpg)
Non-negativity Avoids Destructive Interference
• With non-negativity, destructive interference cannot happen
• Everything must cumulatively add to explain the original data
• But . . .
![Page 20: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/20.jpg)
Non-negativity Avoids Destructive Interference
• With non-negativity, destructive interference cannot happen
• Everything must cumulatively add to explain the original data
• But . . .
![Page 21: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/21.jpg)
Approximation I
In doing so, we violate the superposition property of sound
x = x1 +x2 + . . .+ xN
and actually solve
|X| ≈ |X1|+ |X2|+ . . .+ |XN |
![Page 22: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/22.jpg)
Approximation II
Alternatively, we can see this approximation via:
x = x1 +x2 + . . .+ xN
|X| ejφ = |X1| ejφ1 + |X2| ejφ2 + . . .+ |XN | ejφN
|X| ejφ ≈ (|X1|+ |X2|+ . . .+ |XN |) ejφ
|X| ≈ |X1|+ |X2|+ . . .+ |XN |
![Page 23: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/23.jpg)
Roadmap of Talk
1 Review
2 Further Insight
3 Supervised and Semi-Supervised Separation
4 Probabilistic Interpretation
5 Extensions
6 Evaluation
7 Future Research Directions
8 Matlab
![Page 24: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/24.jpg)
Unsupervised Separation I
Single, simultaneously estimation of W and H from a mixture V
V ≈ W H
What we’ve seen so far
![Page 25: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/25.jpg)
Unsupervised Separation II
• Complex sounds need more than one basis vector
• Difficult to control which basis vector explain which source
• No way to control the factorization other than F, T, and K
![Page 26: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/26.jpg)
Unsupervised Separation II
• Complex sounds need more than one basis vector
• Difficult to control which basis vector explain which source
• No way to control the factorization other than F, T, and K
![Page 27: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/27.jpg)
Unsupervised Separation II
• Complex sounds need more than one basis vector
• Difficult to control which basis vector explain which source
• No way to control the factorization other than F, T, and K
![Page 28: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/28.jpg)
Supervised Separation
General idea:
1 Use isolated training data of each source within a mixture topre-learn individual models of each source [SRS07]
2 Given a mixture, use the pre-learned models for separation
![Page 29: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/29.jpg)
Supervised Separation
General idea:
1 Use isolated training data of each source within a mixture topre-learn individual models of each source [SRS07]
2 Given a mixture, use the pre-learned models for separation
![Page 30: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/30.jpg)
Supervised Separation I
Example:Drum + Bass
Fre
qu
en
cy
Time
Drum and Bass Loop play stop
![Page 31: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/31.jpg)
Supervised Separation II
Use isolated training data to learn factorization for each source
Bass Loop play stop
Basis Vectors
Freq
ue
ncy (H
z)
Basis
Drum + Bass
Freq
uenc
y
Time
0 1 2 3 4 5 6 7
Activat
ions
Time (seconds)
V1 ≈W1H1
Drum Loop play stop
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
Drum + Bass
Freq
uenc
y
Time
Activat
ions
Time (seconds)
V2 ≈W2H2
![Page 32: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/32.jpg)
Supervised Separation II
Use isolated training data to learn factorization for each source
Bass Loop play stop
Basis Vectors
Freq
ue
ncy (H
z)
Basis
Drum + Bass
Freq
uenc
y
Time
0 1 2 3 4 5 6 7
Activat
ions
Time (seconds)
V1 ≈W1H1
Drum Loop play stop
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
Drum + Bass
Freq
uenc
y
Time
Activat
ions
Time (seconds)
V2 ≈W2H2
![Page 33: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/33.jpg)
Supervised Separation III
Throw away the activations H1 and H2
Bass Loop play stop
Basis Vectors
Freq
ue
ncy (H
z)
Basis
Drum + Bass
Freq
uenc
y
Time
V1 ≈W1H1
Drum Loop play stop
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
Drum + Bass
Freq
uenc
y
Time
V2 ≈W2H2
![Page 34: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/34.jpg)
Supervised Separation IV
Concatenate basis vectors of each source for complete dictionary
Basis Vectors
Fre
qu
ency (H
z)
Basis
Drum + Bass
Freq
uenc
y
Time
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
Drum + Bass
Freq
uenc
y
Time
W ≈[W1 W2
]=
Basis Vectors
Freq
uency (H
z)
Basis
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
![Page 35: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/35.jpg)
Supervised Separation VNow, factorize the mixture with W fixed (only estimate H)
Drum + Bass
Fre
qu
en
cy
Time
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
V ≈ W H
≈[W1 W2
] [H
T
1
HT
2
]
![Page 36: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/36.jpg)
Supervised Separation VNow, factorize the mixture with W fixed (only estimate H)
Drum + Bass
Fre
qu
en
cy
Time
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
V ≈ W H
≈[W1 W2
] [H
T
1
HT
2
]
![Page 37: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/37.jpg)
Complete Supervised Process
1 Use isolated training data to learn a factorization (WsHs) foreach source s
2 Throw away activations Hs for each source s
3 Concatenate basis vectors of each source (W1,W2,...) forcomplete dictionary W
4 Hold W fixed, and factorize unknown mixture of sources V(only estimate H)
5 Once complete, use W and H as before to filter and separateeach source
![Page 38: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/38.jpg)
Complete Supervised Process
1 Use isolated training data to learn a factorization (WsHs) foreach source s
2 Throw away activations Hs for each source s
3 Concatenate basis vectors of each source (W1,W2,...) forcomplete dictionary W
4 Hold W fixed, and factorize unknown mixture of sources V(only estimate H)
5 Once complete, use W and H as before to filter and separateeach source
![Page 39: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/39.jpg)
Complete Supervised Process
1 Use isolated training data to learn a factorization (WsHs) foreach source s
2 Throw away activations Hs for each source s
3 Concatenate basis vectors of each source (W1,W2,...) forcomplete dictionary W
4 Hold W fixed, and factorize unknown mixture of sources V(only estimate H)
5 Once complete, use W and H as before to filter and separateeach source
![Page 40: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/40.jpg)
Complete Supervised Process
1 Use isolated training data to learn a factorization (WsHs) foreach source s
2 Throw away activations Hs for each source s
3 Concatenate basis vectors of each source (W1,W2,...) forcomplete dictionary W
4 Hold W fixed, and factorize unknown mixture of sources V(only estimate H)
5 Once complete, use W and H as before to filter and separateeach source
![Page 41: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/41.jpg)
Complete Supervised Process
1 Use isolated training data to learn a factorization (WsHs) foreach source s
2 Throw away activations Hs for each source s
3 Concatenate basis vectors of each source (W1,W2,...) forcomplete dictionary W
4 Hold W fixed, and factorize unknown mixture of sources V(only estimate H)
5 Once complete, use W and H as before to filter and separateeach source
![Page 42: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/42.jpg)
Sound Examples
Drum + Bass
Fre
quen
cy
Time
Fre
quen
cy
Time
Layer from Source 1
Fre
quen
cy
Time
Layer from Source 2
Mixture sound (left) p s and separated drums p s and bass p s .
Freq
uenc
y
Time
Mask for Source 1
Freq
uenc
y
Time
Mask for Source 2
Masking filters used to process mixture into the separated sources.
![Page 43: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/43.jpg)
Question
• What if you don’t have isolated training data for each source?
• And unsupervised separation still doesn’t work?
![Page 44: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/44.jpg)
Question
• What if you don’t have isolated training data for each source?
• And unsupervised separation still doesn’t work?
![Page 45: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/45.jpg)
Semi-Supervised Separation
General Idea:
1 Learn supervised dictionaries for as many sources as you can[SRS07]
2 Infer remaining unknown dictionaries from the mixture(only fix certain columns of W)
![Page 46: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/46.jpg)
Semi-Supervised Separation
General Idea:
1 Learn supervised dictionaries for as many sources as you can[SRS07]
2 Infer remaining unknown dictionaries from the mixture(only fix certain columns of W)
![Page 47: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/47.jpg)
Semi-Supervised Separation I
Example:Drum + Bass
Fre
qu
en
cy
Time
Drum and Bass Loop play stop
![Page 48: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/48.jpg)
Semi-Supervised Separation II
Use isolated training data to learn factorization for as manysources as possible (e.g. one source)
Bass Loop play stop
Basis Vectors
Freq
ue
ncy (H
z)
Basis
Drum + Bass
Freq
uenc
yTime
0 1 2 3 4 5 6 7
Activat
ions
Time (seconds)
V1 ≈W1H1
![Page 49: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/49.jpg)
Semi-Supervised Separation III
Throw away the activations H1
Bass Loop play stop
Basis Vectors
Freq
ue
ncy (H
z)
Basis
Drum + Bass
Freq
uenc
y
Time
V1 ≈W1H1
![Page 50: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/50.jpg)
Semi-Supervised Separation IV
Concatenate known basis vectors with unknown basis vectors(initialized randomly) for complete dictionary
Basis Vectors
Freq
ue
ncy (H
z)
Basis
Drum + Bass
Freq
uenc
y
Time
Known bass basis vectors Unknown drum basis vectors
(initialized randomly)
W ≈[W1W2
]=
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
![Page 51: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/51.jpg)
Semi-Supervised Separation VNow, factorize the mixture with W1 fixed (estimate W2 and H)
Drum + Bass
Fre
qu
en
cy
Time
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
V ≈ W H
≈[W1 W2
] [H
T
1
HT
2
]
![Page 52: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/52.jpg)
Semi-Supervised Separation VNow, factorize the mixture with W1 fixed (estimate W2 and H)
Drum + Bass
Fre
qu
en
cy
Time
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
Basis Vectors
Fre
qu
en
cy (H
z)
Basis
V ≈ W H
≈[W1 W2
] [H
T
1
HT
2
]
![Page 53: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/53.jpg)
Complete Semi-Supervised Process
1 Use isolated training data to learn a factorization (WsHs) foras many sources s as possible
2 Throw away activations Hs for each known source s
3 Concatenate known basis vectors with random init vectors forunknown sources to construct complete dictionary W
4 Hold the columns of W fixed which correspond to knownsources, and factorize a mixture V (estimate H and anyknown column of W)
5 Once complete, use W and H as before to filter and separateeach source
![Page 54: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/54.jpg)
Complete Semi-Supervised Process
1 Use isolated training data to learn a factorization (WsHs) foras many sources s as possible
2 Throw away activations Hs for each known source s
3 Concatenate known basis vectors with random init vectors forunknown sources to construct complete dictionary W
4 Hold the columns of W fixed which correspond to knownsources, and factorize a mixture V (estimate H and anyknown column of W)
5 Once complete, use W and H as before to filter and separateeach source
![Page 55: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/55.jpg)
Complete Semi-Supervised Process
1 Use isolated training data to learn a factorization (WsHs) foras many sources s as possible
2 Throw away activations Hs for each known source s
3 Concatenate known basis vectors with random init vectors forunknown sources to construct complete dictionary W
4 Hold the columns of W fixed which correspond to knownsources, and factorize a mixture V (estimate H and anyknown column of W)
5 Once complete, use W and H as before to filter and separateeach source
![Page 56: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/56.jpg)
Complete Semi-Supervised Process
1 Use isolated training data to learn a factorization (WsHs) foras many sources s as possible
2 Throw away activations Hs for each known source s
3 Concatenate known basis vectors with random init vectors forunknown sources to construct complete dictionary W
4 Hold the columns of W fixed which correspond to knownsources, and factorize a mixture V (estimate H and anyknown column of W)
5 Once complete, use W and H as before to filter and separateeach source
![Page 57: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/57.jpg)
Complete Semi-Supervised Process
1 Use isolated training data to learn a factorization (WsHs) foras many sources s as possible
2 Throw away activations Hs for each known source s
3 Concatenate known basis vectors with random init vectors forunknown sources to construct complete dictionary W
4 Hold the columns of W fixed which correspond to knownsources, and factorize a mixture V (estimate H and anyknown column of W)
5 Once complete, use W and H as before to filter and separateeach source
![Page 58: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/58.jpg)
Sound ExamplesSupervised the bass.
Drum + BassF
requ
ency
Time
Freq
uenc
y
Time
Layer from Source 1
Freq
uenc
y
Time
Layer from Source 2
Mixture sound (left) p s and separated drums p s and bass p s .
Freq
uenc
y
Time
Mask for Source 1
Freq
uenc
y
Time
Mask for Source 2
Masking filters used to process mixture into the separated sources.
![Page 59: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/59.jpg)
Roadmap of Talk
1 Review
2 Further Insight
3 Supervised and Semi-Supervised Separation
4 Probabilistic Interpretation
5 Extensions
6 Evaluation
7 Future Research Directions
8 Matlab
![Page 60: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/60.jpg)
Probabilistic InterpretationSome notation:z indexes basis vectors, f frequency bins, and t time frames.
The model:For each time frame t, repeat the following:
• Choose a component from p(z|t). z−→
t −→
= H
• Choose a frequency from p(f |z). f−→
z −→
= W
The spectrogram Vft are the counts that we obtain at the end ofthe day. We want to estimate p(z|t) and p(f |z).
![Page 61: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/61.jpg)
Probabilistic InterpretationSome notation:z indexes basis vectors, f frequency bins, and t time frames.The model:For each time frame t, repeat the following:
• Choose a component from p(z|t). z−→
t −→
= H
• Choose a frequency from p(f |z). f−→
z −→
= W
The spectrogram Vft are the counts that we obtain at the end ofthe day. We want to estimate p(z|t) and p(f |z).
![Page 62: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/62.jpg)
Probabilistic InterpretationSome notation:z indexes basis vectors, f frequency bins, and t time frames.The model:For each time frame t, repeat the following:
• Choose a component from p(z|t). z−→
t −→
= H
• Choose a frequency from p(f |z). f−→
z −→
= W
The spectrogram Vft are the counts that we obtain at the end ofthe day. We want to estimate p(z|t) and p(f |z).
![Page 63: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/63.jpg)
Probabilistic InterpretationSome notation:z indexes basis vectors, f frequency bins, and t time frames.The model:For each time frame t, repeat the following:
• Choose a component from p(z|t). z−→
t −→
= H
• Choose a frequency from p(f |z). f−→
z −→
= W
The spectrogram Vft are the counts that we obtain at the end ofthe day. We want to estimate p(z|t) and p(f |z).
![Page 64: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/64.jpg)
Probabilistic InterpretationSome notation:z indexes basis vectors, f frequency bins, and t time frames.The model:For each time frame t, repeat the following:
• Choose a component from p(z|t). z−→
t −→
= H
• Choose a frequency from p(f |z). f−→
z −→
= W
The spectrogram Vft are the counts that we obtain at the end ofthe day. We want to estimate p(z|t) and p(f |z).
![Page 65: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/65.jpg)
Probabilistic Interpretation
Is this realistic?
• We’re assuming the spectrogram contains counts. We sample“quanta” of spectral energy at a time.
• This model is popular in topic modeling, where we assumedocuments are generated from first sampling a topic fromp(z|d) and then a word from p(w|z).
• probabilistic latent semantic indexing, or pLSI [Hof99]• latent Dirichlet allocation, or LDA [BNJ03]
• In audio, this model is called probabilistic latent componentanalysis, or PLCA [SRS06]
![Page 66: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/66.jpg)
Probabilistic Interpretation
Is this realistic?
• We’re assuming the spectrogram contains counts. We sample“quanta” of spectral energy at a time.
• This model is popular in topic modeling, where we assumedocuments are generated from first sampling a topic fromp(z|d) and then a word from p(w|z).
• probabilistic latent semantic indexing, or pLSI [Hof99]• latent Dirichlet allocation, or LDA [BNJ03]
• In audio, this model is called probabilistic latent componentanalysis, or PLCA [SRS06]
![Page 67: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/67.jpg)
Probabilistic Interpretation
Is this realistic?
• We’re assuming the spectrogram contains counts. We sample“quanta” of spectral energy at a time.
• This model is popular in topic modeling, where we assumedocuments are generated from first sampling a topic fromp(z|d) and then a word from p(w|z).
• probabilistic latent semantic indexing, or pLSI [Hof99]
• latent Dirichlet allocation, or LDA [BNJ03]
• In audio, this model is called probabilistic latent componentanalysis, or PLCA [SRS06]
![Page 68: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/68.jpg)
Probabilistic Interpretation
Is this realistic?
• We’re assuming the spectrogram contains counts. We sample“quanta” of spectral energy at a time.
• This model is popular in topic modeling, where we assumedocuments are generated from first sampling a topic fromp(z|d) and then a word from p(w|z).
• probabilistic latent semantic indexing, or pLSI [Hof99]• latent Dirichlet allocation, or LDA [BNJ03]
• In audio, this model is called probabilistic latent componentanalysis, or PLCA [SRS06]
![Page 69: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/69.jpg)
Probabilistic Interpretation
Is this realistic?
• We’re assuming the spectrogram contains counts. We sample“quanta” of spectral energy at a time.
• This model is popular in topic modeling, where we assumedocuments are generated from first sampling a topic fromp(z|d) and then a word from p(w|z).
• probabilistic latent semantic indexing, or pLSI [Hof99]• latent Dirichlet allocation, or LDA [BNJ03]
• In audio, this model is called probabilistic latent componentanalysis, or PLCA [SRS06]
![Page 70: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/70.jpg)
Latent Variable Model
We only observe the outcomes Vft. But the full model involvesunobserved variables Z.
p(z|t) Z
p(f |z)
F
NT
The Expectation-Maximization (EM) algorithm is used to fitlatent variable models. It is also used in estimating Hidden MarkovModels, Gaussian mixture models, etc.
![Page 71: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/71.jpg)
Latent Variable Model
We only observe the outcomes Vft. But the full model involvesunobserved variables Z.
p(z|t) Z
p(f |z)
F
NT
The Expectation-Maximization (EM) algorithm is used to fitlatent variable models. It is also used in estimating Hidden MarkovModels, Gaussian mixture models, etc.
![Page 72: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/72.jpg)
Latent Variable Model
We only observe the outcomes Vft. But the full model involvesunobserved variables Z.
p(z|t) Z
p(f |z)
F
NT
The Expectation-Maximization (EM) algorithm is used to fitlatent variable models. It is also used in estimating Hidden MarkovModels, Gaussian mixture models, etc.
![Page 73: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/73.jpg)
Maximum Likelihood EstimationTo fit the parameters, we choose the parameters that maximize thelikelihood of the data. Let’s zoom in on a single time frame:
p(v1, ..., vF ) =(∑
f vf )!
v1!...vF !
F∏f=1
p(f |t)vf
According to the model on the previous slide, the frequency couldhave come from any of the latent components. We don’t observethis so we average over all of them.
p(f |t) =∑z
p(z|t)p(f |z)
Putting it all together, we obtain:
p(v1, ..., vF ) =(∑
f vf )!
v1!...vF !
F∏f=1
(∑z
p(z|t)p(f |z))vf
![Page 74: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/74.jpg)
Maximum Likelihood EstimationTo fit the parameters, we choose the parameters that maximize thelikelihood of the data. Let’s zoom in on a single time frame:
p(v1, ..., vF ) =(∑
f vf )!
v1!...vF !
F∏f=1
p(f |t)vf
According to the model on the previous slide, the frequency couldhave come from any of the latent components. We don’t observethis so we average over all of them.
p(f |t) =∑z
p(z|t)p(f |z)
Putting it all together, we obtain:
p(v1, ..., vF ) =(∑
f vf )!
v1!...vF !
F∏f=1
(∑z
p(z|t)p(f |z))vf
![Page 75: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/75.jpg)
Maximum Likelihood EstimationTo fit the parameters, we choose the parameters that maximize thelikelihood of the data. Let’s zoom in on a single time frame:
p(v1, ..., vF ) =(∑
f vf )!
v1!...vF !
F∏f=1
p(f |t)vf
According to the model on the previous slide, the frequency couldhave come from any of the latent components. We don’t observethis so we average over all of them.
p(f |t) =∑z
p(z|t)p(f |z)
Putting it all together, we obtain:
p(v1, ..., vF ) =(∑
f vf )!
v1!...vF !
F∏f=1
(∑z
p(z|t)p(f |z))vf
![Page 76: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/76.jpg)
Maximum Likelihood Estimation
p(v1, ..., vF ) =(∑
f vf )!
v1!...vF !
F∏f=1
(∑z
p(z|t)p(f |z))vf
• We want to maximize this over p(z|t) and p(f |z).
• In general, with probabilities it is easier to maximize the logthan the thing itself:
log p(v1, ..., vF ) =
F∑f=1
vf log
(∑z
p(z|t)p(f |z))
+ const.
• Remember from last week: First thing you should alwaystry is differentiate and set equal to zero. Does this work here?
![Page 77: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/77.jpg)
Maximum Likelihood Estimation
p(v1, ..., vF ) =(∑
f vf )!
v1!...vF !
F∏f=1
(∑z
p(z|t)p(f |z))vf
• We want to maximize this over p(z|t) and p(f |z).
• In general, with probabilities it is easier to maximize the logthan the thing itself:
log p(v1, ..., vF ) =
F∑f=1
vf log
(∑z
p(z|t)p(f |z))
+ const.
• Remember from last week: First thing you should alwaystry is differentiate and set equal to zero. Does this work here?
![Page 78: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/78.jpg)
Maximum Likelihood Estimation
p(v1, ..., vF ) =(∑
f vf )!
v1!...vF !
F∏f=1
(∑z
p(z|t)p(f |z))vf
• We want to maximize this over p(z|t) and p(f |z).
• In general, with probabilities it is easier to maximize the logthan the thing itself:
log p(v1, ..., vF ) =
F∑f=1
vf log
(∑z
p(z|t)p(f |z))
+ const.
• Remember from last week: First thing you should alwaystry is differentiate and set equal to zero. Does this work here?
![Page 79: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/79.jpg)
The Connection to NMF
• Last week, we talked about minimizing the KL divergencebetween V and WH.
D(V ||WH) = −∑f,t
Vft log
(∑z
WfzHzt
)+∑f,t
∑z
WfzHzt+const.
• Compare with maximizing the log-likelihood:
log p(v1, ..., vF ) =
F∑f=1
vf log
(∑z
p(z|t)p(f |z))
+ const.
subject to∑
z p(z|t) = 1 and∑
f p(f |z) = 1.
• Last week, we used majorization-minimization on D(V ||WH):
− log
(∑z
φftzWfzHzt
φftz
)≤ −
∑z
φftz logWfzHzt
φftz
• Now watch what we do with the log-likelihood....
![Page 80: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/80.jpg)
The Connection to NMF
• Last week, we talked about minimizing the KL divergencebetween V and WH.
D(V ||WH) = −∑f,t
Vft log
(∑z
WfzHzt
)+∑f,t
∑z
WfzHzt+const.
• Compare with maximizing the log-likelihood:
log p(v1, ..., vF ) =
F∑f=1
vf log
(∑z
p(z|t)p(f |z))
+ const.
subject to∑
z p(z|t) = 1 and∑
f p(f |z) = 1.
• Last week, we used majorization-minimization on D(V ||WH):
− log
(∑z
φftzWfzHzt
φftz
)≤ −
∑z
φftz logWfzHzt
φftz
• Now watch what we do with the log-likelihood....
![Page 81: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/81.jpg)
The Connection to NMF
• Last week, we talked about minimizing the KL divergencebetween V and WH.
D(V ||WH) = −∑f,t
Vft log
(∑z
WfzHzt
)+∑f,t
∑z
WfzHzt+const.
• Compare with maximizing the log-likelihood:
log p(v1, ..., vF ) =
F∑f=1
vf log
(∑z
p(z|t)p(f |z))
+ const.
subject to∑
z p(z|t) = 1 and∑
f p(f |z) = 1.
• Last week, we used majorization-minimization on D(V ||WH):
− log
(∑z
φftzWfzHzt
φftz
)≤ −
∑z
φftz logWfzHzt
φftz
• Now watch what we do with the log-likelihood....
![Page 82: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/82.jpg)
The Connection to NMF
• Last week, we talked about minimizing the KL divergencebetween V and WH.
D(V ||WH) = −∑f,t
Vft log
(∑z
WfzHzt
)+∑f,t
∑z
WfzHzt+const.
• Compare with maximizing the log-likelihood:
log p(v1, ..., vF ) =
F∑f=1
vf log
(∑z
p(z|t)p(f |z))
+ const.
subject to∑
z p(z|t) = 1 and∑
f p(f |z) = 1.
• Last week, we used majorization-minimization on D(V ||WH):
− log
(∑z
φftzWfzHzt
φftz
)≤ −
∑z
φftz logWfzHzt
φftz
• Now watch what we do with the log-likelihood....
![Page 83: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/83.jpg)
The Connection to NMF
• Last week, we talked about minimizing the KL divergencebetween V and WH.
D(V ||WH) = −∑f,t
Vft log
(∑z
WfzHzt
)+∑f,t
∑z
WfzHzt+const.
• Compare with maximizing the log-likelihood:
log p(v1, ..., vF ) =
F∑f=1
vf log
(∑z
p(z|t)p(f |z))
+ const.
subject to∑
z p(z|t) = 1 and∑
f p(f |z) = 1.
• Last week, we used majorization-minimization on D(V ||WH):
− log
(∑z
φftzWfzHzt
φftz
)≤ −
∑z
φftz logWfzHzt
φftz
• Now watch what we do with the log-likelihood....
![Page 84: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/84.jpg)
EM Algorithm
• Suppose we observed the latent component for a frequencyquanta. Then we wouldn’t need to average over thecomponents; its log-likelihood would be:
log p(z|t)p(f |z)
• But we don’t know the latent component, so let’s average thisover our best guess of the probability of each component:∑
z
p(z|f, t) log p(z|t)p(f |z)
• In summary, we’ve replaced
log
(∑z
p(z|t)p(f |z))
by∑z
p(z|f, t) log p(z|t)p(f |z)
Look familiar?
![Page 85: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/85.jpg)
EM Algorithm
• Suppose we observed the latent component for a frequencyquanta. Then we wouldn’t need to average over thecomponents; its log-likelihood would be:
log p(z|t)p(f |z)
• But we don’t know the latent component, so let’s average thisover our best guess of the probability of each component:∑
z
p(z|f, t) log p(z|t)p(f |z)
• In summary, we’ve replaced
log
(∑z
p(z|t)p(f |z))
by∑z
p(z|f, t) log p(z|t)p(f |z)
Look familiar?
![Page 86: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/86.jpg)
EM Algorithm
• Suppose we observed the latent component for a frequencyquanta. Then we wouldn’t need to average over thecomponents; its log-likelihood would be:
log p(z|t)p(f |z)
• But we don’t know the latent component, so let’s average thisover our best guess of the probability of each component:∑
z
p(z|f, t) log p(z|t)p(f |z)
• In summary, we’ve replaced
log
(∑z
p(z|t)p(f |z))
by∑z
p(z|f, t) log p(z|t)p(f |z)
Look familiar?
![Page 87: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/87.jpg)
EM Algorithm
E-step: Calculate
p(z|f, t) =p(z|t)p(f |z)∑z p(z|t)p(f |z)
M-step: Maximize∑f,t
Vft∑z
p(z|f, t) log p(z|t)p(f |z)
Majorization: Calculate
φftz =WfzHzt∑zWfzHzt
Minimization: Minimize
−∑f,t
Vft∑z
φzft logWfzHzt+∑f,t,z
WfzHzt
The EM updates are exactly the multiplicative updates for NMF,up to normalization!The EM algorithm is a special case of MM, where the minorizingfunction is the expected conditional log likelihood.
![Page 88: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/88.jpg)
EM Algorithm
E-step: Calculate
p(z|f, t) =p(z|t)p(f |z)∑z p(z|t)p(f |z)
M-step: Maximize∑f,t
Vft∑z
p(z|f, t) log p(z|t)p(f |z)
Majorization: Calculate
φftz =WfzHzt∑zWfzHzt
Minimization: Minimize
−∑f,t
Vft∑z
φzft logWfzHzt+∑f,t,z
WfzHzt
The EM updates are exactly the multiplicative updates for NMF,up to normalization!The EM algorithm is a special case of MM, where the minorizingfunction is the expected conditional log likelihood.
![Page 89: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/89.jpg)
EM Algorithm
E-step: Calculate
p(z|f, t) =p(z|t)p(f |z)∑z p(z|t)p(f |z)
M-step: Maximize∑f,t
Vft∑z
p(z|f, t) log p(z|t)p(f |z)
Majorization: Calculate
φftz =WfzHzt∑zWfzHzt
Minimization: Minimize
−∑f,t
Vft∑z
φzft logWfzHzt+∑f,t,z
WfzHzt
The EM updates are exactly the multiplicative updates for NMF,up to normalization!
The EM algorithm is a special case of MM, where the minorizingfunction is the expected conditional log likelihood.
![Page 90: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/90.jpg)
EM Algorithm
E-step: Calculate
p(z|f, t) =p(z|t)p(f |z)∑z p(z|t)p(f |z)
M-step: Maximize∑f,t
Vft∑z
p(z|f, t) log p(z|t)p(f |z)
Majorization: Calculate
φftz =WfzHzt∑zWfzHzt
Minimization: Minimize
−∑f,t
Vft∑z
φzft logWfzHzt+∑f,t,z
WfzHzt
The EM updates are exactly the multiplicative updates for NMF,up to normalization!The EM algorithm is a special case of MM, where the minorizingfunction is the expected conditional log likelihood.
![Page 91: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/91.jpg)
Geometric Interpretation
• We can think of the basis vectors p(f |z) as lying on aprobability simplex.
• The possible sounds for a given source is the convex hull ofthe basis vectors for that source.
![Page 92: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/92.jpg)
Geometric Interpretation
• We can think of the basis vectors p(f |z) as lying on aprobability simplex.
• The possible sounds for a given source is the convex hull ofthe basis vectors for that source.
![Page 93: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/93.jpg)
Geometric Interpretation
In supervised separation, we try to explain time frames of themixture signal as combinations of the basis vectors of the differentsources.
![Page 94: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/94.jpg)
Roadmap of Talk
1 Review
2 Further Insight
3 Supervised and Semi-Supervised Separation
4 Probabilistic Interpretation
5 Extensions
6 Evaluation
7 Future Research Directions
8 Matlab
![Page 95: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/95.jpg)
Extensions
• The number of parameters that need to be estimated is huge:FK +KT .
• In high-dimensional settings, it is useful to impose additionalstructure.
• We will look at two ways to do this: priors andregularization.
![Page 96: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/96.jpg)
Extensions
• The number of parameters that need to be estimated is huge:FK +KT .
• In high-dimensional settings, it is useful to impose additionalstructure.
• We will look at two ways to do this: priors andregularization.
![Page 97: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/97.jpg)
Extensions
• The number of parameters that need to be estimated is huge:FK +KT .
• In high-dimensional settings, it is useful to impose additionalstructure.
• We will look at two ways to do this: priors andregularization.
![Page 98: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/98.jpg)
Priors
• Assume the parameters are also random, e.g., H = p(z|t) isgenerated from p(H|α). This is called a prior distribution.
α H
p(z|t)
Z
p(f |z)
F
NT
• Estimate the posterior distribution p(H|α, V ).
• Bayes’ rule: p(H|α, V ) =p(H,V |α)
p(V |α)=p(H|α)p(V |H)
p(V |α)
![Page 99: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/99.jpg)
Priors
• Assume the parameters are also random, e.g., H = p(z|t) isgenerated from p(H|α). This is called a prior distribution.
α H
p(z|t)
Z
p(f |z)
F
NT
• Estimate the posterior distribution p(H|α, V ).
• Bayes’ rule: p(H|α, V ) =p(H,V |α)
p(V |α)=p(H|α)p(V |H)
p(V |α)
![Page 100: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/100.jpg)
Priors
• Assume the parameters are also random, e.g., H = p(z|t) isgenerated from p(H|α). This is called a prior distribution.
α H
p(z|t)
Z
p(f |z)
F
NT
• Estimate the posterior distribution p(H|α, V ).
• Bayes’ rule: p(H|α, V ) =p(H,V |α)
p(V |α)=p(H|α)p(V |H)
p(V |α)
![Page 101: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/101.jpg)
Priors
• Assume the parameters are also random, e.g., H = p(z|t) isgenerated from p(H|α). This is called a prior distribution.
α H
p(z|t)
Z
p(f |z)
F
NT
• Estimate the posterior distribution p(H|α, V ).
• Bayes’ rule: p(H|α, V ) =p(H,V |α)
p(V |α)=p(H|α)p(V |H)
p(V |α)
![Page 102: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/102.jpg)
Bayesian Inference
• Bayes’ rule gives us an entire distribution over H = p(z|t).
• One option is the posterior mean: computationallyintractable.
• An easier option is the posterior mode (MAP):
maximizeH
log p(H|α, V ) = log p(H|α) + log p(V |H)− p(V |α)
![Page 103: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/103.jpg)
Bayesian Inference
• Bayes’ rule gives us an entire distribution over H = p(z|t).
• One option is the posterior mean: computationallyintractable.
• An easier option is the posterior mode (MAP):
maximizeH
log p(H|α, V ) = log p(H|α) + log p(V |H)− p(V |α)
![Page 104: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/104.jpg)
Bayesian Inference
• Bayes’ rule gives us an entire distribution over H = p(z|t).
• One option is the posterior mean: computationallyintractable.
• An easier option is the posterior mode (MAP):
maximizeH
log p(H|α, V ) = log p(H|α) + log p(V |H)− p(V |α)
![Page 105: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/105.jpg)
Bayesian Inference
• Bayes’ rule gives us an entire distribution over H = p(z|t).
• One option is the posterior mean: computationallyintractable.
• An easier option is the posterior mode (MAP):
maximizeH
log p(H|α, V ) = log p(H|α)︸ ︷︷ ︸log prior
+ log p(V |H)︸ ︷︷ ︸likelihood
−p(V |α)
• We can choose priors that encode structural assumptions, likesparsity.
![Page 106: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/106.jpg)
Bayesian Inference
• Bayes’ rule gives us an entire distribution over H = p(z|t).
• One option is the posterior mean: computationallyintractable.
• An easier option is the posterior mode (MAP):
maximizeH
log p(H|α, V ) = log p(H|α)︸ ︷︷ ︸log prior
+ log p(V |H)︸ ︷︷ ︸likelihood
−p(V |α)
• We can choose priors that encode structural assumptions, likesparsity.
![Page 107: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/107.jpg)
Regularization Viewpoint
• Another way is to add another term to the objective function:
minimizeW,H≥0
D(V ||WH) + λΩ(H)
Ω encodes the desired structure, λ controls the strength.
• We showed earlier that D(V ||WH) is the negative loglikelihood. So:
λΩ(H)⇐⇒ − log p(H|α)
• Some common choices for Ω(H):
• sparsity: ||H||1 =∑
z,t |Hzt|• smoothness:
∑z,t(Hz,t −Hz,t−1)2
![Page 108: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/108.jpg)
Regularization Viewpoint
• Another way is to add another term to the objective function:
minimizeW,H≥0
D(V ||WH) + λΩ(H)
Ω encodes the desired structure, λ controls the strength.
• We showed earlier that D(V ||WH) is the negative loglikelihood. So:
λΩ(H)⇐⇒ − log p(H|α)
• Some common choices for Ω(H):
• sparsity: ||H||1 =∑
z,t |Hzt|• smoothness:
∑z,t(Hz,t −Hz,t−1)2
![Page 109: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/109.jpg)
Regularization Viewpoint
• Another way is to add another term to the objective function:
minimizeW,H≥0
D(V ||WH) + λΩ(H)
Ω encodes the desired structure, λ controls the strength.
• We showed earlier that D(V ||WH) is the negative loglikelihood. So:
λΩ(H)⇐⇒ − log p(H|α)
• Some common choices for Ω(H):
• sparsity: ||H||1 =∑
z,t |Hzt|• smoothness:
∑z,t(Hz,t −Hz,t−1)2
![Page 110: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/110.jpg)
Regularization Viewpoint
• Another way is to add another term to the objective function:
minimizeW,H≥0
D(V ||WH) + λΩ(H)
Ω encodes the desired structure, λ controls the strength.
• We showed earlier that D(V ||WH) is the negative loglikelihood. So:
λΩ(H)⇐⇒ − log p(H|α)
• Some common choices for Ω(H):• sparsity: ||H||1 =
∑z,t |Hzt|
• smoothness:∑
z,t(Hz,t −Hz,t−1)2
![Page 111: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/111.jpg)
Regularization Viewpoint
• Another way is to add another term to the objective function:
minimizeW,H≥0
D(V ||WH) + λΩ(H)
Ω encodes the desired structure, λ controls the strength.
• We showed earlier that D(V ||WH) is the negative loglikelihood. So:
λΩ(H)⇐⇒ − log p(H|α)
• Some common choices for Ω(H):• sparsity: ||H||1 =
∑z,t |Hzt|
• smoothness:∑
z,t(Hz,t −Hz,t−1)2
![Page 112: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/112.jpg)
Roadmap of Talk
1 Review
2 Further Insight
3 Supervised and Semi-Supervised Separation
4 Probabilistic Interpretation
5 Extensions
6 Evaluation
7 Future Research Directions
8 Matlab
![Page 113: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/113.jpg)
Evaluation Measures
• Signal-to-Interference Ratio (SIR)
• Signal-to-Artifact Ratio (SAR)
• Signal-to-Distortion Ratio (SDR)
We want all of these metrics to be as high as possible [VGF06]
![Page 114: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/114.jpg)
Evaluation Measures
To compute these three measures, we must obtain:
• s ∈ RT×N original unmixed signals (ground truth)
• s ∈ RT×N estimated separated sources
Then, we decompose these signals into
• starget — actual source estimate
• einterf — interference signal (i.e. the unwanted source)
• eartif — artifacts of the separation algorithm
![Page 115: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/115.jpg)
Evaluation Measures
To compute starget, einterf , and eartif
• starget = Psj sj
• einterf = Pssj − Psj sj• eartif = sj − Pssj
where Psj and Ps are T × T projection matrices
![Page 116: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/116.jpg)
Signal-to-Interference Ratio (SIR)
A measure of the suppression of the unwanted source
SIR = 10 log10||starget||2||einterf ||2
![Page 117: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/117.jpg)
Signal-to-Artifact Ratio (SAR)
A measure of the artifacts that have been introduced by theseparation process
SAR = 10 log10||starget + einterf ||2
||eartif ||2
![Page 118: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/118.jpg)
Signal-to-Distortion Ratio (SDR)
An overall measure that takes into account both the SIR and SAR
SDR = 10 log10||starget||2
||eartif + einterf ||2
![Page 119: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/119.jpg)
Selecting Hyperparameters using BSS Eval Metrics
• One problem with NMF is the need to specify the number ofbasis vectors K.
• Even more parameters if you include regularization.
• BSS eval metrics give us a way to learn the optimal settingsfor source separation.
• Generate synthetic mixtures, try different parameter settings,and choose the parameters that give the best BSS evalmetrics.
![Page 120: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/120.jpg)
Selecting Hyperparameters using BSS Eval Metrics
• One problem with NMF is the need to specify the number ofbasis vectors K.
• Even more parameters if you include regularization.
• BSS eval metrics give us a way to learn the optimal settingsfor source separation.
• Generate synthetic mixtures, try different parameter settings,and choose the parameters that give the best BSS evalmetrics.
![Page 121: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/121.jpg)
Selecting Hyperparameters using BSS Eval Metrics
• One problem with NMF is the need to specify the number ofbasis vectors K.
• Even more parameters if you include regularization.
• BSS eval metrics give us a way to learn the optimal settingsfor source separation.
• Generate synthetic mixtures, try different parameter settings,and choose the parameters that give the best BSS evalmetrics.
![Page 122: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/122.jpg)
Selecting Hyperparameters using BSS Eval Metrics
• One problem with NMF is the need to specify the number ofbasis vectors K.
• Even more parameters if you include regularization.
• BSS eval metrics give us a way to learn the optimal settingsfor source separation.
• Generate synthetic mixtures, try different parameter settings,and choose the parameters that give the best BSS evalmetrics.
![Page 123: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/123.jpg)
BSS Eval Toolbox
A Matlab tool box for source separation evaluation [VGF06]:
http://bass-db.gforge.inria.fr/bss_eval/
![Page 124: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/124.jpg)
Roadmap of Talk
1 Review
2 Further Insight
3 Supervised and Semi-Supervised Separation
4 Probabilistic Interpretation
5 Extensions
6 Evaluation
7 Future Research Directions
8 Matlab
![Page 125: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/125.jpg)
Research Directions
• Score-informed separation - sheet music
• Interactive separation - user-interaction
• Temporal dynamics - how sounds change over time
• Unsupervised separation - grouping basis vectors, clustering
• Phase estimation - complex NMF, STFT constraints, etc.
• Universal models - big data for general models of sources
![Page 126: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/126.jpg)
Demos
• Universal Speech Models
• Interactive Source Separation• Drums + Bass• Guitar + Vocals + AutoTune• Jackson 5 Remixed
![Page 127: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/127.jpg)
STFT
x1 = wavread(‘bass’);
x2 = wavread(‘drums’);
[xm fs] = wavread(‘drums+bass’);
FFTSIZE = 1024;
HOPSIZE = 256;
WINDOWSIZE = 512;
X1 = myspectrogram(x1,FFTSIZE,fs,hann(WINDOWSIZE),-HOPSIZE);
V1 = abs(X1(1:(FFTSIZE/2+1),:));
X2 = myspectrogram(x2,FFTSIZE,fs,hann(WINDOWSIZE),-HOPSIZE);
V2 = abs(X2(1:(FFTSIZE/2+1),:));
Xm = myspectrogram(xm,FFTSIZE,fs,hann(WINDOWSIZE),-HOPSIZE);
Vm = abs(Xm(1:(FFTSIZE/2+1),:)); maxV = max(max(db(Vm)));
F = size(Vm,1);
T = size(Vm,2);
• https://ccrma.stanford.edu/~jos/sasp/Matlab_
listing_myspectrogram_m.html
• https://ccrma.stanford.edu/~jos/sasp/Matlab_
listing_invmyspectrogram_m.html
![Page 128: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/128.jpg)
NMFK = [25 25]; % number of basis vectors
MAXITER = 500; % total number of iterations to run
[W1, H1] = nmf(V1, K(1), [], MAXITER,[]);
[W2, H2] = nmf(V2, K(2), [], MAXITER,[]);
[W, H] = nmf(Vm, K, [W1 W2], MAXITER, 1:sum(K));
function [W, H] = nmf(V, K, W, MAXITER, fixedInds)
F = size(V,1); T = size(V,2);
rand(’seed’,0)
if isempty(W)
W = 1+rand(F, sum(K));
end
H = 1+rand(sum(K), T);
inds = setdiff(1:sum(K),fixedInds);
ONES = ones(F,T);
for i=1:MAXITER
% update activations
H = H .* (W’*( V./(W*H+eps))) ./ (W’*ONES);
% update dictionaries
W(:,inds) = W(:,inds) .* ((V./(W*H+eps))*H(inds,:)’) ./(ONES*H(inds,:)’);
end
% normalize W to sum to 1
sumW = sum(W);
W = W*diag(1./sumW);
H = diag(sumW)*H;
![Page 129: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/129.jpg)
FILTER & ISTFT
% get the mixture phase
phi = angle(Xm);
c = [1 cumsum(K)];
for i=1:length(K)
% create masking filter
Mask = W(:,c(i):c(i+1))*H(c(i):c(i+1),:)./(W*H);
% filter
XmagHat = Vm.*Mask;
% create upper half of frequency before istft
XmagHat = [XmagHat; conj( XmagHat(end-1:-1:2,:))];
% Multiply with phase
XHat = XmagHat.*exp(1i*phi);
% create upper half of frequency before istft
xhat(:,i) = real(invmyspectrogram(XmagHat.*exp(1i*phi),HOPSIZE))’;
end
![Page 130: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/130.jpg)
References I
David M. Blei, Andrew Y. Ng, and Michael I. Jordan, Latentdirichlet allocation, J. Mach. Learn. Res. 3 (2003), 993–1022.
T. Hofmann, Probabilistic latent semantic indexing,Proceedings of the 22nd annual international ACM SIGIRConference on Research and Development in InformationRetrieval (New York, NY, USA), SIGIR ’99, ACM, 1999,pp. 50–57.
P. Smaragdis and J.C. Brown, Non-negative matrixfactorization for polyphonic music transcription, IEEEWorkshop on Applications of Signal Processing to Audio andAcoustics (WASPAA), oct. 2003, pp. 177 – 180.
P. Smaragdis, B. Raj, and M. Shashanka, A ProbabilisticLatent Variable Model for Acoustic Modeling, Advances inNeural Information Processing Systems (NIPS), Workshop onAdvances in Modeling for Acoustic Processing, 2006.
![Page 131: Source Separation Tutorial Mini-Series III: Extensions and ...njb/teaching/sstutorial/part3.pdf · Source Separation Tutorial Mini-Series III: Extensions and Interpretations to Non-Negative](https://reader036.vdocuments.mx/reader036/viewer/2022063017/5fd84d4ce8d30b779e06af16/html5/thumbnails/131.jpg)
References II
, Supervised and semi-supervised separation of soundsfrom single-channel mixtures, International Conference onIndependent Component Analysis and Signal Separation(Berlin, Heidelberg), Springer-Verlag, 2007, pp. 414–421.
E. Vincent, R. Gribonval, and C. Fevotte, Performancemeasurement in blind audio source separation, IEEETransactions on Audio, Speech, and Language Processing 14(2006), no. 4, 1462 –1469.