biomedical signal processing --- application of ... · image? f. theis biomedical signal processing...
TRANSCRIPT
Biomedical signal processing — application ofoptimization methods for machine learning
problems
Fabian J. Theis
Computational Modeling in BiologyInstitute of Bioinformatics and Systems Biology
Helmholtz Zentrum Munchen
http://cmb.helmholtz-muenchen.de
Grenoble, 16-Sep-2008
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Data mining
cocktail-party problem
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Data mining
cocktail-party problem
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Data mining
cocktail-party problem
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Data miningcocktail-party problem
W
Neural
NetworkF. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Data mining
• mixture model x(t) = f(s(t))
• estimate mixing process f and sources s(t)
• often linear f = A
s(t) x(t) s(t)
W
Neural
NetworkF. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Outline
1 Supervised methodsMotivation 1: classificationMotivation 2: image segmentationStatistical decision theory
2 Unsupervised methodsClusteringk-meansPartitional clustering
3 Signal component analysisIndependent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
4 Conclusions
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Outline
1 Supervised methodsMotivation 1: classificationMotivation 2: image segmentationStatistical decision theory
2 Unsupervised methodsClusteringk-meansPartitional clustering
3 Signal component analysisIndependent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
4 Conclusions
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Motivation 1: classification
data analysis: classification
• decide between (two or multiple) classes s(t) ∈ 0, 1• learn by example
gf
?
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Neural networks
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Classification: example
• observations:• immunological data set• 30 cell parameters of 37
children with pulmonarydiseases
• goal• interpretation using
supervised andunsupervised analysis
• disease classification intochronic bronchitis orinterstitial lung disease
CB ⇔ ILD ?
cooperation with D. Hartl, Pediatric Immunology, Munich
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Classification: example
• observations:• immunological data set• 30 cell parameters of 37
children with pulmonarydiseases
• goal• interpretation using
supervised andunsupervised analysis
• disease classification intochronic bronchitis orinterstitial lung disease
CB ⇔ ILD ?
cooperation with D. Hartl, Pediatric Immunology, Munich
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Data visualization & dimension reduction
parameter interpretation?
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Data visualization & dimension reduction
0.07
0.248
0.426
d 0.166
1
1.89
d 2.8
8.23
13.6
d 15.5
30.3
48.1
d 22.8
35.9
49
d 0.395
1.81
3.3
d 0.104
1.8
4.5
d 3.2
19.1
37.8
d 0.718
10.9
22.1
d 1.39
16.3
32.1
d 30.4
57.2
82.6
d 6.84
16
25.5
d 4
27.9
53.8
d 1.35
3.28
5.22
d 196000
446000
699000
d 1
1.33
1.71
d 0.0623
4.44
9.29 CB(3)
CB(3)
ILD(1)
ILD(2)
ILD(3)CB(2)
CB(1)
CB(1)
CB(1)
ILD(1)
CB(1)
ILD(2)
ILD(1)
CB(1)
CB(1)
ILD(1)
ILD(2)
ILD(2)
ILD(1)
CB(2)ILD(1)
ILD(1)
ILD(2)CB(1)
nO(2)O(1)
nO(3)
x(1)
x(2)
x(3)O(2)
nO(1)
O(1)
O(1)
x(1)
O(1)
x(2)
x(1)
O(1)
O(1)
x(1)
x(2)
x(2)
x(1)
O(2)x(1)
x(1)
x(2)O(1)
K−means−Clusters
• visualization by self-organizing map network• topology-preserving nonlinear dimension reduction/scaling• detect new parameter dependencies
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Disease classification
dimension-reducingnetwork
z(i) = BsupervisedAunsup.x(i)results:
• down-scaling to 5 hiddenneurons suffices
• classification rate of > 90%
[Theis, Hartl, Krauss-Etschmann, Lang. Neural network signal analysis in immunology. Proc. ISSPA 2003.]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Disease classification
dimension-reducingnetwork
z(i) = BsupervisedAunsup.x(i)results:
• down-scaling to 5 hiddenneurons suffices
• classification rate of > 90%
[Theis, Hartl, Krauss-Etschmann, Lang. Neural network signal analysis in immunology. Proc. ISSPA 2003.]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Motivation 2: image segmentation
classification
• application in image processing
• ⇒ object classification
gf
?
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Motivation 2: image segmentation
Problem: Howmany labelled cellslie in this sectionimage?
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Biological background: neurogenesis
• adult neurogenesis• new neurons emerge even
in the adult human brain• level depends on external
stimuli• Are there neural ancestral
cells?
• goal• automated quantification
of neurogenesis in adultmice
cooperation with Z. Kohl, Department of Neurology, University of Regensburg
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Automated cell counting
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Automated cell counting
directional neural network
• train cell patch classifier ζusing directional neuralnetwork
• scan image using ζ to get cellpositions
• speed-up via hierarchicaland multiscale methods
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Automated cell counting
directional neural network
• train cell patch classifier ζusing directional neuralnetwork
• scan image using ζ to get cellpositions
• speed-up via hierarchicaland multiscale methods
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Results
• counting comparison with 2 experts (variability ±5%) yields90%± 4% accuracy
• application: considerable cell proliferation in hippocampus ofepileptic mice
[Theis, Kohl, Guggenberger, Kuhn, Lang. ZANE - an algorithm for counting labelled cells in section images. Proc. MEDSIP 2004]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Statistical decision theory
setup
• input: random vector X : Ω → Rp
• output: random vector Y : Ω → R or categorical output, possiblyY ∈ 0, 1
• input-output relation measured by joint density P(X ,Y )
• realization by samples (training data) (xi , yi ) for i = 1, . . . ,N
• often collected in (N × p)-matrix X and vector y ∈ RN
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Goal: prediction
• goal: learn classificator from training data ⇒predict y∗ for new sample x∗
−1 −0.5 0 0.5−1.5
−1
−0.5
0
0.5
1
1.5
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Linear model
y = β0 +
p∑j=1
xj βj
set x0 := 1, theny = x>β
least squares: minimize
RSS(β) =N∑
i=1
(yi − x>i β)2 = (y − Xβ)>(y − Xβ)
⇒ X>(y − Xβ) = 0 so
β = (X>X)−1X>y
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Linear model
y = β0 +
p∑j=1
xj βj
set x0 := 1, theny = x>β
least squares: minimize
RSS(β) =N∑
i=1
(yi − x>i β)2 = (y − Xβ)>(y − Xβ)
⇒ X>(y − Xβ) = 0 so
β = (X>X)−1X>y
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Linear model
y = β0 +
p∑j=1
xj βj
set x0 := 1, theny = x>β
least squares: minimize
RSS(β) =N∑
i=1
(yi − x>i β)2 = (y − Xβ)>(y − Xβ)
⇒ X>(y − Xβ) = 0 so
β = (X>X)−1X>y
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Linear model
y = β0 +
p∑j=1
xj βj
set x0 := 1, theny = x>β
least squares: minimize
RSS(β) =N∑
i=1
(yi − x>i β)2 = (y − Xβ)>(y − Xβ)
⇒ X>(y − Xβ) = 0 so
β = (X>X)−1X>y
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Linear model
−1 −0.5 0 0.5−1.5
−1
−0.5
0
0.5
1
1.5
decision boundary x |x>β = 1/2
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Linear model
nice, but what about more complex data?
−3 −2 −1 0 1 2−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
−3 −2 −1 0 1 2 3 4−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
(r = 2 and r = 10 Gaussians per class, σ = 0.2, with r means sampledfrom N((1, 0), I and N((0, 1), I), respectively)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Linear model
hm?
−3 −2 −1 0 1 2−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
−3 −2 −1 0 1 2 3 4−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
‘global’, linear model is too rigid
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Nearest-neighbor method
y =1
k
∑xi∈Nk (x)
yi
if Nk(x) equal the k closest points xi to x
• local model
• needs metric (here Euclidean)
• how to determine k?• smaller k ⇒ higher learning accuracy• larger k ⇒ smoother, higher generalizability• least-square learning would yield k = 1
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Nearest-neighbor method
y =1
k
∑xi∈Nk (x)
yi
if Nk(x) equal the k closest points xi to x
• local model
• needs metric (here Euclidean)
• how to determine k?• smaller k ⇒ higher learning accuracy• larger k ⇒ smoother, higher generalizability• least-square learning would yield k = 1
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Nearest-neighbor method, k = 10
−1 −0.5 0 0.5−1.5
−1
−0.5
0
0.5
1
1.5
−3 −2 −1 0 1 2−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
decision boundary x |y(x) = 1/2
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Nearest-neighbor method, k = 1, 2, 10
−3 −2 −1 0 1 2 3 4−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
−3 −2 −1 0 1 2 3 4−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
−3 −2 −1 0 1 2 3 4−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Statistical decisions
probabilistic view: P(X ,Y ) = P(Y |X )P(X )
find function f (X ) predicting Y as well as possible w.r.t. squared errorloss L(Y , f (X )) = (Y − f (X ))2
expected prediction error
EPE(f ) = E (Y−f (X ))2 =
∫(y−f (x))2P(dx , dy) = EXEY |X ((Y−f (X ))2|X )
pointwise minimization suffices
f (x) = argmincEY |X ((Y − c)2|X = x)
solved at conditional expectation (regression function)
f (x) = E (Y |X = x)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Statistical decisions
probabilistic view: P(X ,Y ) = P(Y |X )P(X )
find function f (X ) predicting Y as well as possible w.r.t. squared errorloss L(Y , f (X )) = (Y − f (X ))2
expected prediction error
EPE(f ) = E (Y−f (X ))2 =
∫(y−f (x))2P(dx , dy) = EXEY |X ((Y−f (X ))2|X )
pointwise minimization suffices
f (x) = argmincEY |X ((Y − c)2|X = x)
solved at conditional expectation (regression function)
f (x) = E (Y |X = x)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Statistical decisions
f (x) = E (Y |X = x)
can be estimated by
f (x) =1
k
∑xi∈Nk (x)
yi
• approximate expectation via sample averages
• approximate point conditioning to local conditioning
• note: f (x) → E (Y |X = x) for N,K →∞, k/N → 0
• but:• (very) finite samples• ‘curse’ of dimensionality
• fraction r of unit cube in p dimensions is covered by cube of edgelength ep(r) = r1/p
• e2(0.01) = 0.1, e2(0.1) = 0.32• e10(0.01) = 0.63, e10(0.1) = 0.80
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Statistical decisions
f (x) = E (Y |X = x)
can be estimated by
f (x) =1
k
∑xi∈Nk (x)
yi
• approximate expectation via sample averages
• approximate point conditioning to local conditioning
• note: f (x) → E (Y |X = x) for N,K →∞, k/N → 0
• but:• (very) finite samples• ‘curse’ of dimensionality
• fraction r of unit cube in p dimensions is covered by cube of edgelength ep(r) = r1/p
• e2(0.01) = 0.1, e2(0.1) = 0.32• e10(0.01) = 0.63, e10(0.1) = 0.80
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Statistical decisions
f (x) = E (Y |X = x)
can be estimated by
f (x) =1
k
∑xi∈Nk (x)
yi
• approximate expectation via sample averages
• approximate point conditioning to local conditioning
• note: f (x) → E (Y |X = x) for N,K →∞, k/N → 0
• but:• (very) finite samples• ‘curse’ of dimensionality
• fraction r of unit cube in p dimensions is covered by cube of edgelength ep(r) = r1/p
• e2(0.01) = 0.1, e2(0.1) = 0.32• e10(0.01) = 0.63, e10(0.1) = 0.80
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Statistical decisions
if instead for approximating f (x) = E (Y |X = x), we assume linear modelf (x) = x>β, we get
β = E (XX>)−1E (XY )
• no conditioning, global approximation
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Statistical decisions
if instead for approximating f (x) = E (Y |X = x), we assume linear modelf (x) = x>β, we get
β = E (XX>)−1E (XY )
• no conditioning, global approximation
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Statistical decisions for discrete Y
if Y ∈ 0, 1, consider loss function
L(Y , f (X )) =
0 if f(X)=Y1 otherwise
then EPE = EX
∑y∈0,1 L(y , f (X ))P(y |X ) and hence
Y (x) = argminy0∈0,1∑
y∈0,1
L(y , y0)P(y |X = x)
= argminy0∈0,1 1− P(y0|X = x)
which yields the Bayes classifier
Y (x) = argmaxy P(y |X = x)
question: how to model P(Y |X )?
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Statistical decisions for discrete Y
if Y ∈ 0, 1, consider loss function
L(Y , f (X )) =
0 if f(X)=Y1 otherwise
then EPE = EX
∑y∈0,1 L(y , f (X ))P(y |X ) and hence
Y (x) = argminy0∈0,1∑
y∈0,1
L(y , y0)P(y |X = x)
= argminy0∈0,1 1− P(y0|X = x)
which yields the Bayes classifier
Y (x) = argmaxy P(y |X = x)
question: how to model P(Y |X )?
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Statistical decisions for discrete Y
if Y ∈ 0, 1, consider loss function
L(Y , f (X )) =
0 if f(X)=Y1 otherwise
then EPE = EX
∑y∈0,1 L(y , f (X ))P(y |X ) and hence
Y (x) = argminy0∈0,1∑
y∈0,1
L(y , y0)P(y |X = x)
= argminy0∈0,1 1− P(y0|X = x)
which yields the Bayes classifier
Y (x) = argmaxy P(y |X = x)
question: how to model P(Y |X )?
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Bayes classifier results
−1 −0.5 0 0.5−1.5
−1
−0.5
0
0.5
1
1.5
−3 −2 −1 0 1 2−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
−3 −2 −1 0 1 2 3 4−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory
Method combinations
• nonlinear models e.g. f (x) =∑p
j=1 fj(xj) or basis expansionf (x) =
∑j hj(x)βj with polynomial, Fourier or sigmoidal bases (→
neural networks)
• prediction/function approximation by maximum-likelihoodestimation of parameters
• enhance generalizability by adding regularization term +λJ(f ) toRSS(f ) for f from some function class
• generalize inner-product methods to nonlinear situations byhigh-dimensional embedding x 7→ Φ(x) and kernelsk(x , x ′) = Φ(x)>Φ(x)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Outline
1 Supervised methodsMotivation 1: classificationMotivation 2: image segmentationStatistical decision theory
2 Unsupervised methodsClusteringk-meansPartitional clustering
3 Signal component analysisIndependent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
4 Conclusions
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Clustering
• explanation by example• goal: differentiate
hand-written digits 2 and4
• given a set of unknowngray-scale images of 2s and4s, find the subset of 2sand the subset of 4s
•• unsupervised learning by
example
•
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Clustering
• explanation by example• goal: differentiate
hand-written digits 2 and4
• given a set of unknowngray-scale images of 2s and4s, find the subset of 2sand the subset of 4s
• versus• unsupervised learning by
example
•
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Clustering
• explanation by example• goal: differentiate
hand-written digits 2 and4
• given a set of unknowngray-scale images of 2s and4s, find the subset of 2sand the subset of 4s
• versus• unsupervised learning by
example
• like a baby:
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Example data set
• here: machine learningi.e. statistical approach
• needs many test cases:
here 1000 28x28 images each• interpret each 28x28-image as
element of R784:
. . .
. . .
• dimension reduction viaPCA to only 2 dimensions
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Example data set
• here: machine learningi.e. statistical approach
• needs many test cases:
here 1000 28x28 images each• interpret each 28x28-image as
element of R784:
. . .
. . .
• dimension reduction viaPCA to only 2 dimensions
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Example data set
• here: machine learningi.e. statistical approach
• needs many test cases:
here 1000 28x28 images each• interpret each 28x28-image as
element of R784:
. . .
. . .
• dimension reduction viaPCA to only 2 dimensions
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Example data set
• here: machine learningi.e. statistical approach
• needs many test cases:
here 1000 28x28 images each• interpret each 28x28-image as
element of R784:
. . .
. . .
• dimension reduction viaPCA to only 2 dimensions
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Example data set
• here: machine learningi.e. statistical approach
• needs many test cases:
here 1000 28x28 images each• interpret each 28x28-image as
element of R784:
. . .
. . .
• dimension reduction viaPCA to only 2 dimensions
−1 0 1 2 3 4 5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
k-means
• clustering:• data vectors (samples)
x(1), x(2), . . . , x(T ) ∈ Rn
• distance measure d(x, y)between samples
• algorithm: k-means• given number k of clusters• initialize centroids
randomly• update rules: batch or
sequential (online)
• cost function• minimize E(ci ,Ci ) :=Pk
i=11|Ci |
Px∈Ci
d(xi , ci )2
[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
k-means
• clustering:• data vectors (samples)
x(1), x(2), . . . , x(T ) ∈ Rn
• distance measure d(x, y)between samples
• algorithm: k-means• given number k of clusters• initialize centroids
randomly• update rules: batch or
sequential (online)
• cost function• minimize E(ci ,Ci ) :=Pk
i=11|Ci |
Px∈Ci
d(xi , ci )2
Samples
Centroids
[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
k-means
• clustering:• data vectors (samples)
x(1), x(2), . . . , x(T ) ∈ Rn
• distance measure d(x, y)between samples
• algorithm: k-means• given number k of clusters• initialize centroids
randomly• update rules: batch or
sequential (online)
• cost function• minimize E(ci ,Ci ) :=Pk
i=11|Ci |
Px∈Ci
d(xi , ci )2
Aufteilung
batch k-means
[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
k-means
• clustering:• data vectors (samples)
x(1), x(2), . . . , x(T ) ∈ Rn
• distance measure d(x, y)between samples
• algorithm: k-means• given number k of clusters• initialize centroids
randomly• update rules: batch or
sequential (online)
• cost function• minimize E(ci ,Ci ) :=Pk
i=11|Ci |
Px∈Ci
d(xi , ci )2
Zuweisung
batch k-means
[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
k-means
• clustering:• data vectors (samples)
x(1), x(2), . . . , x(T ) ∈ Rn
• distance measure d(x, y)between samples
• algorithm: k-means• given number k of clusters• initialize centroids
randomly• update rules: batch or
sequential (online)
• cost function• minimize E(ci ,Ci ) :=Pk
i=11|Ci |
Px∈Ci
d(xi , ci )2
batch k-means
[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
k-means
• clustering:• data vectors (samples)
x(1), x(2), . . . , x(T ) ∈ Rn
• distance measure d(x, y)between samples
• algorithm: k-means• given number k of clusters• initialize centroids
randomly• update rules: batch or
sequential (online)
• cost function• minimize E(ci ,Ci ) :=Pk
i=11|Ci |
Px∈Ci
d(xi , ci )2
beliebiges Sample
sequentieller k-means
[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
k-means
• clustering:• data vectors (samples)
x(1), x(2), . . . , x(T ) ∈ Rn
• distance measure d(x, y)between samples
• algorithm: k-means• given number k of clusters• initialize centroids
randomly• update rules: batch or
sequential (online)
• cost function• minimize E(ci ,Ci ) :=Pk
i=11|Ci |
Px∈Ci
d(xi , ci )2
nächster Centroid
sequentieller k-means
[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
k-means
• clustering:• data vectors (samples)
x(1), x(2), . . . , x(T ) ∈ Rn
• distance measure d(x, y)between samples
• algorithm: k-means• given number k of clusters• initialize centroids
randomly• update rules: batch or
sequential (online)
• cost function• minimize E(ci ,Ci ) :=Pk
i=11|Ci |
Px∈Ci
d(xi , ci )2
Update
sequentieller k-means
[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
k-means
• clustering:• data vectors (samples)
x(1), x(2), . . . , x(T ) ∈ Rn
• distance measure d(x, y)between samples
• algorithm: k-means• given number k of clusters• initialize centroids
randomly• update rules: batch or
sequential (online)
• cost function• minimize E(ci ,Ci ) :=Pk
i=11|Ci |
Px∈Ci
d(xi , ci )2
sequentieller k-means
[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
k-means
• clustering:• data vectors (samples)
x(1), x(2), . . . , x(T ) ∈ Rn
• distance measure d(x, y)between samples
• algorithm: k-means• given number k of clusters• initialize centroids
randomly• update rules: batch or
sequential (online)
• cost function• minimize E(ci ,Ci ) :=Pk
i=11|Ci |
Px∈Ci
d(xi , ci )2
sequentieller k-means
[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Batch k-means
−1 0 1 2 3 4 5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3k−means after 1 iteration
done: error 4.5%
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Batch k-means
−1 0 1 2 3 4 5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3k−means after 2 iterations
done: error 4.5%
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Batch k-means
−1 0 1 2 3 4 5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3k−means after 3 iterations
done: error 4.5%
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Batch k-means
−1 0 1 2 3 4 5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3k−means after 4 iterations
done: error 4.5%
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Batch k-means
−1 0 1 2 3 4 5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3k−means after 5 iterations
done: error 4.5%
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Batch k-means
−1 0 1 2 3 4 5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3k−means after 6 iterations
done: error 4.5%
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Batch k-means
−1 0 1 2 3 4 5−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3k−means after 7 iterations
done: error 4.5%
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Partitional clustering• goal:
• given a set A of points in metric space (M, d)• find partition of A into Bi ,
Si Bi = A, and centroids ci ∈ M
minimizing
E(B1, c1, . . . , Bk , ck) :=kX
i=1
Xa∈Bi
d(a, ci )2. (1)
• A = a1, . . . , aT ⇒ constrained non-linear opt. problem• minimize
E(W,C) :=kX
i=1
TXt=1
witd(ai , ci )2. (2)
subject to
wit ∈ 0, 1,kX
i=1
wit = 1 for 1 ≤ i ≤ k, 1 ≤ t ≤ T . (3)
• C := c1, . . . , ck centroid locations, W := (wit) partition matrix
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Partitional clustering• goal:
• given a set A of points in metric space (M, d)• find partition of A into Bi ,
Si Bi = A, and centroids ci ∈ M
minimizing
E(B1, c1, . . . , Bk , ck) :=kX
i=1
Xa∈Bi
d(a, ci )2. (1)
• A = a1, . . . , aT ⇒ constrained non-linear opt. problem• minimize
E(W,C) :=kX
i=1
TXt=1
witd(ai , ci )2. (2)
subject to
wit ∈ 0, 1,kX
i=1
wit = 1 for 1 ≤ i ≤ k, 1 ≤ t ≤ T . (3)
• C := c1, . . . , ck centroid locations, W := (wit) partition matrix
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Minimize this!
• common approach: partial optimization for W and C• alternate minimization of either W and C while keeping the other
one fixed
• ⇒ batch k-means algorithm• initial random choice of centroids c1, . . . , ck
• iterate until convergence:• cluster assignment: for each at determine an index i(t) such that
i(t) = argmini d(at , ci )
• cluster update: within each cluster Bi := at |i(t) = i determine thecentroid ci by minimizing
ci := argminc
Xa∈Bi
d(a, c)2
• convergence to local minimum (??)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Minimize this!
• common approach: partial optimization for W and C• alternate minimization of either W and C while keeping the other
one fixed
• ⇒ batch k-means algorithm• initial random choice of centroids c1, . . . , ck
• iterate until convergence:• cluster assignment: for each at determine an index i(t) such that
i(t) = argmini d(at , ci )
• cluster update: within each cluster Bi := at |i(t) = i determine thecentroid ci by minimizing
ci := argminc
Xa∈Bi
d(a, c)2
• convergence to local minimum (??)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Minimize this!
• common approach: partial optimization for W and C• alternate minimization of either W and C while keeping the other
one fixed
• ⇒ batch k-means algorithm• initial random choice of centroids c1, . . . , ck
• iterate until convergence:• cluster assignment: for each at determine an index i(t) such that
i(t) = argmini d(at , ci )
• cluster update: within each cluster Bi := at |i(t) = i determine thecentroid ci by minimizing
ci := argminc
Xa∈Bi
d(a, c)2
• convergence to local minimum (??)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Euclidean case
• special case: M := Rn and the Euclidean distance d(x , y) := ‖x − y‖• centroids can be calculated in closed form:
• centroid is given by the cluster mean
ci := (1/|Bi |)Xa∈Bi
a
• this follows directly from
Xa∈Bi
‖a− ci‖2 =Xa∈Bi
nXj=1
(aj − cij)2 =
nXj=1
Xa∈Bi
(a2j − 2ajcij + c2
ij )
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Euclidean case
• special case: M := Rn and the Euclidean distance d(x , y) := ‖x − y‖• centroids can be calculated in closed form:
• centroid is given by the cluster mean
ci := (1/|Bi |)Xa∈Bi
a
• this follows directly from
Xa∈Bi
‖a− ci‖2 =Xa∈Bi
nXj=1
(aj − cij)2 =
nXj=1
Xa∈Bi
(a2j − 2ajcij + c2
ij )
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Euclidean case
• special case: M := Rn and the Euclidean distance d(x , y) := ‖x − y‖• centroids can be calculated in closed form:
• centroid is given by the cluster mean
ci := (1/|Bi |)Xa∈Bi
a
• this follows directly from
Xa∈Bi
‖a− ci‖2 =Xa∈Bi
nXj=1
(aj − cij)2 =
nXj=1
Xa∈Bi
(a2j − 2ajcij + c2
ij )
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Extensions
ci := argminc
∑a∈Bi
d(a, c)p
• more difficult optimization problems:• non-Euclidean spaces e.g. RPn or Grassmann manifolds• extensions from p = 2 to e.g. p = 1 or p <• p = 1 corresponds to finding the spatial median of Bi
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Clusteringk-meansPartitional clustering
Extensions
ci := argminc
∑a∈Bi
d(a, c)p
• more difficult optimization problems:• non-Euclidean spaces e.g. RPn or Grassmann manifolds• extensions from p = 2 to e.g. p = 1 or p <• p = 1 corresponds to finding the spatial median of Bi
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Outline
1 Supervised methodsMotivation 1: classificationMotivation 2: image segmentationStatistical decision theory
2 Unsupervised methodsClusteringk-meansPartitional clustering
3 Signal component analysisIndependent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
4 Conclusions
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Independent component analysisexample: Cocktail party problem of the brain
auditorycortex
worddetection
decision
auditorycortex 2
[Keck, Theis, Gruber, Lang, Specht, Puntonet. 3D spatial analysis of fMRI data on a word perception task. LNCS, 3195:977-984]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
BSS model
• Blind source separation (BSS) problem
x(t) = As(t) + ε(t)
• x(t) observed m-dimensional random vector• A (unknown) full-rank m × n matrix• s(t) (unknown) n-dimensional source signals (here: n ≤ m)• ε(t) (unknown) white noise
• goal: given x, recover A and s!
• additional assumptions necessary• stochastically independent s(t): ps(s1, . . . , sn) = ps1(s1) . . . psn (sn)⇒ independent component analysis (ICA)
• sparse source signals si (t) ⇒ sparse component analysis (SCA)• nonnegative s and A ⇒ nonnegative matrix factorization (NMF)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
BSS model
• Blind source separation (BSS) problem
x(t) = As(t) + ε(t)
• x(t) observed m-dimensional random vector• A (unknown) full-rank m × n matrix• s(t) (unknown) n-dimensional source signals (here: n ≤ m)• ε(t) (unknown) white noise
• goal: given x, recover A and s!
• additional assumptions necessary• stochastically independent s(t): ps(s1, . . . , sn) = ps1(s1) . . . psn (sn)⇒ independent component analysis (ICA)
• sparse source signals si (t) ⇒ sparse component analysis (SCA)• nonnegative s and A ⇒ nonnegative matrix factorization (NMF)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
• important questions in data analysis• model? (restrictions to A and s)• indeterminacies of the model?• algorithmic identification given x?
• identifiability• obvious indeterminacies: scaling L and permutation P
Theorem
Let the independent random vector s ∈ L2 contain at most one gaussiancomponent. Given two ICA solutions As = A′s′, then A = A′LP.
Note: theorem does not hold for gaussiansources s.
[Theis. A new concept for separability problems in blind source separation. Neural Computation, 2004]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
• important questions in data analysis• model? (restrictions to A and s)• indeterminacies of the model?• algorithmic identification given x?
• identifiability• obvious indeterminacies: scaling L and permutation P
Theorem
Let the independent random vector s ∈ L2 contain at most one gaussiancomponent. Given two ICA solutions As = A′s′, then A = A′LP.
Note: theorem does not hold for gaussiansources s.
[Theis. A new concept for separability problems in blind source separation. Neural Computation, 2004]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
• important questions in data analysis• model? (restrictions to A and s)• indeterminacies of the model?• algorithmic identification given x?
• identifiability• obvious indeterminacies: scaling L and permutation P
Theorem
Let the independent random vector s ∈ L2 contain at most one gaussiancomponent. Given two ICA solutions As = A′s′, then A = A′LP.
Note: theorem does not hold for gaussiansources s.
[Theis. A new concept for separability problems in blind source separation. Neural Computation, 2004]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
• important questions in data analysis• model? (restrictions to A and s)• indeterminacies of the model?• algorithmic identification given x?
• identifiability• obvious indeterminacies: scaling L and permutation P
Theorem
Let the independent random vector s ∈ L2 contain at most one gaussiancomponent. Given two ICA solutions As = A′s′, then A = A′LP.
−2
−1
0
1
2
−2
−1
0
1
20
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Note: theorem does not hold for gaussiansources s.
[Theis. A new concept for separability problems in blind source separation. Neural Computation, 2004]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
ICA algorithms
• basic scheme of ICA algorithms (case m = n)• search for invertible demixing matrix W that minimizes some
dependence measure of Wx
• some contrasts• minimize mutual information I (Wx) (?)• maximize neural network output entropy H(f (Wx)) (?)• extend PCA by performing nonlinear decorrelation (?)• maximize non-Gaussianity of output components (Wx)i (?)• minimize off-diagonal error of Hln pWx
• minimize median deviation of Wx
[Theis et al. Linear geometric ICA: Fundamentals and algorithms. Neural Computation, 2003]
[Theis, Lang, Puntonet. A geometric algorithm for overcomplete linear ICA. Neurocomputing, 2004]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
ICA algorithms
• basic scheme of ICA algorithms (case m = n)• search for invertible demixing matrix W that minimizes some
dependence measure of Wx
• some contrasts• minimize mutual information I (Wx) (?)• maximize neural network output entropy H(f (Wx)) (?)• extend PCA by performing nonlinear decorrelation (?)• maximize non-Gaussianity of output components (Wx)i (?)• minimize off-diagonal error of Hln pWx
• minimize median deviation of Wx
[Theis et al. Linear geometric ICA: Fundamentals and algorithms. Neural Computation, 2003]
[Theis, Lang, Puntonet. A geometric algorithm for overcomplete linear ICA. Neurocomputing, 2004]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Optimization
• problem: minimize costfunction f (W) on Gl(n) orO(n)
• often: gradient descent:∆W ∝ −∇f (W)
• in high dimensions:simulated annealing orgenetic algorithms
• use non-Euclidean structure ofGl(n)
• Euclidean gradient notcompatible with groupGl(n)
• define natural gradient
∇natf (W) = ∇eucf (W)W>W
⇒ considerable performanceincrease
[Stadlthanner, Theis, Puntonet, Lang. Extended sparse nonnegative matrix factorization. LNCS, 3512:249-256][Squartini, Theis. New Riemannian metrics for speeding-up the convergence of over- and underdetermined ICA. In preparation]
[Theis. Gradients on matrix manifolds and their chain rule. Submitted to NIPS LR, 2005]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Optimization
• problem: minimize costfunction f (W) on Gl(n) orO(n)
• often: gradient descent:∆W ∝ −∇f (W)
• in high dimensions:simulated annealing orgenetic algorithms
• use non-Euclidean structure ofGl(n)
• Euclidean gradient notcompatible with groupGl(n)
• define natural gradient
∇natf (W) = ∇eucf (W)W>W
⇒ considerable performanceincrease
[Stadlthanner, Theis, Puntonet, Lang. Extended sparse nonnegative matrix factorization. LNCS, 3512:249-256][Squartini, Theis. New Riemannian metrics for speeding-up the convergence of over- and underdetermined ICA. In preparation]
[Theis. Gradients on matrix manifolds and their chain rule. Submitted to NIPS LR, 2005]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
fMRI analysis
• function magneticresonance imaging
• noninvasive brain imagingtechnique ⇒ information onbrain activation patterns
• activation maps helpidentifying task-relatedbrain regions
• BSS techniques for fMRIpossible, see (?).
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
fMRI analysis
spatial-only BSS
• function magneticresonance imaging
• noninvasive brain imagingtechnique ⇒ information onbrain activation patterns
• activation maps helpidentifying task-relatedbrain regions
• BSS techniques for fMRIpossible, see (?).
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Experimental setup
• experiment• block design protocol:
• 5 time instants of visualstimulation
• 5 instants of rest
• 100 scans taking 3s each• data set
• well known design →expected activity in visualcortex
• here: use only a singlehorizontal slice
• preprocessing• motion correction• smoothing
data acquired by D. Auer, MPI of Psychiatry, Munich
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Results
1 2
3 4
(a) spatial sources sS
1 cc: 0.18 2 cc: 0.00
3 cc: 0.05 4 cc: 0.90
(b) temporal sources tS
• component 2 partially represents the frontal eye fields• component 4: stimulus component, cc = 0.9 with stimulus
[Theis, Gruber, Keck, Lang. Functional MRI analysis by a novel spatiotemporal ICA algorithm. LNCS 3696:677-682]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Independent subspace analysis
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Why extend ICA?
• identifiability of ICA onlyholds if data follows generativemodel with independentsources
• simulation• apply ICA to data not
fulfilling the ICA model• here sources consist of a
2d- and a 1-d irreduciblecomponent
• plot Amari-error over 100runs
.
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Why extend ICA?
• identifiability of ICA onlyholds if data follows generativemodel with independentsources
• simulation• apply ICA to data not
fulfilling the ICA model• here sources consist of a
2d- and a 1-d irreduciblecomponent
• plot Amari-error over 100runs
FastICA JADE Extended Infomax0
1
2
3
4
cros
stal
king
err
or
result: no recovery of mixingmatrix
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Independent subspace analysis
• require stochastic independence only between groups of sourcecomponents
• nk-dimensional S is to be k-independent i.e.0B@ S1
...Sk
1CA , . . . ,
0B@ Snk−k+1
...Snk
1CAmutually independent⇒ independent subspace analysis (ISA)
• recent result: extension to arbitrary group-size• major advantage:
general independent subspace analysis (ISA) always exists
[Theis. Uniqueness of complex and multidimensional independent component analysis. Signal Processing, 2004]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Independent subspace analysis
• require stochastic independence only between groups of sourcecomponents
• nk-dimensional S is to be k-independent i.e.0B@ S1
...Sk
1CA , . . . ,
0B@ Snk−k+1
...Snk
1CAmutually independent⇒ independent subspace analysis (ISA)
• recent result: extension to arbitrary group-size• major advantage:
general independent subspace analysis (ISA) always exists
[Theis. Uniqueness of complex and multidimensional independent component analysis. Signal Processing, 2004]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
PCA
X
S
A
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
ICA
X
SL
P
A
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
ISA with fixed groupsize
X
SL
P
A
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
General ISA
X
SL
P
A
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
ISA framework
Definition
Y independent component of X :⇔ ∃ X = A(Y,Z) such that Y and Zare stochastically independent.
Definition (general ISA)
• S is irreducible if it contains no lower-dim. independent cpt.
• W ∈ Gl(n) independent subspace analysis of X :⇔∃ WX = (S1, . . . ,Sk) with pairwise independent, irreducible Si
Theorem
Given a random vector X with existing covariance, then an ISA of Xexists and is unique except for scaling and permutation.
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
ISA framework
Definition
Y independent component of X :⇔ ∃ X = A(Y,Z) such that Y and Zare stochastically independent.
Definition (general ISA)
• S is irreducible if it contains no lower-dim. independent cpt.
• W ∈ Gl(n) independent subspace analysis of X :⇔∃ WX = (S1, . . . ,Sk) with pairwise independent, irreducible Si
Theorem
Given a random vector X with existing covariance, then an ISA of Xexists and is unique except for scaling and permutation.
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
ISA framework
Definition
Y independent component of X :⇔ ∃ X = A(Y,Z) such that Y and Zare stochastically independent.
Definition (general ISA)
• S is irreducible if it contains no lower-dim. independent cpt.
• W ∈ Gl(n) independent subspace analysis of X :⇔∃ WX = (S1, . . . ,Sk) with pairwise independent, irreducible Si
Theorem
Given a random vector X with existing covariance, then an ISA of Xexists and is unique except for scaling and permutation.
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Algebraic ISA algorithms
• main idea: source condition matrices Ci (S) are block-diagonal
• subspace JADE• after whitening assume orthogonal A• group-independence of S: contracted quadricovariance matrices
Cij(S) are block-diagonal• perform joint block diagonalization of Cij(X) to get A>
• for general ISA, estimate block-structure after diagonalization
=Cij(S) A>
Cij(X)I
A
[Theis. Towards a general independent subspace analysis. NIPS 2006 accepted]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Algebraic ISA algorithms
• main idea: source condition matrices Ci (S) are block-diagonal
• subspace JADE• after whitening assume orthogonal A• group-independence of S: contracted quadricovariance matrices
Cij(S) are block-diagonal• perform joint block diagonalization of Cij(X) to get A>
• for general ISA, estimate block-structure after diagonalization
=Cij(S) A>
Cij(X)I
A
[Theis. Towards a general independent subspace analysis. NIPS 2006 accepted]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Joint Block Diagonalization with unknown block-sizes
Joint Block Diagonalization (JBD)
• given n × n-matrices C1, . . . ,CK and a partition m,m1 + · · ·+ mr = n of n
• goal: find orthogonal A such that ∀k: A>CkA is m-block-diagonal
⇒ minimize (e.g. by applying iterative Givens-rotations)
f m(A) :=K∑
k=1
‖A>CkA− diagMm(A>CkA)‖2F
unknown blocksize m ⇒ general JBD then searches formaximal-length block structure i.e.
(A,m) = argmaxm | ∃A:f m(A)=0 |m|
result: JBD by JD: any block-optimal JBD i.e. zero of f m is a localminimum of ordinary joint diagonalization.
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Joint Block Diagonalization with unknown block-sizes
Joint Block Diagonalization (JBD)
• given n × n-matrices C1, . . . ,CK and a partition m,m1 + · · ·+ mr = n of n
• goal: find orthogonal A such that ∀k: A>CkA is m-block-diagonal
⇒ minimize (e.g. by applying iterative Givens-rotations)
f m(A) :=K∑
k=1
‖A>CkA− diagMm(A>CkA)‖2F
unknown blocksize m ⇒ general JBD then searches formaximal-length block structure i.e.
(A,m) = argmaxm | ∃A:f m(A)=0 |m|
result: JBD by JD: any block-optimal JBD i.e. zero of f m is a localminimum of ordinary joint diagonalization.
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Joint Block Diagonalization with unknown block-sizes
Joint Block Diagonalization (JBD)
• given n × n-matrices C1, . . . ,CK and a partition m,m1 + · · ·+ mr = n of n
• goal: find orthogonal A such that ∀k: A>CkA is m-block-diagonal
⇒ minimize (e.g. by applying iterative Givens-rotations)
f m(A) :=K∑
k=1
‖A>CkA− diagMm(A>CkA)‖2F
unknown blocksize m ⇒ general JBD then searches formaximal-length block structure i.e.
(A,m) = argmaxm | ∃A:f m(A)=0 |m|
result: JBD by JD: any block-optimal JBD i.e. zero of f m is a localminimum of ordinary joint diagonalization.
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Joint Block Diagonalization with unknown block-sizes
Joint Block Diagonalization (JBD)
• given n × n-matrices C1, . . . ,CK and a partition m,m1 + · · ·+ mr = n of n
• goal: find orthogonal A such that ∀k: A>CkA is m-block-diagonal
⇒ minimize (e.g. by applying iterative Givens-rotations)
f m(A) :=K∑
k=1
‖A>CkA− diagMm(A>CkA)‖2F
unknown blocksize m ⇒ general JBD then searches formaximal-length block structure i.e.
(A,m) = argmaxm | ∃A:f m(A)=0 |m|
result: JBD by JD: any block-optimal JBD i.e. zero of f m is a localminimum of ordinary joint diagonalization.
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Example
5 10 15 20 25 30 35 40
5
10
15
20
25
30
35
405 10 15 20 25 30 35 40
5
10
15
20
25
30
35
40
5 10 15 20 25 30 35 40
5
10
15
20
25
30
35
40
(unknown) C1 A>A w/o rec. P A>A .
• performance of the proposed general JBD• (unknown) block-partition 40 = 1 + 2 + 2 + 3 + 3 + 5 + 6 + 6 + 6 + 6• additive noise with SNR of 5dB, K = 100 matrices• result: estimate A equals A after permutation recovery
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Extraction of fetal electrocardiograms
• separate fetal ECG (FECG) recordings from the mother’s ECG(MECG)
• apply Hessian-based MICA with k = 2 and 500 Hessians
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
0 100 200 300 400 500−50
0
50
0 100 200 300 400 500−50
0
120
0 100 200 300 400 500−100
0
50
(a) ECG recordings
0 100 200 300 400 500−120
0
50
0 100 200 300 400 500−20
0
80
0 100 200 300 400 500−20
0
20
(b) extracted sources
0 100 200 300 400 500−50
0
50
0 100 200 300 400 500−50
0
120
0 100 200 300 400 500−100
0
50
(c) MECG part
0 100 200 300 400 500−50
0
50
0 100 200 300 400 500−50
0
120
0 100 200 300 400 500−100
0
50
(d) FECG part
[Theis. Blind signal separation into groups of dependent signals using joint block diagonalization. Proc. ISCAS 2005]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Sparse component analysis
sparse
[Theis, Puntonet, Lang. Median-based clustering for underdetermined blind signal processing. IEEE SPL, 2005]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Model
• Sparse Component Analysis (SCA) problem
x(t) = As(t)
• observed mixtures x(t) ∈ Rm
• A (unknown) real matrix with linearly independent columns• s(t) (unknown) (m − 1)-sparse sources s(t) ∈ Rn i.e. s(t) has at
most (m − 1) non-zeros
• goal: recover unknown A and s(t) given only x(t)
Theorem
If s(t) is (m − 1)-sparse and A and s(t) in ’general position’, both A ands(t) are identifiable (except for scaling and permutation).
[Georgiev, Theis, Cichocki. Sparse component analysis and blind source separation of underdetermined mixtures. IEEE TNN, 2005]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Model
• Sparse Component Analysis (SCA) problem
x(t) = As(t)
• observed mixtures x(t) ∈ Rm
• A (unknown) real matrix with linearly independent columns• s(t) (unknown) (m − 1)-sparse sources s(t) ∈ Rn i.e. s(t) has at
most (m − 1) non-zeros
• goal: recover unknown A and s(t) given only x(t)
Theorem
If s(t) is (m − 1)-sparse and A and s(t) in ’general position’, both A ands(t) are identifiable (except for scaling and permutation).
[Georgiev, Theis, Cichocki. Sparse component analysis and blind source separation of underdetermined mixtures. IEEE TNN, 2005]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
SCA algorithm
• matrix identification by multiple hyperplane detection• e.g. using Hough transform• robust against outliers and noise
• source recovery using sparsity andknown matrix
−1
−0.5
0
0.5
1 −1
−0.5
0
0.5
1
−1
−0.5
0
0.5
1
[Theis, Georgiev, Cichocki. Robust sparse component analysis based on a generalized Hough transform. Signal Processing 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
SCA of surface electromyograms
• electromyogram (EMG): electric signal generated by a contractingmuscle
• surface EMG: non-invasive, however source overlaps
cooperation with G. Garcıa, Bioinformatic Engineering, Osaka
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Results
source and SCA recovery within 8 artificial, dependent mixtures
• results on toy data: sparseness works as separation criterion
• real data• relative sEMG enhancement 24.6± 21.4% (mean over 9 subjects)• beats standard signal processing and ICA
[Theis, Garcıa. On the use of sparse signal decomposition in the analysis of multi-channel surface EMGs. Signal Processing, 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Results
source and SCA recovery within 8 artificial, dependent mixtures
• results on toy data: sparseness works as separation criterion
• real data• relative sEMG enhancement 24.6± 21.4% (mean over 9 subjects)• beats standard signal processing and ICA
[Theis, Garcıa. On the use of sparse signal decomposition in the analysis of multi-channel surface EMGs. Signal Processing, 2006]
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
SCA of functional MRI data
1 2 3
4 5
1 cc: −0.16 2 cc: −0.28 3 cc: 0.13
4 cc: −0.04 5 cc: −0.88
component maps (S) time courses (A)
• complete SCA was performed using k-means hyperplane clustering
• components 2 and 3 represents inner ventricles, component 1 contains thefrontal eye fields
• component 5 is desired visual stimulus component — active in the visualcortex (crosscorrelation with stimulus |cc| = 0.88 — fastICA yields similar|cc| = 0.9)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
SCA of functional MRI data
1 2 3
4 5
1 cc: −0.16 2 cc: −0.28 3 cc: 0.13
4 cc: −0.04 5 cc: −0.88
component maps (S) time courses (A)
• complete SCA was performed using k-means hyperplane clustering
• components 2 and 3 represents inner ventricles, component 1 contains thefrontal eye fields
• component 5 is desired visual stimulus component — active in the visualcortex (crosscorrelation with stimulus |cc| = 0.88 — fastICA yields similar|cc| = 0.9)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinear SCA
Given m-dimensional random vector x, find representation
x = f(As)
with unknown
• n-dim. random vector s (sources)
• m × n-matrix A (mixing matrix)
• diagonal invertible f = f1 × . . .× fm (postnonlinearities)
postnonlinear ICA ⇒ s independent (see (?)) here: SCA model ⇒ s is
(m − 1)-sparse
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Overcomplete postnonlinear cocktail-party problem
f1
f2
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Overcomplete postnonlinear cocktail-party problem
f1
f2
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Overcomplete postnonlinear cocktail-party problem
f1
f2
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity identification lemma
Given an invertible 2× 2-matrix A, define L at 0 as
L := A([0, ε)× 0 ∪ 0 × [0, ε)).
Lemma
If a diagonal analytic diffeomorphism h := h1 × h2 maps an L (in ’generalposition’) at 0 again on an L at 0, then it is a linear scaling.
h
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Identifiability
• due to linear identifiability it is enough show that if f(As) = f(As)then h = f−1 f is linear scaling
• case m = 2: image of As and As are finite unions of L’s, so thisfollows from previous lemma
h
h
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Identifiability• due to linear identifiability it is enough show that if f(As) = f(As)
then h = f−1 f is linear scaling
• case m = 2: image of As and As are finite unions of L’s, so thisfollows from previous lemma
h
h
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Identifiability
• due to linear identifiability it is enough show that if f(As) = f(As)then h = f−1 f is linear scaling
• case m = 2: image of As and As are finite unions of L’s, so thisfollows from previous lemma
h
h
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Identifiability: proof
f
f
R
R
3
3
2
2
2
R
R
RA
A
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Algorithm
• multistage separation algorithm:• find separating nonlinearities g• estimate mixing matrix A of linearized model g(x)• estimate sources given A and g(x)
• how can g be found algorithmically?
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Algorithm
• multistage separation algorithm:• find separating nonlinearities g• estimate mixing matrix A of linearized model g(x)• estimate sources given A and g(x)
• how can g be found algorithmically?
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
• for simplicity assume m = 2.
• geometrical preprocessing: determine two 1-dimensionalsubmanifolds in the image of x
• find curves y(t) and z(t) in R2 which are mapped onto an L by g.
• simple method:• choose arbitrary starting points y(t1) and z(t1) among samples of x• iteratively pick closest sample to previous y(ti−1) resp. z(ti−1) with
smaller modulus
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
• for simplicity assume m = 2.
• geometrical preprocessing: determine two 1-dimensionalsubmanifolds in the image of x
• find curves y(t) and z(t) in R2 which are mapped onto an L by g.
• simple method:• choose arbitrary starting points y(t1) and z(t1) among samples of x• iteratively pick closest sample to previous y(ti−1) resp. z(ti−1) with
smaller modulus
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
• for simplicity assume m = 2.
• geometrical preprocessing: determine two 1-dimensionalsubmanifolds in the image of x
• find curves y(t) and z(t) in R2 which are mapped onto an L by g.
• simple method:• choose arbitrary starting points y(t1) and z(t1) among samples of x• iteratively pick closest sample to previous y(ti−1) resp. z(ti−1) with
smaller modulus
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
fA−→
0 10 20 30 40 50 60 70 80 90 100−1.5
−1
−0.5
0
0.5
1
1.5
0 10 20 30 40 50 60 70 80 90 100−5
0
5
−1.5 −1 −0.5 0 0.5 1 1.5−5
−4
−3
−2
−1
0
1
2
3
4
5
−1.5 −1 −0.5 0 0.5 1 1.5−5
−4
−3
−2
−1
0
1
2
3
4
5
geometrical preprocessing mixture density
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
fA−→
0 10 20 30 40 50 60 70 80 90 100−1.5
−1
−0.5
0
0.5
1
1.5
0 10 20 30 40 50 60 70 80 90 100−5
0
5
−1.5 −1 −0.5 0 0.5 1 1.5−5
−4
−3
−2
−1
0
1
2
3
4
5
−1.5 −1 −0.5 0 0.5 1 1.5−5
−4
−3
−2
−1
0
1
2
3
4
5
geometrical preprocessing mixture density
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
fA−→
0 10 20 30 40 50 60 70 80 90 100−1.5
−1
−0.5
0
0.5
1
1.5
0 10 20 30 40 50 60 70 80 90 100−5
0
5
−1.5 −1 −0.5 0 0.5 1 1.5−5
−4
−3
−2
−1
0
1
2
3
4
5
−1.5 −1 −0.5 0 0.5 1 1.5−5
−4
−3
−2
−1
0
1
2
3
4
5
geometrical preprocessing
mixture density
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
0 10 20 30 40 50 60 70 80 90 100−0.5
0
0.5
fA−→
0 10 20 30 40 50 60 70 80 90 100−1.5
−1
−0.5
0
0.5
1
1.5
0 10 20 30 40 50 60 70 80 90 100−5
0
5
−1.5 −1 −0.5 0 0.5 1 1.5−5
−4
−3
−2
−1
0
1
2
3
4
5
−1.5 −1 −0.5 0 0.5 1 1.5−5
−4
−3
−2
−1
0
1
2
3
4
5
geometrical preprocessing mixture density
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
• reparametrization (y := y y−11 ) of the curves gives y1 = z1 = id.
• hence g y = (g1, ag1) and g z = (g1, bg1)
• g2 y2 = ag1 = abg2 z2
• analytical geometrical postnonlinearity detection: find analytical1d diffeomorphism g with
g y = cg z
for c 6= 0,±1 and given curves y , z : (−1, 1) → R withy(0) = z(0) = 0.
• note c = y ′(0)/z ′(0)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
• reparametrization (y := y y−11 ) of the curves gives y1 = z1 = id.
• hence g y = (g1, ag1) and g z = (g1, bg1)
• g2 y2 = ag1 = abg2 z2
• analytical geometrical postnonlinearity detection: find analytical1d diffeomorphism g with
g y = cg z
for c 6= 0,±1 and given curves y , z : (−1, 1) → R withy(0) = z(0) = 0.
• note c = y ′(0)/z ′(0)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
• reparametrization (y := y y−11 ) of the curves gives y1 = z1 = id.
• hence g y = (g1, ag1) and g z = (g1, bg1)
• g2 y2 = ag1 = abg2 z2
• analytical geometrical postnonlinearity detection: find analytical1d diffeomorphism g with
g y = cg z
for c 6= 0,±1 and given curves y , z : (−1, 1) → R withy(0) = z(0) = 0.
• note c = y ′(0)/z ′(0)
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
• equation g y = cg z can be solved in different ways:• calculate composite derivatives using Faa di Bruno’s formula ⇒
derivatives of y and z lead to estimation of derivatives of g• least-squares polynomial fit of g using energy function
E = 12T
PTi=1(g(y(ti ))− cg(z(ti )))
2
• MLP approximation of g using E from above
• fix g(0) = 0 and g ′(0) = 1.
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Postnonlinearity detection
• equation g y = cg z can be solved in different ways:• calculate composite derivatives using Faa di Bruno’s formula ⇒
derivatives of y and z lead to estimation of derivatives of g• least-squares polynomial fit of g using energy function
E = 12T
PTi=1(g(y(ti ))− cg(z(ti )))
2
• MLP approximation of g using E from above
• fix g(0) = 0 and g ′(0) = 1.
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Artificial mixtures
• artificial example• postnonlinear mixture of n = 3 uniform sources (105 samples) to
m = 2 observations• postnonlinear mixing model x = f1 × f2(As)
• mixing matrix A =
„4.3 7.8 0.599 6.2 10
«• postnonlinearities f1(x) = tanh(x) + 0.1x and f2(x) = x
• algorithm• MLP based postnonlinearity detection algorithm• natural gradient-descent learning• parameters: 9 hidden neurons, learning rate of η = 0.01 and 105
iterations
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
Artificial mixtures
• artificial example• postnonlinear mixture of n = 3 uniform sources (105 samples) to
m = 2 observations• postnonlinear mixing model x = f1 × f2(As)
• mixing matrix A =
„4.3 7.8 0.599 6.2 10
«• postnonlinearities f1(x) = tanh(x) + 0.1x and f2(x) = x
• algorithm• MLP based postnonlinearity detection algorithm• natural gradient-descent learning• parameters: 9 hidden neurons, learning rate of η = 0.01 and 105
iterations
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
PNL detection
−2 0 2
−1.5
−1
−0.5
0
0.5
1
1.5
f1
−4 −2 0 2 4
−5
0
5
f2
−2 −1 0 1 2−4
−3
−2
−1
0
1
2
3
4
g1
−5 0 5−5
0
5
g2
mixing pnls f
separating pnls g
0 10 20 30 40 50 60 70 80 90 100−5
0
5
0 10 20 30 40 50 60 70 80 90 100−5
0
5
0 10 20 30 40 50 60 70 80 90 100−5
0
5
−5
0
5
−5
0
5−6
−4
−2
0
2
4
6
SIRs: 26, 71 and 46 dB density of recovered sources
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
PNL detection
−2 0 2
−1.5
−1
−0.5
0
0.5
1
1.5
f1
−4 −2 0 2 4
−5
0
5
f2
−2 −1 0 1 2−4
−3
−2
−1
0
1
2
3
4
g1
−5 0 5−5
0
5
g2
mixing pnls f separating pnls g
0 10 20 30 40 50 60 70 80 90 100−5
0
5
0 10 20 30 40 50 60 70 80 90 100−5
0
5
0 10 20 30 40 50 60 70 80 90 100−5
0
5
−5
0
5
−5
0
5−6
−4
−2
0
2
4
6
SIRs: 26, 71 and 46 dB density of recovered sources
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis
PNL detection
−2 0 2
−1.5
−1
−0.5
0
0.5
1
1.5
f1
−4 −2 0 2 4
−5
0
5
f2
−2 −1 0 1 2−4
−3
−2
−1
0
1
2
3
4
g1
−5 0 5−5
0
5
g2
mixing pnls f separating pnls g
0 10 20 30 40 50 60 70 80 90 100−5
0
5
0 10 20 30 40 50 60 70 80 90 100−5
0
5
0 10 20 30 40 50 60 70 80 90 100−5
0
5
−5
0
5
−5
0
5−6
−4
−2
0
2
4
6
SIRs: 26, 71 and 46 dB density of recovered sources
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Conclusions
• analyze statistical patterns in data sets x(t)• method: factorization model x(t) = f (s(t))
• supervised training of f ⇒ nearest neighbor (local), regression(global)
• unsupervised identification (often linear) ⇒ clustering (local model),blind source separation (linear model)
• applications: biomedical data analysis, signal processing, financialmarkets etc.
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
Current application — with T. Schroder, HMGU
• unsupervised clustering of subtrees
• supervised learning of cell shapes
• parameter estimation of dynamical system for cell fate decision
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems
Supervised methodsUnsupervised methods
Signal component analysisConclusions
References
F. Theis Biomedical signal processing — application of optimization methods for machine learning problems