biomedical signal processing --- application of ... · image? f. theis biomedical signal processing...

161
Biomedical signal processing — application of optimization methods for machine learning problems Fabian J. Theis Computational Modeling in Biology Institute of Bioinformatics and Systems Biology Helmholtz Zentrum M¨ unchen http://cmb.helmholtz-muenchen.de Grenoble, 16-Sep-2008 F. Theis Biomedical signal processing — application of optimization methods for machi

Upload: others

Post on 21-Jul-2020

43 views

Category:

Documents


9 download

TRANSCRIPT

Page 1: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Biomedical signal processing — application ofoptimization methods for machine learning

problems

Fabian J. Theis

Computational Modeling in BiologyInstitute of Bioinformatics and Systems Biology

Helmholtz Zentrum Munchen

http://cmb.helmholtz-muenchen.de

Grenoble, 16-Sep-2008

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 2: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Data mining

cocktail-party problem

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 3: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Data mining

cocktail-party problem

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 4: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Data mining

cocktail-party problem

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 5: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Data miningcocktail-party problem

W

Neural

NetworkF. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 6: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Data mining

• mixture model x(t) = f(s(t))

• estimate mixing process f and sources s(t)

• often linear f = A

s(t) x(t) s(t)

W

Neural

NetworkF. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 7: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Outline

1 Supervised methodsMotivation 1: classificationMotivation 2: image segmentationStatistical decision theory

2 Unsupervised methodsClusteringk-meansPartitional clustering

3 Signal component analysisIndependent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

4 Conclusions

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 8: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Outline

1 Supervised methodsMotivation 1: classificationMotivation 2: image segmentationStatistical decision theory

2 Unsupervised methodsClusteringk-meansPartitional clustering

3 Signal component analysisIndependent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

4 Conclusions

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 9: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Motivation 1: classification

data analysis: classification

• decide between (two or multiple) classes s(t) ∈ 0, 1• learn by example

gf

?

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 10: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Neural networks

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 11: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Classification: example

• observations:• immunological data set• 30 cell parameters of 37

children with pulmonarydiseases

• goal• interpretation using

supervised andunsupervised analysis

• disease classification intochronic bronchitis orinterstitial lung disease

CB ⇔ ILD ?

cooperation with D. Hartl, Pediatric Immunology, Munich

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 12: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Classification: example

• observations:• immunological data set• 30 cell parameters of 37

children with pulmonarydiseases

• goal• interpretation using

supervised andunsupervised analysis

• disease classification intochronic bronchitis orinterstitial lung disease

CB ⇔ ILD ?

cooperation with D. Hartl, Pediatric Immunology, Munich

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 13: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Data visualization & dimension reduction

parameter interpretation?

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 14: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Data visualization & dimension reduction

0.07

0.248

0.426

d 0.166

1

1.89

d 2.8

8.23

13.6

d 15.5

30.3

48.1

d 22.8

35.9

49

d 0.395

1.81

3.3

d 0.104

1.8

4.5

d 3.2

19.1

37.8

d 0.718

10.9

22.1

d 1.39

16.3

32.1

d 30.4

57.2

82.6

d 6.84

16

25.5

d 4

27.9

53.8

d 1.35

3.28

5.22

d 196000

446000

699000

d 1

1.33

1.71

d 0.0623

4.44

9.29 CB(3)

CB(3)

ILD(1)

ILD(2)

ILD(3)CB(2)

CB(1)

CB(1)

CB(1)

ILD(1)

CB(1)

ILD(2)

ILD(1)

CB(1)

CB(1)

ILD(1)

ILD(2)

ILD(2)

ILD(1)

CB(2)ILD(1)

ILD(1)

ILD(2)CB(1)

nO(2)O(1)

nO(3)

x(1)

x(2)

x(3)O(2)

nO(1)

O(1)

O(1)

x(1)

O(1)

x(2)

x(1)

O(1)

O(1)

x(1)

x(2)

x(2)

x(1)

O(2)x(1)

x(1)

x(2)O(1)

K−means−Clusters

• visualization by self-organizing map network• topology-preserving nonlinear dimension reduction/scaling• detect new parameter dependencies

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 15: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Disease classification

dimension-reducingnetwork

z(i) = BsupervisedAunsup.x(i)results:

• down-scaling to 5 hiddenneurons suffices

• classification rate of > 90%

[Theis, Hartl, Krauss-Etschmann, Lang. Neural network signal analysis in immunology. Proc. ISSPA 2003.]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 16: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Disease classification

dimension-reducingnetwork

z(i) = BsupervisedAunsup.x(i)results:

• down-scaling to 5 hiddenneurons suffices

• classification rate of > 90%

[Theis, Hartl, Krauss-Etschmann, Lang. Neural network signal analysis in immunology. Proc. ISSPA 2003.]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 17: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Motivation 2: image segmentation

classification

• application in image processing

• ⇒ object classification

gf

?

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 18: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Motivation 2: image segmentation

Problem: Howmany labelled cellslie in this sectionimage?

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 19: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Biological background: neurogenesis

• adult neurogenesis• new neurons emerge even

in the adult human brain• level depends on external

stimuli• Are there neural ancestral

cells?

• goal• automated quantification

of neurogenesis in adultmice

cooperation with Z. Kohl, Department of Neurology, University of Regensburg

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 20: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Automated cell counting

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 21: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Automated cell counting

directional neural network

• train cell patch classifier ζusing directional neuralnetwork

• scan image using ζ to get cellpositions

• speed-up via hierarchicaland multiscale methods

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 22: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Automated cell counting

directional neural network

• train cell patch classifier ζusing directional neuralnetwork

• scan image using ζ to get cellpositions

• speed-up via hierarchicaland multiscale methods

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 23: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Results

• counting comparison with 2 experts (variability ±5%) yields90%± 4% accuracy

• application: considerable cell proliferation in hippocampus ofepileptic mice

[Theis, Kohl, Guggenberger, Kuhn, Lang. ZANE - an algorithm for counting labelled cells in section images. Proc. MEDSIP 2004]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 24: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Statistical decision theory

setup

• input: random vector X : Ω → Rp

• output: random vector Y : Ω → R or categorical output, possiblyY ∈ 0, 1

• input-output relation measured by joint density P(X ,Y )

• realization by samples (training data) (xi , yi ) for i = 1, . . . ,N

• often collected in (N × p)-matrix X and vector y ∈ RN

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 25: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Goal: prediction

• goal: learn classificator from training data ⇒predict y∗ for new sample x∗

−1 −0.5 0 0.5−1.5

−1

−0.5

0

0.5

1

1.5

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 26: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Linear model

y = β0 +

p∑j=1

xj βj

set x0 := 1, theny = x>β

least squares: minimize

RSS(β) =N∑

i=1

(yi − x>i β)2 = (y − Xβ)>(y − Xβ)

⇒ X>(y − Xβ) = 0 so

β = (X>X)−1X>y

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 27: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Linear model

y = β0 +

p∑j=1

xj βj

set x0 := 1, theny = x>β

least squares: minimize

RSS(β) =N∑

i=1

(yi − x>i β)2 = (y − Xβ)>(y − Xβ)

⇒ X>(y − Xβ) = 0 so

β = (X>X)−1X>y

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 28: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Linear model

y = β0 +

p∑j=1

xj βj

set x0 := 1, theny = x>β

least squares: minimize

RSS(β) =N∑

i=1

(yi − x>i β)2 = (y − Xβ)>(y − Xβ)

⇒ X>(y − Xβ) = 0 so

β = (X>X)−1X>y

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 29: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Linear model

y = β0 +

p∑j=1

xj βj

set x0 := 1, theny = x>β

least squares: minimize

RSS(β) =N∑

i=1

(yi − x>i β)2 = (y − Xβ)>(y − Xβ)

⇒ X>(y − Xβ) = 0 so

β = (X>X)−1X>y

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 30: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Linear model

−1 −0.5 0 0.5−1.5

−1

−0.5

0

0.5

1

1.5

decision boundary x |x>β = 1/2

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 31: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Linear model

nice, but what about more complex data?

−3 −2 −1 0 1 2−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−3 −2 −1 0 1 2 3 4−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

(r = 2 and r = 10 Gaussians per class, σ = 0.2, with r means sampledfrom N((1, 0), I and N((0, 1), I), respectively)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 32: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Linear model

hm?

−3 −2 −1 0 1 2−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−3 −2 −1 0 1 2 3 4−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

‘global’, linear model is too rigid

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 33: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Nearest-neighbor method

y =1

k

∑xi∈Nk (x)

yi

if Nk(x) equal the k closest points xi to x

• local model

• needs metric (here Euclidean)

• how to determine k?• smaller k ⇒ higher learning accuracy• larger k ⇒ smoother, higher generalizability• least-square learning would yield k = 1

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 34: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Nearest-neighbor method

y =1

k

∑xi∈Nk (x)

yi

if Nk(x) equal the k closest points xi to x

• local model

• needs metric (here Euclidean)

• how to determine k?• smaller k ⇒ higher learning accuracy• larger k ⇒ smoother, higher generalizability• least-square learning would yield k = 1

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 35: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Nearest-neighbor method, k = 10

−1 −0.5 0 0.5−1.5

−1

−0.5

0

0.5

1

1.5

−3 −2 −1 0 1 2−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

decision boundary x |y(x) = 1/2

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 36: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Nearest-neighbor method, k = 1, 2, 10

−3 −2 −1 0 1 2 3 4−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−3 −2 −1 0 1 2 3 4−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−3 −2 −1 0 1 2 3 4−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 37: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Statistical decisions

probabilistic view: P(X ,Y ) = P(Y |X )P(X )

find function f (X ) predicting Y as well as possible w.r.t. squared errorloss L(Y , f (X )) = (Y − f (X ))2

expected prediction error

EPE(f ) = E (Y−f (X ))2 =

∫(y−f (x))2P(dx , dy) = EXEY |X ((Y−f (X ))2|X )

pointwise minimization suffices

f (x) = argmincEY |X ((Y − c)2|X = x)

solved at conditional expectation (regression function)

f (x) = E (Y |X = x)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 38: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Statistical decisions

probabilistic view: P(X ,Y ) = P(Y |X )P(X )

find function f (X ) predicting Y as well as possible w.r.t. squared errorloss L(Y , f (X )) = (Y − f (X ))2

expected prediction error

EPE(f ) = E (Y−f (X ))2 =

∫(y−f (x))2P(dx , dy) = EXEY |X ((Y−f (X ))2|X )

pointwise minimization suffices

f (x) = argmincEY |X ((Y − c)2|X = x)

solved at conditional expectation (regression function)

f (x) = E (Y |X = x)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 39: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Statistical decisions

f (x) = E (Y |X = x)

can be estimated by

f (x) =1

k

∑xi∈Nk (x)

yi

• approximate expectation via sample averages

• approximate point conditioning to local conditioning

• note: f (x) → E (Y |X = x) for N,K →∞, k/N → 0

• but:• (very) finite samples• ‘curse’ of dimensionality

• fraction r of unit cube in p dimensions is covered by cube of edgelength ep(r) = r1/p

• e2(0.01) = 0.1, e2(0.1) = 0.32• e10(0.01) = 0.63, e10(0.1) = 0.80

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 40: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Statistical decisions

f (x) = E (Y |X = x)

can be estimated by

f (x) =1

k

∑xi∈Nk (x)

yi

• approximate expectation via sample averages

• approximate point conditioning to local conditioning

• note: f (x) → E (Y |X = x) for N,K →∞, k/N → 0

• but:• (very) finite samples• ‘curse’ of dimensionality

• fraction r of unit cube in p dimensions is covered by cube of edgelength ep(r) = r1/p

• e2(0.01) = 0.1, e2(0.1) = 0.32• e10(0.01) = 0.63, e10(0.1) = 0.80

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 41: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Statistical decisions

f (x) = E (Y |X = x)

can be estimated by

f (x) =1

k

∑xi∈Nk (x)

yi

• approximate expectation via sample averages

• approximate point conditioning to local conditioning

• note: f (x) → E (Y |X = x) for N,K →∞, k/N → 0

• but:• (very) finite samples• ‘curse’ of dimensionality

• fraction r of unit cube in p dimensions is covered by cube of edgelength ep(r) = r1/p

• e2(0.01) = 0.1, e2(0.1) = 0.32• e10(0.01) = 0.63, e10(0.1) = 0.80

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 42: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Statistical decisions

if instead for approximating f (x) = E (Y |X = x), we assume linear modelf (x) = x>β, we get

β = E (XX>)−1E (XY )

• no conditioning, global approximation

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 43: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Statistical decisions

if instead for approximating f (x) = E (Y |X = x), we assume linear modelf (x) = x>β, we get

β = E (XX>)−1E (XY )

• no conditioning, global approximation

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 44: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Statistical decisions for discrete Y

if Y ∈ 0, 1, consider loss function

L(Y , f (X )) =

0 if f(X)=Y1 otherwise

then EPE = EX

∑y∈0,1 L(y , f (X ))P(y |X ) and hence

Y (x) = argminy0∈0,1∑

y∈0,1

L(y , y0)P(y |X = x)

= argminy0∈0,1 1− P(y0|X = x)

which yields the Bayes classifier

Y (x) = argmaxy P(y |X = x)

question: how to model P(Y |X )?

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 45: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Statistical decisions for discrete Y

if Y ∈ 0, 1, consider loss function

L(Y , f (X )) =

0 if f(X)=Y1 otherwise

then EPE = EX

∑y∈0,1 L(y , f (X ))P(y |X ) and hence

Y (x) = argminy0∈0,1∑

y∈0,1

L(y , y0)P(y |X = x)

= argminy0∈0,1 1− P(y0|X = x)

which yields the Bayes classifier

Y (x) = argmaxy P(y |X = x)

question: how to model P(Y |X )?

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 46: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Statistical decisions for discrete Y

if Y ∈ 0, 1, consider loss function

L(Y , f (X )) =

0 if f(X)=Y1 otherwise

then EPE = EX

∑y∈0,1 L(y , f (X ))P(y |X ) and hence

Y (x) = argminy0∈0,1∑

y∈0,1

L(y , y0)P(y |X = x)

= argminy0∈0,1 1− P(y0|X = x)

which yields the Bayes classifier

Y (x) = argmaxy P(y |X = x)

question: how to model P(Y |X )?

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 47: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Bayes classifier results

−1 −0.5 0 0.5−1.5

−1

−0.5

0

0.5

1

1.5

−3 −2 −1 0 1 2−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

−3 −2 −1 0 1 2 3 4−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 48: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Motivation 1: classificationMotivation 2: image segmentationStatistical decision theory

Method combinations

• nonlinear models e.g. f (x) =∑p

j=1 fj(xj) or basis expansionf (x) =

∑j hj(x)βj with polynomial, Fourier or sigmoidal bases (→

neural networks)

• prediction/function approximation by maximum-likelihoodestimation of parameters

• enhance generalizability by adding regularization term +λJ(f ) toRSS(f ) for f from some function class

• generalize inner-product methods to nonlinear situations byhigh-dimensional embedding x 7→ Φ(x) and kernelsk(x , x ′) = Φ(x)>Φ(x)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 49: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Outline

1 Supervised methodsMotivation 1: classificationMotivation 2: image segmentationStatistical decision theory

2 Unsupervised methodsClusteringk-meansPartitional clustering

3 Signal component analysisIndependent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

4 Conclusions

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 50: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Clustering

• explanation by example• goal: differentiate

hand-written digits 2 and4

• given a set of unknowngray-scale images of 2s and4s, find the subset of 2sand the subset of 4s

•• unsupervised learning by

example

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 51: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Clustering

• explanation by example• goal: differentiate

hand-written digits 2 and4

• given a set of unknowngray-scale images of 2s and4s, find the subset of 2sand the subset of 4s

• versus• unsupervised learning by

example

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 52: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Clustering

• explanation by example• goal: differentiate

hand-written digits 2 and4

• given a set of unknowngray-scale images of 2s and4s, find the subset of 2sand the subset of 4s

• versus• unsupervised learning by

example

• like a baby:

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 53: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Example data set

• here: machine learningi.e. statistical approach

• needs many test cases:

here 1000 28x28 images each• interpret each 28x28-image as

element of R784:

. . .

. . .

• dimension reduction viaPCA to only 2 dimensions

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 54: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Example data set

• here: machine learningi.e. statistical approach

• needs many test cases:

here 1000 28x28 images each• interpret each 28x28-image as

element of R784:

. . .

. . .

• dimension reduction viaPCA to only 2 dimensions

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 55: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Example data set

• here: machine learningi.e. statistical approach

• needs many test cases:

here 1000 28x28 images each• interpret each 28x28-image as

element of R784:

. . .

. . .

• dimension reduction viaPCA to only 2 dimensions

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 56: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Example data set

• here: machine learningi.e. statistical approach

• needs many test cases:

here 1000 28x28 images each• interpret each 28x28-image as

element of R784:

. . .

. . .

• dimension reduction viaPCA to only 2 dimensions

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 57: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Example data set

• here: machine learningi.e. statistical approach

• needs many test cases:

here 1000 28x28 images each• interpret each 28x28-image as

element of R784:

. . .

. . .

• dimension reduction viaPCA to only 2 dimensions

−1 0 1 2 3 4 5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 58: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

k-means

• clustering:• data vectors (samples)

x(1), x(2), . . . , x(T ) ∈ Rn

• distance measure d(x, y)between samples

• algorithm: k-means• given number k of clusters• initialize centroids

randomly• update rules: batch or

sequential (online)

• cost function• minimize E(ci ,Ci ) :=Pk

i=11|Ci |

Px∈Ci

d(xi , ci )2

[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 59: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

k-means

• clustering:• data vectors (samples)

x(1), x(2), . . . , x(T ) ∈ Rn

• distance measure d(x, y)between samples

• algorithm: k-means• given number k of clusters• initialize centroids

randomly• update rules: batch or

sequential (online)

• cost function• minimize E(ci ,Ci ) :=Pk

i=11|Ci |

Px∈Ci

d(xi , ci )2

Samples

Centroids

[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 60: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

k-means

• clustering:• data vectors (samples)

x(1), x(2), . . . , x(T ) ∈ Rn

• distance measure d(x, y)between samples

• algorithm: k-means• given number k of clusters• initialize centroids

randomly• update rules: batch or

sequential (online)

• cost function• minimize E(ci ,Ci ) :=Pk

i=11|Ci |

Px∈Ci

d(xi , ci )2

Aufteilung

batch k-means

[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 61: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

k-means

• clustering:• data vectors (samples)

x(1), x(2), . . . , x(T ) ∈ Rn

• distance measure d(x, y)between samples

• algorithm: k-means• given number k of clusters• initialize centroids

randomly• update rules: batch or

sequential (online)

• cost function• minimize E(ci ,Ci ) :=Pk

i=11|Ci |

Px∈Ci

d(xi , ci )2

Zuweisung

batch k-means

[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 62: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

k-means

• clustering:• data vectors (samples)

x(1), x(2), . . . , x(T ) ∈ Rn

• distance measure d(x, y)between samples

• algorithm: k-means• given number k of clusters• initialize centroids

randomly• update rules: batch or

sequential (online)

• cost function• minimize E(ci ,Ci ) :=Pk

i=11|Ci |

Px∈Ci

d(xi , ci )2

batch k-means

[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 63: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

k-means

• clustering:• data vectors (samples)

x(1), x(2), . . . , x(T ) ∈ Rn

• distance measure d(x, y)between samples

• algorithm: k-means• given number k of clusters• initialize centroids

randomly• update rules: batch or

sequential (online)

• cost function• minimize E(ci ,Ci ) :=Pk

i=11|Ci |

Px∈Ci

d(xi , ci )2

beliebiges Sample

sequentieller k-means

[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 64: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

k-means

• clustering:• data vectors (samples)

x(1), x(2), . . . , x(T ) ∈ Rn

• distance measure d(x, y)between samples

• algorithm: k-means• given number k of clusters• initialize centroids

randomly• update rules: batch or

sequential (online)

• cost function• minimize E(ci ,Ci ) :=Pk

i=11|Ci |

Px∈Ci

d(xi , ci )2

nächster Centroid

sequentieller k-means

[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 65: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

k-means

• clustering:• data vectors (samples)

x(1), x(2), . . . , x(T ) ∈ Rn

• distance measure d(x, y)between samples

• algorithm: k-means• given number k of clusters• initialize centroids

randomly• update rules: batch or

sequential (online)

• cost function• minimize E(ci ,Ci ) :=Pk

i=11|Ci |

Px∈Ci

d(xi , ci )2

Update

sequentieller k-means

[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 66: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

k-means

• clustering:• data vectors (samples)

x(1), x(2), . . . , x(T ) ∈ Rn

• distance measure d(x, y)between samples

• algorithm: k-means• given number k of clusters• initialize centroids

randomly• update rules: batch or

sequential (online)

• cost function• minimize E(ci ,Ci ) :=Pk

i=11|Ci |

Px∈Ci

d(xi , ci )2

sequentieller k-means

[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 67: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

k-means

• clustering:• data vectors (samples)

x(1), x(2), . . . , x(T ) ∈ Rn

• distance measure d(x, y)between samples

• algorithm: k-means• given number k of clusters• initialize centroids

randomly• update rules: batch or

sequential (online)

• cost function• minimize E(ci ,Ci ) :=Pk

i=11|Ci |

Px∈Ci

d(xi , ci )2

sequentieller k-means

[Theis, Gruber. Grassmann clustering. Proc. EUSIPCO 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 68: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Batch k-means

−1 0 1 2 3 4 5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3k−means after 1 iteration

done: error 4.5%

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 69: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Batch k-means

−1 0 1 2 3 4 5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3k−means after 2 iterations

done: error 4.5%

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 70: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Batch k-means

−1 0 1 2 3 4 5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3k−means after 3 iterations

done: error 4.5%

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 71: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Batch k-means

−1 0 1 2 3 4 5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3k−means after 4 iterations

done: error 4.5%

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 72: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Batch k-means

−1 0 1 2 3 4 5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3k−means after 5 iterations

done: error 4.5%

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 73: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Batch k-means

−1 0 1 2 3 4 5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3k−means after 6 iterations

done: error 4.5%

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 74: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Batch k-means

−1 0 1 2 3 4 5−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3k−means after 7 iterations

done: error 4.5%

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 75: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Partitional clustering• goal:

• given a set A of points in metric space (M, d)• find partition of A into Bi ,

Si Bi = A, and centroids ci ∈ M

minimizing

E(B1, c1, . . . , Bk , ck) :=kX

i=1

Xa∈Bi

d(a, ci )2. (1)

• A = a1, . . . , aT ⇒ constrained non-linear opt. problem• minimize

E(W,C) :=kX

i=1

TXt=1

witd(ai , ci )2. (2)

subject to

wit ∈ 0, 1,kX

i=1

wit = 1 for 1 ≤ i ≤ k, 1 ≤ t ≤ T . (3)

• C := c1, . . . , ck centroid locations, W := (wit) partition matrix

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 76: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Partitional clustering• goal:

• given a set A of points in metric space (M, d)• find partition of A into Bi ,

Si Bi = A, and centroids ci ∈ M

minimizing

E(B1, c1, . . . , Bk , ck) :=kX

i=1

Xa∈Bi

d(a, ci )2. (1)

• A = a1, . . . , aT ⇒ constrained non-linear opt. problem• minimize

E(W,C) :=kX

i=1

TXt=1

witd(ai , ci )2. (2)

subject to

wit ∈ 0, 1,kX

i=1

wit = 1 for 1 ≤ i ≤ k, 1 ≤ t ≤ T . (3)

• C := c1, . . . , ck centroid locations, W := (wit) partition matrix

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 77: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Minimize this!

• common approach: partial optimization for W and C• alternate minimization of either W and C while keeping the other

one fixed

• ⇒ batch k-means algorithm• initial random choice of centroids c1, . . . , ck

• iterate until convergence:• cluster assignment: for each at determine an index i(t) such that

i(t) = argmini d(at , ci )

• cluster update: within each cluster Bi := at |i(t) = i determine thecentroid ci by minimizing

ci := argminc

Xa∈Bi

d(a, c)2

• convergence to local minimum (??)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 78: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Minimize this!

• common approach: partial optimization for W and C• alternate minimization of either W and C while keeping the other

one fixed

• ⇒ batch k-means algorithm• initial random choice of centroids c1, . . . , ck

• iterate until convergence:• cluster assignment: for each at determine an index i(t) such that

i(t) = argmini d(at , ci )

• cluster update: within each cluster Bi := at |i(t) = i determine thecentroid ci by minimizing

ci := argminc

Xa∈Bi

d(a, c)2

• convergence to local minimum (??)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 79: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Minimize this!

• common approach: partial optimization for W and C• alternate minimization of either W and C while keeping the other

one fixed

• ⇒ batch k-means algorithm• initial random choice of centroids c1, . . . , ck

• iterate until convergence:• cluster assignment: for each at determine an index i(t) such that

i(t) = argmini d(at , ci )

• cluster update: within each cluster Bi := at |i(t) = i determine thecentroid ci by minimizing

ci := argminc

Xa∈Bi

d(a, c)2

• convergence to local minimum (??)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 80: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Euclidean case

• special case: M := Rn and the Euclidean distance d(x , y) := ‖x − y‖• centroids can be calculated in closed form:

• centroid is given by the cluster mean

ci := (1/|Bi |)Xa∈Bi

a

• this follows directly from

Xa∈Bi

‖a− ci‖2 =Xa∈Bi

nXj=1

(aj − cij)2 =

nXj=1

Xa∈Bi

(a2j − 2ajcij + c2

ij )

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 81: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Euclidean case

• special case: M := Rn and the Euclidean distance d(x , y) := ‖x − y‖• centroids can be calculated in closed form:

• centroid is given by the cluster mean

ci := (1/|Bi |)Xa∈Bi

a

• this follows directly from

Xa∈Bi

‖a− ci‖2 =Xa∈Bi

nXj=1

(aj − cij)2 =

nXj=1

Xa∈Bi

(a2j − 2ajcij + c2

ij )

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 82: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Euclidean case

• special case: M := Rn and the Euclidean distance d(x , y) := ‖x − y‖• centroids can be calculated in closed form:

• centroid is given by the cluster mean

ci := (1/|Bi |)Xa∈Bi

a

• this follows directly from

Xa∈Bi

‖a− ci‖2 =Xa∈Bi

nXj=1

(aj − cij)2 =

nXj=1

Xa∈Bi

(a2j − 2ajcij + c2

ij )

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 83: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Extensions

ci := argminc

∑a∈Bi

d(a, c)p

• more difficult optimization problems:• non-Euclidean spaces e.g. RPn or Grassmann manifolds• extensions from p = 2 to e.g. p = 1 or p <• p = 1 corresponds to finding the spatial median of Bi

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 84: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Clusteringk-meansPartitional clustering

Extensions

ci := argminc

∑a∈Bi

d(a, c)p

• more difficult optimization problems:• non-Euclidean spaces e.g. RPn or Grassmann manifolds• extensions from p = 2 to e.g. p = 1 or p <• p = 1 corresponds to finding the spatial median of Bi

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 85: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Outline

1 Supervised methodsMotivation 1: classificationMotivation 2: image segmentationStatistical decision theory

2 Unsupervised methodsClusteringk-meansPartitional clustering

3 Signal component analysisIndependent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

4 Conclusions

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 86: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Independent component analysisexample: Cocktail party problem of the brain

auditorycortex

worddetection

decision

auditorycortex 2

[Keck, Theis, Gruber, Lang, Specht, Puntonet. 3D spatial analysis of fMRI data on a word perception task. LNCS, 3195:977-984]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 87: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

BSS model

• Blind source separation (BSS) problem

x(t) = As(t) + ε(t)

• x(t) observed m-dimensional random vector• A (unknown) full-rank m × n matrix• s(t) (unknown) n-dimensional source signals (here: n ≤ m)• ε(t) (unknown) white noise

• goal: given x, recover A and s!

• additional assumptions necessary• stochastically independent s(t): ps(s1, . . . , sn) = ps1(s1) . . . psn (sn)⇒ independent component analysis (ICA)

• sparse source signals si (t) ⇒ sparse component analysis (SCA)• nonnegative s and A ⇒ nonnegative matrix factorization (NMF)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 88: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

BSS model

• Blind source separation (BSS) problem

x(t) = As(t) + ε(t)

• x(t) observed m-dimensional random vector• A (unknown) full-rank m × n matrix• s(t) (unknown) n-dimensional source signals (here: n ≤ m)• ε(t) (unknown) white noise

• goal: given x, recover A and s!

• additional assumptions necessary• stochastically independent s(t): ps(s1, . . . , sn) = ps1(s1) . . . psn (sn)⇒ independent component analysis (ICA)

• sparse source signals si (t) ⇒ sparse component analysis (SCA)• nonnegative s and A ⇒ nonnegative matrix factorization (NMF)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 89: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

• important questions in data analysis• model? (restrictions to A and s)• indeterminacies of the model?• algorithmic identification given x?

• identifiability• obvious indeterminacies: scaling L and permutation P

Theorem

Let the independent random vector s ∈ L2 contain at most one gaussiancomponent. Given two ICA solutions As = A′s′, then A = A′LP.

Note: theorem does not hold for gaussiansources s.

[Theis. A new concept for separability problems in blind source separation. Neural Computation, 2004]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 90: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

• important questions in data analysis• model? (restrictions to A and s)• indeterminacies of the model?• algorithmic identification given x?

• identifiability• obvious indeterminacies: scaling L and permutation P

Theorem

Let the independent random vector s ∈ L2 contain at most one gaussiancomponent. Given two ICA solutions As = A′s′, then A = A′LP.

Note: theorem does not hold for gaussiansources s.

[Theis. A new concept for separability problems in blind source separation. Neural Computation, 2004]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 91: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

• important questions in data analysis• model? (restrictions to A and s)• indeterminacies of the model?• algorithmic identification given x?

• identifiability• obvious indeterminacies: scaling L and permutation P

Theorem

Let the independent random vector s ∈ L2 contain at most one gaussiancomponent. Given two ICA solutions As = A′s′, then A = A′LP.

Note: theorem does not hold for gaussiansources s.

[Theis. A new concept for separability problems in blind source separation. Neural Computation, 2004]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 92: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

• important questions in data analysis• model? (restrictions to A and s)• indeterminacies of the model?• algorithmic identification given x?

• identifiability• obvious indeterminacies: scaling L and permutation P

Theorem

Let the independent random vector s ∈ L2 contain at most one gaussiancomponent. Given two ICA solutions As = A′s′, then A = A′LP.

−2

−1

0

1

2

−2

−1

0

1

20

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Note: theorem does not hold for gaussiansources s.

[Theis. A new concept for separability problems in blind source separation. Neural Computation, 2004]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 93: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

ICA algorithms

• basic scheme of ICA algorithms (case m = n)• search for invertible demixing matrix W that minimizes some

dependence measure of Wx

• some contrasts• minimize mutual information I (Wx) (?)• maximize neural network output entropy H(f (Wx)) (?)• extend PCA by performing nonlinear decorrelation (?)• maximize non-Gaussianity of output components (Wx)i (?)• minimize off-diagonal error of Hln pWx

• minimize median deviation of Wx

[Theis et al. Linear geometric ICA: Fundamentals and algorithms. Neural Computation, 2003]

[Theis, Lang, Puntonet. A geometric algorithm for overcomplete linear ICA. Neurocomputing, 2004]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 94: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

ICA algorithms

• basic scheme of ICA algorithms (case m = n)• search for invertible demixing matrix W that minimizes some

dependence measure of Wx

• some contrasts• minimize mutual information I (Wx) (?)• maximize neural network output entropy H(f (Wx)) (?)• extend PCA by performing nonlinear decorrelation (?)• maximize non-Gaussianity of output components (Wx)i (?)• minimize off-diagonal error of Hln pWx

• minimize median deviation of Wx

[Theis et al. Linear geometric ICA: Fundamentals and algorithms. Neural Computation, 2003]

[Theis, Lang, Puntonet. A geometric algorithm for overcomplete linear ICA. Neurocomputing, 2004]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 95: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Optimization

• problem: minimize costfunction f (W) on Gl(n) orO(n)

• often: gradient descent:∆W ∝ −∇f (W)

• in high dimensions:simulated annealing orgenetic algorithms

• use non-Euclidean structure ofGl(n)

• Euclidean gradient notcompatible with groupGl(n)

• define natural gradient

∇natf (W) = ∇eucf (W)W>W

⇒ considerable performanceincrease

[Stadlthanner, Theis, Puntonet, Lang. Extended sparse nonnegative matrix factorization. LNCS, 3512:249-256][Squartini, Theis. New Riemannian metrics for speeding-up the convergence of over- and underdetermined ICA. In preparation]

[Theis. Gradients on matrix manifolds and their chain rule. Submitted to NIPS LR, 2005]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 96: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Optimization

• problem: minimize costfunction f (W) on Gl(n) orO(n)

• often: gradient descent:∆W ∝ −∇f (W)

• in high dimensions:simulated annealing orgenetic algorithms

• use non-Euclidean structure ofGl(n)

• Euclidean gradient notcompatible with groupGl(n)

• define natural gradient

∇natf (W) = ∇eucf (W)W>W

⇒ considerable performanceincrease

[Stadlthanner, Theis, Puntonet, Lang. Extended sparse nonnegative matrix factorization. LNCS, 3512:249-256][Squartini, Theis. New Riemannian metrics for speeding-up the convergence of over- and underdetermined ICA. In preparation]

[Theis. Gradients on matrix manifolds and their chain rule. Submitted to NIPS LR, 2005]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 97: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

fMRI analysis

• function magneticresonance imaging

• noninvasive brain imagingtechnique ⇒ information onbrain activation patterns

• activation maps helpidentifying task-relatedbrain regions

• BSS techniques for fMRIpossible, see (?).

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 98: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

fMRI analysis

spatial-only BSS

• function magneticresonance imaging

• noninvasive brain imagingtechnique ⇒ information onbrain activation patterns

• activation maps helpidentifying task-relatedbrain regions

• BSS techniques for fMRIpossible, see (?).

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 99: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Experimental setup

• experiment• block design protocol:

• 5 time instants of visualstimulation

• 5 instants of rest

• 100 scans taking 3s each• data set

• well known design →expected activity in visualcortex

• here: use only a singlehorizontal slice

• preprocessing• motion correction• smoothing

data acquired by D. Auer, MPI of Psychiatry, Munich

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 100: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Results

1 2

3 4

(a) spatial sources sS

1 cc: 0.18 2 cc: 0.00

3 cc: 0.05 4 cc: 0.90

(b) temporal sources tS

• component 2 partially represents the frontal eye fields• component 4: stimulus component, cc = 0.9 with stimulus

[Theis, Gruber, Keck, Lang. Functional MRI analysis by a novel spatiotemporal ICA algorithm. LNCS 3696:677-682]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 101: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Independent subspace analysis

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 102: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Why extend ICA?

• identifiability of ICA onlyholds if data follows generativemodel with independentsources

• simulation• apply ICA to data not

fulfilling the ICA model• here sources consist of a

2d- and a 1-d irreduciblecomponent

• plot Amari-error over 100runs

.

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 103: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Why extend ICA?

• identifiability of ICA onlyholds if data follows generativemodel with independentsources

• simulation• apply ICA to data not

fulfilling the ICA model• here sources consist of a

2d- and a 1-d irreduciblecomponent

• plot Amari-error over 100runs

FastICA JADE Extended Infomax0

1

2

3

4

cros

stal

king

err

or

result: no recovery of mixingmatrix

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 104: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Independent subspace analysis

• require stochastic independence only between groups of sourcecomponents

• nk-dimensional S is to be k-independent i.e.0B@ S1

...Sk

1CA , . . . ,

0B@ Snk−k+1

...Snk

1CAmutually independent⇒ independent subspace analysis (ISA)

• recent result: extension to arbitrary group-size• major advantage:

general independent subspace analysis (ISA) always exists

[Theis. Uniqueness of complex and multidimensional independent component analysis. Signal Processing, 2004]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 105: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Independent subspace analysis

• require stochastic independence only between groups of sourcecomponents

• nk-dimensional S is to be k-independent i.e.0B@ S1

...Sk

1CA , . . . ,

0B@ Snk−k+1

...Snk

1CAmutually independent⇒ independent subspace analysis (ISA)

• recent result: extension to arbitrary group-size• major advantage:

general independent subspace analysis (ISA) always exists

[Theis. Uniqueness of complex and multidimensional independent component analysis. Signal Processing, 2004]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 106: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

PCA

X

S

A

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 107: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

ICA

X

SL

P

A

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 108: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

ISA with fixed groupsize

X

SL

P

A

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 109: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

General ISA

X

SL

P

A

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 110: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

ISA framework

Definition

Y independent component of X :⇔ ∃ X = A(Y,Z) such that Y and Zare stochastically independent.

Definition (general ISA)

• S is irreducible if it contains no lower-dim. independent cpt.

• W ∈ Gl(n) independent subspace analysis of X :⇔∃ WX = (S1, . . . ,Sk) with pairwise independent, irreducible Si

Theorem

Given a random vector X with existing covariance, then an ISA of Xexists and is unique except for scaling and permutation.

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 111: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

ISA framework

Definition

Y independent component of X :⇔ ∃ X = A(Y,Z) such that Y and Zare stochastically independent.

Definition (general ISA)

• S is irreducible if it contains no lower-dim. independent cpt.

• W ∈ Gl(n) independent subspace analysis of X :⇔∃ WX = (S1, . . . ,Sk) with pairwise independent, irreducible Si

Theorem

Given a random vector X with existing covariance, then an ISA of Xexists and is unique except for scaling and permutation.

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 112: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

ISA framework

Definition

Y independent component of X :⇔ ∃ X = A(Y,Z) such that Y and Zare stochastically independent.

Definition (general ISA)

• S is irreducible if it contains no lower-dim. independent cpt.

• W ∈ Gl(n) independent subspace analysis of X :⇔∃ WX = (S1, . . . ,Sk) with pairwise independent, irreducible Si

Theorem

Given a random vector X with existing covariance, then an ISA of Xexists and is unique except for scaling and permutation.

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 113: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Algebraic ISA algorithms

• main idea: source condition matrices Ci (S) are block-diagonal

• subspace JADE• after whitening assume orthogonal A• group-independence of S: contracted quadricovariance matrices

Cij(S) are block-diagonal• perform joint block diagonalization of Cij(X) to get A>

• for general ISA, estimate block-structure after diagonalization

=Cij(S) A>

Cij(X)I

A

[Theis. Towards a general independent subspace analysis. NIPS 2006 accepted]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 114: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Algebraic ISA algorithms

• main idea: source condition matrices Ci (S) are block-diagonal

• subspace JADE• after whitening assume orthogonal A• group-independence of S: contracted quadricovariance matrices

Cij(S) are block-diagonal• perform joint block diagonalization of Cij(X) to get A>

• for general ISA, estimate block-structure after diagonalization

=Cij(S) A>

Cij(X)I

A

[Theis. Towards a general independent subspace analysis. NIPS 2006 accepted]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 115: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Joint Block Diagonalization with unknown block-sizes

Joint Block Diagonalization (JBD)

• given n × n-matrices C1, . . . ,CK and a partition m,m1 + · · ·+ mr = n of n

• goal: find orthogonal A such that ∀k: A>CkA is m-block-diagonal

⇒ minimize (e.g. by applying iterative Givens-rotations)

f m(A) :=K∑

k=1

‖A>CkA− diagMm(A>CkA)‖2F

unknown blocksize m ⇒ general JBD then searches formaximal-length block structure i.e.

(A,m) = argmaxm | ∃A:f m(A)=0 |m|

result: JBD by JD: any block-optimal JBD i.e. zero of f m is a localminimum of ordinary joint diagonalization.

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 116: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Joint Block Diagonalization with unknown block-sizes

Joint Block Diagonalization (JBD)

• given n × n-matrices C1, . . . ,CK and a partition m,m1 + · · ·+ mr = n of n

• goal: find orthogonal A such that ∀k: A>CkA is m-block-diagonal

⇒ minimize (e.g. by applying iterative Givens-rotations)

f m(A) :=K∑

k=1

‖A>CkA− diagMm(A>CkA)‖2F

unknown blocksize m ⇒ general JBD then searches formaximal-length block structure i.e.

(A,m) = argmaxm | ∃A:f m(A)=0 |m|

result: JBD by JD: any block-optimal JBD i.e. zero of f m is a localminimum of ordinary joint diagonalization.

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 117: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Joint Block Diagonalization with unknown block-sizes

Joint Block Diagonalization (JBD)

• given n × n-matrices C1, . . . ,CK and a partition m,m1 + · · ·+ mr = n of n

• goal: find orthogonal A such that ∀k: A>CkA is m-block-diagonal

⇒ minimize (e.g. by applying iterative Givens-rotations)

f m(A) :=K∑

k=1

‖A>CkA− diagMm(A>CkA)‖2F

unknown blocksize m ⇒ general JBD then searches formaximal-length block structure i.e.

(A,m) = argmaxm | ∃A:f m(A)=0 |m|

result: JBD by JD: any block-optimal JBD i.e. zero of f m is a localminimum of ordinary joint diagonalization.

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 118: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Joint Block Diagonalization with unknown block-sizes

Joint Block Diagonalization (JBD)

• given n × n-matrices C1, . . . ,CK and a partition m,m1 + · · ·+ mr = n of n

• goal: find orthogonal A such that ∀k: A>CkA is m-block-diagonal

⇒ minimize (e.g. by applying iterative Givens-rotations)

f m(A) :=K∑

k=1

‖A>CkA− diagMm(A>CkA)‖2F

unknown blocksize m ⇒ general JBD then searches formaximal-length block structure i.e.

(A,m) = argmaxm | ∃A:f m(A)=0 |m|

result: JBD by JD: any block-optimal JBD i.e. zero of f m is a localminimum of ordinary joint diagonalization.

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 119: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Example

5 10 15 20 25 30 35 40

5

10

15

20

25

30

35

405 10 15 20 25 30 35 40

5

10

15

20

25

30

35

40

5 10 15 20 25 30 35 40

5

10

15

20

25

30

35

40

(unknown) C1 A>A w/o rec. P A>A .

• performance of the proposed general JBD• (unknown) block-partition 40 = 1 + 2 + 2 + 3 + 3 + 5 + 6 + 6 + 6 + 6• additive noise with SNR of 5dB, K = 100 matrices• result: estimate A equals A after permutation recovery

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 120: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Extraction of fetal electrocardiograms

• separate fetal ECG (FECG) recordings from the mother’s ECG(MECG)

• apply Hessian-based MICA with k = 2 and 500 Hessians

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 121: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

0 100 200 300 400 500−50

0

50

0 100 200 300 400 500−50

0

120

0 100 200 300 400 500−100

0

50

(a) ECG recordings

0 100 200 300 400 500−120

0

50

0 100 200 300 400 500−20

0

80

0 100 200 300 400 500−20

0

20

(b) extracted sources

0 100 200 300 400 500−50

0

50

0 100 200 300 400 500−50

0

120

0 100 200 300 400 500−100

0

50

(c) MECG part

0 100 200 300 400 500−50

0

50

0 100 200 300 400 500−50

0

120

0 100 200 300 400 500−100

0

50

(d) FECG part

[Theis. Blind signal separation into groups of dependent signals using joint block diagonalization. Proc. ISCAS 2005]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 122: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Sparse component analysis

sparse

[Theis, Puntonet, Lang. Median-based clustering for underdetermined blind signal processing. IEEE SPL, 2005]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 123: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Model

• Sparse Component Analysis (SCA) problem

x(t) = As(t)

• observed mixtures x(t) ∈ Rm

• A (unknown) real matrix with linearly independent columns• s(t) (unknown) (m − 1)-sparse sources s(t) ∈ Rn i.e. s(t) has at

most (m − 1) non-zeros

• goal: recover unknown A and s(t) given only x(t)

Theorem

If s(t) is (m − 1)-sparse and A and s(t) in ’general position’, both A ands(t) are identifiable (except for scaling and permutation).

[Georgiev, Theis, Cichocki. Sparse component analysis and blind source separation of underdetermined mixtures. IEEE TNN, 2005]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 124: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Model

• Sparse Component Analysis (SCA) problem

x(t) = As(t)

• observed mixtures x(t) ∈ Rm

• A (unknown) real matrix with linearly independent columns• s(t) (unknown) (m − 1)-sparse sources s(t) ∈ Rn i.e. s(t) has at

most (m − 1) non-zeros

• goal: recover unknown A and s(t) given only x(t)

Theorem

If s(t) is (m − 1)-sparse and A and s(t) in ’general position’, both A ands(t) are identifiable (except for scaling and permutation).

[Georgiev, Theis, Cichocki. Sparse component analysis and blind source separation of underdetermined mixtures. IEEE TNN, 2005]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 125: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

SCA algorithm

• matrix identification by multiple hyperplane detection• e.g. using Hough transform• robust against outliers and noise

• source recovery using sparsity andknown matrix

−1

−0.5

0

0.5

1 −1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

[Theis, Georgiev, Cichocki. Robust sparse component analysis based on a generalized Hough transform. Signal Processing 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 126: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

SCA of surface electromyograms

• electromyogram (EMG): electric signal generated by a contractingmuscle

• surface EMG: non-invasive, however source overlaps

cooperation with G. Garcıa, Bioinformatic Engineering, Osaka

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 127: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Results

source and SCA recovery within 8 artificial, dependent mixtures

• results on toy data: sparseness works as separation criterion

• real data• relative sEMG enhancement 24.6± 21.4% (mean over 9 subjects)• beats standard signal processing and ICA

[Theis, Garcıa. On the use of sparse signal decomposition in the analysis of multi-channel surface EMGs. Signal Processing, 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 128: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Results

source and SCA recovery within 8 artificial, dependent mixtures

• results on toy data: sparseness works as separation criterion

• real data• relative sEMG enhancement 24.6± 21.4% (mean over 9 subjects)• beats standard signal processing and ICA

[Theis, Garcıa. On the use of sparse signal decomposition in the analysis of multi-channel surface EMGs. Signal Processing, 2006]

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 129: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

SCA of functional MRI data

1 2 3

4 5

1 cc: −0.16 2 cc: −0.28 3 cc: 0.13

4 cc: −0.04 5 cc: −0.88

component maps (S) time courses (A)

• complete SCA was performed using k-means hyperplane clustering

• components 2 and 3 represents inner ventricles, component 1 contains thefrontal eye fields

• component 5 is desired visual stimulus component — active in the visualcortex (crosscorrelation with stimulus |cc| = 0.88 — fastICA yields similar|cc| = 0.9)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 130: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

SCA of functional MRI data

1 2 3

4 5

1 cc: −0.16 2 cc: −0.28 3 cc: 0.13

4 cc: −0.04 5 cc: −0.88

component maps (S) time courses (A)

• complete SCA was performed using k-means hyperplane clustering

• components 2 and 3 represents inner ventricles, component 1 contains thefrontal eye fields

• component 5 is desired visual stimulus component — active in the visualcortex (crosscorrelation with stimulus |cc| = 0.88 — fastICA yields similar|cc| = 0.9)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 131: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinear SCA

Given m-dimensional random vector x, find representation

x = f(As)

with unknown

• n-dim. random vector s (sources)

• m × n-matrix A (mixing matrix)

• diagonal invertible f = f1 × . . .× fm (postnonlinearities)

postnonlinear ICA ⇒ s independent (see (?)) here: SCA model ⇒ s is

(m − 1)-sparse

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 132: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Overcomplete postnonlinear cocktail-party problem

f1

f2

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 133: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Overcomplete postnonlinear cocktail-party problem

f1

f2

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 134: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Overcomplete postnonlinear cocktail-party problem

f1

f2

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 135: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity identification lemma

Given an invertible 2× 2-matrix A, define L at 0 as

L := A([0, ε)× 0 ∪ 0 × [0, ε)).

Lemma

If a diagonal analytic diffeomorphism h := h1 × h2 maps an L (in ’generalposition’) at 0 again on an L at 0, then it is a linear scaling.

h

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 136: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Identifiability

• due to linear identifiability it is enough show that if f(As) = f(As)then h = f−1 f is linear scaling

• case m = 2: image of As and As are finite unions of L’s, so thisfollows from previous lemma

h

h

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 137: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Identifiability• due to linear identifiability it is enough show that if f(As) = f(As)

then h = f−1 f is linear scaling

• case m = 2: image of As and As are finite unions of L’s, so thisfollows from previous lemma

h

h

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 138: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Identifiability

• due to linear identifiability it is enough show that if f(As) = f(As)then h = f−1 f is linear scaling

• case m = 2: image of As and As are finite unions of L’s, so thisfollows from previous lemma

h

h

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 139: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Identifiability: proof

f

f

R

R

3

3

2

2

2

R

R

RA

A

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 140: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Algorithm

• multistage separation algorithm:• find separating nonlinearities g• estimate mixing matrix A of linearized model g(x)• estimate sources given A and g(x)

• how can g be found algorithmically?

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 141: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Algorithm

• multistage separation algorithm:• find separating nonlinearities g• estimate mixing matrix A of linearized model g(x)• estimate sources given A and g(x)

• how can g be found algorithmically?

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 142: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

• for simplicity assume m = 2.

• geometrical preprocessing: determine two 1-dimensionalsubmanifolds in the image of x

• find curves y(t) and z(t) in R2 which are mapped onto an L by g.

• simple method:• choose arbitrary starting points y(t1) and z(t1) among samples of x• iteratively pick closest sample to previous y(ti−1) resp. z(ti−1) with

smaller modulus

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 143: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

• for simplicity assume m = 2.

• geometrical preprocessing: determine two 1-dimensionalsubmanifolds in the image of x

• find curves y(t) and z(t) in R2 which are mapped onto an L by g.

• simple method:• choose arbitrary starting points y(t1) and z(t1) among samples of x• iteratively pick closest sample to previous y(ti−1) resp. z(ti−1) with

smaller modulus

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 144: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

• for simplicity assume m = 2.

• geometrical preprocessing: determine two 1-dimensionalsubmanifolds in the image of x

• find curves y(t) and z(t) in R2 which are mapped onto an L by g.

• simple method:• choose arbitrary starting points y(t1) and z(t1) among samples of x• iteratively pick closest sample to previous y(ti−1) resp. z(ti−1) with

smaller modulus

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 145: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

fA−→

0 10 20 30 40 50 60 70 80 90 100−1.5

−1

−0.5

0

0.5

1

1.5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

−1.5 −1 −0.5 0 0.5 1 1.5−5

−4

−3

−2

−1

0

1

2

3

4

5

−1.5 −1 −0.5 0 0.5 1 1.5−5

−4

−3

−2

−1

0

1

2

3

4

5

geometrical preprocessing mixture density

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 146: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

fA−→

0 10 20 30 40 50 60 70 80 90 100−1.5

−1

−0.5

0

0.5

1

1.5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

−1.5 −1 −0.5 0 0.5 1 1.5−5

−4

−3

−2

−1

0

1

2

3

4

5

−1.5 −1 −0.5 0 0.5 1 1.5−5

−4

−3

−2

−1

0

1

2

3

4

5

geometrical preprocessing mixture density

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 147: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

fA−→

0 10 20 30 40 50 60 70 80 90 100−1.5

−1

−0.5

0

0.5

1

1.5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

−1.5 −1 −0.5 0 0.5 1 1.5−5

−4

−3

−2

−1

0

1

2

3

4

5

−1.5 −1 −0.5 0 0.5 1 1.5−5

−4

−3

−2

−1

0

1

2

3

4

5

geometrical preprocessing

mixture density

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 148: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

0 10 20 30 40 50 60 70 80 90 100−0.5

0

0.5

fA−→

0 10 20 30 40 50 60 70 80 90 100−1.5

−1

−0.5

0

0.5

1

1.5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

−1.5 −1 −0.5 0 0.5 1 1.5−5

−4

−3

−2

−1

0

1

2

3

4

5

−1.5 −1 −0.5 0 0.5 1 1.5−5

−4

−3

−2

−1

0

1

2

3

4

5

geometrical preprocessing mixture density

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 149: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

• reparametrization (y := y y−11 ) of the curves gives y1 = z1 = id.

• hence g y = (g1, ag1) and g z = (g1, bg1)

• g2 y2 = ag1 = abg2 z2

• analytical geometrical postnonlinearity detection: find analytical1d diffeomorphism g with

g y = cg z

for c 6= 0,±1 and given curves y , z : (−1, 1) → R withy(0) = z(0) = 0.

• note c = y ′(0)/z ′(0)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 150: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

• reparametrization (y := y y−11 ) of the curves gives y1 = z1 = id.

• hence g y = (g1, ag1) and g z = (g1, bg1)

• g2 y2 = ag1 = abg2 z2

• analytical geometrical postnonlinearity detection: find analytical1d diffeomorphism g with

g y = cg z

for c 6= 0,±1 and given curves y , z : (−1, 1) → R withy(0) = z(0) = 0.

• note c = y ′(0)/z ′(0)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 151: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

• reparametrization (y := y y−11 ) of the curves gives y1 = z1 = id.

• hence g y = (g1, ag1) and g z = (g1, bg1)

• g2 y2 = ag1 = abg2 z2

• analytical geometrical postnonlinearity detection: find analytical1d diffeomorphism g with

g y = cg z

for c 6= 0,±1 and given curves y , z : (−1, 1) → R withy(0) = z(0) = 0.

• note c = y ′(0)/z ′(0)

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 152: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

• equation g y = cg z can be solved in different ways:• calculate composite derivatives using Faa di Bruno’s formula ⇒

derivatives of y and z lead to estimation of derivatives of g• least-squares polynomial fit of g using energy function

E = 12T

PTi=1(g(y(ti ))− cg(z(ti )))

2

• MLP approximation of g using E from above

• fix g(0) = 0 and g ′(0) = 1.

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 153: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Postnonlinearity detection

• equation g y = cg z can be solved in different ways:• calculate composite derivatives using Faa di Bruno’s formula ⇒

derivatives of y and z lead to estimation of derivatives of g• least-squares polynomial fit of g using energy function

E = 12T

PTi=1(g(y(ti ))− cg(z(ti )))

2

• MLP approximation of g using E from above

• fix g(0) = 0 and g ′(0) = 1.

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 154: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Artificial mixtures

• artificial example• postnonlinear mixture of n = 3 uniform sources (105 samples) to

m = 2 observations• postnonlinear mixing model x = f1 × f2(As)

• mixing matrix A =

„4.3 7.8 0.599 6.2 10

«• postnonlinearities f1(x) = tanh(x) + 0.1x and f2(x) = x

• algorithm• MLP based postnonlinearity detection algorithm• natural gradient-descent learning• parameters: 9 hidden neurons, learning rate of η = 0.01 and 105

iterations

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 155: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

Artificial mixtures

• artificial example• postnonlinear mixture of n = 3 uniform sources (105 samples) to

m = 2 observations• postnonlinear mixing model x = f1 × f2(As)

• mixing matrix A =

„4.3 7.8 0.599 6.2 10

«• postnonlinearities f1(x) = tanh(x) + 0.1x and f2(x) = x

• algorithm• MLP based postnonlinearity detection algorithm• natural gradient-descent learning• parameters: 9 hidden neurons, learning rate of η = 0.01 and 105

iterations

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 156: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

PNL detection

−2 0 2

−1.5

−1

−0.5

0

0.5

1

1.5

f1

−4 −2 0 2 4

−5

0

5

f2

−2 −1 0 1 2−4

−3

−2

−1

0

1

2

3

4

g1

−5 0 5−5

0

5

g2

mixing pnls f

separating pnls g

0 10 20 30 40 50 60 70 80 90 100−5

0

5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

−5

0

5

−5

0

5−6

−4

−2

0

2

4

6

SIRs: 26, 71 and 46 dB density of recovered sources

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 157: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

PNL detection

−2 0 2

−1.5

−1

−0.5

0

0.5

1

1.5

f1

−4 −2 0 2 4

−5

0

5

f2

−2 −1 0 1 2−4

−3

−2

−1

0

1

2

3

4

g1

−5 0 5−5

0

5

g2

mixing pnls f separating pnls g

0 10 20 30 40 50 60 70 80 90 100−5

0

5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

−5

0

5

−5

0

5−6

−4

−2

0

2

4

6

SIRs: 26, 71 and 46 dB density of recovered sources

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 158: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Independent component analysisIndependent subspace analysisSparse component analysisNonlinear sparse component analysis

PNL detection

−2 0 2

−1.5

−1

−0.5

0

0.5

1

1.5

f1

−4 −2 0 2 4

−5

0

5

f2

−2 −1 0 1 2−4

−3

−2

−1

0

1

2

3

4

g1

−5 0 5−5

0

5

g2

mixing pnls f separating pnls g

0 10 20 30 40 50 60 70 80 90 100−5

0

5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

0 10 20 30 40 50 60 70 80 90 100−5

0

5

−5

0

5

−5

0

5−6

−4

−2

0

2

4

6

SIRs: 26, 71 and 46 dB density of recovered sources

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 159: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Conclusions

• analyze statistical patterns in data sets x(t)• method: factorization model x(t) = f (s(t))

• supervised training of f ⇒ nearest neighbor (local), regression(global)

• unsupervised identification (often linear) ⇒ clustering (local model),blind source separation (linear model)

• applications: biomedical data analysis, signal processing, financialmarkets etc.

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 160: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

Current application — with T. Schroder, HMGU

• unsupervised clustering of subtrees

• supervised learning of cell shapes

• parameter estimation of dynamical system for cell fate decision

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems

Page 161: Biomedical signal processing --- application of ... · image? F. Theis Biomedical signal processing — application of optimization methods for machine learning problems. Supervised

Supervised methodsUnsupervised methods

Signal component analysisConclusions

References

F. Theis Biomedical signal processing — application of optimization methods for machine learning problems