-
1
Michel Verleysen Altran 18/11/2002 - 1
Artificial Neural Networks:general overview and specific signal
processing model
November 2002
Michel Verleysen Altran 18/11/2002 - 2
Content
p Part I: Introductionp why “artificial neural networks” (ANN) ?p what are ANNs useful for? ?p learning – generalization – overfitting
p Part II: From linear to non-linear learning modelsp Adaline (linear model)pMulti-Layer Perceptron (MLP) (non-linear model)
pstochastic gradient learning
p learning, validation and test
p Part III: Blind Source Separation (BSS)p BSS modelp statistical independencep some realizations
-
2
Michel Verleysen Altran 18/11/2002 - 3
Content
p Part I: Introductionp why “artificial neural networks” (ANN) ?p what are ANNs useful for? ?p learning – generalization – overfitting
p Part II: From linear to non-linear learning modelsp Adaline (linear model)pMulti-Layer Perceptron (MLP) (non-linear model)
pstochastic gradient learning
p learning, validation and test
p Part III: Blind Source Separation (BSS)p BSS modelp statistical independencep some realizations
Michel Verleysen Altran 18/11/2002 - 4
Why Anns ?
Von Neumann’s computer (Human) braindeterminism fuzzy behavioursequence of instructions parallelismhigh speed slow speedrepetitive tasks adaptation to situationsprogramming learninguniqueness of solutions different solutionsex: matrix product ex: face recognition
Von Neumann’s computer (Human) braindeterminism fuzzy behavioursequence of instructions parallelismhigh speed slow speedrepetitive tasks adaptation to situationsprogramming learninguniqueness of solutions different solutionsex: matrix product ex: face recognition
-
3
Michel Verleysen Altran 18/11/2002 - 5
Artificial Neural Networks
p Artificial neural networks ARE NOT :p an attempt to understand biological systemsp an attempt to reproduce the behavior of a biological system
p Artificial neural networks ARE :p a set of tools aimed to solve “perceptive” problems (“perceptive”
“rule-based”)
At least this framework, i.e. “Neural Computation”…
Michel Verleysen Altran 18/11/2002 - 6
Why “Artificial neural networks” ?
p Structure: many computation units in parallel(McCulloch & Pitts 1943)
p Learning rule (adaptation of parameters):similar to Hebb’s rule (1949)
p And that’s all…
inputs
outputs ( ) ( ) ( )
= ∑ ∑
= =
M
j
D
iijikjk xwgwhy
0 0
12x
( ) ( ) ( )( )( )xwgwhxy 12=
-
4
Michel Verleysen Altran 18/11/2002 - 7
Learning
p Give - many - examples (input-output pairs, or training samples)
p Compute the parameters to fit the examplesp Test the result!
inputs
outputs
Michel Verleysen Altran 18/11/2002 - 8
Why Neural Computation ?
p Hypothetical problem of optical character recognition (OCR)
p If: - image 256 x 256 pixels- 8-bits pixel values (256 gray levels)
then: - 10158000 different images
p Necessity to work with features !
-
5
Michel Verleysen Altran 18/11/2002 - 9
Feature Extraction
p Example of feature in OCR: ratio height/width (x1) of the character
p Histogram of feature value
x1
“a” class“b” class
Michel Verleysen Altran 18/11/2002 - 10
Multi-dimensional Features
p Necessity to classify according to several, but not too muchfeatures
p Several: because of the overlap in histogram with 1 featurep Not too much: because classification methods are easier in small
dimensional spaces
p ANN: self-construction of features!
x1
x2 C1
C2
-
6
Michel Verleysen Altran 18/11/2002 - 11
What are ANNs useful for ?
p regression
p classification
p adaptive filtering
p projection0
1
2
3
4
5
6
7
8
9
0 0.5 1 1.5 2 2.5 3 3.5
Michel Verleysen Altran 18/11/2002 - 12
What are ANNs useful for ?
p regression
p classification
p adaptive filtering
p projection 00.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 1 2 3 4
-
7
Michel Verleysen Altran 18/11/2002 - 13
Classification - Regression
regressionfeaturesOutput value(continuous variable)
x1
x2
y
classificationfeaturesClass label(discrete variable)
x1
x2 C1
C2
Michel Verleysen Altran 18/11/2002 - 14
What are ANNs useful for ?
p regression
p classification
p adaptive filtering
p projection
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.5 1 1.5 2 2.5 3 3.5
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2 2.5 3 3.5
-
8
Michel Verleysen Altran 18/11/2002 - 15
What are ANNs useful for ?
p regression
p classification
p adaptive filtering
p projection
Michel Verleysen Altran 18/11/2002 - 16
Principle
Non-linear parametricregression model
outputs
_
inputs
desired outputs(error criteria)
Parameteradaptation
-
9
Michel Verleysen Altran 18/11/2002 - 17
Principle
p Model: restrictions about the possible relation
Non-linear parametricregression model
outputs
_
inputs
desired outputs(error criteria)
Parameteradaptation
Michel Verleysen Altran 18/11/2002 - 18
Principle
p Parametric: learning parameters to adjust(contain the information)
Non-linear parametricregression model
outputs
_
inputs
desired outputs(error criteria)
Parameteradaptation
-
10
Michel Verleysen Altran 18/11/2002 - 19
Principle
p Non-linear: more powerful than linear
Non-linear parametricregression model
outputs
_
inputs
desired outputs(error criteria)
Parameteradaptation
Michel Verleysen Altran 18/11/2002 - 20
Principle
Universal approximator !
Non-linear parametricregression model
outputs
_
inputs
desired outputs(error criteria)
Parameteradaptation
-
11
Michel Verleysen Altran 18/11/2002 - 21
Error criterion
p regressionp classificationp adaptive filtering
p projection
outputs
_
inputs
desiredoutputs
0
1
2
3
4
5
6
7
8
9
0 0.5 1 1.5 2 2.5 3 3.5
∑∂i
i2
Michel Verleysen Altran 18/11/2002 - 22
Error criterion
p regressionp classificationp adaptive filtering
p projection
outputs
_
inputs
desiredoutputs
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 1 2 3 4
Wrong classifications
-
12
Michel Verleysen Altran 18/11/2002 - 23
Error criterion
p regressionp classificationp adaptive filtering
p projection
outputs
_
inputs
desiredoutputs
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14
Independencebetweenoutputs
Michel Verleysen Altran 18/11/2002 - 24
Error criterion
p regressionp classificationp adaptive filtering
p projection
outputs
_
inputs
desiredoutputs
Independencebetweenoutputs
-
13
Michel Verleysen Altran 18/11/2002 - 25
Polynomial Curve Fitting
p Strong similarity between polynomial curve fitting and regression with neural networks
p polynoms
p neural networks
p Error criterion
p Polynoms: E is quadratic in w �min(E) is the solution of a setof linear equations
Neural networks: E is non-linear in w � local minima
∑=
=++++=M
j
jj
MM xwxwxwxwwy
0
2210 L
( )∑=
−=P
p
pp tyE1
2
( )xw ,fy =desired output(target)
Michel Verleysen Altran 18/11/2002 - 26
Learning - generalization
p Learning: process of p finding parameters w whichpminimize the error function Ep given a set of input-output pairs (xp,tp)
0
0.5
1
1.5
2
2.5
0 1 2 3 4 5
x
t
0
0.5
1
1.5
2
2.5
0 1 2 3 4 5
x
t
w
-
14
Michel Verleysen Altran 18/11/2002 - 27
0
0.5
1
1.5
2
2.5
0 1 2 3 4 5
x
t
0
0.5
1
1.5
2
2.5
0 1 2 3 4 5
x
t
Learning - generalization
p generalization: process ofp assigning an output value y top a new input xp given the parameters vector w
w
Michel Verleysen Altran 18/11/2002 - 28
Overfitting
p Compromise betweenpmodel complexity andpnumber of learning pairs
p Neural networks are puniversal approximation models withpa (relatively) low complexity
p But overfitting must be a serious concern !
0
0.5
1
1.5
2
2.5
0 1 2 3 4 5
x
y
0
0.5
1
1.5
2
2.5
0 1 2 3 4 5
x
y
-
15
Michel Verleysen Altran 18/11/2002 - 29
Overfitting in classification
x1
x2 C1
C2
x1
x2 C1
C2
x1
x2 C1
C2
Michel Verleysen Altran 18/11/2002 - 30
The Curse of Dimensionality
p Histograms: the number of boxes is exponential with the dimension
p More features � reduced performancesp If # data is limited � dimension must be lowp Concept of intrinsic dimensionality of data
-
16
Michel Verleysen Altran 18/11/2002 - 31
Content
p Part I: Introductionp why “artificial neural networks” (ANN) ?p what are ANNs useful for? ?p learning – generalization – overfitting
p Part II: From linear to non-linear learning modelsp Adaline (linear model)pMulti-Layer Perceptron (MLP) (non-linear model)
pstochastic gradient learning
p learning, validation and test
p Part III: Blind Source Separation (BSS)p BSS modelp statistical independencep some realizations
Michel Verleysen Altran 18/11/2002 - 32
From linear to non –linear regression
p Linear regression↑ simpler↑ easy « learning » (direct or adaptive)↓ not sufficient for most problems
p Non-linear regression↓ more complex= adaptive learning only
↑ may be adapted to most problems
-
17
Michel Verleysen Altran 18/11/2002 - 33
Adaline
p Adaline: ADAptive LINear Elementp = linear classification, linear discriminant functions,...
xwy T=
=
=
DD w
w
w
x
x
MM
1
0
1
1
wx
x1x2
xD
y
wD
w1
w2
1
w0 (bias)
Michel Verleysen Altran 18/11/2002 - 34
Adaline: learning
p Adaline: one linear outputp patterns (learning vectors) must follow:p but P>Dp P patternsp D parameters (degrees of freedom)
p � non-ideal solutionp � optimisation (of parameters w) according to a criterion:
pD
i
pii
p xwt xw T
1==∑
=
( ) ( )∑∑==
−=−=P
p
ppP
p
pp tytE1
2T
1
2xw
-
18
Michel Verleysen Altran 18/11/2002 - 35
Adaline: pseudo-inverse
p Error criterion
p Inputs xp in a matrix and outputs yp in a vector
p Error criterion
( ) ( )∑∑==
−=−=P
p
ppP
p
pp tytE1
2T
1
2xw
( )
==
PDDD
P
P
P
xxx
xxx
xxx
L
MMM
L
L
K
21
222
12
121
11
21 xxxX
( )Pttt K21T =t
2TT Xwt −=E
Michel Verleysen Altran 18/11/2002 - 36
Adaline: pseudo-inverse
p Error criterionp Gradient of error (function to minimize)
with respect to weights (free parameters)
( )Pjjjj xxx K21where =x
2TT Xwt −=E
∂∂
∂∂
∂∂≡
∂∂
D
T
wE
wE
wEE
L
21w
( )( )( )( ) j
j
jj
w
wwE
xtXw
wXtXwt
Xwt
TT
TTT
2TT
2 −=
−−∂
∂=
−∂
∂=∂∂
-
19
Michel Verleysen Altran 18/11/2002 - 37
Adaline: pseudo-inverse
p Criterionp Derivative of criterion
p Minimum of error
( ) TTTT 2 XtXww
−=
∂∂E
0T
=
∂∂wE
( ) XtXXw 1T −=
2TT Xwt −=E
Michel Verleysen Altran 18/11/2002 - 38
Adaline: iterative learning
p Pseudo-inverse requires :p all input-output pairs (xp, tp)pmatrix inversion (often ill-configured)
p necessity for iterative methods without matrix inversion
p � Gradient descent !
-
20
Michel Verleysen Altran 18/11/2002 - 39
Adaline: gradient descent
p function to minimise: Ep parameters: w
p Stochastic gradient: use one input-output pair at a time!
p pseudo-inverse and gradient descent and stochastic gradient descent:same error criterion → same solution !
( ) ( ))t(
Ett
wwww
∂∂α−=+1 ( ) ( ) ( )wXtXww T21 −α+=+ tt
( ) ∑∑==
=−=P
pp
P
p
pp EtE11
2Txw
( ) ( ) ( ) kkkttt xxwww T21 −α+=+( ) ( ))t(
pEttww
ww∂∂
α−=+1
Michel Verleysen Altran 18/11/2002 - 40
Multi-layer perceptron (MLP)
p Convention: 2 layers of weights (in literature: sometimes 3 layers of units or neurons)
p g and h can be threshold (sign) units or continuous onesp h can be linear but not g (otherwise only one layer)
x0(bias)
x1
xD
y1
yC
z0(bias)
z1
zM
( )1ijw
( )2ijw
1st layer 2nd layer
( ) ( ) ( )
= ∑ ∑
= =
M
j
D
iijikjk xwgwhy
0 0
12x
( ) ( ) ( )( )( )xwgwhxy 12=
-
21
Michel Verleysen Altran 18/11/2002 - 41
Error back-propagation 1/
p Online (stochastic) algorithm considered herep Some notations
x0(bias)
x1
xD
y1
yC
z0(bias)
z1
zM
( )1ijw
( )2ijw
1st layer 2nd layer
( )lijw( )1−l
jzg
g( )lia
( )liz
( ) ( ) ( )∑
−=j
lj
lij
li zwa
1
Michel Verleysen Altran 18/11/2002 - 42
Error back-propagation 2/
p Gradient descent: value of
( ) ( ) ( )∑
−=j
lj
lij
li zwa
1
( )lijw
E
∂∂
( ) ( )
( )
( )
( ) ( )1−δ=
∂∂
∂∂=
∂∂
lj
li
lij
li
li
lij
z
w
a
a
E
w
E
( )( )li
li a
E
∂∂≡δ to evaluate
( )lijw( )1−l
jz
g
g( )lia
( )liz
-
22
Michel Verleysen Altran 18/11/2002 - 43
Error back-propagation 3/
p For output units
( )( )li
li a
E
∂∂≡δ to evaluate
( )( ) ( )
( )
( )
( )( )( )lil
i
li
li
li
li
li
a'gy
E
a
y
y
E
a
E
∂∂=
∂∂
∂∂=
∂∂=δ
known
derivative of error criterion (known)
( )lijw( )1−l
jz
g
g( )lia
( )liz
Michel Verleysen Altran 18/11/2002 - 44
Error back-propagation 4/
p For hidden units
( )( )li
li a
E
∂∂≡δ to evaluate
( )( ) ( )
( )
( )
( )
( ) ( )( )
( )
( ) ( ) ( )( )( ) ( )( ) ( ) ( )( )∑∑
∑
∑
∑
++++
∈
+
+
+
+
δ=δ=
∂
∂
δ=
∂∂
∂∂=
∂∂=δ
k
lki
lk
li
k
li
lki
lk
kl
i
lj
lj
lkj
lk
li
lk
kl
kl
i
li
wa'ga'gw
a
zw
a
a
a
E
a
E
1111
1
1
1
1
The error term (δ) is expressedas a combination of errors inthe next layer
( )lijw( )1−l
jzg
g( )lia
( )liz
g
g
g
( )1+lka
( )1+lkiw
( )1+lkz
-
23
Michel Verleysen Altran 18/11/2002 - 45
Error back-propagation 5/
p Apply an input vector xk and propagate it through the network to evaluate all activations and neuron outputs
p Evaluate error terms in output layer
p Back-propagate error terms to find error terms
p Evaluate all derivatives
p Adjust weights according to derivatives and a gradient descent scheme
( )lia
( )oiδ
( )liδ
( )1−δ li
( )( ) ( )1−δ=
∂∂ l
jl
ilij
zwE
( )liz
Michel Verleysen Altran 18/11/2002 - 46
Back-propagation : weight adjustment 1/
p Now we have the derivatives ,
how to adjust weights according to derivatives ?
( )lijw
E
∂∂
( ) ( )tt www −+≡δ 1
-
24
Michel Verleysen Altran 18/11/2002 - 47
Back-propagation : weight adjustment 2/
p First-order methods :
p For a δw of a given magnitude, largest δE is found when gradient and weight update vectors are parallel
p Adaptation rule:
( )( ) ( )( )( )
( ) ( )( )
( )( )( )
ww
w
www
ww
w
w
δ
∂∂+=
−+
∂∂+=+
T
t
T
t
EtE
ttE
tEtE 11
( ) ( )( )t
Ett
wwww
∂∂α−=+1
Michel Verleysen Altran 18/11/2002 - 48
Back-propagation : weight adjustment 4/
p Improvements: momentum
p to avoid sharp changes in gradient direction (stochastic scheme,outliers, etc.)
p β =~0.9
( ) ( )( )
( ) ( )( )11 −−β+∂∂α−=+ ttEtt
tww
www
w
-
25
Michel Verleysen Altran 18/11/2002 - 49
Back-propagation : weight adjustment 5/
p Improvements: adaptive learning rate αp in classification problems: if underrepresented classes
pmaximum safe step size
p “risk-taking” algorithms: individual learning rates
(this is more or less the “delta-bar-delta” rule)
( ) ( ) ( )( )tij
ijijij wE
ttwtww
∂∂α−=+1
( ) ( ) ( ) ( ) κ+α=+α⇒>δ+δ tttwtw ijijijij 101( ) ( ) ( ) ( ) κ−α=+α⇒
-
26
Michel Verleysen Altran 18/11/2002 - 51
Validation and test
p Basic principle: never test on learning data
Data
Learning set Test set
Michel Verleysen Altran 18/11/2002 - 52
Validation and test
p Basic principle: never test on learning datap Corrolary: never compare models on learning data
Data
N1 N3
Learning set Test set
N2
Validation set
-
27
Michel Verleysen Altran 18/11/2002 - 53
Validation and test
p Basic principle: never test on learning datap Corrolary: never compare models on learning data
p Learning set: used to lear each modelp Validation set: used to compare models and select one of themp Test set: used to estimate the error made by the selected
model
Data
N1 N3
Learning set Test set
N2
Validation set
Michel Verleysen Altran 18/11/2002 - 54
Validation and test
p Each model: dependent on (sub)set of data→ several learnings for each model→ validation errors are averaged
p Average validations errors are compared (between models)p The best is selectedp Its error is estimated on test set
p To select subsets of data: several waysp (k-fold) (cross)-validationp bootstrap
p Particularly important with small sets of high-dimensional data!
-
28
Michel Verleysen Altran 18/11/2002 - 55
Content
p Part I: Introductionp why “artificial neural networks” (ANN) ?p what are ANNs useful for? ?p learning – generalization – overfitting
p Part II: From linear to non-linear learning modelsp Adaline (linear model)pMulti-Layer Perceptron (MLP) (non-linear model)
pstochastic gradient learning
p learning, validation and test
p Part III: Blind Source Separation (BSS)p BSS modelp statistical independencep some realizations
Michel Verleysen Altran 18/11/2002 - 56
Cocktail party
-
29
Michel Verleysen Altran 18/11/2002 - 57
Blind Source Separation (BSS) orIndependent Component Analysis (ICA)
The sourcesare unknown
The mixtureis unknown
The observationsare known
but independent
Michel Verleysen Altran 18/11/2002 - 58
ICA principle
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14
-1.5
-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14
1° measure ofindependence
2° algorithm
unknownmixture
s1(t)
s2(t)
unknown sources
observedsignals ICA
estimationof sources
knownunknown
x1(t)
x2(t)
y1(t)
y2(t)
-
30
Michel Verleysen Altran 18/11/2002 - 59
Hypotheses on the model
p Mixture is linear
p Other possible mixtures:p linear convolutivep non-linearp others (information on sources, etc.)
( ) ( ) ( )( ) ( ) ( )tsatsatx
tsatsatx
2221212
2121111
+=+=
unknownmixture
s1(t)
s2(t)
unknown sources
observedsignals ICA
estimationof sources
knownunknown
x1(t)
x2(t)
y1(t)
y2(t)
Michel Verleysen Altran 18/11/2002 - 60
Definition of the model
p time series sj(t) → random variables sjp signals are zero-mean
Here:p linear mixtures x = A sp # observations xi = # sources sj ⇒ A: square matrix
p If A was known: s = A-1 x …
p But A is unknown: Wxys ==ˆ
hypothesis !
-
31
Michel Verleysen Altran 18/11/2002 - 61
Applications of ICA
p cocktail party
p biomedical signalsp artifacts in EEGp ECG recordings from a pregnant woman
p extraction of features in natural images
p extraction of fault-related machine vibrations
p improving the prediction of a set of time series
p communications
p etc.
Michel Verleysen Altran 18/11/2002 - 62
Statistical independence
p Toss of 2 coins
heads tailscoin 1
P(A)=1/2 P(A)=1/2
heads
tails
coin 2
P(B)=1/2
P(B)=1/2
P(A,B)=1/4
( ) ( ) ( )BPAPB,AP =
-
32
Michel Verleysen Altran 18/11/2002 - 63
Statistical independence
p continuous variables
( ) ( ) ( )bpapb,ap 21=
p1(a)
p2(b)
Michel Verleysen Altran 18/11/2002 - 64
Statistical independence
p For continuous variables:
p Property:
( ) ( ) ( )bpapb,ap 21=
( ) ( ) ( )bEaEabE =( ) ( )( ) ( )( ) ( )( )bhEahEbhahE 2121 =
( ) ( ) ( )bEaEabE =
p(a,b)=1/4
( ) ( ) ( )2222 10 bEaEbaE =≠=
p Independence ≠ uncorrelationp uncorrelation:
p exemple:
-
33
Michel Verleysen Altran 18/11/2002 - 65
Illustration of ICA
Mixing matrix
s1
s2
−=
21
21A
x1
x2
Michel Verleysen Altran 18/11/2002 - 66
ICA ↔ PCA
x1
x2
x1
x2
PCA
y1y2
x1
x2ICA
y1
y2
-
34
Michel Verleysen Altran 18/11/2002 - 67
Ambiguities in ICA
p Amplitudes (variances, energies) of the sources
(the same for individual amplitudes)
→ we assume (ambiguity on sign remains)
p Order of sources
(P : permutation matrix)
sA
Asx kk
==
{ } 12 =isE
( )PsAPAsx 1−==
Michel Verleysen Altran 18/11/2002 - 68
Why Gaussian variables are forbidden ?
pwithGaussian
orthogonal
Gaussiani
i xs
⇒
A( )
+π
=2
-exp21 22
21
21xx
x,xp
x1
x2
-
35
Michel Verleysen Altran 18/11/2002 - 69
« Nongaussian is independent »
p Central limit theorem:Sum of independent variables → Gaussian
p
→ yi is « more Gaussian » than siyi is « least Gaussian » when yi= sj
→ find wi which maximizes the nongaussianity of wi x
s x = As y = Wx
sources observations estimations
A W
s y = Zs
sources estimations
Z
Michel Verleysen Altran 18/11/2002 - 70
A classical measure of nongaussianity:kurtosis
p find w by
gradient descent on |kurtosis|
p Problem: sensitivity to outliers !
( ) { } { }( )224 3kurt yEyEy −=( ) 0Gausskurt ==y
-
36
Michel Verleysen Altran 18/11/2002 - 71
Another measure of nongaussianity:negentropy
p Entropy
p Differential entropy
p Entropy (Gaussian) > entropy (non-Gaussian)
p Problem: computationally difficult (needs densities)
( ) ( ) ( )ii
i aYPaYPYH ==−= ∑ log
( ) ( ) ( )∫−= yyfyfyH dlog
Equal variances
Michel Verleysen Altran 18/11/2002 - 72
Mutual information
p Natural measure of independence
p Always positive, 0 if yi independent
p roughly equivalent to maximisation of negentropy
( ) ( ) ( )yHyHy,,y,yIn
iin −=∑
=121 K
-
37
Michel Verleysen Altran 18/11/2002 - 73
Summary of contrast functions
kurtosisnegentropyhigh-order cumulants
mutual informationnon-linear cross-correlationsmaximum likelihoodnon-linear PCAhigh-order cumulants
one-unitmulti-unit
Michel Verleysen Altran 18/11/2002 - 74
Preprocessing1. Centering2. Whitening (PCA) (→ signals uncorrelated + unit variance)
s1
s2
x1
x2
x1
x2~
~
p is orthogonal → n(n-1)/2 parameters instead of n2A~
whitening
sAx ~~ =
mixture
Asx =
-
38
Michel Verleysen Altran 18/11/2002 - 75
p pioneering work - Hérault-Jutten:
p non-linear decorrelation:
pmaximisation of entropy:
p one-unit rules:
Algorithms for ICA
p Adaptive algorithms
p Batch algorithms
( ) ( ) ( ) ( )111 −
−−
−= kxk'gExkgEk TT wwwxw
p FastICA (fixed-point):
( ) ( )+=+ tt WW 1 W∆
w∆ ( )xwx Tgr∝
( ) ( ) ji,ygyg ≠∝ for2211ijW∆
( ) ( )( )Wyy Tgg 211−∝W∆[ ] ( ) TT xWxW tanh21 −∝ −W∆
Michel Verleysen Altran 18/11/2002 - 76
Extensions of the ICA problem
p Basic model: linear x = A s
p Extensions to linear model:
p # observations > # sources
p # sources > # observations
p noisy observations
p ill-conditioned mixtures
p Other extensions
p convolutive mixtures
p non-linear mixtures
-
39
Michel Verleysen Altran 18/11/2002 - 77
Some realizations…
p ECG from pregnant women
SIP
CO
,(T
riest
e),1
996.
0 0.2 0.4 0.6 0.8−50
0
50Sensor signals
−5
0
5Source signals
−
0 0.2 0.4 0.6 0.8−50
0
120
−10
−5
0
5
−
1
0 0.2 0.4 0.6 0.8−100
0
50
−10
−5
0
5
−1
observations estimations
J.-F. Cardoso, Multidimensional independent component analysis, Proc. of ICASSP’98, Seattle (USA), pp. 1941-1944.
Michel Verleysen Altran 18/11/2002 - 78
Some realizations…
p extraction of artefacts from MEG
R. Vigario, V. Jousmäki, M. Hämäläinen, R. Hari, E. Oja, Independent component analysis for indetification of artifacts in magnetoencephalographic recordings, NIPS’97, pp. 229-235, MIT Press, 1998.
-
40
Michel Verleysen Altran 18/11/2002 - 79
Some realizations…
p Auditory evoked fieldsp 122-channel MEG data
R. Vigario, J. Särelä, V. Jousmäki, E. Oja, Independent component analysis in decomposition of audotory and somatosensoryevoked fields, Proc. of ICA’99, Aussois (France), January 1999, pp. 167-172.
Michel Verleysen Altran 18/11/2002 - 80
Some realizations…
A.J. Bell, T.J. Sejnowski, Edges are the‘Independent Components’ of naturalscenes, NIPS’96, pp. 831-836, MIT Press, 1996.
-
41
Michel Verleysen Altran 18/11/2002 - 81
Some realizations…
p Hand-free phone in car
N. Charkani El Hassani, Séparation auto-adaptative de sources pour des mélanges convolutifs – Application à la téléphonie mains-libres dans les voitures, thèse de doctorat, INP Grenoble, 1996.
Michel Verleysen Altran 18/11/2002 - 82
Some realizations…
p Multiple RF tags
Y. Deville, J. Damour, N. Charkani, Improved multi-tag radio-frequency identification systems based on new source separationneural networks, Proc. of ICA’99, Aussois (France), January 1999, pp. 449-454.
-
42
Michel Verleysen Altran 18/11/2002 - 83
Some realizations…
p Financial time series
ICA
reconstruction (4 ICs)
A.D. Back, A.S. Weigend, A First Application of Independent Component Analysis to Extracting Structure from Stock Returns, International Journal of Neural Systems, Vol. 8, No.5 (October, 1997)
Michel Verleysen Altran 18/11/2002 - 84
Other applications…
p Proceedings of the IEEE – special issue October 1998
p receivers (QAM signals)
p equalization
p CDMA transmission
p television receivers
p seismic patterns identification
p ICA’99 and ICA 2000 conferences
pmachine vibration analysis
p astronomical images
p…
-
43
Michel Verleysen Altran 18/11/2002 - 85
Cocktail party
p Speech – music separationobservations estimations
p Speech – speech separationobservations estimations
T.-W. Lee, Institute for Neural Computation, University of California (San Diego), http://www.cnl.salk.edu/~tewon/Blind/blind_audio.html
Michel Verleysen Altran 18/11/2002 - 86
Some realizations…
p extraction of artefacts from EEG
R. Vigario, V. Jousmäki, M. Hämäläinen, R. Hari, E. Oja, Independent component analysis for indetification of artifacts in magnetoencephalographic recordings, NIPS’97, pp. 229-235, MIT Press, 1998.
-
44
Michel Verleysen Altran 18/11/2002 - 87
Some realizations…
p extraction of artefacts from EEG
R. Vigario, V. Jousmäki, M. Hämäläinen, R. Hari, E. Oja, Independent component analysis for indetification of artifacts in magnetoencephalographic recordings, NIPS’97, pp. 229-235, MIT Press, 1998.