on the basis learning rule of adaptive-subspace som (assom)
DESCRIPTION
ICANN’06. On the Basis Learning Rule of Adaptive-Subspace SOM (ASSOM). Huicheng Zheng, Christophe Laurent and Grégoire Lefebvre 13th September 2006. Thanks to the MUSCLE Internal Fellowship ( http://www.muscle-noe.org ). Outline. Introduction Minimization of the ASSOM objective function - PowerPoint PPT PresentationTRANSCRIPT
On the Basis Learning Rule of Adaptive-Subspace SOM (ASSOM)
Huicheng Zheng, Christophe Laurent and Grégoire Lefebvre
13th September 2006
Thanks to the MUSCLE Internal Fellowship (http://www.muscle-noe.org).
ICANN’06
2
Outline• Introduction• Minimization of the ASSOM objective
function• Fast-learning methods
– Insight on the basis vector rotation– Batch-mode basis vector updating
• Experiments• Conclusions
3
Motivation of ASSOM
• Learning “invariance classes” with subspace learning and SOM [Kohonen. T., et al., 1997]– For example: spatial-translation invariance
rectangles
circles
triangles
……
4
Applications of ASSOM• Invariant feature formation
[Kohonen, T., et al., 1997]• Speech processing
[Hase, H., et al., 1996]• Texture segmentation
[Ruiz del Solar, J., 1998]• Image retrieval
[De Ridder, D., et al., 2000]• Image classification
[Zhang, B., et al., 1999]
5
ASSOM Modules Representing Subspaces
The module arrays in ASSOM
Rectangular topology
Hexagonal topology
1b 2b Hb
Q
x
1Tbx
2ˆ )(L jx
2Tbx Hbx
T
A module representing the subspace L(j)
c
i
j
ci
j
6
Competition and Adaptation
• Repeatedly:– Competition: The winner
– Adaptation: For the winner and the modules i in its neighborhood
– Orthonormalize the basis vectors
2ˆmaxarg )(L j
jc x
)(')()( ),( ih
ic
ih t bxpb
xx
xxIxp
)(ˆ)()(),(
T)()(
i
thtt ic
ic
L
N×N matrix:)(' i
hb
)(ihb )(i
cp
7
Transformation Invariance• Episodes correspond to signal subspaces.• Example:
– One episode, S, consists of 8 vectors. Each vector is translated in time with respect to the others.
8
Episode Learning
• Episode winner
• Adaptation: for each sample x(s) in the episode X={x(s), s S} – Rotate the basis vectors
– Orthonormalize the basis vectors
Ss
jsc j
2)(ˆmaxarg )(L
x
)(')()( )),(( ih
ic
ih ts bxpb
9
Deficiency of the Traditional Learning Rule
• Rotation operator pc(i)(x(s),t) is an N×N matrix.
– N: input vector dimension
• Approximately:NOP (number of operations) ∝ MN2
– M: subspace dimension
10
Efforts in the Literature
• Adaptive Subspace Map (ASM) [De Ridder, D., et al., 2000]:– Drop topological ordering– Perform a batch-mode updating with PCA– Essentially not ASSOM.
• Replace the basis updating rule [McGlinchey, S.J., Fyfe, C., 1998]– NOP ∝ M2N
11
Outline• Introduction• Minimization of the ASSOM objective
function• Fast-learning methods
– Insight on the basis vector rotation– Batch-mode basis vector updating
• Experiments• Conclusions
12
Minimization of the ASSOMObjective Function
XXx
xd)(
)(
)(~
2
2
)( )(
Ps
shE
i Ss
ic
iL
)()( ˆ~ii LL
xxx where:
(projection error)
P(X): probability density function of X
Solution: Stochastic gradient descent:
)('2
T)()(
)(
)()()()( i
hSs
ic
ih
s
sstht b
x
xxIb
)(t : Learning rate function
13
Minimization of the ASSOMObjective Function
)(tWhen is small:
Ss
ic
Ss
ic
s
sstht
s
sstht 2
T)(
2
T)(
)(
)()()()(
)(
)()()()(
x
xxI
x
xxI
In practice, better stability has been observed by the modified form proposed in [Kohonen, T., et al., 1997]
Ss
ic
ic
ss
ssthtt
i )()(ˆ
)()()()()(
)(
T)()(
xx
xxIM
L
14
Minimization of the ASSOMObjective Function
• corresponds to a modified objective function:)()( ticM
XXx
xd)(
)(
)(ˆ )()(m P
s
shE
i Ss
ic
iL
• Solution to Em:
Ss
ic
ic
ss
ssthtt
i )()(ˆ
)()()()()(
)(
T)()(
xx
xxIB
L
• When is small:)(t
)()( )()( tt ic
ic BM
15
Outline
• Introduction
• Minimization of the ASSOM objective function
• Fast-learning methods– Insight on the basis vector rotation– Batch-mode basis vector updating
• Experiments
• Conclusions
16
Insight on the Basis Vector Rotation
• Recall: traditional learning
)()(ˆ
)()()()()),((
)(
)()(
ss
ssthtts
i
Ti
ci
cxx
xxIxp
L
)(')()( )),(( ih
ic
ih ts bxpb
17
Insight on the Basis Vector Rotation
)(ihb
)(),()(,
)( stsihc
ih xb
)()(ˆ
)()()(),(
)(
)(')()(
,ss
sthtts
i
ih
Ti
cihc
xx
bx
L
)()(ˆ
)()()()(
)(
)(')()(')(
ss
sstht
i
ih
Ti
ci
hi
hxx
bxxbb
L
scalar
projection
• For fast computing, calculate first, then scale x(s) with to get
• NOP ∝ MN
• Referred to as FL-ASSOM (Fast-Learning ASSOM)
),()(, tsihc
Scalar
),()(, tsihc )(i
hb
18
Insight on the Basis Vector Rotation
)(ihb
)(' ihb
)(),()(,
)( stsihc
ih xb
)(sx
19
Outline
• Introduction
• Minimization of the ASSOM objective function
• Fast-learning methods– Insight on the basis vector rotation– Batch-mode basis vector updating
• Experiments
• Conclusions
20
Batch-mode Fast Learning(BFL-ASSOM)
• Motivation: Re-use the previously calculated during module competition.
)()(ˆ
)()()(),(
)(
)('T)()(
,ss
sthtts
i
ihi
cihc
xx
bx
L
• In the basic ASSOM, L(i) keeps changing with receiving of each component vector x(s). has to be re-calculated for each x(s).
)(ˆ )( siLx
)(ˆ )( siLx
21
Batch-mode Rotation• Use the solution to the modified objective
function Em:
Ss
Ti
ci
css
ssthtt
i )()(ˆ
)()()()()(
)(
)()(
xx
xxIB
L
• Subspace remains the same for all the component vectors in the episode. We can now use calculated during module competition.
)(ˆ )( siLx
22
Batch-mode Fast Learning
Ss
ihc
ih sts )(),()(
,)( xb
)()(ˆ
)()()(),(
)(
)(')()(
,ss
sthtts
i
ih
Ti
cihc
xx
bx
L
where ),()(, tsihc is a scalar defined by:
• Correction is a linear combination of component vectors x(s) in the episode.
• For each episode, one orthonormalization of basis vectors is enough.
)(ihb
23
Outline
• Introduction
• Minimization of the ASSOM objective function
• Fast-learning methods– Insight on the basis vector rotation– Batch-mode basis vector updating
• Experiments
• Conclusions
24
Experimental Demonstration• Emergence of translation-invariant filters
– Episodes are drawn from a colored noise image
– Vectors in episodes are subject to translation
white noise image colored noise image
• Example episode (magnified):
25
Resulted Filters
1b
2b
FL-ASSOM BFL-ASSOM
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
6
0 5 10 15 20 25 30(×10
3)
FL-ASSOM
BFL-ASSOM
e
t
Decrease of the average projection error e with learning step t:
26
Timing Results
Times given in seconds for 1,000 training steps.
M: subspace dimension
N: input vector dimension
VU: Vector Updating time
WL: Whole Learning time
27
Timing Results
0
200
400
600
800
1000
1200
50 100 200 400 N
VU(s)
ASSOM
FL-ASSOM
BFL-ASSOM
0
200
400
600
800
1000
1200
2 3 4 M
VU(s)
ASSOMFL-ASSOM
BFL-ASSOM
Change of vector updating time (VU) with input dimension N:
Change of vector updating time (VU) with subspace dimension M:
Vertical scales of FL-ASSOM and BFL-ASSOM have been magnified 10 times for clarity.
28
Outline
• Introduction
• Minimization of ASSOM objective function
• Fast-learning methods– Insight on the basis vector rotation– Batch-mode basis vector updating
• Experiments
• Conclusions
29
Conclusions
• The basic ASSOM algorithm corresponds to a modified objective function.
• Updating of basis vectors in the basic ASSOM correponds to a scaling of the component vectors in the input episode.
• In batch-mode updating, the correction to the basis vectors is a linear combination of component vectors in the input episode.
• Basis learning can be dramatically boosted with the previous understandings.
30
References• De Ridder, D., et al., 2000: The adaptive subspace map for image
description and image database retrieval. SSPR&SPR 2000.
• Hase, H., et al., 1996: Speech signal processing using Adaptive Subspace SOM (ASSOM). Technical Report NC95-140, The Inst. of Electronics, Information and Communication Engineers, Tottori University, Koyama, Japan.
• Kohonen, T., et al., 1997: Self-Organized formation of various invariant-feature filters in the adaptive-subspace SOM. Neural Computation 9(6).
• McGlinchey, S. J., Fyfe, C., 1998: Fast formation of invariant feature maps. EUSIPCO’98.
• Ruiz del Solar, J., 1998: Texsom: texture segmentation using Self-Organizing Maps. Neurocomputing 21(1–3).
• Zhang, B., et al., 1999: Handwritten digit recognition by adaptive-subspace self-organizing map (ASSOM). IEEE Trans. on Neural Networks 10:4.
31
Thanks and questions?