counter propagation network

Counter propagation network (CPN) (§ 5.3)• Basic idea of CPN

– Purpose: fast and coarse approximation of vector mapping• not to map any given x to its with given precision,• input vectors x are divided into clusters/classes.• each cluster of x has one output y, which is (hopefully) the

average of for all x in that class.– Architecture: Simple case: FORWARD ONLY CPN,

)(xy )(x

)(x

from input to hidden (class)

from hidden (class) to output

mpn

jj,kkk,ii

yzx

yvzwx

yzx

111

– Learning in two phases: – training sample (x, d ) where is the desired precise mapping– Phase1: weights coming into hidden nodes are trained by

competitive learning to become the representative vector of a cluster of input vectors x: (use only x, the input part of (x, d ))1. For a chosen x, feedforward to determined the winning

2.

3. Reduce , then repeat steps 1 and 2 until stop condition is met– Phase 2: weights going out of hidden nodes are trained by delta

rule to be an average output of where x is an input vector that causes to win (use both x and d). 1. For a chosen x, feedforward to determined the winning

2. (optional)

3.

4. Repeat steps 1 – 3 until stop condition is met

)(xd

kz

kv)(x

kz

))(()()( *,*,*, oldwxoldwneww ikiikik

))(()()( *,*,*, oldwxoldwneww ikiikik ))(()()( *,*,*, oldvdoldvnewv kjjkjkj

*kz

*kz

kw

• A combination of both unsupervised learning (for in phase 1) and supervised learning (for in phase 2).

• After phase 1, clusters are formed among sample input x , each is a representative of a cluster (average).

• After phase 2, each cluster k maps to an output vector y, which is the average of

• View phase 2 learning as following delta rule

•

Notes

kwkv

_:)( kclusterxx

)(2)(

because , where)(

**,2

**,*,*,

*,*,*,*,*,

kkjjkkjjkjkj

kjkjjkjjkjkj

zvdzvdvv

Ev

Evdvdvv

win* make that samples trainingall ofmean theis where

)()( and )( , when ,shown that becan It

kx

xtvxtwt

kk

kw

)1()()1()1( as rewriteen becan rule update Weight similar.) is of (proof on only Show

*,*,

**

txtwtwvw

iikik

kk

tiiii

iiik

iiik

iikik

xtxtxtx

txtxtw

txtxtwtxtwtw

)1)(1(...)1)(1()1)(()1(

)1()()1()1()1(

)1())()1()1)((1( )1()()1()1(

2

*,2

*,

*,*,

xx

x

xEtxEtxE

xtxtxEtwE

ti

t

tiiiik

)1(1

1

])1....()1(1[

))]1(()1...())(()1())1(([

])1(...)1)(()1(([)]1([ 1*,

thenset, training thefromrandomly drawn are If x

• After training, the network works like a look-up of math table.

– For any input x, find a region where x falls (represented by the wining z node);

– use the region as the index to look-up the table for the function value.

– CPN works in multi-dimensional input space

– More cluster nodes (z), more accurate mapping.

– Training is much faster than BP

– May have linear separability problem

• If both

we can establish bi-directional approximation

• Two pairs of weights matrices:

W(x to z) and V(z to y) for approx. map x to

U(y to z) and T(z to x) for approx. map y to

• When training sample (x, y) is applied ( ), they can jointly determine the winner zk* or separately for

exist)(function inverse its and)( 1 yxxy

)(xy

)(1 yx YyXx onandon

)*()*( and ykxk zz

Full CPN

Adaptive Resonance Theory (ART) (§ 5.4)

• ART1: for binary patterns; ART2: for continuous patterns

• Motivations: Previous methods have the following problems:

1.Number of class nodes is pre-determined and fixed. – Under- and over- classification may result from training

– Some nodes may have empty classes.

– no control of the degree of similarity of inputs grouped in one class.

2.Training is non-incremental: – with a fixed set of samples,

– adding new samples often requires re-train the network with the enlarged training set until a new stable state is reached.

• Ideas of ART model:– suppose the input samples have been appropriately classified

into k clusters (say by some fashion of competitive learning).

– each weight vector is a representative (average) of all samples in that cluster.

– when a new input vector x arrives

1.Find the winner j* among all k cluster nodes

2.Compare with x

if they are sufficiently similar (x resonates with class j*),

then update based on

else, find/create a free class node and make x as its

first member.

jw

*jw

|| *jwx *jw

• To achieve these, we need:– a mechanism for testing and determining (dis)similarity

between x and .

– a control for finding/creating new class nodes.

– need to have all operations implemented by units of local computation.

• Only the basic ideas are presented– Simplified from the original ART model

– Some of the control mechanisms realized by various specialized neurons are done by logic statements of the algorithm

*jw

ART1 Architecture

)10( comparison similarityfor parameter vigilancepolar)(binary/bi to from tsdown weigh top :

values)(real to from weightsup bottom :(classes)output :

tors)(input vecinput :

,

,

ρρ:xytyxb

yx

ijji

jiij

Working of ART1

• 3 phases after each input vector x is applied• Recognition phase: determine the winner cluster

for x– Using bottom-up weights b

– Winner j* with max yj* = bj* ּx– x is tentatively classified to cluster j*

– the winner may be far away from x (e.g., |tj* - x| is unacceptably large)

Working of ART1 (3 phases)

• Comparison phase: – Compute similarity using top-down weights t:

vector:

– If (# of 1’s in s)|/(# of 1’s in x) > ρ, accept the classification, update bj* and tj*

– else: remove j* from further consideration, look for other potential winner or create a new node with x as its first patter.

ljlin xtssss *,***

1* where),...,(

otherwise 0

1 are and both if 1 *,*

ljl

i

xts

• Weight update/adaptive phase

– Initial weight: (no bias)

bottom up: top down:

– When a resonance occurs with

– If k sample patterns are clustered to node j then

= pattern whose 1’s are common to all these k samples

)1/(1)0(, nb lj 1)0(, jlt

** andupdate*, node jj tbj

n

lljl

ljl

n

ii

llj

xt

xt

s

sb

1*,

*,

1

*

*

*,

)old(5.0

)old(

5.0)new(

jt

)().....2()1()().....2()1()0()new( kxxxkxxxtt jj

)old(new)( *,*

*, lljllj xtst

jj

lllj

tbixsb

normalizedais0)(ifonly0iff0)new(,

• Example

8/1)0(,1)0(:initially)0,1,1,1,0,1,1()5(

)0,1,1,1,0,0,0()4()0,1,1,1,1,0,1()3()0,1,1,1,1,0,0()2()1,0,0,0,0,1,1()1(

patternsInput 7,7.0

,11,

ll btx

xxxx

n

for input x(1)

Node 1 wins

Notes1. Classification as a search process

2. No two classes have the same b and t

3. Outliers that do not belong to any cluster will be assigned separate nodes

4. Different ordering of sample input presentations may result in different classification.

5. Increase of increases # of classes learned, and decreases the average class size.

6. Classification may shift during search, will reach stability eventually.

7. There are different versions of ART1 with minor variations

8. ART2 is the same in spirit but different in details.

-

ART1 Architecture

ni sss 1

mj yyy 1

ijbjitRG2

)(1 aF

2F

connection full :)( and between connection wise-pair :)( to)(

unitscluster :units interface :)(

unitsinput :)(

12

11

2

1

1

bFFbFaF

FbFaF

olar)binary/bip j class ing(represent to from

tsdown weigh top: value)(real to from weightsup bottom :

ij

ji

ji

ij

xyt

yxb

units control : , G1, G2R

+

+++

+

-

+ G1

)(1 bF

ni xxx 1

• cluster units: competitive, receive input vector x through weights b: to determine winner j.

• input units: placeholder or external inputs

• interface units:

– pass s to x as input vector for classification by

– compare x and

– controlled by gain control unit G1

•

• Needs to sequence the three phases (by control units G1, G2, and R)

2F

) winner fromn (projectio jj yt

1G

)(1 aF

)(1 bF

2F

1) are inputs threethe of twoif 1(output rule 2/3obey and )(both in Nodes 21 FbF

RGtFtGsbF jijii ,2, : Input to ,1, :)( Input to 21

JtbFGsbFG

ysG

for open )( :0 0 receive open to )( :1

otherwise0 0and0if1

11

11

1

parametervigilance1otherwise1

if0

o

s

xR

input new afor tionclassifica new a ofstart thesignals 1

otherwise00if1

2

2

G

sG

R = 0: resonance occurs, update and

R = 1: fails similarity test, inhibits J from further computationJtJb

counter propagation network

Technology

cluster of x

input vectors x

training sample x

input samples

given x

input units

useonly x

class j