unsupervised learning

Unsupervised Learning

G.Anuradha

Contents

• Introduction

• Competitive Learning networks

• Kohenen self-organizing networks

• Learning vector quantization

• Hebbian learning

04/19/23 3

The main property of a neural network is an The main property of a neural network is an ability to learn from its environment, and to ability to learn from its environment, and to improve its performance through learning. So improve its performance through learning. So far we have considered far we have considered supervisedsupervised or or active active learninglearning learning with an external “teacher” learning with an external “teacher” or a supervisor who presents a training set to the or a supervisor who presents a training set to the network. But another type of learning also network. But another type of learning also exists: exists: unsupervised learningunsupervised learning..

IntroductionIntroduction

04/19/23 4

In contrast to supervised learning, unsupervised or In contrast to supervised learning, unsupervised or self-organised learning self-organised learning does not require an does not require an external teacher. During the training session, the external teacher. During the training session, the neural network receives a number of different neural network receives a number of different input patterns, discovers significant features in input patterns, discovers significant features in these patterns and learns how to classify input data these patterns and learns how to classify input data into appropriate categories. Unsupervised into appropriate categories. Unsupervised learning tends to follow the neuro-biological learning tends to follow the neuro-biological organisation of the brain.organisation of the brain.

Unsupervised learning algorithms aim to learn Unsupervised learning algorithms aim to learn rapidly and can be used in real-time.rapidly and can be used in real-time.

04/19/23 5

In 1949, Donald Hebb proposed one of the key In 1949, Donald Hebb proposed one of the key ideas in biological learning, commonly known as ideas in biological learning, commonly known as Hebb’s LawHebb’s Law. Hebb’s Law states that if neuron . Hebb’s Law states that if neuron ii is is near enough to excite neuron near enough to excite neuron j j and repeatedly and repeatedly participates in its activation, the synaptic connection participates in its activation, the synaptic connection between these two neurons is strengthened and between these two neurons is strengthened and neuron neuron j j becomes more sensitive to stimuli from becomes more sensitive to stimuli from neuron neuron ii..

Hebbian learningHebbian learning

04/19/23 6

Hebb’s Law can be represented in the form of two Hebb’s Law can be represented in the form of two rules: rules:

1.1. If two neurons on either side of a connection If two neurons on either side of a connection are activated synchronously, then the weight of are activated synchronously, then the weight of that connection is increased. that connection is increased.

2.2. If two neurons on either side of a connection If two neurons on either side of a connection are activated asynchronously, then the weight are activated asynchronously, then the weight of that connection is decreased. of that connection is decreased.

Hebb’s Law provides the basis for learning Hebb’s Law provides the basis for learning without a teacher. Learning here is a without a teacher. Learning here is a local local phenomenon phenomenon occurring without feedback from occurring without feedback from the environmentthe environment..

04/19/23 7

In competitive learning, neurons compete among In competitive learning, neurons compete among themselves to be activated.themselves to be activated.

While in Hebbian learning, several output neurons While in Hebbian learning, several output neurons can be activated simultaneously, in competitive can be activated simultaneously, in competitive learning, only a single output neuron is active at learning, only a single output neuron is active at any time.any time.

The output neuron that wins the “competition” is The output neuron that wins the “competition” is called the called the winner-takes-all winner-takes-all neuron.neuron.

Competitive learningCompetitive learning

04/19/23 Intelligent Systems and Soft Computing

8

The basic idea of competitive learning was The basic idea of competitive learning was introduced in the early 1970s.introduced in the early 1970s.

In the late 1980s, Teuvo Kohonen introduced a In the late 1980s, Teuvo Kohonen introduced a special class of artificial neural networks called special class of artificial neural networks called self-organizing feature mapsself-organizing feature maps. These maps are . These maps are based on competitive learning.based on competitive learning.


9

Our brain is dominated by the cerebral cortex, a Our brain is dominated by the cerebral cortex, a very complex structure of billions of neurons and very complex structure of billions of neurons and hundreds of billions of synapses. The cortex hundreds of billions of synapses. The cortex includes areas that are responsible for different includes areas that are responsible for different human activities (motor, visual, auditory, human activities (motor, visual, auditory, somatosensory, etc.), and associated with different somatosensory, etc.), and associated with different sensory inputs. We can say that each sensory sensory inputs. We can say that each sensory input is mapped into a corresponding area of the input is mapped into a corresponding area of the cerebral cortex. cerebral cortex. The cortex is a self-organizing The cortex is a self-organizing computational map in the human brain.computational map in the human brain.

What is a self-organizing feature map?What is a self-organizing feature map?

Kohenen Self-Organizing Feature Maps

• Feature mapping converts a wide pattern space into a typical feature space

• Apart from reducing higher dimensionality it has to preserve the neighborhood relations of input patterns


11

Feature-mapping Kohonen modelFeature-mapping Kohonen model

Input layer

Kohonen layer

(a)

Input layer

Kohonen layer

1 1(b)

00

04/19/23 12

The Kohonen networkThe Kohonen network

The Kohonen model provides a topological The Kohonen model provides a topological mapping. It places a fixed number of input mapping. It places a fixed number of input patterns from the input layer into a higher- patterns from the input layer into a higher- dimensional output or Kohonen layer. dimensional output or Kohonen layer.

Training in the Kohonen network begins with the Training in the Kohonen network begins with the winner’s neighborhood of a fairly large size. Then, winner’s neighborhood of a fairly large size. Then, as training proceeds, the neighborhood size as training proceeds, the neighborhood size gradually decreases.gradually decreases.

Model: Horizontal & Vertical linesRumelhart & Zipser, 1985

• Problem – identify vertical or horizontal signals

• Inputs are 6 x 6 arrays

• Intermediate layer with 8 units

• Output layer with 2 units

• Cannot work with one layer

Rumelhart & Zipser, Cntd

H V

Geometrical Interpretation

• So far the ordering of the output units themselves was not necessarily informative

• The location of the winning unit can give us information regarding similarities in the data

• We are looking for an input output mapping that conserves the topologic properties of the inputs feature mapping

• Given any two spaces, it is not guaranteed that such a mapping exits!


19

In the Kohonen network, a neuron learns by In the Kohonen network, a neuron learns by shifting its weights from inactive connections to shifting its weights from inactive connections to active ones. Only the winning neuron and its active ones. Only the winning neuron and its neighbourhood are allowed to learn. If a neuron neighbourhood are allowed to learn. If a neuron does not respond to a given input pattern, then does not respond to a given input pattern, then learning cannot occur in that particular neuron.learning cannot occur in that particular neuron.

The The competitive learning rule competitive learning rule defines the change defines the change wwijij applied to synaptic weight applied to synaptic weight wwijij as as

where where xxii is the input signal and is the input signal and is the is the learning learning rate rate parameter.parameter.

ncompetitiothelosesneuronif,0

ncompetitiothewinsneuronif),(

j

jwxw

ijiij


20

The overall effect of the competitive learning rule The overall effect of the competitive learning rule resides in moving the synaptic weight vector resides in moving the synaptic weight vector WWjj of of

the winning neuron the winning neuron j j towards the input pattern towards the input pattern XX. . The matching criterion is equivalent to the The matching criterion is equivalent to the minimum minimum Euclidean distance Euclidean distance between vectors.between vectors.

The Euclidean distance between a pair of The Euclidean distance between a pair of nn-by-1 -by-1 vectors vectors X X and and WWjj is defined byis defined by

where where xxii and and wwijij are the are the iith elements of the vectors th elements of the vectors

X X and and WWjj, respectively., respectively.

2/1

1

2)(

n

iijij wxd WX


21

To identify the winning neuron, To identify the winning neuron, jjXX, that best , that best

matches the input vector matches the input vector XX, we may apply the , we may apply the following condition:following condition:

where where m m is the number of neurons in the Kohonen is the number of neurons in the Kohonen layer.layer.

,jj

minj WXX j =1,2, . . .,m


22

Suppose, for instance, that the 2-dimensional input Suppose, for instance, that the 2-dimensional input vector vector X X is presented to the three-neuron Kohonen is presented to the three-neuron Kohonen network,network,

The initial weight vectors, The initial weight vectors, WWjj, are given by, are given by

12.0

52.0X

81.0

27.01W

70.0

42.02W

21.0

43.03W


23

We find the winning (best-matching) neuron We find the winning (best-matching) neuron jjXX using the minimum-distance Euclidean criterion:using the minimum-distance Euclidean criterion:

Neuron 3 is the winner and its weight vector Neuron 3 is the winner and its weight vector WW33 is is updated according to the competitive learning rule.updated according to the competitive learning rule.

2212

21111 )()( wxwxd 73.0)81.012.0()27.052.0( 22

2222

21212 )()( wxwxd 59.0)70.012.0()42.052.0( 22

2232

21313 )()( wxwxd 13.0)21.012.0()43.052.0( 22

0.01)43.052.0(1.0)( 13113 wxw

0.01)21.012.0(1.0)( 23223 wxw


24

The updated weight vector The updated weight vector WW33 at iteration ( at iteration (p p + 1) + 1) is determined as:is determined as:

The weight vector The weight vector WW33 of the wining neuron 3 of the wining neuron 3 becomes closer to the input vector becomes closer to the input vector X X with each with each iteration.iteration.

20.0

44.0

01.0

0.01

21.0

43.0)()()1( 333 ppp WWW

Measures of similarity

Distance Normalized scalar product

Kohenen Self-Organizing networks

• Also known as Kohenen Feature maps or topology-preserving maps

• Learning procedure of Kohenen feature maps is similar to that of competitive learning networks.

• Similarity (dissimilarity) measure is selected and the winning unit is considered to be the one with the largest (smallest) activation

• The weights of the winning neuron as well as the neighborhood around the winning units are adjusted.

• Neighborhood size decreases slowly with every iteration.

Training of kohenon self organizing network

1. Select the winning output unit as the one with the largest similarity measure between all wi and xi . The winning unit c satisfies the equation ||x-wc||=min||x-wi|| where the index c refers to the winning unit (Euclidean distance)

2. Let NBc denote a set of index corresponding to a neighborhood around winner c. The weights of the winner and it neighboring units are updated by Δwi=ɳ(x-wi) iεNBc

StartInitialize Weights,

Learning rate

Initialize Topological

Neighborhoodparams

For Each i/p

X

For i=1 to n

For j=1 to m

D(j)=Ʃ(xi-wij)2

Winning unit index J is computedD(J)=minimum

Weights of winning unit

continue

continue

Reduce learningrate

Reduce radiusOf network

Test(t+1)

Is reduced

Stop

Problem

• Construct a kohenen self-organizing map to cluster the four given vectors [0 0 1 1], [1 0 0 0], [0 1 1 0],[0 0 01]. The number of cluster to be formed is two. Initial learning rate is 0.5

• Initial weights =

3.08.0

5.06.0

7.04.0

9.02.0

Solution to the problem

Input vector Winner weights

[0 0 1 1] D(1) [0.1 0.2 0.8 0.9]

[1 0 0 0] D(2) [0.95 0.35 0.25 0.15]

[0 1 1 0] D(1) [0.05 0.6 0.9 0.95]

[0 0 0 1] D(1) [0.025 0.3 0.45 0.975]

InferenceInitial random weights


36

Network after 100 iterationsNetwork after 100 iterations

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1

W(2

,j)

W(1,j)


37

Network after 1000 iterationsNetwork after 1000 iterations

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1

W(2

,j)

W(1,j)


38

Network after 10,000 iterationsNetwork after 10,000 iterations

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1

W(2

,j)

W(1,j)

Training occurs in several steps and over many iterations:

Each node's weights are initialized.

A vector is chosen at random from the set of training data and presented to the lattice.

Every node is examined to calculate which one's weights are most like the input vector. The winning node is commonly known as the Best Matching Unit (BMU).

The radius of the neighbourhood of the BMU is now calculated. This is a value that starts large, typically set to the 'radius' of the lattice, but diminishes each time-step. Any nodes found within this radius are deemed to be inside the BMU's neighbourhood.

Each neighbouring node's (the nodes found in step 4) weights are adjusted to make them more like the input vector. The closer a node is to the BMU, the more its weights get altered.

Repeat step 2 for N iterations.

Determining Neighbourhood

• A neighborhood function around the winning unit can be used instead of defining the neighborhood of a winning unit.

• A Gaussian function can be used as neighborhood function

)2

||||exp()(

2

2

pcpi

ic

Where pi and pc are the positions of the output units i and c respectively and σ reflects the scope of the neighborhood.

The update formula using neighborhood function is given by

))(( wixicwi

Applications of SOM-World Poverty Map

What happens in a dot product method?

• In the case of a dot product finding a minimum of x-wi is nothing but finding the maximum among the m scalar products

• Competitive learning network performs an on-line clustering process on the input patterns

• When process is complete the input data is divided into disjoint clusters

• With euclidean distance the update formula is actually an online gradient descent that minimizes the objective function

Competitive learning with unit-length vectors

The dots represent the input vectors and the crosses denote the weight vectors for the four output units

As the learning continues, the four weight vectors rotate toward the centers of the four input clusters

Limitations of competitive learning

• Weights are initialized to random values which might be far from any input vector and it never gets updated– Can be prevented by initializing the weights to

samples from the input data itself, thereby ensuring that all weights get updated when all the input patterns are presented

– Or else the weights of winning as well as losing neurons can be updated by tuning the learning constant by using a significantly smaller learning rate for the losers. This is called as leaky learning

– Note:- Changing η is generally desired. An initial value of η explores the data space widely. Later on progressively smaller value refines the weights. Similar to the cooling schedule in simulated annealing.

Limitations of competitive learning

• Lacks the capability to add new clusters when deemed necessary

• If ɳ is constant –no stability of clusters

• If ɳ is decreasing with time may become too small to update cluster centers

• This is called as stability-plasticity dilemma (Solved using adaptive resonance theory (ART))

• If the output units are arranged in the form of a vector or matrix then the weights of winners as well as neighbouring losers can be updated. (Kohenen feature maps)

• After learning the input space is divided into a number of disjoint clusters. These cluster centers are known as template or code book

• For any input pattern presented we can use an appropriate code book vector (Vector Quantization)

• This vector quantization is used in data compression in IP and communication systems.

Learning Vector QuantizationLVQ

LVQ

• Recall that a Kohonen SOM is a clustering technique, which can be used to provide insight into the nature of data. We can transform this unsupervised neural network into a supervised LVQ neural network.

• The network architecture is just like a SOM, but without a topological structure.

• Each output neuron represents a known category (e.g. apple, pear, orange).

• Input vector = x=(x1,x2…..xn)

• Weight vector for the jth output neuron wj=( w1j,w2j,….wnj)

• Cj= Category represented by the jth neuron. This is pre-assigned.

• T = Correct category for input

• Define Euclidean distance between the input vector and the weight

vector of the jth neuron as: Ʃ(xi-wij)2

• It is an adaptive data classification method based on training data with desired class information

• It is actually a supervised training method but employs unsupervised data-clustering techniques to preprocess the data set and obtain cluster centers

• Resembles a competitive learning network except that each output unit is associated with a class.

Network representation of LVQ

Possible data distributions and decision boundaries

LVQ learning algorithm

• Step 1: Initialize the cluster centers by a clustering method

• Step 2: Label each cluster by the voting method• Step 3: Randomly select a training input vector x

and find k such that ||x-wk|| is a minimum

• Step 4: If x and wk belong to the same class update wk by

)( wkxwk else

)( wkxwk

• The parameters used for the training process of a LVQ include the following– x=training vector (x1,x2,……xn)

– T=category or class for the training vector x

– wj= weight vector for j th output unit (w1j,…wij….wnj)

– cj= cluster or class or category associated with jth output unit

– The Euclidean distance of jth output unit is D(j)=Ʃ(xi-wij)2

Start Initialize weightLearning rate

For each i/px

A

B

Y Calculate winnerWinner = min D(j)

If T=Cj

Input T

wj(n)=wj(o) + ɳ[x-wj(o)]

wj(n)=wj(o) - ɳ[x-wj(o)]

Y N

Reduce ɳɳ(t+1)=0.5 ɳ(t)

If ɳ reduces negligible Stop

Y

B

A

Problem

• Construct and test and LVQ net with five vectors assigned to two classes. The given vectors along with the classes are as shown in the table below

Vector Class

[0 0 1 1] 1

[1 0 0 0] 2

[0 0 0 1] 2

[1 1 0 0] 1

[0 1 1 0] 1

unsupervised learning

Documents

active learning learning

biological learning

type of learning

selforganised learning

neuron i

neuron j

output neurons

intelligent systems