ann combined

Artificial Neural Artificial Neural Networks Networks LecLec 1 & 21 & 2Networks Networks LecLec 1 & 21 & 2

Dr. Aditya Abhyankar

1ANN - Dr. Abhyankar - Lecture 1&2

Major Topics to be CoveredMajor Topics to be CoveredMajor Topics to be CoveredMajor Topics to be Coveredy ANN Basics, neurons, learning algorithmsy Perceptron learning, and pattern classificationPerceptron learning, and pattern classificationy Multi-Layer Perceptron (MLP), back-propagation learning,

and applicationsy Pattern classification, Support vector machine (SVM)y Clustering, Self-Organization Mapy Radial Basis Networky Time series prediction, system identification, expert system

F S t Th d F L i C t ly Fuzzy Set Theory and Fuzzy Logic Controly Genetic Algorithm and Evolution Computingy Learn vector quantizationy Mixture of Expert networky Mixture of Expert networky Recurrent network


Class PhilosophyClass PhilosophyClass PhilosophyClass Philosophy

Questions starting with WHY !!y Questions starting with WHY !!y No formalitiesy No attendance gimmicksy No attendance gimmicksy More inquisitive


ApplicationsApplications

y General models of ANN Applications Pattern Classifications Control, time series modeling, estimation Optimization

Real world application examples Real world application examples


ApplicationsApplications

y Many memoryless ANN paradigms (MLP) modeled mathematically as a nonlinear

(f )mapping between the inputs (feature vectors) and outputs.

Discrete output values: classification problem Discrete output values: classification problem Continuous output values: approximation problem

ANN with feed-back can be used to model t eed bac ca be used to odedynamic systems


Pattern Classification Pattern Classification Pattern Classification Pattern Classification ApplicationsApplications

y Speech Recognition and Speech Synthesisy Classification of radar/sonar signalsClassification of radar/sonar signalsy Remote Sensing and image classificationy Handwritten character/digits Recognitiong gy ECG/EEG/EMG Filtering/Classificationy Credit card application screeningy Data mining, Information retrieval


Control, Time Series, EstimationControl, Time Series, Estimation

y Machine Control / Robot manipulationFi i l / S i ifi / E i i Ti iy Financial / Scientific / Engineering Time series forecastingInverse modeling of vocal tracty Inverse modeling of vocal tract


OptimizationOptimization

y Traveling sales persony Multiprocessor scheduling and task assignmenty VLSI placement and routing


Real World ApplicationsReal World Applications

y S&P 500 index predictiony S&P 500 index predictiony Real Estate appraisaly Credit scoringgy Geochemical modelingy Hospital patient stay length predictiony Breast cancer cell image classificationy Jury summoning predictiony Precision direct mailingy Precision direct mailingy Natural gas price prediction


ANN !!!ANN !!!ANN !!!ANN !!!y An artificial neural network (ANN) is a massively

parallel distributed computing system (algorithmparallel distributed computing system (algorithm, device, or other) that has a natural propensity for storing experiential knowledge and making it available for useavailable for use.

y It resembles the brain in two aspects:1) Knowledge is acquired by the network through a learning1). Knowledge is acquired by the network through a learning process.2). Interneuron connection strengths known as synaptic weights are used to store the knowledge.

Al k d & M t (1990) H ki Aleksander & Morton (1990), Haykin(1994)


Biological S stemBiological S stemBiological SystemBiological System


ANN Ass mptionsANN Ass mptionsANN AssumptionsANN Assumptions

Info mation p ocessing happens at man y Information processing happens at many simple elements called neuronsy Signals are passed between neurons over Signals are passed between neurons over

the connection linksy Each connection link has associated

i htweighty Each neuron applies an activation function


ANN cha acte isticsANN cha acte isticsANN characteristicsANN characteristics

P tt f ti b t y Pattern of connections between neurons called architecturey Method of determining the weights on the y Method of determining the weights on the

connections called training algorithmy Mathematical model for assigning the y Mathematical model for assigning the

output called activation function


Ne on ModelNe on ModelNeuron ModelNeuron Modely McCulloch-Pitts (Simplistic) Neuron Model

The network function of a neuron is ay The network function of a neuron is aweighted sum of its input signals plus a biasterm.


Neuron ModelNeuron ModelNeuron ModelNeuron Modely The net function is a linear or nonlinear

mapping from the input data space to an pp g p pintermediate feature spacey The most common form is a hyper-plane


Othe Net Fo msOthe Net Fo msOther Net FormsOther Net Formsy Higher order net function: Net function is a linear

bi ti f hi h d l i l t Fcombination of higher order polynomial terms. For example, a 2nd order net function has the form:

ND lt (S P) t f ti i t d f ti th

1, 1i ijk j k i

j kw y y

= == +

y Delta (S-P) net function instead of summation, theproduct of all weighted synaptic inputs are computed:

N

0

N

i ij ij

w y=

=17ANN - Dr. Abhyankar - Lecture 1&2

Neuron Activation functionNeuron Activation functionNeuron Activation functionNeuron Activation function


Neuron Activation functionNeuron Activation function


ANN configurationANN configurationgg

y Uni-directional communication links represented b di t d Th ANN t t th bby directed arcs. The ANN structure thus can be described by a directed graph.y Fully connected a cyclic graph with feedy Fully connected a cyclic graph with feed-

back.There are NxN connections for N neurons.


ANN configurationANN configurationANN configurationANN configurationy Feed-forward, layered connection acyclic

directed graph no loop or cycledirected graph, no loop or cycle.


ANN configurationANN configurationANN configurationANN configuration


FeedFeed--back Dynamic Systemback Dynamic SystemFeedFeed back Dynamic Systemback Dynamic Systemy Without Delay, feedback

cause causality problem: ancause causality problem: anunknown variable dependson an unknown variable!2 ( 1) ( ( 2))a2 = g(a1) = g(g(a2)) =

To break the cycle, at leastone delay element must beinserted into the feedbackloop.

This effectively created a This effectively created anonlinear dynamic system(sequential machine).


Models of NeuronModels of Neuron

y McCulloch-Pitts Model (MP)y Rosenblatts Perceptron Modely Adaline Model


MP ModelMP ModelMP ModelMP Modely McCulloch-Pitts (Simplistic) Neuron Model

y The network function of a neuron is ai ht d f it i t i l l biweighted sum of its input signals plus a bias

term.


MP Model LimitationsMP Model LimitationsMP Model LimitationsMP Model Limitations

y Weights fixedWeights fixedy Incapable of learningy Original model allows ONLY:g

Binary output steps Operations at discrete time steps


PerceptronPerceptronPerceptronPerceptron28ANN - Dr. Abhyankar - Lecture 1&2

M1

M

i ii

x w a = Activation1i=( )s f x=Output

b s = Error

PerceptronPerceptron i iw a =Weight Change

PerceptronPerceptron29ANN - Dr. Abhyankar - Lecture 1&2

y Perceptron learning law gives step-y Perceptron learning law gives stepby-step process for adjusting weightsy Perceptron convergence theoremPerceptron convergence theorem

Perceptron Perceptron -- AdvantagesAdvantagesPerceptron Perceptron AdvantagesAdvantages30ANN - Dr. Abhyankar - Lecture 1&2

Widrows AdalineWidrows AdalineWidrow s AdalineWidrow s Adaline31ANN - Dr. Abhyankar - Lecture 1&2

M1

M

i ii

x w a = Activation1i=

( )s f x x= =Outputb s b x = = Error

Adaline (ADAptive Linear Adaline (ADAptive Linear Element)Element) i i

w a =Weight ChangeElement)Element)


y Analog activation value x compared with a og a a o a u o pa dtarget output b

ORy Output is linear function of xy LMS learning lawy Gradient descent algorithm

Widrows AdalineWidrows AdalineWidrow s AdalineWidrow s Adaline33ANN - Dr. Abhyankar - Lecture 1&2

Heat and Cold ExHeat and Cold ExHeat and Cold Ex.Heat and Cold Ex.


Hebb RuleHebb RuleHebb RuleHebb Rule35ANN - Dr. Abhyankar - Lecture 1&2

Heat and Cold Heat and Cold ExampleExampleHeat and Cold Heat and Cold ExampleExample


Artificial Neural Artificial Neural Networks Networks LecLec 3 & 43 & 4


ANN !!!ANN !!!ANN !!!ANN !!!

y An artificial neural network (ANN) is a massively paralleldistributed computing system (algorithm device or other) thatdistributed computing system (algorithm, device, or other) that has a natural propensity for storing experiential knowledge and making it available for use.

y It resembles the brain in two aspects:1). Knowledge is acquired by the network through a learning processprocess.2). Interneuron connection strengths known as synaptic weights are used to store the knowledge.

Aleksander & Morton (1990) Haykin Aleksander & Morton (1990), Haykin (1994)

2July 20, 2010 ANN - Dr. Abhyankar - Lec 3 & 4


4July 20, 2010ANN - Dr. Abhyankar - Lec 3 & 4

Features Features Biological NNBiological NNFeatures Features Biological NNBiological NN

y Robustness and fault toleranceRobustness and fault tolerancey Flexibility: on-the-fly-learning, adjustment

of weightsy Adaptability: ability to deal with variety of

data situations (fuzzy, probabilistic, noisy etc) etc) y Efficiency: Parallel and distributed

computing


Performance ComparisonPerformance ComparisonParameter BNN ANNSpeed Slow (few ms per Slow (few ns perSpeed Slow (few ms per

execution)Slow (few ns per execution)

Processing Massively parallel Mostly sequentialProcessing Massively parallel Mostly sequential

Size & Neurons~1011, Difficult to perform Complexity interconnections~

1015, contribution

pcomplex pattern recognition tasks

from dendrites and synapses

7July 20, 2010ANN - Dr. Abhyankar -Lec 3 & 4

Performance ComparisonPerformance ComparisonPerformance ComparisonPerformance Comparison

Parameter BNN ANNParameter BNN ANN

Storage Adaptable Strictly replaceableStorage Adaptable (strengths of interconnections)

Strictly replaceable (memory mapping)

)Fault Tolerance

Good FT, distributed

Poor FT, corrupted memories un-

information retrieved data

8July 20, 2010ANN - Dr. Abhyankar -Lec 3 & 4

ANN TerminologyANN TerminologyANN TerminologyANN Terminology

ANN highly simplified model of BNNy ANN highly simplified model of BNNy ANN has interconnected processing unitsy Summing part receives N inputs weights y Summing part receives N inputs, weights

each value and computes weighted sumy Weighted sum activation valueWeighted sum activation valuey Positive weight excitatory inputy Negative weight inhibitory inputg g y p


Other Things we sawOther Things we sawOther Things we sawOther Things we saw

y Other Net FormsOther Net Formsy Various Activation Functionsy ANN configurationsgy Dynamic Systemsy ANN Assumptionsy ANN Characteristics


Models of Models of Neuron Neuron 1 layer1 layery McCulloch-Pitts Model (MP)y Rosenblatts Perceptron Modely Adaline Model



x1xxM

1iwijw

iny yjx

xMij

iNw y

Th t k f ti f i

Nxib

y The network function of a neuron is aweighted sum of its input signals plus a biasterm.term.


MP ModelMP Modely The net function is a linear or nonlinear


NT

i ij j i i iy w x b w x b= + = +1

iin ij j i i ij

y w x b w x b=

+ +( )iny f y=


PerceptronPerceptronpp


P tP tPerceptronPerceptron

M

1

M

i ii


b s = Errori iw a =Weight Change


PerceptronPerceptron -- AdvantagesAdvantages



Wid Wid Ad liAd liWidrowsWidrows AdalineAdaline


AdalineAdaline ((ADAptiveADAptive Linear Element)Linear Element)M

1

M

i ii



i iw a =Weight Change20July 20, 2010 ANN - Dr. Abhyankar - Lec 3 & 4

WidrowsWidrows AdalineAdalineWidrowsWidrows AdalineAdaline

y Analog activation value x compared with y Analog activation value x compared with target output b

OROy Output is linear function of xy LMS learning lawgy Gradient descent algorithm


AND Example using MP modelAND Example using MP modelAND Example using MP modelAND Example using MP model

1 ?w = 1 20 0 0x x y

1x 1 ?w

0 1 01 0 0

2x y

1 0 01 1 1

2 ?w = ?TH =



1 1w = 1 20 0 0x x y

1x 1 1w

0 1 01 0 0

2x y

1 0 01 1 1

2 1w = 2TH =



1 1w = Gives equation f li 1 20 0 0x x y

1x 1 1w of a line

0 1 01 0 0

2x y

1 0 01 1 1

2 1w = 2TH =Why One Neuron is sufficient ???



2x1 2

0 0 0x x y2

0 1 01 0 0

1x1 0 01 1 1


OR Example using MP modelOR Example using MP modelOR Example using MP modelOR Example using MP model

2x1 2

0 0 0x x y2

0 1 11 0 1

1x1 0 11 1 1


OR Example using MP modelOR Example using MP modelOR Example using MP modelOR Example using MP model

1 2w = 1 20 0 0x x y

1x 1 2w

0 1 11 0 1

2x y

1 0 11 1 1

2 2w = 2TH =


ANDAND--NOT Example using MP NOT Example using MP p gp gmodelmodel

1 2w = 1 20 0 0x x y

1x 1 2w

0 1 01 0 1

2x y

1 0 11 1 0

2 1w = 2TH =


XX--OR Example using MP modelOR Example using MP modelXX OR Example using MP modelOR Example using MP model

1 ?w = 1 20 0 0x x y

1x 1 ?w

0 1 11 0 1

2x y

1 0 11 1 0

2 ?w = ?TH =



2x1 2

0 0 0x x y2

0 1 11 0 1

1x1 0 11 1 1

NOT Linearly Separable !!!!!!


XX--OR Example using MP modelOR Example using MP modelXX OR Example using MP modelOR Example using MP model1 2

0 0 00 1 1

x x y

0 1 11 0 11 1 01x 11 2w = 31 2w =

y12 1w = 21 1w = 2TH =

2xy

22 2w =2TH =

32 2w =2TH =22 2TH =

1 2 1 2 2 1( ) { ( ) } { ( ) }x XOR x x ANDNOT x OR x ANDNOT x=31July 20, 2010 ANN - Dr. Abhyankar - Lec 3 & 4


HOT HOT

1x 1y

COLD COLD

2x 2y

Actual Input Perceived Output



( ) ( 1) ( 2)y t x t ANDx t=2 2 2( ) ( 1) ( 2)y t x t ANDx t= 1 1 2 2( ) { ( 1)} { ( 3) ( 2)}y t x t OR x t ANDNOTx t=



HOT HOT2

1x 1y2

-1 NN

COLD COLD22 1

2x 2y2

1

1N N




HOT HOT2

Case Study 1: Cold stimulus applied for small duration

01x 1yHOT HOT

2

1 1

COLD COLD2

-1

12x 2yCO CO

2

1

1


1

t=0



HOT HOT2


01x 1yHOT HOT

0

2

1 1

COLD COLD

0

2

-1

02x 2yCO CO

12

1

1


1

t=1



HOT HOT2


1x 0 1yHOT HOT

1

2

1 1

COLD COLD

1

2

-1

2x 0 2yCO CO

02

1

1


1

t=2



HOT HOT2


1x 1 1yHOT HOT

2

1 1

COLD COLD2

-1

2x 0 2yCO CO

2

1

1


1

t=3



HOT HOT2

Case Study 2: Hot stimulus applied for one t step

11x 1yHOT HOT

2

1 1

COLD COLD2

-1

02x 2yCO CO

2

1

1


1

t=0



HOT HOT2

Case Study 2: Hot stimulus applied for one t step

1x 1 1yHOT HOT

2

1 1

COLD COLD2

-1

2x 0 2yCO CO

02

1

1


1

t=1



HOT HOT2

Case Study 3: Cold stimulus applied for longer duration

01x 1yHOT HOT

2

1 1

COLD COLD2

-1

12x 2yCO CO

2

1

1


1

t=0



HOT HOT2


01x 1yHOT HOT

0

2

1 1

COLD COLD

0

2

-1

12x 2yCO CO

12

1

1


1

t=1



HOT HOT2


1x 0 1yHOT HOT

0

2

1 1

COLD COLD

0

2

-1

2x 1 2yCO CO

12

1

1


1

t=2



x1xxM

1iwijw

iny yjx

xMij

iNw y

Th t k f ti f i

Nxib



MP ModelMP Modely The net function is a linear or nonlinear


NT

i ij j i i iy w x b w x b= + = +1

iin ij j i i ij

y w x b w x b=

+ +( )iny f y=


Significance of biasSignificance of biasgg

1x w 1x 1iw y1jx

M

M

1iwijw

iNw iny y

1

jx

x

M

M

1iwijw

iNw iny y

0b + +Nx

ibNx

i1 1 2 2 0b x w x w+ + =

w b1 1 2 2x w x w + =

1w 12 1

2 2

w bx xw w

= 12 12 2

x xw w

= +46July 20, 2010 ANN - Dr. Abhyankar - Lec 3 & 4

HebbHebb Rule (Rule (HebbHebb Net)Net)HebbHebb Rule (Rule (HebbHebb Net)Net)

Initialize all weights to zero0 0, 1iw i n= = for {all input training vectors and target

t t i d t 2 4 }1 output pairs do steps 2 4 }1

S t A ti ti f I t2 i ix s= y t=Set Activation for Inputs2

Set Activation for Outputs3

i i y t

( ) ( )i i iw n w o x y= +Adjust weights & bias4

( ) ( )i i i y

( ) ( )b n b o y= +48July 20, 2010 ANN - Dr. Abhyankar - Lec 3 & 4

ifi i l lifi i l lArtificial Neural Artificial Neural Networks Networks LecLec 5&65&6Networks Networks LecLec 5&65&6


RecapRecapRecapRecap

y ANN- definitiony Resemblance with BNNy BNN ANN comparisony ANN Terminologyy Neuron Modelsy Learning !!

ANN TerminologyANN TerminologyANN highly simplified model of BNNy ANN highly simplified model of BNNy ANN has interconnected processing unitsy Summing part receives N inputs weights y Summing part receives N inputs, weights

each value and computes weighted sumy Weighted sum activation valueWeighted sum activation valuey Positive weight excitatory inputy Negative weight inhibitory inputg g y p

Models of Models of Neuron Neuron 1 layer1 layery McCulloch-Pitts Model (MP)y Rosenblatts Perceptron Modely Adaline Model



x1xxM

1iwijw

iny yjx

xMij

iNw y

Th t k f ti f i

Nxib



PerceptronPerceptron

1A1a 1w

2A 2a 2ww xMA

Ma

Mw( )s f x=

Sensory S i O t tMyUnitAssociation

Unit

SummingUnit

OutputUnit

Unit

PerceptronPerceptron

M

1

M

i ii


b s = Errori iw a =Weight Change

PerceptronPerceptron -- AdvantagesAdvantages


WidrowsWidrows AdalineAdaline

1A1a 1w

2A 2a 2ww xMA

Ma

Mw( )s f x x= =

Sensory S i O t tMyUnitAssociation

Unit

SummingUnit

OutputUnit

Unit

AdalineAdaline ((ADAptiveADAptive Linear Element)Linear Element)M

1

M

i ii



i iw a =Weight Change

Wid Ad liWid Ad liWidrows AdalineWidrows Adaline

y Analog activation value x compared with y Analog activation value x compared with target output b

OROy Output is linear function of xy LMS learning lawgy Gradient descent algorithm


HOT HOT2

1x 1y2

-1 NN

COLD COLD22 1

2x 2y2

1

1N N








i i y t


( ) ( )i i i y

( ) ( )b n b o y= +16July 20, 2010 ANN - Dr. Abhyankar - Lec 3 & 4

ExamplesExamples

AND logicy AND logicy OR logicy AND-NOT logicy AND-NOT logicy 3-D example where Hebb failsy X-OR !!y X OR !!

Concepts Concepts HebbianHebbian LearningLearningConcepts Concepts HebbianHebbian LearningLearning

y A discriminating hyper plane is constituted y A discriminating hyper plane is constituted by the combination of summing unit and output unit.y 0 targets are difficult to learn!y Bi-polar notion preferred

Hebb R le doesnt gi e di ection of y Hebb Rule doesnt give direction of learning!y Concept of a bias!!Concept of a bias!!

Pe cept on Lea ningPe cept on Lea ningPerceptron LearningPerceptron Learning

y Step 0: Initialize weights and bias (for y Step 0: Initialize weights and bias (for simplicity set ws and bs to 0). Set learning rate 0 1< gy Step 1: While stopping condition is falsey Step 2: For each training pair :s ty Step 3: Set activation to Input

i ix s=

Pe cept on Lea ningPe cept on Lea ningPerceptron LearningPerceptron Learningy Step 4: Compute response of o/p unitp p p p

in i iy b x w= +i

1 iny >0in

in

yy y

= 1 iny


y Step 5: Update error and bias if error p poccurred for the given pattern, if y t

( ) ( )w n w o tx+( ) ( )i i iw n w o tx= +( ) ( )b n b o t= +

y Step 6: For no weight change (for the

( ) ( )b n b o t= +p g g (

entire epoch) stop!

Problem!Problem!

y AND function, bipolar i/ps, bipolar targets, 1 0b 1 21, 0b w w = = = = =

Problem!Problem!

y AND function, binary i/ps, bipolar targets, 1 0 0 2b 1, 0, 0.2b = = =

Problem!Problem!

y AND function, binary i/ps, bipolar targets, 1, 0, 0.2b = = =

Results for third epoch

Problem!Problem!


Results of tenth epoch

Artificial Neural Networks Artificial Neural Networks LL 7 & 87 & 8 LecLec 7 & 87 & 8Dr. Aditya AbhyankarDr. Aditya Abhyankar

1ANN - Dr. Abhyankar - Lec 7 & 8

TodayTodayTodayToday

C t l d ill f l t tiy Conceptual drill from last timey PLR-CT

Delta Ruley Delta Ruley Adaline learning


Concepts Concepts HebbianHebbian LearningLearningConcepts Concepts HebbianHebbian LearningLearning

y A discriminating hyper plane is constituted y A discriminating hyper plane is constituted by the combination of summing unit and output unit.y 0 targets are difficult to learn!y Bi-polar notion preferred

Hebb R le doesnt gi e di ection of y Hebb Rule doesnt give direction of learning!y Concept of a bias!!Concept of a bias!!







i i y t


( ) ( )i i i y

( ) ( )b n b o y= +4ANN - Dr. Abhyankar - Lec 7 & 8

ExamplesExamples

AND logicy AND logicy OR logicy AND-NOT logicy AND-NOT logicy 3-D example where Hebb failsy X-OR !!y X OR !!


XX--OR Example using OR Example using HebbianHebbianXX OR Example using OR Example using HebbianHebbian

1 ?w = 1 21 1 1x x y 1x

1 ?w

1 1 11 1 1

2x y

0TH 1 1 11 1 1

2

?w = 0TH =



2x21 2

1 1 1x x y

1x1 1 1

1 1 11 1 11 1 1

NOT Linearly Separable !!!!!!



1 1 11 1 1

x x y 1 ?b =

1x 11 ?w = 31 ?w =1 1 1

1 1 11 1 1

y12 ?w = 21 ?w =

2xy

22 ?w =

32 ?w =3 ?b =22

1 2 1 2 2 1( ) { ( ) } { ( ) }x XOR x x ANDNOT x OR x ANDNOT x=2 ?b =



0 0 01 2{ ( ) }x ANDNOT x

1 2 1 2 1 2

0 0 0x x t w w b w w b 1 1 1 1 1 1 1 1 11 1 1 1 1 1 2 0 2

1 1 1 1 1 1 2 0 2

1 1 1 1 1 1 3 1 1

1 1 1 1 1 1 2 2 2



2x1 2{ ( ) }x ANDNOT x2

1x2 1 1x x=



0 0 02 1{ ( ) }x ANDNOT x

1 2 1 2 1 2

0 0 0x x t w w b w w b 1 1 1 1 1 1 1 1 11 1 1 1 1 1 0 2 0

1 1 1 1 1 1 0 2 0

1 1 1 1 1 1 1 3 1

1 1 1 1 1 1 2 2 2



2x1 2{ ( ) }x ANDNOT x2

1x2 1 1x x= +



1 2{ ( ) }x ANDNOT x

11 2{ ( ) }x ANDNOT x

1+2 1 1x x= 2 1 1x x= +

t2x 2x

1t1

t

1x1x 1

1+

11

+

11



2x2

1x1x



1 1 11 1 1

x x y 1 ?b =

1x 11 ?w = 31 ?w =1 1 1

1 1 11 1 1

y12 ?w = 21 ?w =

2xy

22 ?w =

32 ?w =3 ?b =22

1 2 1 2 2 1( ) { ( ) } { ( ) }x XOR x x ANDNOT x OR x ANDNOT x=2 ?b =



0 0 01 2 1 2 2 1( ) { ( ) } { ( ) }x XOR x x ANDNOT x OR x ANDNOT x=

1 2 1 2 1 2

0 0 0x x t w w b w w b 1 1 1 1 1 1 1 1 11 1 1 1 1 1 0 2 0

1 1 1 1 1 1 0 2 0

1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 2 2 2


XX--OR Example using MP modelOR Example using MP modelXX OR Example using MP modelOR Example using MP model1 2{ ( ) }x ANDNOT x

2x2x

2 1 1x x= x1x




i ix s=

Pe cept on Lea ningPe cept on Lea ningPerceptron LearningPerceptron Learningy Step 4: Compute response of o/p unitp p p p

in i iy b x w= +i

1 iny >0in

in

yy y

= 1 iny



( ) ( )w n w o tx+( ) ( )i i iw n w o tx= +( ) ( )b n b o t= +

y Step 6: For no weight change (for the

( ) ( )b n b o t= +p g g (

entire epoch) stop!

Problem!Problem!

y AND function, bipolar i/ps, bipolar targets,1 0b

L t l thi i d d ti f

1 21, 0b w w = = = = =y Lets solve this using advanced notion of

bias!


1x w 1x 1iw y1jx

M

M

1iwijw

iNw iny y

1

jx

x

M

M

1iwijw

iNw iny y

0b + +Nx

ibNx

i1 1 2 2 0b x w x w+ + =

w b1 1 2 2x w x w + =

1w 12 1

2 2

w bx xw w

= 12 12 2

x xw w

= +23July 20, 2010 ANN - Dr. Abhyankar - Lec 3 & 4


1xM 1i

winy

jxM

Mijw

iNw iny y

NxM iNw

ib

0x24July 20, 2010 ANN - Dr. Abhyankar - Lec 3 & 4

Problem!Problem!

y AND function, binary i/ps, bipolar targets, 1 0 0 2b 1, 0, 0.2b = = =

Problem!Problem!


Results for third epoch

Problem!Problem!


Results of tenth epoch

PerceptronPerceptron Convergence Convergence TheoremTheorem

If there is a weight vector w* such that gf(x(p).w*)=t(p) for all p, then for any starting vector w, PLR will converge to g , ga weight vector that gives correct response to all training patterns in p g pfinite number of loops.



* * *( ). (0). , min{ . }w k w w w km m x w + =* 2

2* 2

( ( ). )|| ( ) |||| ||w k ww kw

|| ||w

* 22 ( (0). )|| ( ) || w w kmk +2 * 2( ( ) )|| ( ) || || ||w k w



2 2 2|| ( ) || || (0) || , max{|| ||}w k w kM M x + =

* 22 2( (0). ) || ( ) || || (0) ||w w km k kM+ 2 2* 2( ( ) ) || ( ) || || (0) |||| || w k w kMw +


Ad liAd li D lt L i R lD lt L i R lAdalineAdaline Delta Learning RuleDelta Learning Rule


i ix s=


y Step 4: Compute response of o/p unitp p p p

i i iy b x w= +in i ii

y b x w+iny y=



( ) ( ) ( )w n w o t y x+( ) ( ) ( )i i iw n w o t y x= + ( ) ( ) ( )b n b o t y= +

y Step 6: If largest weight change is smaller

( ) ( ) ( )b n b o t y= + y Step 6: If largest weight change is smaller

than specified tolerance then stop!

Problem!Problem!

y AND function, bipolar i/ps, bipolar targets, 1 0b 1 21, 0b w w = = = = =

Artificial Neural Networks Artificial Neural Networks LL 99 LecLec 99

Dr. Aditya AbhyankarDr. Aditya Abhyankar


Last TimeLast TimeLast TimeLast Time

C t l d ill!!y Conceptual drill!!y PLR-CT

Delta Ruley Delta Ruley Adaline learning


TodayTodayTodayToday

D lt R ly Delta Ruley Adaline learning

MATLAB Demoy MATLAB Demoy Madaline Philosophyy Extended Delta Rule BPy Extended Delta Rule BP


Delta RuleDelta RuleDelta RuleDelta Rule

D lt R l h i ht th y Delta Rule changes weights on the neuron connections to minimize difference between yin and tbetween yin and ty Aims at reducing error across all the

training patterns (exemplars)training patterns (exemplars)y Squared error for a particular training

pattern can be given asp g2 2( ) ( )inE t y t y= =



E i f ti f ll th i hty E is function of all the weightsy Gradient of E will be a vector consisting of

partial derivatives of E with reference to partial derivatives of E with reference to each weighty Gradient gives direction of rapid increase y Gradient gives direction of rapid increase

in E, we wish to minimize E!y Hence calculate: E

Iw



2( ) inyE t y = 2( )inI I

t yw w

= 2( )E ( )E

l ll b d d

2( )in II

E t y xw = ( )in II

E t y xw

= y Local error will be reduced most

rapidly by adjusting weights as per the delta rulethe delta rule



y Step 0: Initialize weights and bias (for y Step 0: Initialize weights and bias (for simplicity set ws and bs to small value). Set learning rate 0 1< gy Step 1: While stopping condition is falsey Step 2: For each training pair :s ty Step 3: Set activation to Input

i ix s=


y Step 4: Compute response of o/p unitp p p p

i i iy b x w= +in i ii

y b x w+iny y=



( ) ( ) ( )w n w o t y x+( ) ( ) ( )i i iw n w o t y x= + ( ) ( ) ( )b n b o t y= +

y Step 6: If largest weight change is smaller

( ) ( ) ( )b n b o t y= + y Step 6: If largest weight change is smaller

than specified tolerance then stop!

Problem!Problem!

y AND function, bipolar i/ps, bipolar targets,

1 20.1, 0.1b w w = = = =

MadalineMadaline ArchitectureArchitectureMadalineMadaline ArchitectureArchitecture

1b1Z

1x 11w 1v1

Z

y12w 21w

1

2v

1inZiny

2xy

22w 2

v

b2inZ22

2b3b

2Z


M d liM d li L i (MRI)L i (MRI)MedalineMedaline Learning (MRI)Learning (MRI)

MedalineMedaline Learning (MRI)Learning (MRI)MedalineMedaline Learning (MRI)Learning (MRI)

M d liM d li L i (MRI)L i (MRI)MedalineMedaline Learning (MRI)Learning (MRI)

Problem!Problem!

y X-OR function, bipolar i/ps, bipolar t t targets,

0 5 0 3 0 05 0 2b w w = = = =1 11 212 12 22

0.5, 0.3, 0.05, 0.20.5, 0.1, 0.2

b w wb w w = = = =

= = =2 1 2 0.5b v v= = =

Artificial Neural Networks - 10Artificial Neural Networks - 10


Back-Propagation (BP)

Aims at balancing memorization and generalizationg

Stage 1: Feedforward I/p training pattern Stage 2: calculation and backpropagation of Stage 2: calculation and backpropagation of

associated errorS 3 Adj f h i h Stage 3: Adjustments of the weights

Architecture

HiddenLayer

Nomenclature

Activation Function

Characteristics: ContinuousContinuous Differentiable Monotonically non-decreasingMonotonically non decreasing Easily differentiable

Binary Sigmoid FunctionBinary Sigmoid Functionrange (0,1)

Bipolar Sigmoid FunctionBipolar Sigmoid Functionrange (-1,1)

Algorithm: Training

Algorithm

Application

Example

X-or Problem (linearly not separable) using 2-4-1 backprop Netp p

Initial Weights to hidden layer

Initial weights to o/p layer

hidden layer

Solution

1y

1z 2z 3z 4z

1x 2x

Solution

y Output1y Layer

1z 2z 3z 4zHiddenLayery

1x 2x InputLayer

Solution

y Output1y Layer

1z 2z 3z 4zHiddenLayer

13v 14vy

11v12v 13

v

1x 2x InputLayer

Solution

y Output1y Layer


13v 14v

21v 22vy11v

12v 13v21 22v

23v 24v

1x 2x InputLayer

Solution

y Output1y Layer11w 21w 31w 41w


13v 14v

21v 22vy11v

12v 13v21 22v

23v 24v

1x 2x InputLayer

Solution

y Output01w 1y Layer11w 21w 31w 41w

101w


13v 14v

21v 22vy11v

12v 13v21 22v

23v 24v01v

02v03v

1x 2x InputLayer1

03v

04v

Example

X-or Problem (linearly not separable) using 2-4-1 backprop Netp p

Initial Weights to hidden layer

Initial weights to o/p layer

hidden layer

Solution Training: step1

y Output01 0.1401w = 0.49190.2913

w

= 1y Layer11w 21w 31w 41w

1

010.39790.3581

1z 2z 3z 4z13v 14

v21v 22v

11v12v 13

v21 22v23v 24v

01v02v

03v

1x 2x103v

04v


y Output01 0.1401w = 0.49190.2913

w


1

010.39790.3581

1z 2z 3z 4z13v 14

v21v 22v

11v12v 13

v21 22v23v 24v

01v02v

03v

1x 2x103v

04v

[ ]0 0.3378 0.2771 0.2859 0.3329v =


y Output01 0.1401w = 0.49190.2913

w


1

010.39790.3581

1z 2z 3z 4z13v 14

v21v 22v

11v12v 13

v21 22v23v 24v

01v02v

03v

1x 2x103v

04v[ ]0 0 3378 0 2771 0 2859 0 3329v = [ ]0 0.3378 0.2771 0.2859 0.3329v 0.197 0.3191 0.1448 0.3394

0.3099 0.1904 0.0347 0.4861v

=


y Output01 0.1401w = 0.49190.2913

w

= 1y Layer1

010.39790.3581

3zin = i

1z 2z 3z 4z1 2

1 1 0x x t

0 1691i

2

0.7866zin =

3

0.1064zin

4

0.4796zin =

1 0 10 1 1

1 0.1691zin =

1x 2x10 0 0

[ ]0 0.3378 0.2771 0.2859 0.3329v = 0.197 0.3191 0.1448 0.33940.3099 0.1904 0.0347 0.4861

v =


y Output01 0.1401w = 0.49190.2913

w

= 1y Layer1

010.39790.3581

3z =

1z 2z 3z 4z1 2

1 1 0x x t

0 5422

2

0.6871z =

3

0.5266z

4

0.3823z =

1 0 10 1 1

1 0.5422z =

1x 2x10 0 0

[ ]0 0.3378 0.2771 0.2859 0.3329v = 0.197 0.3191 0.1448 0.33940.3099 0.1904 0.0347 0.4861

v =


y Output01 0.1401w = 0.49190.2913

w

=

1 0.1462yin =

1y Layer1

010.39790.3581

3z =

1z 2z 3z 4z1 2

1 1 0x x t

0 5422

2

0.6871z =

3

0.5266z

4

0.3823z =

1 0 10 1 1

1 0.5422z =

1x 2x10 0 0

[ ]0 0.3378 0.2771 0.2859 0.3329v = 0.197 0.3191 0.1448 0.33940.3099 0.1904 0.0347 0.4861

v =


y Output01 0.1401w = 0.49190.2913

w

=

1 0.1462yin = 1 0.4635y =

1y Layer1

010.39790.3581

3z =

1z 2z 3z 4z1 2

1 1 0x x t

0 5422

2

0.6871z =

3

0.5266z

4

0.3823z =

1 0 10 1 1

1 0.5422z =

1x 2x10 0 0

[ ]0 0.3378 0.2771 0.2859 0.3329v = 0.197 0.3191 0.1448 0.33940.3099 0.1904 0.0347 0.4861

v =


y Output01 0.1401w = 0.49190.2913

w

=

1 0.1462yin = 1 0.4635y =

1y Layer1

010.39790.3581

3z =

0.1153k = -0.0012-0.0016

1z 2z 3z 4z1 2

1 1 0x x t

0 5422

2

0.6871z =

3

0.5266z

4

0.3823z =-0.0012

-0.0009

w = 1 0 10 1 1

1 0.5422z =

1x 2x10 0 0

[ ]0 0.3378 0.2771 0.2859 0.3329v = 0.197 0.3191 0.1448 0.33940.3099 0.1904 0.0347 0.4861

v =


y Output01 0.1401w = 0.49190.2913

w

=

1 0.1462yin = 1 0.4635y =01 0.0023w =

1y Layer1

010.39790.3581

3z =

0.1153k = -0.0012-0.0016

1z 2z 3z 4z1 2

1 1 0x x t

0 5422

2

0.6871z =

3

0.5266z

4

0.3823z =-0.0012

-0.0009

w = 1 0 10 1 1

1 0.5422z =

1x 2x10 0 0

[ ]0 0.3378 0.2771 0.2859 0.3329v = 0.197 0.3191 0.1448 0.33940.3099 0.1904 0.0347 0.4861

v =


y Output01 0.1401w = 0.49190.2913

w

=

1 0.1462yin = 1 0.4635y =01 0.0023w =

1y Layer1

010.39790.3581

3z =

0.1153k = -0.0012-0.0016

1z 2z 3z 4z1 2

1 1 0x x t

0 5422

2

0.6871z =

3

0.5266z

4

0.3823z =-0.0012

-0.0009

w = 1 0 10 1 1

1 0.5422z =

-0.0567 1x 2x1

0 0 00.05670.03360.0459

in =

[ ]0 0.3378 0.2771 0.2859 0.3329v = 0.197 0.3191 0.1448 0.33940.3099 0.1904 0.0347 0.4861

v =

-0.0413


y Output01 0.1401w = 0.49190.2913

w

=

1 0.1462yin = 1 0.4635y =01 0.0023w =

1y Layer1

010.39790.3581

3z =

0.1153k = -0.0012-0.0016

1z 2z 3z 4z1 2

1 1 0x x t

0 5422

2

0.6871z =

3

0.5266z

4

0.3823z =-0.0012

-0.0009

w = 1 0 10 1 1

1 0.5422z =

-0.0567 -0.0141 1x 2x1

0 0 00.05670.03360.0459

in =

0.00720.01140 0097

j =

[ ]0 0.3378 0.2771 0.2859 0.3329v = 0.197 0.3191 0.1448 0.33940.3099 0.1904 0.0347 0.4861

v =

-0.0413 -0.0097


y Output01 0.1401w = 0.49190.2913

w

=

1 0.1462yin = 1 0.4635y =01 0.0023w =

1y Layer1

010.39790.3581

1 21 1 0x x t

3z =0.1153k = -0.0012

-0.0016

1z 2z 3z 4z

1 1 01 0 10 1 10 5422

2

0.6871z =

3

0.5266z

4

0.3823z =-0.0012

-0.0009

w = 0 0 0

1 0.5422z =-0.05670.0336

i

1x 2x1 0.197 0.3191 0.1448 0.33940.3099 0.1904 0.0347 0.4861v = 0.0459-0.0413

in =

[ ]0 0.3378 0.2771 0.2859 0.3329v = -0.2815 0.1444 0.2287 -0.1950-0.2815 0.1444 0.2287 -0.1950

v = * 1.0e-003


y Output01 0.1401w = 0.49190.2913

w

=

1 0.1462yin = 1 0.4635y =01 0.0023w =

1y Layer1

010.39790.3581

1 21 1 0x x t

3z =0.1153k = -0.0012

-0.0016

1z 2z 3z 4z

1 1 01 0 10 1 10 5422

2

0.6871z =

3

0.5266z

4

0.3823z =-0.0012

-0.0009

w = 0 0 0

1 0.5422z =-0.05670.0336

i

1x 2x1 0.197 0.3191 0.1448 0.33940.3099 0.1904 0.0347 0.4861v = 0.0459-0.0413

in = [ ]0 3378 0 2771 0 2859 0 3329[ ]0 -0.2815 0.1444 0.2287 -0.1950v =

-0.2815 0.1444 0.2287 -0.1950-0.2815 0.1444 0.2287 -0.1950

v = [ ]0 0.3378 0.2771 0.2859 0.3329v =

3*10 3*10


y Output01 0.1424w = 0.4907-0.2929

w

=

1 0.1462yin = 1 0.4635y =01 0.0023w =

1y Layer1

01-0.39910.3572 1 2x x t3z =

0.1153k = -0.0012-0.0016

1z 2z 3z 4z1 2

1 1 01 0 1

x x t

0 5422

2

0.6871z =

3

0.5266z

4

0.3823z =-0.0012

-0.0009

w = 0 1 10 0 0

1 0.5422z =-0.05670.0336

i

1x 2x1[ ]-0 2815 0 1444 0 2287 -0 1950v =

0.0459-0.0413

in = -0.2815 0.1444 0.2287 -0.1950-0.2815 0.1444 0.2287 -0.1950

v = * 1.0e-003 [ ]0 0.2815 0.1444 0.2287 0.1950v =

0.1967 0.3192 -0.1446 0.33920.3096 0.1905 -0.0345 -0.4863

v = [ ]0 -0.3381 0.2772 0.2861 -0.3331v =


y Output01 0.1393w = 0.4923-0.2908

w

= 1y Layer1

01-0.39750.3584 1 2x x t

1z 2z 3z 4z1 2

1 1 01 0 1

x x t

0 1 10 0 0

1x 2x10.1970 0.3191 -0.1448 0.33940.3100 0.1904 -0.0347 -0.4861

v = [ ]0 -0.3377 0.2770 0.2858 -0.3328v =

Application

Artificial Neural Networks Artificial Neural Networks 11 & 1211 & 12 11 & 1211 & 12


B kB k P i (BP)P i (BP)BackBack--Propagation (BP)Propagation (BP)

y Aims at balancing memorization and generalizationy Stage 1: Feedforward I/p training patterny Stage 2: calculation and backpropagation

of associated errorof associated errory Stage 3: Adjustments of the weights

ArchitectureArchitecture

HiddenLayer

NomenclatureNomenclatureNomenclatureNomenclatureInput Training Vector

( )x

x x x x= 1

1

( ,......, ,...... )Output Target Vector

( ,......, ,...... )

i n

k m

x x x x

tt t t t=

,Portion of error correction weight adjustment for ,

errror at to be propagated backj k

kk

wY

Portion of er

j ,ror correction weight adjustment for v ,

errror at Z to be propagated backi j

j

Learning RateInput Unit iX i

NomenclatureNomenclatureNomenclatureNomenclature

Activation Function Activation Function

y Characteristics: ContinuousDifferentiable Differentiable

Monotonically non-decreasing Easily differentiabley

Binary Sigmoid FunctionBinary Sigmoid Functionrange (0,1)range (0,1)g ( , )g ( , )

Bipolar Sigmoid FunctionBipolar Sigmoid FunctionBipolar Sigmoid FunctionBipolar Sigmoid Functionrange (range (--1,1)1,1)

Algorithm: TrainingAlgorithm: TrainingAlgorithm: TrainingAlgorithm: Training

AlgorithmAlgorithmAlgorithmAlgorithm

ApplicationApplicationApplicationApplication

y X-or Problem (linearly not separable) o ob ( a y o s pa ab )using 2-4-1 backprop Nety Initial Weights to

I iti l i ht thidden layer Initial weights to o/p layer

ExampleExample

SolutionSolutionSolutionSolution

1y

1z 2z 3z 4z

1x 2x


y Output1y Layer


1x 2x InputLayer


y Output1y Layer


13v 14vy

11v12v 13

v

1x 2x InputLayer


y Output1y Layer


13v 14v

21v 22vy11v

12v 13v21 22v

23v 24v

1x 2x InputLayer




13v 14v

21v 22vy11v

12v 13v21 22v

23v 24v

1x 2x InputLayer



101w


13v 14v

21v 22vy11v

12v 13v21 22v

23v 24v01v

02v03v

1x 2x InputLayer1

03v

04v


hidden layer Initial weights to o/p layero/p layer

ExampleExample

Solution Solution Training: step1Training: step1Solution Solution Training: step1Training: step1

y Output01 0.1401w = 0.49190.2913

w


1

010.39790.3581

1z 2z 3z 4z13v 14

v21v 22v

11v12v 13

v21 22v23v 24v

01v02v

03v

1x 2x103v

04v