artificial intelligence ml u4
TRANSCRIPT
-
8/12/2019 artificial intelligence ML u4
1/100
Machine Learning
-
8/12/2019 artificial intelligence ML u4
2/100
Introduction
What is learning ? Learning is any process by which a system
improves performance from experience.
!erbert "imon
Learning is constructing or modifyingrepresentations of what is being experienced.
#ys$ard Michals%i
-
8/12/2019 artificial intelligence ML u4
3/100
&
Why learn?
'uild software agents that can adapt to their users or to othersoftware agents or to changing environments
(ersonali$ed news or mail filter
(ersonali$ed tutoring
Mars robot
)iscover new thingsor structure that were previously un%nown to
humans
*xamples+ data mining, scientific discovery
ML as a subfield of -I is concerned with design and development
of algorithms and techniues that allow computer to learn.
"imulation of Intelligence reuires features such as %nowledge
acuisition, inference, updating or refinement of %nowledge base
etc. /hus we can sum up by saying that learning is an important
aspect of intelligence.
-
8/12/2019 artificial intelligence ML u4
4/100
/ypes of Learning Methodologies
Inductive learning+
#euired rules and patterns are extracted from
massive data sets.
)eductive learning+ )educing new %nowledge from already existing
%nowledge.
-
8/12/2019 artificial intelligence ML u4
5/100
0
-pplications
-ssign ob1ect2event to one of a given finiteset of categories. Medical diagnosis
3redit card applications or transactions 4raud detection in e5commerce
"pam filtering in email
#ecommended boo%s, movies, music
4inancial investments
6ame playing
!andwritten letters
-
8/12/2019 artificial intelligence ML u4
6/100
Machine5Learning "ystems
3omponents of a Learning "ystem
1. Learning component+ /o ma%e changes or
improvements to the system depending on
its performance.
2. Performance element+ It performs the tas%
ofchoosing the actions that need to be ta%en.
&. Critic+ /he 1ob of the critic is to inform the
learning component regarding itserformance
-
8/12/2019 artificial intelligence ML u4
7/100
7. Problem generator+ It suggests
problems or
actions that would lead to generation of
new
examples or experiences.
0. Sensors and effectors+ 'oth thesecomponents are external to the system.
-
8/12/2019 artificial intelligence ML u4
8/100
8
- general model of learning
agents
-
8/12/2019 artificial intelligence ML u4
9/100
9
Ma1or paradigms of machine
learning Rote learning Learning by memori$ation.
eg5 3aching. "teps+ :rgani$ation, 6enerali$ation, "tability of ;nowledge.
Learning by taking advice /a%ing high level and abstract advice and thenconverting it into rules. eg. *xpert "ystems "teps+ #euest, Interpret, :perationali$e, Integrate, *valuation.
Learning by Parameter Adjustment "teps+
Initially start with some estimate of the correct weight settings. /hen modify the weight in the program on the basis of accumulated experiences. Increase or decrease the weights of features that appear to be good or bad
predictors respectively.
Learning by acro!"perators "imilar to rote learning, instead we avoidexpensive re5computation by using macro5operators that are learnt forsubseuent use.
Learning by Analogy )etermine correspondence between two differentrepresentations . Identified as 3-"* '-"*) #*":
-
8/12/2019 artificial intelligence ML u4
10/100
"upervised and unsupervised
Learning Supervised learning =se specific examples
to reach general conclusions or extract generalrules
3lassification >3oncept learning
#egression
#nsupervised learning $Clustering%=nsupervised identification of natural groups indata
-
8/12/2019 artificial intelligence ML u4
11/100
1) Neural Network Based Learning
It is a system loosely modeled based on the
human brain.
The basic computational element (model
neuron) is often called a node or unit. It
receives input from some other units, or
perhaps from an external source. Each input
has an associated weight w, which can bemodified by the learning methods.
-
8/12/2019 artificial intelligence ML u4
12/100
@A
2% Supervised concept
learning 6iven a training set of positive and
negative examples of a concept
3onstruct a description that will accurately
classify whether future examples are
positive or negative
/hat is, learn some good estimate of
function f given a training set B>x@, y@, >xA,
yA, ..., >xn, ynC where each yiis either D
>positive or 5 >negative, or a probability
distribution over D25
-
8/12/2019 artificial intelligence ML u4
13/100
&% Probability appro'imating
Correct Learning
In the (-3 model, we specify two small
parameters, E and F, and reuire that with
probability at least >@ F a system learns
a concept with error at most E.
-
8/12/2019 artificial intelligence ML u4
14/100
7 #einforcement Learning
)ecision ma%ing >robot, chess machine
'asic ;inds+ =tility 4unction
-ction Galue 4unction
-
8/12/2019 artificial intelligence ML u4
15/100
@0
/he inductive learning
problem *xtrapolate from a given set of
examples to ma%e accurate predictionsabout future examples
"upervised versus unsupervisedlearning Learn an un%nown function f>H J, where
H is an input example and J is the desiredoutput.
Supervised learningimplies we are givena training setof >H, J pairs by a
teacher.
-
8/12/2019 artificial intelligence ML u4
16/100
@K
/he inductive learning
problem *xtrapolate from a given set of examples to ma%e accurate
predictions about future examples
"upervised versus unsupervised learning
Learn an un%nown function f>H J, where H is an inputexample and J is the desired output.
Supervised learningimplies we are given a training setof>H, J pairs by a teacher
#nsupervised learningmeans we are only given the Hsand some >ultimate feedbac% function on our performance.
3oncept learning or classification
6iven a set of examples of some concept2class2category,
determine if a given example is an instance of the concept ornot
If it is an instance, we call it a positive example
If it is not, it is called a negative example
:r we can ma%e a probabilistic prediction >e.g., using a
'ayes net
-
8/12/2019 artificial intelligence ML u4
17/100
@
Inductive learning framewor%
#aw input data from sensors are typically preprocessedto obtain a feature vector, H, that adeuately describesall of the relevant features for classifying examples
*ach x is a list of >attribute, value pairs. 4or example,H (erson+"ue, *ye3olor+'rown, -ge+Joung,
"ex+4emaleN
/he number of attributes is fixed
*ach attribute has a fixed, finite number of possiblevalues >or could be continuous
*ach example can be interpreted as a point in an n5dimensional feature space, where n is the number ofattributes.
-
8/12/2019 artificial intelligence ML u4
18/100
@8
Learning decision trees6oal+ 'uild a decision treeto classifyexamples as positive or negative
instances of a concept using supervised
learning from a training set
- decision treeis a tree where
each non5leaf node has associatedwith it an attribute >feature
each leaf node has associated with it a
classification >D or 5
each arc has associated with it one of
the possible values of the attribute atthe node from which the arc is directed
6enerali$ation+ allow for OA classes
e.g., Bsell, hold, buyC
Color
S(apeSi)e *
*! Si)e
*!
*
big
big small
small
roundsuare
redgreen blue
-
8/12/2019 artificial intelligence ML u4
19/100
@9
)ecision tree5induced partition
example
Color
S(apeSi)e *
*! Si)e
*!
*
big
big small
small
roundsuare
redgreen brown
I
-
8/12/2019 artificial intelligence ML u4
20/100
AP
Inductive learning and bias
"uppose that we want to learn a function f>x y and we are
given some sample >x,y pairs, as in figure >a
/here are several hypotheses we could ma%e about this
function, e.g.+ >b, >c and >d
- preference for one over the others reveals the biasof ourlearning techniue, e.g.+
prefer piece5wise functions
prefer a smooth function
prefer a simple function and treat outliers as noise
-
8/12/2019 artificial intelligence ML u4
21/100
A@
3hoosing the best attribute
/he %ey problem is choosing which attributeto split a given set of examples
"ome possibilities are+ Random+"elect any attribute at random
Least!,alues+3hoose the attribute with thesmallest number of possible values
ost!,alues+3hoose the attribute with thelargest number of possible values
a'!-ain+3hoose the attribute that has thelargest expected information gaini.e., theattribute that will result in the smallest expectedsi$e of the subtrees rooted at its children
/he I)& algorithm uses the Max56ain method
of selecting the best attribute
-
8/12/2019 artificial intelligence ML u4
22/100
)eductive learning
Wor%ing on already existing facts and
%nowledge and simply deducing new
%nowledge from the existing one.
If - >assertion then '>conclusion.
1%Probability based learning $ayesian
Learning%
2%Adaptive dynamic learning
-
8/12/2019 artificial intelligence ML u4
23/100
3lustering -lgorithms
*xclusive 3lustering
;5means
:verlapping 3lustering
4u$$y 35means >43M
!ierarchical 3lustering
-
8/12/2019 artificial intelligence ML u4
24/100
"upport Gector Machines
/he classifier is a separating hyperplane.
Most important training points are support vectorsQ they define
the hyperplane.
Ruadratic optimi$ation algorithms can identify which training points
'iare support vectors with non5$ero Lagrangian multipliers i.
'oth in the dual formulation of the problem and in the solution
training points appear only inside inner products+
4ind 1Nsuch that
/>0Si 5 SSijyiyj'i3'jis maximi$ed
and
>@ Siyi P
>A P 4i4 Cfor all i
f>' Siyi'i3' * b
-
8/12/2019 artificial intelligence ML u4
25/100
Linear 3lassifiers
Copyright 2001, 2003,Andrew W. Moore
fx
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x- b)
How woud you
!"ssi#y this d"t"$
-
8/12/2019 artificial intelligence ML u4
26/100
Linear 3lassifiers
Copyright 2001, 2003,Andrew W. Moore
fx
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x- b)
How woud you
!"ssi#y this d"t"$
-
8/12/2019 artificial intelligence ML u4
27/100
Linear 3lassifiers
Copyright 2001, 2003,Andrew W. Moore
fx
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x- b)
How woud you
!"ssi#y this d"t"$
-
8/12/2019 artificial intelligence ML u4
28/100
Linear 3lassifiers
Copyright 2001, 2003,Andrew W. Moore
fx
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x- b)
How woud you
!"ssi#y this d"t"$
-
8/12/2019 artificial intelligence ML u4
29/100
Linear 3lassifiers
Copyright 2001, 2003,Andrew W. Moore
fx
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x- b)
Any o# these
woud %e #ine..
..%ut whi!h is
%est$
-
8/12/2019 artificial intelligence ML u4
30/100
3lassifier Margin
Copyright 2001, 2003,Andrew W. Moore
fx
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x- b)
&e#ine the '"rgin
o# " ine"r
!"ssi#ier "s thewidth th"t the
%ound"ry !oud %e
in!re"sed %y
%e#ore hitting "d"t"point.
-
8/12/2019 artificial intelligence ML u4
31/100
Maximum Margin
Copyright 2001, 2003,Andrew W. Moore
fx
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x- b)
(he '")i'u'
'"rgin ine"r
!"ssi#ieris theine"r !"ssi#ier
with the, u',
'")i'u' '"rgin.
(his is thesi'pest *ind o#
M C"ed "n
M/
Linear "GM
-
8/12/2019 artificial intelligence ML u4
32/100
Maximum Margin
Copyright 2001, 2003,Andrew W. Moore
fx
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x- b)
(he '")i'u'
'"rgin ine"r
!"ssi#ieris theine"r !"ssi#ier
with the, u',
'")i'u' '"rgin.
(his is thesi'pest *ind o#
M C"ed "n
M/
upport e!tors
"re those
d"t"points th"t
the '"rginpushes up
"g"inst
Linear "GM
-
8/12/2019 artificial intelligence ML u4
33/100
*stimate the Margin
What is the distance expression for a point '
to a line5'Db P?
Copyright 2001, 2003,Andrew W. Moore
denotes +1
denotes -1x wx +% 0
2 2
2
( )d
ii
b bd
w=
+ += =
x w x w
x
w
-
8/12/2019 artificial intelligence ML u4
34/100
*stimate the Margin
What is the expression for margin?
Copyright 2001, 2003,Andrew W. Moore
denotes +1
denotes -1 wx +% 0
2
margin min ( ) mindD D
ii
bd
w
=
+ =
x xx w
x
Margin
-
8/12/2019 artificial intelligence ML u4
35/100
Maximi$e Margin
Copyright 2001, 2003,Andrew W. Moore
denotes +1
denotes -1 wx +% 0
,
,
2,
argmax margin( , , )
! argmax min ( )
argmax min
i
i
b
iDb
i
dDbii
b D
d
b
w
=
+ =
w
xw
xw
w
x
x w
Margin
-
8/12/2019 artificial intelligence ML u4
36/100
Maximi$e Margin
Copyright 2001, 2003,Andrew W. Moore
denotes +1
denotes -1 wx +% 0
( )
2,
argmax min
sub"ect to # $
i
i
dDbi
i
i i i
b
w
D y b
=
+
+ >
xw
x w
x x w
Margin
-
8/12/2019 artificial intelligence ML u4
37/100
Maximi$e Margin
"trategy+
Copyright 2001, 2003,Andrew W. Moore
denotes +1
denotes -1
wx +% 0
( )
2,
argmax min
sub"ect to # $
i
i
dDb
ii
i i i
b
wD y b
=
+
+ xw
x w
x x w
Margin
# i iD b + x x w( )
2
,
argmin
sub"ect to #
d
iib
i i i
w
D y b
=
+ w
x x w
-
8/12/2019 artificial intelligence ML u4
38/100
Maximum Margin Linear
3lassifier
!ow to solve such a convex optimi$ation
problem ?Copyright 2001, 2003,
Andrew W. Moore
( )
( )
( )
% % 2
,
2 2
& , '! argmin
sub"ect to
....
d
kkw b
N N
w b w
y w x b
y w x b
y w x b
=
+
+
+
r
r
r r
r r
r r
-
8/12/2019 artificial intelligence ML u4
39/100
Lagrange Multiplier Method
/he new ob1ective function is called the
Lagrangian for the optimi$ation problem+
Lp TUUWUUV5 Xi >yi>w.xiD b @5555>@
Xi555 Lagrange multiplier
(artially )ifferentiating Lp w.r.t YwZ and YbZ weget5
555>A
'ecause the La ran e multi liers areCopyright 2001, 2003,
Andrew W. Moore
w =iyixi and iyi= 0
-
8/12/2019 artificial intelligence ML u4
40/100
It can be handled only when
[i\ P,
[i yi>w.xi D b @N P
/hese are %nown as the ;arush5;uhn5
/uc%er >;;/ conditions.
4rom the above euation YbZ can be
calculated.
"ubtituting the values from en. >A in en.
>@, we get5
Copyright 2001, 2003,
Andrew W. Moore
Linear "GM+
-
8/12/2019 artificial intelligence ML u4
41/100
Linear "GM+
-
8/12/2019 artificial intelligence ML u4
42/100
"upport Gector Machine >"GM for
-
8/12/2019 artificial intelligence ML u4
43/100
-
8/12/2019 artificial intelligence ML u4
44/100
"upport Gector Machine for
-
8/12/2019 artificial intelligence ML u4
45/100
"GM ;ernel 4unctions
K(a,b)=(a. b1)!is an example of an
"GM ;ernel 4unction
'eyond polynomials there are other very
high dimensional basis functions that can
be made practical by finding the right
;ernel 4unction
#adial5'asis5style ;ernel 4unction+
-
8/12/2019 artificial intelligence ML u4
46/100
;ernel /ric%s
#eplacing dot product with a %ernel
function
a >b
3ould ;>a,b >a5b&be a %ernel function ?
3ould ;>a,b >a5b7 >aDbAbe a %ernel
function?
Copyright 2001, 2003,Andrew W. Moore
-
8/12/2019 artificial intelligence ML u4
47/100
-
8/12/2019 artificial intelligence ML u4
48/100
-
8/12/2019 artificial intelligence ML u4
49/100
3ontd..
-n -rtificial
-
8/12/2019 artificial intelligence ML u4
50/100
-
8/12/2019 artificial intelligence ML u4
51/100
/he
-
8/12/2019 artificial intelligence ML u4
52/100
'ias of a
-
8/12/2019 artificial intelligence ML u4
53/100
-
8/12/2019 artificial intelligence ML u4
54/100
c
b
a
+tep unction
-
8/12/2019 artificial intelligence ML u4
55/100
c d
b
a
1amp unction
-
8/12/2019 artificial intelligence ML u4
56/100
+igmoid function
The Gaussian function is the probability function of the
http://en.wikipedia.org/wiki/Image:Logistic-curve.png -
8/12/2019 artificial intelligence ML u4
57/100
The Gaussian function is the probability function of the
normal distribution. +ometimes also called the fre*uency
curve.
-
8/12/2019 artificial intelligence ML u4
58/100
-rtificial
-
8/12/2019 artificial intelligence ML u4
59/100
-
8/12/2019 artificial intelligence ML u4
60/100
Perceptron+ 7euron odel
-
8/12/2019 artificial intelligence ML u4
61/100
Perceptron+ 7euron odel>"pecial form of single layer feed forward
/he perceptron was first proposed by #osenblatt >@908 is asimple neuron that is used to classify its input into one of twocategories.
- perceptron uses a step functionthat returns D@ ifweighted sum of its input P and 5@ otherwise
x1
x2
xn
w2
w1
wn
b (bias)
v y (v)
-
8/12/2019 artificial intelligence ML u4
62/100
-
8/12/2019 artificial intelligence ML u4
63/100
Learning (rocess for (erceptron
Initially assign random weights to inputs between 5P.0and DP.0
/raining data is presented to perceptron and its output isobserved.
If output is incorrect, the weights are ad1ustedaccordingly using following formula.wi wi D >a^ xi ^e, where YeZ is error produced
and YaZ >5@
-
8/12/2019 artificial intelligence ML u4
64/100
*xample+ (erceptron to learn :#
function
Initially consider w@ 5P.A and wA P.7 /raining data say, x@ P and xA P, output is P. 3ompute y "tep>w@^x@ D wA^xA P. :utput is correct
so weights are not changed. 4or training data x@P and xA @, output is @
3ompute y "tep>w@^x@ D wA^xA P.7 @. :utput iscorrect so weights are not changed.
-
8/12/2019 artificial intelligence ML u4
65/100
(erceptron+ Limitations
/he perceptron can only model linearly separablefunctions, those functions which can be drawn in A5dim graph and
single straight line separates values in two part.
'oolean functions given below are linearly
separable+ -
-
8/12/2019 artificial intelligence ML u4
66/100
H:#
-
8/12/2019 artificial intelligence ML u4
67/100
These two classes (true and false) cannot be separated using a
line. 9ence 6/1 is non linearly separable.
6 62 6 6/1 62
$ $ $
$
$
$
6
true false
false true$ 62
-
8/12/2019 artificial intelligence ML u4
68/100
Multi layer feed5forward 44
-
8/12/2019 artificial intelligence ML u4
69/100
44@,5@ and >5@,@.
/he output node is used to combine the outputs of the two hidden
nodes.
Input nodes 9idden layer /utput layer /utput
H 7$.4X
7 Y
7 92X2
-
8/12/2019 artificial intelligence ML u4
70/100
-
8/12/2019 artificial intelligence ML u4
71/100
44
-
8/12/2019 artificial intelligence ML u4
72/100
/raining -lgorithm+
'ac%propagation
/he 'ac%propagation algorithm learns in the same wayas single perceptron.
It searches for weight values that minimi$e the totalerror of the networ% over the set of training examples>training set.
'ac%propagation consists of the repeated application ofthe following two passes+ ;or5ard pass+ In this step, the networ% is activated on one
example and the error of each neuron at the output layer iscomputed.
ack5ard pass+ In this step the networ% error is used forupdating the weights. /he error is propagated bac%wards fromthe output layer through the networ% layer by layer. /his isdone by recursively computing the local gradient of eachneuron.
< i
-
8/12/2019 artificial intelligence ML u4
73/100
ac
-
8/12/2019 artificial intelligence ML u4
74/100
3ontd..
3onsider a networ% of three layers. Let us use i to represent nodes in input layer, 1 to
represent nodes in hidden layer and % represent nodesin output layer.
wi1 refers to weight of connection between a node ininput layer and node in hidden layer.
/he following euation is used to derive the outputvalue J1 of node 1
where, H1 xi . wi15 1 , @ i nQ n is the number of inputs to
node 1, and 1is threshold for node 1
jXe+=
="
-
8/12/2019 artificial intelligence ML u4
75/100
-
8/12/2019 artificial intelligence ML u4
76/100
Weight =pdate #ule
/he 'ac%prop weight update rule is based on thegradient descent method+
It ta%es a step in the direction yielding the maximum
decrease of the networ% error *.
/his direction is the opposite of the gradient of *. Iteration of the 'ac%prop algorithm is usually
terminated when the sum of suares of errors of the
output values for all training data in an epoch is less
than some threshold such as P.P@
ijijij www +=i"
i"w
:w
= #
"t i it i
-
8/12/2019 artificial intelligence ML u4
77/100
"topping criterions
/otal mean suared error change+ 'ac%5prop is considered to have converged when the
absolute rate of change in the average suared error per
epoch is sufficiently small >in the range P.@, P.P@N.
6enerali$ation based criterion+ -fter each epoch, the
-
8/12/2019 artificial intelligence ML u4
78/100
-
8/12/2019 artificial intelligence ML u4
79/100
#adial 'asis 4unction #'4 ifits output depends on the distance of the input from agiven stored vector. /he #'4 neural networ% has an input layer, a hidden layer and
an output layer.
In such #'4 networ%s, the hidden layer uses neurons with#'4s as activation functions. /he outputs of all these hidden neurons are combined linearly
at the output node.
/hese networ%s have a wide variety of applicationssuch as function approximation, time series prediction, control and regression, pattern classification tas%s for performing complex >non5linear.
#'4 -rchitecture
-
8/12/2019 artificial intelligence ML u4
80/100
#'4 -rchitecture
"ne (idden layer 5it( R; activation functions
"utput layer 5it( linear activation function.
x!
x"
x1
y
w"1
w1
m
... m
@@)(@@...@@)(@@ mmm txwtxwy ++=
txxxtx m centerfrom),...,(ofdistance@@@@ =
3ont
-
8/12/2019 artificial intelligence ML u4
81/100
3ont...
!ere we reuire weights, ifrom the hidden layer to theoutput layer only.
/he weights ican be determined with the help of anyof the standard iterative methods described earlier forneural networ%s.
!owever, since the approximating function given belowis linear w. r. t. i, it can be directly calculated using thematrix methods of linear least suares without having toexplicitly determine iiteratively.
It should be noted that the approximate function f(*) isdifferentiable with respect to i.
)()(= ==
N
iiii tXwXfY
Aomparison
-
8/12/2019 artificial intelligence ML u4
82/100
RBF NN FF NN
Non-linear layered feed-forwardnetwor*s.
Non-linear layered feed-forwardnetwor*s
Hidden "yer o# 456 is non-linear,the output "yer o# 456 is linear.
Hidden "nd output "yers o#6677 "re usu"y non-linear.
8ne singlehidden "yer M"y h"e morehidden "yers.
7euron 'ode o# the hidden neuronsis different#ro' the one o# theoutput nodes.
Hidden "nd output neuronssh"re " common neuron model.
A!ti"tion #un!tion o# e"!h hiddenneuron in " 456 77 !o'putes theEuclidean distance %etween inpute!tor "nd the !enter o# th"t unit.
A!ti"tion #un!tion o# e"!hhidden neuron in " 6677!o'putes the inner product o#input e!tor "nd the syn"pti!weight e!tor o# th"t neuron
Aomparison
;; BE+IC; I++DE+
-
8/12/2019 artificial intelligence ML u4
83/100
)ata representation
-
8/12/2019 artificial intelligence ML u4
84/100
)ata representation depends on the problem. In general -
-
8/12/2019 artificial intelligence ML u4
85/100
/he number of layers and neurons depend on thespecific tas%.
In practice this issue is solved by trial and error.
/wo types of adaptive algorithms can be used+
start from a large networ% and successively remove some
neurons and lin%s until networ% performance degrades.
begin with a small networ% and introduce new neurons until
performance is satisfactory.
;etwor< Topology
-
8/12/2019 artificial intelligence ML u4
86/100
Initiali$ation of weights
-
8/12/2019 artificial intelligence ML u4
87/100
Initiali$ation of weights
In general, initial weights are randomly chosen, withtypical values between [email protected] and @.P or 5P.0 and P.0.
If some inputs are much larger than others, random
initiali$ation may bias the networ% to give much more
importance to larger inputs. In such a case, weights can be initiali$ed as follows+
=
=Ni
N,...,
@x@
2
i"i
wor weights from the input to the first layer
or weights from the first to the second layer=
=Ni
Ni,...,)xw(
2
"
-
8/12/2019 artificial intelligence ML u4
88/100
/he right value of depends on the
application.
Galues between P.@ and P.9 have beenused in many applications.
:ther heuristics is that adapt during the
training as described in previous slides.
Ahoice of learning rate
/ i i
-
8/12/2019 artificial intelligence ML u4
89/100
/raining
#ule of thumb+ the number of training examples should be at least five to
ten times the number of weights of the networ%.
:ther rule+
@@! number of weights
a!expected accuracy on test seta):(
@E@; >
# t < t %
-
8/12/2019 artificial intelligence ML u4
90/100
#ecurrent
-
8/12/2019 artificial intelligence ML u4
91/100
#ecurrent
-
8/12/2019 artificial intelligence ML u4
92/100
!opfield
-
8/12/2019 artificial intelligence ML u4
93/100
!opfield
-
8/12/2019 artificial intelligence ML u4
94/100
-ctivation -lgorithm
-ctive unit represented by @ and inactive by P.
+epeat 3hoose any unit randomly. /he chosen unit may be
active or inactive.
4or the chosen unit, compute the sum of the weightson the connection to the active neighbours only, if any. If sum O P >threshold is assumed to be P, then the
chosen unit becomes active, otherwise it becomesinactive.
If chosen unit has no active neighbours then ignore it,and status remains same.
ntilthe networ% reaches to a stable state
-
8/12/2019 artificial intelligence ML u4
95/100
Aurrent +tate +elected Dnit fromcurrent state
Aorresponding ;ew +tate
:2
:2
:2
:2
:2
:2
+um ! 7 2 ! F $Gactivated
:2
:2
:2
9ere, the sum of weights of
active neighbours of aselected unit is calculated. :2
:2
:2
:2
:2
:2
+um ! 72 H $G deactivated
-
8/12/2019 artificial intelligence ML u4
96/100
2
72
)
6 ! I$ J
2
72
)
6 !I $J
2
72
)
6 !I$ $ $J
+table ;etwor
-
8/12/2019 artificial intelligence ML u4
97/100
*xample
-
8/12/2019 artificial intelligence ML u4
98/100
*xample
Let us now consider a !opfield networ% with four unitsand three training input vectors that are to be learned by
the networ%.
3onsider three input examples, namely, H@, HA, and H&
defined as follows+
7
X! 7 X2! X!
7 7
7
E ! 6. (6)T3 62. (62)
T3 6. (6)
T7 .I
X! I 7 7 J : 2
-
8/12/2019 artificial intelligence ML u4
99/100
7 7 $ $ $ $ 7 7 &! 7 7 . 7 $ $ $ ! 7 $ 7
7 7 $ $ $ 7 $ 7
7 7 $ $ $ 7 7 $
: :
: 5
X! I: :J
: 2
: :
: 5
+table positions of the networwhich is athamming distance @.
4inally, with the obtained weights and
stable states >H@ and H&, we canstabili$e any new >partial pattern to one
of those