ch. 2: linear discriminants stephen marsland, machine learning: an algorithmic perspective. crc 2009...
TRANSCRIPT
![Page 1: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/1.jpg)
Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An
Algorithmic Perspective. CRC 2009based on slides from Stephen Marsland,from Romain Thibaux (regression slides),
and Moshe Sipper
Longin Jan LateckiTemple University
![Page 2: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/2.jpg)
161.326 Stephen Marsland
McCulloch and Pitts Neurons
x2
xm
w1
w2
wm
oh
x1
Greatly simplified biological neuronsSum the inputs
If total is less than some threshold, neuron firesOtherwise does not
![Page 3: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/3.jpg)
161.326 Stephen Marsland
McCulloch and Pitts Neurons
for some threshold
The weight wj can be positive or negative Inhibitory or exitatory
Use only a linear sum of inputs Use a simple output instead of a pulse (spike train)
![Page 4: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/4.jpg)
161.326 Stephen Marsland
Neural Networks
Can put lots of McCulloch & Pitts neurons together
Connect them up in any way we likeIn fact, assemblies of the neurons are
capable of universal computationCan perform any computation that a normal
computer canJust have to solve for all the weights wij
![Page 5: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/5.jpg)
161.326 Stephen Marsland
Training Neurons
Adapting the weights is learningHow does the network know it is right?How do we adapt the weights to make the
network right more often?Training set with target outputsLearning rule
![Page 6: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/6.jpg)
2.2 The Perceptron
Definition from Wikipedia:The perceptron is a binary classifier which maps its input x (a real-valued vector) to an output value f(x) (a single binary value) across the matrix:
else
bwxxf
0
01{)(
In order to not explicitly write b, we extend the input vector x by one more dimension that is always set to -1, e.g., x=(-1,x_1, …, x_7) with x_0=-1, and extend the weight vector to w=(w_0,w_1, …, w_7). Then adjusting w_0 corresponds to adjusting b.
The perceptron is considered the simplest kind of feed-forward neural network.
![Page 7: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/7.jpg)
161.326 Stephen Marsland
Bias Replaces Threshold
Inputs Outputs
-1
![Page 8: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/8.jpg)
![Page 9: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/9.jpg)
161.326 Stephen Marsland
Perceptron Decision = Recall
Outputs are:
For example, y=(y_1, …, y_5)=(1, 0, 0, 1, 1) is a possible output.We may have a different function g in the place of sign, as in (2.4) in the book.
![Page 10: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/10.jpg)
161.326 Stephen Marsland
Perceptron Learning = Updating the Weights
We want to change the values of the weightsAim: minimise the error at the outputIf E = t-y, want E to be 0Use: Learning rate
Error
Input
![Page 11: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/11.jpg)
Example 1: The Logical OR-1X_1 X_2 t
0 0 0
0 1 1
1 0 1
1 1 1
Initial values: w_0(0)=-0.05, w_1(0) =-0.02, w_2(0)=0.02, and =0.25Take first row of our training table:y_1= sign( -0.05×-1 + -0.02×0 + 0.02×0 ) = 1
w_0(1) = -0.05 + 0.25×(0-1)×-1=0.2w_1(1) = -0.02 + 0.25×(0-1)×0=-0.02w_2(1) = 0.02 + 0.25×(0-1)×0=0.02
We continue with the new weights and the second row, and so onWe make several passes over the training data.
W_0W_1
W_2
![Page 12: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/12.jpg)
161.326 Stephen Marsland
Decision boundary for OR perceptron
![Page 13: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/13.jpg)
Perceptron Learning Applet
• http://lcn.epfl.ch/tutorial/english/perceptron/html/index.html
![Page 14: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/14.jpg)
161.326 Stephen Marsland
Example 2: Obstacle Avoidance with the Perceptron
LS RS
LM RM
w1 w2
w3
w4
= 0.3 = -0.01
LS RS
LM RM
![Page 15: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/15.jpg)
161.326 Stephen Marsland
Obstacle Avoidance with the Perceptron
LS RS LM RM
0 0 1 1
0 1 -1 1
1 0 1 -1
1 1 X X
![Page 16: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/16.jpg)
161.326 Stephen Marsland
Obstacle Avoidance with the Perceptron
LS RS
LM RM
w1 w2
w3
w4w1=0+0.3 * (1-1) * 0 = 0
![Page 17: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/17.jpg)
161.326 Stephen Marsland
Obstacle Avoidance with the Perceptron
LS RS
RM
w1 w2
w3
w4w2=0+0.3 * (1-1) * 0 = 0
And the same for w3, w4
LM
![Page 18: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/18.jpg)
161.326 Stephen Marsland
Obstacle Avoidance with the Perceptron
LS RS LM RM
0 0 1 1
0 1 -1 1
1 0 1 -1
1 1 X X
![Page 19: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/19.jpg)
161.326 Stephen Marsland
Example 1: Obstacle Avoidance with the Perceptron
LS RS
RM
w1 w2
w3
w4 w1=0+0.3 * (-1-1) * 0 = 0
LM
![Page 20: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/20.jpg)
161.326 Stephen Marsland
Obstacle Avoidance with the Perceptron
LS RS
RM
w1 w2
w3
w4 w1=0+0.3 * (-1-1) * 0 = 0w2=0+0.3 * ( 1-1) * 0 = 0
LM
![Page 21: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/21.jpg)
161.326 Stephen Marsland
Obstacle Avoidance with the Perceptron
LS RS
RM
w1 w2
w3
w4 w1=0+0.3 * (-1-1) * 0 = 0w2=0+0.3 * ( 1-1) * 0 = 0w3=0+0.3 * (-1-1) * 1 = -0.6
LM
![Page 22: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/22.jpg)
161.326 Stephen Marsland
Obstacle Avoidance with the Perceptron
LS RS
RM
w1 w2
w3
w4 w1=0+0.3 * (-1-1) * 0 = 0w2=0+0.3 * ( 1-1) * 0 = 0w3=0+0.3 * (-1-1) * 1 = -0.6w4=0+0.3 * ( 1-1) * 1 = 0
LM
![Page 23: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/23.jpg)
161.326 Stephen Marsland
Obstacle Avoidance with the Perceptron
LS RS LM RM
0 0 1 1
0 1 -1 1
1 0 1 -1
1 1 X X
![Page 24: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/24.jpg)
161.326 Stephen Marsland
Obstacle Avoidance with the Perceptron
LS RS
LM RM
w1 w2
w3
w4
w1=0+0.3 * ( 1-1) * 1 = 0w2=0+0.3 * (-1-1) * 1 = -0.6w3=-0.6+0.3 * ( 1-1) * 0 = -0.6w4=0+0.3 * (-1-1) * 0 = 0
![Page 25: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/25.jpg)
161.326 Stephen Marsland
Obstacle Avoidance with the PerceptronLS RS
LM RM
-0.01 -0.01
-0.60
-0.60
![Page 26: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/26.jpg)
161.326 Stephen Marsland
2.3 Linear Separability
Outputs are:
where
cos|||||||| xwxw
and is the angle between vectors x and w.
![Page 27: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/27.jpg)
161.326 Stephen Marsland
Geometry of linear Separability
w
The equation of a line isw_0 + w_1*x + w_2*y=0It also means that point (x,y) is on the line
This equation is equivalent to
wx = (w_0, w_1,w_2) (1,x,y) = 0
If wx > 0, then the angle between w and x is less than 90 degree, which means thatw and x lie on the same side of the line.
Each output node of perceptron tries to separate the training dataInto two classes (fire or no-fire) with a linear decision boundary, i.e., straight line in 2D, plane in 3D, and hyperplane in higher dim.
![Page 28: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/28.jpg)
161.326 Stephen Marsland
Linear SeparabilityThe Binary AND Function
![Page 29: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/29.jpg)
29
Gradient Descent Learning Rule
• Consider linear unit without threshold and continuous output o (not just –1,1)– y=w0 + w1 x1 + … + wn xn
• Train the wi’s such that they minimize the squared error
– E[w1,…,wn] = ½ dD (td-yd)2
where D is the set of training examples
![Page 30: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/30.jpg)
30
Supervised Learning• Training and test data sets• Training set; input & target
![Page 31: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/31.jpg)
31
Gradient Descent
D={<(1,1),1>,<(-1,-1),1>, <(1,-1),-1>,<(-1,1),-1>}
Gradient:E[w]=[E/w0,… E/wn]
(w1,w2)
(w1+w1,w2 +w2)w=- E[w]
wi=- E/wi
/wi 1/2d(td-yd)2
= d /wi 1/2(td-i wi xi)2
= d(td- yd)(-xi)
![Page 32: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/32.jpg)
161.326 Stephen Marsland
Gradient DescentError
wi=- E/wi
![Page 33: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/33.jpg)
33
Incremental Stochastic Gradient Descent
• Batch mode : gradient descent
w=w - ED[w] over the entire data D
ED[w]=1/2d(td-yd)2
• Incremental mode: gradient descent
w=w - Ed[w] over individual training examples d
Ed[w]=1/2 (td-yd)2
Incremental Gradient Descent can approximate Batch Gradient Descent arbitrarily closely if is small enough
![Page 34: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/34.jpg)
34
Gradient Descent Perceptron Learning
Gradient-Descent(training_examples, )Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is the vector of input values, and t is the target output value, is the learning rate (e.g. 0.1)
• Initialize each wi to some small random value
• Until the termination condition is met, Do– For each <(x1,…xn),t> in training_examples Do
• Input the instance (x1,…,xn) to the linear unit and compute the output y
• For each linear unit weight wi Do
– wi= (t-y) xi
– For each linear unit weight wi Do• wi=wi+wi
![Page 35: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/35.jpg)
161.326 Stephen Marsland
Linear Separability The Exclusive Or (XOR) function.A B Out
0 0 0
0 1 1
1 0 1
1 1 0
Limitations of the Perceptron
![Page 36: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/36.jpg)
161.326 Stephen Marsland
Limitations of the Perceptron
?W1 > 0W2 > 0W1 + W2 < 0
![Page 37: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/37.jpg)
161.326 Stephen Marsland
Limitations of the Perceptron?
In1
In2
In3
A B C Out
0 0 1 0
0 1 0 1
1 0 0 1
1 1 0 0
![Page 38: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/38.jpg)
2.4 Linear regression
010
2030
40
0
10
20
30
20
22
24
26
Tem
pera
ture
0 10 200
20
40
Given examples
Predict given a new point
![Page 39: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/39.jpg)
0 200
20
40
010
2030
40
0
10
20
30
20
22
24
26
Tem
pera
ture
Linear regression
Prediction Prediction
![Page 40: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/40.jpg)
Ordinary Least Squares (OLS)
0 200
Error or “residual”
Prediction
Observation
Sum squared error
![Page 41: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/41.jpg)
Minimize the sum squared errorSum squared error
Linear equation
Linear system
![Page 42: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/42.jpg)
Alternative derivation
n
d Solve the system (it’s better not to invert the matrix)
![Page 43: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/43.jpg)
Beyond lines and planes
everything is the same with
still linear in
0 10 200
20
40
![Page 44: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/44.jpg)
Geometric interpretation
[Matlab demo]
010
200
100
200
300
400
-10
0
10
20
![Page 45: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/45.jpg)
Ordinary Least Squares [summary]
n
d
Let
For example
Let
Minimize by solving
Given examples
PredictyXXXw TT 1)(
![Page 46: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/46.jpg)
Probabilistic interpretation
0 200
Likelihood
![Page 47: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/47.jpg)
Summery
• Perceptron and regression optimize the same target function• In both cases we compute the gradient (vector of partial
derivatives)• In the case of regression, we set the gradient to zero and
solve for vector w. As the solution we have a closed formula for w such that the target function obtains the global minimum.
• In the case of perceptron, we iteratively go in the direction of the minimum by going in the direction of minus the gradient. We do this incrementally making small steps for each data point.
![Page 48: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/48.jpg)
Homework 1
• (Ch. 2.3.3) Implement perceptron in Matlab and test it on the Pmia Indian Dataset from UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/
• (Ch. 2.4.1) Implementing linear regression in Matlab and apply it to auto-mpg dataset.
![Page 49: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/49.jpg)
161.326 Stephen Marsland
From Ch. 3: Testing
How do we evaluate our trained network?Can’t just compute the error on the training
data - unfair, can’t see overfittingKeep a separate testing setAfter training, evaluate on this test setHow do we check for overfitting?Can’t use training or testing sets
![Page 50: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/50.jpg)
161.326 Stephen Marsland
Validation
Keep a third set of data for thisTrain the network on training dataPeriodically, stop and evaluate on
validation setAfter training has finished, test on test set
This is coming expensive on data!
![Page 51: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/51.jpg)
161.326 Stephen Marsland
Hold Out Cross Validation
Inputs
Targets
…
Training Training Validation
![Page 52: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/52.jpg)
161.326 Stephen Marsland
Hold Out Cross Validation
Partition training data into K subsetsTrain on K-1 of subsets, validate on KthRepeat for new network, leaving out a
different subsetChoose network that has best validation
errorTraded off data for computationExtreme version: leave-one-out
![Page 53: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/53.jpg)
161.326 Stephen Marsland
Early Stopping
When should we stop training?Could set a minimum training error
Danger of overfittingCould set a number of epochs
Danger of underfitting or overfittingCan use the validation set
Measure the error on the validation set during training
![Page 54: Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain](https://reader033.vdocuments.mx/reader033/viewer/2022051214/56649c7d5503460f94932878/html5/thumbnails/54.jpg)
161.326 Stephen Marsland
Early StoppingError
Training
Number of epochs
Validation
Time to stop training