machine learning (cse 446): perceptron convergence...frank rosenblatt. the perceptron: a...

Machine Learning (CSE 446):Perceptron Convergence

University of Washingtoncse446-staff@cs.washington.edu

1 / 13

Review

1 / 13

Happy Medium?

Decision trees (that aren’t too deep): use relatively few features to classify.

K-nearest neighbors: all features weighted equally.

Today: use all features, but weight them.

For today’s lecture, assume that y ∈ {−1,+1} instead of {0, 1}, and that x ∈ Rd.

2 / 13

Inspiration from NeuronsImage from Wikimedia Commons.

Input signals come in through dendrites, output signal passes out through the axon.

2 / 13

Perceptron Learning AlgorithmData: D = 〈(xn, yn)〉Nn=1, number of epochs EResult: weights w and bias binitialize: w = 0 and b = 0;for e ∈ {1, . . . , E} do

for n ∈ {1, . . . , N}, in random order do# predicty = sign (w · xn + b);if y 6= yn then

# updatew← w + yn · xn;b← b+ yn;

endreturn w, b

Algorithm 1: PerceptronTrain

3 / 13

Linear Decision Boundary

w·x + b = 0

activation = w·x + b

4 / 13

Linear Decision Boundary

w·x + b = 0

4 / 13

Interpretation of Weight Values

What does it mean when . . .

I w1 = 100?

I w2 = −1?

I w3 = 0?

What if ‖w‖ is “large”?

5 / 13

What would we like to do?

I Optimization problem: find a classifier which minimizes the classification loss.

I The perceptron algorithm can be viewed as trying to do this...

I Problem: (in general) this is an NP-Hard problem.

I Let’s still try to understand it...

This is the general approach of loss function minimization: find parameters whichmake our training error ’small’ (and which also generalizes)

6 / 13

When does the perceptron not converge?

7 / 13

Linear Separability

A dataset D = 〈(xn, yn)〉Nn=1 is linearly separable if there exists some linear classifier(defined by w, b) such that, for all n, yn = sign (w · xn + b).

If data are separable, (without loss of generality) can scale so that:

I “margin at 1”, can assume for all (x, y)

y (w∗ · x) ≥ 1

(let w∗ be smallest norm vector with margin 1).

I CIML: assumes ‖w∗‖ is unit length and scales the ”1” above.

8 / 13

Perceptron ConvergenceDue to Rosenblatt (1958).

Theorem: Suppose data are scaled so that ‖xi‖2 ≤ 1.Assume D is linearly separable, and let be w∗ be a separator with “margin 1”.Then the perceptron algorithm will converge in at most ‖w∗‖2 epochs.

I Let wt be the param at “iteration” t; w0 = 0

I “A Mistake Lemma”: At iteration t

If we make a mistake, ‖wt+1 −w∗‖2 = ‖wt −w∗‖2

If we do make a mistake, ‖wt+1 −w∗‖2 ≤ ‖wt −w∗‖2 − 1

I The theorem directly follows from this lemma. Why?

9 / 13

Proof of the “Mistake Lemma”

10 / 13

Proof of the “Mistake Lemma” (more scratch space)

11 / 13

Proof of the “Mistake Lemma” (more scratch space)

11 / 13

Voted Perceptron

I Suppose w1, w4, w10,w11 . . . are the parameters right after we updated (e.g.after we made a mistake).

I Idea: instead of using the final wt to classify, we classify with a majority voteusing w1, w4, w10,w11 . . .

I Why?

See CIML for details: Implementation and variants.

12 / 13

Voted Perceptron

Let w(e,n) and b(e,n) be the parameters after updating based on the nth example onepoch e.

y = sign

N∑n=1

sign(w(e,n) · x+ b(e,n))

12 / 13

Voted Perceptron

13 / 13

Voted Perceptron

13 / 13

Voted Perceptron

13 / 13

Voted Perceptron

13 / 13

Voted Perceptron

13 / 13

Voted Perceptron

13 / 13

References I

Frank Rosenblatt. The perceptron: A probabilistic model for information storage andorganization in the brain. Psychological Review, 65:386–408, 1958.

13 / 13

machine learning (cse 446): perceptron convergence...frank rosenblatt. the perceptron: a...

Documents

perceptron and svm - university of washington...the...

multilayer perceptron perceptron.pdf · multilayer...

rosenblatt poster

neuronovÉ sÍtĚ 1 - files.klaska.net...nicméně jeho...

deep learning - budapest university of technology and...

rusa mars ala2009 rosenblatt

computacion inteligente el perceptron. nov 2005 2 agenda ...

introduction to (shallow) neural...

paul c. rosenblatt

perceptron -...

rosenblatt 1958

machine learning (cse 446): the perceptron algorithm

cse 446 perceptron learning winter 2012 dan weld some slides...

Νευρο Ασαφής Υπολογιστική -...

perceptrons and kernel methods · 2011. 1. 12. · a...

perceptron (c) marcin sydow summary perceptron

deep learning concepts and applications · 2019-12-20 ·...

the perceptron algorithm - george mason university ·...

machine learning (cse 446): perceptronperceptron convergence...

manuela veloso 15-381 - fall 2001 - carnegie mellon school...