machine learning (cse 446): perceptron convergence...frank rosenblatt. the perceptron: a...

25
Machine Learning (CSE 446): Perceptron Convergence Sham M Kakade c 2018 University of Washington [email protected] 1 / 13

Upload: others

Post on 13-Mar-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Machine Learning (CSE 446):Perceptron Convergence

Sham M Kakadec© 2018

University of [email protected]

1 / 13

Page 2: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Review

1 / 13

Page 3: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Happy Medium?

Decision trees (that aren’t too deep): use relatively few features to classify.

K-nearest neighbors: all features weighted equally.

Today: use all features, but weight them.

For today’s lecture, assume that y ∈ {−1,+1} instead of {0, 1}, and that x ∈ Rd.

2 / 13

Kira Goldner
Kira Goldner
Kira Goldner
Page 4: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Inspiration from NeuronsImage from Wikimedia Commons.

Input signals come in through dendrites, output signal passes out through the axon.

2 / 13

Page 5: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Perceptron Learning AlgorithmData: D = 〈(xn, yn)〉Nn=1, number of epochs EResult: weights w and bias binitialize: w = 0 and b = 0;for e ∈ {1, . . . , E} do

for n ∈ {1, . . . , N}, in random order do# predicty = sign (w · xn + b);if y 6= yn then

# updatew← w + yn · xn;b← b+ yn;

end

end

endreturn w, b

Algorithm 1: PerceptronTrain

3 / 13

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 6: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Linear Decision Boundary

w·x + b = 0

activation = w·x + b

4 / 13

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 7: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Linear Decision Boundary

w·x + b = 0

4 / 13

Page 8: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Interpretation of Weight Values

What does it mean when . . .

I w1 = 100?

I w2 = −1?

I w3 = 0?

What if ‖w‖ is “large”?

5 / 13

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 9: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Today

5 / 13

Page 10: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

What would we like to do?

I Optimization problem: find a classifier which minimizes the classification loss.

I The perceptron algorithm can be viewed as trying to do this...

I Problem: (in general) this is an NP-Hard problem.

I Let’s still try to understand it...

This is the general approach of loss function minimization: find parameters whichmake our training error ’small’ (and which also generalizes)

6 / 13

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 11: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

When does the perceptron not converge?

7 / 13

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 12: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Linear Separability

A dataset D = 〈(xn, yn)〉Nn=1 is linearly separable if there exists some linear classifier(defined by w, b) such that, for all n, yn = sign (w · xn + b).

If data are separable, (without loss of generality) can scale so that:

I “margin at 1”, can assume for all (x, y)

y (w∗ · x) ≥ 1

(let w∗ be smallest norm vector with margin 1).

I CIML: assumes ‖w∗‖ is unit length and scales the ”1” above.

8 / 13

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 13: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Perceptron ConvergenceDue to Rosenblatt (1958).

Theorem: Suppose data are scaled so that ‖xi‖2 ≤ 1.Assume D is linearly separable, and let be w∗ be a separator with “margin 1”.Then the perceptron algorithm will converge in at most ‖w∗‖2 epochs.

I Let wt be the param at “iteration” t; w0 = 0

I “A Mistake Lemma”: At iteration t

If we make a mistake, ‖wt+1 −w∗‖2 = ‖wt −w∗‖2

If we do make a mistake, ‖wt+1 −w∗‖2 ≤ ‖wt −w∗‖2 − 1

I The theorem directly follows from this lemma. Why?

9 / 13

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 14: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Proof of the “Mistake Lemma”

10 / 13

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 15: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Proof of the “Mistake Lemma” (more scratch space)

11 / 13

Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Kira Goldner
Page 16: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Proof of the “Mistake Lemma” (more scratch space)

11 / 13

Page 17: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Voted Perceptron

I Suppose w1, w4, w10,w11 . . . are the parameters right after we updated (e.g.after we made a mistake).

I Idea: instead of using the final wt to classify, we classify with a majority voteusing w1, w4, w10,w11 . . .

I Why?

See CIML for details: Implementation and variants.

12 / 13

Page 18: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Voted Perceptron

Let w(e,n) and b(e,n) be the parameters after updating based on the nth example onepoch e.

y = sign

(E∑

e=1

N∑n=1

sign(w(e,n) · x+ b(e,n))

)

12 / 13

Page 19: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Voted Perceptron

13 / 13

Page 20: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Voted Perceptron

13 / 13

Page 21: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Voted Perceptron

13 / 13

Page 22: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Voted Perceptron

13 / 13

Page 23: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Voted Perceptron

13 / 13

Page 24: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

Voted Perceptron

13 / 13

Page 25: Machine Learning (CSE 446): Perceptron Convergence...Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review,

References I

Frank Rosenblatt. The perceptron: A probabilistic model for information storage andorganization in the brain. Psychological Review, 65:386–408, 1958.

13 / 13