1 online learning algorithms. 2 outline online learning framework design principles of online...

40
1 Online Learning Online Learning Algorithms Algorithms

Upload: melvyn-turner

Post on 23-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

1

Online Learning Online Learning AlgorithmsAlgorithms

Page 2: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

2

Outline

• Online learning Framework

• Design principles of online learning algorithms (additive

updates) Perceptron, Passive-Aggressive and Confidence weighted

classification

Classification – binary, multi-class and structured prediction

Hypothesis averaging and Regularization

• Multiplicative updates Weighted majority, Winnow, and connections to Gradient

Descent(GD) and Exponentiated Gradient Descent (EGD)

Page 3: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

3

Formal setting – Classification

• Instances Images, Sentences

• Labels Parse tree, Names

• Prediction rule Linear prediction rule

• Loss No. of mistakes

Page 4: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

4

Predictions

• Continuous predictions :

Label

Confidence

• Linear Classifiers

Prediction :

Confidence:

Page 5: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

5

Loss Functions

• Natural Loss: Zero-One loss:

• Real-valued-predictions loss: Hinge loss:

Exponential loss (Boosting)

Page 6: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

6

Loss Functions

1

1Zero-One Loss

Hinge Loss

Page 7: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

7

Online Framework

• Initialize Classifier• Algorithm works in rounds• On round the online algorithm :

Receives an input instance Outputs a prediction Receives a feedback label Computes loss Updates the prediction rule

• Goal : Suffer small cumulative loss

Page 8: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

8

• Margin of an example with respect to the classifier :

• Note :

• The set is separable iff there exists u such that

Margin

Page 9: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

9

Geometrical Interpretation

Margin >0

Margin <<0

Margin <0Margin >>0

Page 10: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

10

Hinge Loss

Page 11: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

11

Why Online Learning?

• Fast• Memory efficient - process one example at

a time• Simple to implement• Formal guarantees – Mistake bounds • Online to Batch conversions• No statistical assumptions• Adaptive

Page 12: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

12

Update Rules• Online algorithms are based on an update rule

which defines from (and possibly other information)

• Linear Classifiers : find from based on the input

• Some Update Rules :

Perceptron (Rosenblat) ALMA (Gentile) ROMMA (Li & Long) NORMA (Kivinen et. al)

MIRA (Crammer & Singer) EG (Littlestown and Warmuth) Bregman Based (Warmuth) CWL (Dredge et. al)

Page 13: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

13

Design Principles of Algorithms

• If the learner suffers non-zero loss at any round, then

we want to balance two goals:

Corrective: Change weights enough so that we don’t make

this error again (1)

Conservative: Don’t change the weights too much (2)

How to define too much ?

Page 14: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

14

Design Principles of Algorithms

• If we use Euclidean distance to measure the change between old and new

weights

Enforcing (1) and minimizing (2)

e.g., Perceptron for squared loss (Windrow-Hoff or Least Mean Squares)

• Passive-Aggressive algorithms do exactly same

except (1) is much stronger – we want to make a correct classification with

margin of at least 1

• Confidence-Weighted classifiers

maintains a distribution over weight vectors

(1) is same as passive-aggressive with a probabilistic notion of margin

Change is measured by KL divergence between two distributions

Page 15: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

15

Design Principles of Algorithms

• If we assume all weights are positive we can use (unnormalized) KL divergence to

measure the change

Multiplicative update or EG algorithm (Kivinen and Warmuth)

Page 16: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

16

The Perceptron Algorithm

• If No-Mistake

Do nothing

• If Mistake

Update

• Margin after update:

Page 17: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

17

Passive-Aggressive Passive-Aggressive AlgorithmsAlgorithms

Page 18: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

18

Passive-Aggressive: Motivation

• Perceptron: No guaranties of margin after the update

• PA: Enforce a minimal non-zero margin after the update

• In particular: If the margin is large enough (1), then do nothing If the margin is less then unit, update such that

the margin after the update is enforced to be unit

Page 19: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

19

Aggressive Update Step

• Set to be the solution of the following optimization problem:

• Closed-form update:

(2)

(1)

where,

Page 20: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

20

Passive-Aggressive Update

Page 21: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

21

Unrealizable Case

Page 22: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

22

Confidence Weighted Confidence Weighted ClassificationClassification

Page 23: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

23

Confidence-Weighted Classification: Motivation

• Many positive reviews with the word best

Wbest

• Later negative review “boring book – best if you want to sleep in seconds”

• Linear update will reduce both

Wbest Wboring

• But best appeared more than boring

• How to adjust weights at different rates?Wboring Wbest

Page 24: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

24

• The weight vector is a linear combination of examples

• Two rate schedules (among others): Perceptron algorithm, conservative:

Passive-aggressive

Update Rules

Page 25: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

25

Distributions in Version Space

Example

Mean weight-vector

Q u ic k T ime ™ a n d a d e c o mp re s s o r

a re n e e d e d to s e e th is p ic tu re .

Page 26: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

26

Margin as a Random Variable

• Signed margin

is a Gaussian-distributed variable

• Thus:

Page 27: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

27

PA-like Update

• PA:

• New Update :

Page 28: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

28

Place most of the probability mass in this region

Weight Vector (Version) Space

Page 29: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

29

Nothing to do, most weight vectors already classify the example correctly

Passive Step

Page 30: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

30

Project the current Gaussian distribution onto the half-space

Aggressive Step

The covariance is shirked in the direction of the new example

Mean moved past the mistake line(large margin)

Page 31: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

31

Extensions: Extensions: Multi-class and Structured Multi-class and Structured

PredictionPrediction

Page 32: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

32

Multiclass Representation I

• k Prototypes• New instance • Compute

• Prediction: the class achieving the highest Score

Class r

1 -1.08

2 1.66

3 0.37

4 -2.09

Page 33: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

33

• Map all input and labels into a joint vector space

• Score labels by projecting the corresponding feature vector

Multiclass Representation II

Estimated volume was a light 2.4 million ounces .

F ) =0 1 1 0( … B I O B I I I I O

Page 34: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

34

Multiclass Representation II

• Predict label with highest score (Inference)

• Naïve search is expensive if the set of possible labels is large

No. of labelings = 3No. of words

B I O B I I I I O

Estimated volume was a light 2.4 million ounces .

Efficient Viterbi decoding for sequences!

Page 35: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

35

Two Representations

• Weight-vector per class (Representation I) Intuitive Improved algorithms

• Single weight-vector (Representation II) Generalizes representation I

Allows complex interactions between input and output

0 0 0 x 0F(x,4)=

Page 36: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

36

• Binary:

• Multi Class:

Margin for Multi Class

Page 37: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

37

• But different mistakes cost (aka loss function) differently – so use it!

• Margin scaled by loss function:

Margin for Multi Class

Page 38: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

38

• Initialize • For

Receive an input instance Outputs a prediction Receives a feedback label Computes loss Update the prediction rule

Perceptron Multiclass online algorithm

Page 39: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

39

• Initialize • For

Receive an input instance Outputs a prediction Receives a feedback label Computes loss Update the prediction rule

PA Multiclass online algorithm

Page 40: 1 Online Learning Algorithms. 2 Outline Online learning Framework Design principles of online learning algorithms (additive updates)  Perceptron, Passive-Aggressive

40

Regularization

• Key Idea: If an online algorithm works well on a

sequence of i.i.d examples, then an ensemble of online hypotheses should generalize well.

• Popular choices: the averaged hypothesis the majority vote use validation set to make a choice