online passive- aggressive algorithms · online passive-aggressive algorithms jean-baptiste behuet...

30
Online Passive- Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 Tutor: Eneldo Loza Mencía Seminar aus Maschinellem Lernen WS07/08

Upload: others

Post on 18-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

Online Passive-Aggressive Algorithms

Jean-Baptiste Behuet 28/11/2007Tutor: Eneldo Loza Mencía

Seminar aus Maschinellem Lernen WS07/08

Page 2: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

2Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Overview● Online algorithms● Online Binary Classification Problem

– Perceptron Algorithm– 3 versions of the Passive-Aggressive Algorithm– Loss bounds, Comparison with the Perceptron

● Other learning problems● Experiments● Conclusion

Page 3: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

3Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Online Algorithms

● Sequence of rounds t:– Instance x

t as input

– Predicts ŷt as output

– Receives correct output yt

– Updates prediction mechanism

Page 4: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

4Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Online Binary Classification:Perceptron Algorithm

● Round t:– instance with label – classification fonction based on weight vector

→ defines hyperplane separating the 2 classes– prediction:– signed margin:– correct if margin > 0

x t∈ℝn yt∈{−1,1}

w t∈ℝn

yt w t . xt bŷt=sign wt . xt b

● Goal: incrementally learn wt

(incrementally modify hyperplane)

x t∈ℝn

Page 5: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

5Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Online Binary Classification:Perceptron Algorithm (2)

● Start with random hyperplane (random w0)

● At each round t of the algorithm:– receives x

t and predicts ŷ

t = sign(w

t.x

t + b

t)

– receives correct yt and updates the hyperplane

● Update minimizes the distance of misclassified exemples to the boundarywt1 = wtρ y t− ŷ t x t with ρ > 0 learning rate

(the hyperplane is updated when an error occurs)

Page 6: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

6Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Passive-Aggressive Algorithmfor binary classification

● Want: margin ≥ 1 as often as possible (not only correctly classified exemples)● Hinge-loss fonction:

● Loss suffered at round t:

● Number of prediction mistakes ≤ ∑ l t2

Page 7: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

7Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Passive-Aggressive Algorithmfor binary classification (2)

● Initialization:● Update:

– wt+1

solution of constrained optimization problem:

w1 = 0, ... ,0

– wt+1

has the form:

● Initialization:● Update:

– wt+1

solution of constrained optimization problem:

Page 8: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

8Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Passive-Aggressive Algorithmfor binary classification (3)

● Trade-off:– w

t+1 required to have no loss on current exemple

– wt+1

as close as possible to wt

● " Passive-Aggressive " :– " passive " when– " aggressive " otherwise: w

t+1 forced to satisfy the

constraint on the current exemple

l t = 0

l t = 0

Page 9: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

9Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Two variations of thePA algorithm

● Problem of aggressiveness in case of noise

● PA-II:

with C aggressiveness parameter

● PA-I:

● same update form as for PA:

for PA-I for PA-II

Page 10: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

10Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Relative loss bounds

● Number of prediction mistakes ≤ ∑ l t2

● Comparison of the loss attained by PA with the loss attained by a fixed classifier sign(u.x)

● For the original PA algorithm:

and lt* = 0∀ t ∥x t∥ R

● For the original PA algorithm:

Page 11: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

11Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Relative loss bounds (2)● For the original PA algorithm:

● For PA-I:

● For PA-II:

∀ t ∥xt∥= 1 , ∀ u∈ℝn

∀ t ∥xt∥ R , ∀ u∈ℝn

∀ t ∥xt2∥ R2 , ∀ u∈ℝn

Page 12: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

12Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Comparison with the Perceptron Algorithm

● Bounds are comparable both in separable (PA) and non-separable (PA-I, PA-II) cases

Page 13: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

13Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Generalization to thenon-linear case: Principle

● map the data space into a feature space where the data is now linearly separable

● K(w,x) is the inner product of the vectors Φ(w) and Φ(x)● algorithm learns w'

t (weight vector in feature space H)

and predicts

● feature map Φ: χ → H● replace (w.x) by

Mercer Kernel K(w,x) (non-linear function)

ŷt = sign w ' tT .Ф xt

Page 14: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

14Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Other problems

● Regression● Uniclass prediction● Multiclass problems

Page 15: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

15Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Regression

● instance

→ prediction ŷ t=w t . x t

y∉{−1,1} , y∈ℝ

● ε-sensitive hinge loss function:

● main difference with the binary problem:

xt∈ℝn

Page 16: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

16Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Regression: PA algorithms● Initialization:● Update:

– wt+1

solution of constrained optimization problem:

w1 = 0, ... ,0

– wt+1

has the form:

● Same loss bounds as for binary classification

Page 17: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

17Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Uniclass prediction

● Equivalent:– find the center → elements are within a radius of ε

● Principle of a round:– no input x

t

– predicts the next element of the sequence to be wt

– receives yt and suffers loss:

Page 18: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

18Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Uniclass prediction:PA algorithms

● Update: wt+1

solution of optimization problem:

wt+1

has the form:

Page 19: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

19Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Multiclass multilabel classification

● Principle:– set of all possible labels– receives instance x

t (associated with relevant labels)

– outputs a score for each of the k labels●

– receives the set of " relevant " labels Yt for x

t

● " relevant " must be ranked higher than " irrelevant "– updates the prediction mechanism

Y={1, ... , k }

prediction vector ∈ ℝk

Page 20: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

20Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Multiclass multilabel:Problem settings

● feature vector: Φ(x,y) = (Φ1(x,y), ..., Φ

d(x,y))

(set of features: Φ1, ..., Φ

d)

● Prediction vector:w t∈ℝ

d

● Margin of the exemple (xt,Y

t):

Page 21: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

21Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Multiclass multilabel:Problem settings (2)

● Margin: difference between– score of the lowest ranked relevant label– score of the highest ranked irrelevant label

● Hinge-loss function:

Page 22: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

22Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Multiclass multilabel:PA algorithms

● wt+1

solution of optimization problem:

● wt+1

has the form:

● Equivalence:

Page 23: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

23Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Experiments

1. Robustness to noise2. Effect of the aggressiveness parameter C3. Multiclass problems:

comparison with other online algorithms

Page 24: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

24Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Experiment 1:Robustness to noise

→ Low noise level: 3 make similar number of errors→ High noise level: PA-I and PA-II outperform PA

● Binary classification, 4000 generated exemples (results averaged on 10 repetitions)

● Instance noise label noise

● Find optimal fixed linear classifier (brute force)

● C = 0.001

Page 25: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

25Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Experiment 2:Effect of C

C " aggressiveness parameter "

● Rule: when there is noise in data, C should be small

● Results meet the theoretic loss bounds

Page 26: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

26Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Experiment 2:Effect of C (2)

● Evolution of error rate with the number of exemples

Page 27: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

27Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Experiment 3:Multiclass problems

● Use standard multiclass datasets: USPS, MNIST● Comparison of the multiclass PA algorithms with:

– multiclass versions of the Perceptron algorithm– MIRA (Margin Infused Relaxed Algorithm)

● PA-I and MIRA comparable● but MIRA solves a complex

optimisation problem for each update ≠ PA: simple expression

Page 28: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

28Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Conclusion

● Further research:– extension to other problems– conversion to batch algorithms– PA with bounded memory constraints

(memory requirements imposed when using Mercer Kernels)

Page 29: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

29Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

References● Crammer, Koby. Dekel, Ofer. Keshet, Joseph. Shalev-Shwartz, Shai.

Singer, Yoram. " Online Passive-Aggressive Algorithms ". Jerusalem, 2006 <http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf> (20 Oct. 2007)

● Schiele, Bernt. "Maschinelles Lernen - Statistische Verfahren". Darmstadt: Technische Universität Darmstadt, 18. Mai 2oo7<http://www.mis.informatik.tu-darmstadt.de/Education/Courses/ml/ slides/ ml-2007-0518-svm2-v1.pdf> (16 Nov. 2007)

● Rojas, Raul. "Perceptron Learning". Neural Networks - A Systematic Introduction. Springer-Verlag, Berlin, 1996<http://page.mi.fu-berlin.de/rojas/neural/chapter/K4.pdf> (22 Nov. 2007)

● Rodriguez, Carlos C. "The Kernel Trick". October 25, 2004 <omega.albany.edu:8008/machine-learning-dir/notes-dir/ker1/ ker1.pdf> (26 Nov. 2007)

Page 30: Online Passive- Aggressive Algorithms · Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007 13 Generalization to the non-linear case: Principle map the data space

30Online Passive-Aggressive Algorithms Jean-Baptiste Behuet 28/11/2007

Thank you for your attention

Questions?