online passive-aggressive algorithms

38
Online Passive-Aggressive Algorithms Tirgul 11

Upload: others

Post on 07-Jan-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms

Tirgul 11

Page 2: Online Passive-Aggressive Algorithms

Multi-Label Classification

2

Page 3: Online Passive-Aggressive Algorithms

Multilabel Problem: Example

โ€ข Mapping Apps to smart folders:

3

Assign an installed app to one or more folders Candy Crush

Saga

Page 4: Online Passive-Aggressive Algorithms

Goal

โ€ข Given ๐ด, an installed app:

โ€ข Assign it to ๐’š โŠ† ๐’ด the set of relevant folders:

โ€ข Where ๐’ด is the set of all folders:

4

Farm Story

SocialGames

GamesSocial Music Photography Shopping

Page 5: Online Passive-Aggressive Algorithms

Multilabel Classification

โ€ข A variant of the classification problem:โ€ข Multiple target labels must by assigned to each instance.

โ€ข Setting:โ€ข There are ๐‘˜ different possible labels: ๐’ด = 1,โ€ฆ , ๐‘˜ .

โ€ข Every instance ๐’™๐‘– is associated with a set of relevant labels ๐’š๐‘–.

โ€ข Special case: โ€ข There is a single relevant label for each instance: multiclass (single-label) classification.

5

Page 6: Online Passive-Aggressive Algorithms

Multilabel Classification

โ€ข Other usage:โ€ข Text categorization:

โ€ข ๐’™๐‘– represents a document.

โ€ข ๐’š๐‘– is the set of topics which are relevant to the documentโ€ข Chosen from a predefined collection of topics.

โ€ข E.g.: A text might be about any of religion, politics, finance or education at the same time or none of these.

6

Page 7: Online Passive-Aggressive Algorithms

Task

7

Assign userโ€™s installed applications to one ore

more smart folders in an automatic way

Page 8: Online Passive-Aggressive Algorithms

Task

โ€ข Training set of examples:

โ€ข Each example (๐ด๐‘– , ๐’š๐‘–) contains an application and a set of one or more smart folders:

8

Farm Story SocialGames

Page 9: Online Passive-Aggressive Algorithms

Model

9

Find a function ๐’š = ๐‘“๐’˜ ๐ด with parameters ๐’˜

that assigns a set of folders ๐’š to an installed

app ๐ด.

Page 10: Online Passive-Aggressive Algorithms

Model and Inference

10

weight vectorsfeature maps

๐‘ฆ = ๐‘“๐‘ค ๐ด = ๐‘Ž๐‘Ÿ๐‘”๐‘ ๐‘œ๐‘Ÿ๐‘ก๐’š๐’˜๐œ™(๐ด, ๐’š)

Page 11: Online Passive-Aggressive Algorithms

Feature maps

TF-IDF of title

Google Playโ€™s category

TF-IDF of description

Representation of

related apps

Page 12: Online Passive-Aggressive Algorithms

Inference

โ€ข Algorithmโ€™s output upon receiving an instance ๐’™๐‘–:โ€ข A score for each of the ๐‘˜ labels in ๐’ด.

12

GamesSocial Music Photography Shopping

๐’ด =

0.6, 0.3, 0.7, 0.1, 0.01

๐’™๐‘– =Farm Story

๐‘ฆ๐‘– = ๐‘Ž๐‘Ÿ๐‘”๐‘ ๐‘œ๐‘Ÿ๐‘ก๐’š๐’˜๐œ™(๐ด๐‘– , ๐’š)

Page 13: Online Passive-Aggressive Algorithms

Inference

โ€ข The algorithmโ€™s prediction is a vector in โ„๐‘˜ where each element in the vector corresponds to the score assigned to the respective label.

โ€ข This form of prediction is often referred to as label ranking.

13

GamesSocial Music Photography Shopping

๐’ด =

0.6, 0.3, 0.7, 0.1, 0.01

๐‘ฆ = ๐‘Ž๐‘Ÿ๐‘”๐‘ ๐‘œ๐‘Ÿ๐‘ก๐’š๐’˜๐œ™(๐ด, ๐’š)

โˆˆ โ„๐‘˜

Page 14: Online Passive-Aggressive Algorithms

Inference

โ€ข For a pair of labels ๐‘Ÿ, ๐‘  โˆˆ ๐’ด:โ€ข If score ๐‘Ÿ > ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’(๐‘ ):

โ€ข Label ๐‘Ÿ is ranked higher than label ๐‘ .

โ€ข Goal of Algorithm:โ€ข Rank every relevant label above every

irrelevant label.

14

Social

Music

Games

Photography

Shopping

0.6

0.3

Page 15: Online Passive-Aggressive Algorithms

The Margin

โ€ข Example: ๐ด๐‘– , ๐’š๐‘– = , .

โ€ข After making predictions ๐’š๐’Š, the algorithm receives the correct set ๐’š๐‘– .

1. Find the least probable correct folder:

2. Find the most probable wrong folder:

15

Games

Music

Social

Photography

Shopping

Games Social

Games

Music

max

Page 16: Online Passive-Aggressive Algorithms

Iterate over examples

โ€ข Example: ๐ด๐‘– , ๐’š๐‘– = , .

โ€ข After making predictions ๐’š๐’Š, the algorithm receives the correct set ๐’š๐‘– .

โ€ข Update:

16

Games

Music

Social

Photography

Shopping

Games Social

max

Page 17: Online Passive-Aggressive Algorithms

The Margin

โ€ข We define the margin attained by the algorithm on round ๐‘– for example ๐’™๐‘– , ๐’š๐‘– ,

17

๐›พ ๐’˜๐‘– , ๐’™๐‘– , ๐‘ฆ๐‘– = min๐‘Ÿโˆˆ๐‘ฆ๐‘–

๐’˜๐‘– ๐œ™ ๐’™๐‘– , ๐‘Ÿ โˆ’ max๐‘  โˆ‰๐‘ฆ๐‘–

๐’˜๐‘–๐œ™ ๐’™๐‘– , ๐‘  .Games

Music

Social

Photography

Shopping

Page 18: Online Passive-Aggressive Algorithms

The Margin

โ€ข The margin is positive if all relevant labels are ranked higher than all irrelevant labels.

โ€ข We are not satisfied with only a positive margin; we require the margin of every prediction to be at least 1.

18

๐›พ ๐’˜๐‘– , ๐’™๐‘– , ๐‘ฆ๐‘– = min๐‘Ÿโˆˆ๐‘ฆ๐‘–

๐’˜๐‘– ๐œ™ ๐’™๐‘– , ๐‘Ÿ โˆ’ max๐‘  โˆ‰๐‘ฆ๐‘–

๐’˜๐‘–๐œ™ ๐’™๐‘– , ๐‘  .

Page 19: Online Passive-Aggressive Algorithms

The Loss

โ€ข Define a hinge loss:

19

๐›พ ๐’˜๐‘– , ๐’™๐‘– , ๐‘ฆ๐‘– = min๐‘Ÿโˆˆ๐‘ฆ๐‘–

๐’˜๐‘– ๐œ™ ๐’™๐‘– , ๐‘Ÿ โˆ’ max๐‘  โˆ‰๐‘ฆ๐‘–

๐’˜๐‘–๐œ™ ๐’™๐‘– , ๐‘  .

โ„“ ๐‘ค๐‘– , ๐‘ฅ๐‘– , ๐‘ฆ๐‘– = 0 ๐›พ ๐’˜๐‘– , ๐’™๐‘– , ๐‘ฆ๐‘– โ‰ฅ 1

1 โˆ’ ๐›พ ๐’˜๐‘– , ๐’™๐‘– , ๐‘ฆ๐‘– ๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ๐‘ค๐‘–๐‘ ๐‘’

๐• min๐‘Ÿโˆˆ๐‘ฆ๐‘–

๐’˜๐‘– ๐œ™ ๐’™๐‘–,๐‘Ÿ โˆ’max๐‘  โˆ‰๐‘ฆ๐‘–

๐’˜๐‘–๐œ™ ๐‘ฅ๐‘–,๐‘  <0

Page 20: Online Passive-Aggressive Algorithms

The Loss

โ€ข Could also be written as follows:

โ€ข where ๐‘Ž + = max(0, ๐‘Ž)

20

โ„“ ๐‘ค๐‘– , ๐‘ฅ๐‘– , ๐‘ฆ๐‘– = 1 โˆ’ ๐›พ ๐’˜๐‘– , ๐’™๐‘– , ๐‘ฆ๐‘– +

โ„“ ๐‘ค๐‘– , ๐‘ฅ๐‘– , ๐‘ฆ๐‘– = 0 ๐›พ ๐’˜๐‘– , ๐’™๐‘– , ๐‘ฆ๐‘– โ‰ฅ 1

1 โˆ’ ๐›พ ๐’˜๐‘– , ๐’™๐‘– , ๐‘ฆ๐‘– ๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ๐‘ค๐‘–๐‘ ๐‘’

Page 21: Online Passive-Aggressive Algorithms

Learning

โ€ข First approach:

โ€ข Goal :

21

Page 22: Online Passive-Aggressive Algorithms

Multilabel PA Optimization Problem

โ€ข An alternative approach:

22

๐‘Ÿ๐‘– = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘›๐‘Ÿโˆˆ๐‘ฆ๐‘–๐’˜๐‘–๐œ™(๐’™๐‘– , ๐‘Ÿ) ๐‘ ๐‘– = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ๐‘ โˆ‰๐‘ฆ๐‘–๐’˜๐‘–๐œ™(๐’™๐‘– , ๐‘ )

๐’˜๐‘–+1 = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘–๐‘›๐‘ค1

2๐’˜ โˆ’๐’˜๐‘–

2

๐‘ . ๐‘ก. โ„“ ๐’˜, ๐’™๐‘– , ๐’š๐‘– = 1 โˆ’ ๐›พ ๐’˜, ๐’™๐‘–, ๐‘ฆ๐‘–+= 0

i.e., the margin is greater than 1.

๐›พ ๐’˜๐‘– , ๐’™๐‘– , ๐‘ฆ๐‘– = ๐’˜๐‘– ๐œ™ ๐’™๐‘– , ๐‘Ÿ๐‘– โˆ’๐’˜๐‘–๐œ™ ๐’™๐‘– , ๐‘ ๐‘–

Page 23: Online Passive-Aggressive Algorithms

Passive Aggressive (PA)

โ€ข The algorithm is passive whenever the hinge-loss is zero:โ€ข โ„“๐‘– = 0๐‘ค๐‘–+1 = ๐‘ค๐‘–

โ€ข When the loss is positive, the algorithm aggressively forces ๐‘ค๐‘–+1 to satisfy the constraint โ„“ ๐’˜๐‘– , ๐’™๐‘– , ๐’š๐‘– = 0, regardless of the step size required.

23

Page 24: Online Passive-Aggressive Algorithms

Passive Aggressive

โ€ข Solving with Lagrange Multipliers, we get the following update rule:

24

๐‘ค๐‘–+1 = ๐‘ค๐‘– + ๐œ๐‘– ๐œ™ ๐‘ฅ๐‘– , ๐‘Ÿ๐‘– โˆ’ ๐œ™ ๐‘ฅ๐‘– , ๐‘ ๐‘–

๐œ๐‘– =โ„“๐‘–

๐œ™ ๐‘ฅ๐‘– , ๐‘Ÿ๐‘– โˆ’ ๐œ™ ๐‘ฅ๐‘– , ๐‘ ๐‘–2

Page 25: Online Passive-Aggressive Algorithms

Passive Aggressive

โ€ข The updated vector ๐‘ค๐‘–+1 will classify example ๐‘ฅ๐‘– with โ„“ ๐’˜, ๐’™๐‘– , ๐’š๐‘– =0.

25

Page 26: Online Passive-Aggressive Algorithms

Ranking

26

Page 27: Online Passive-Aggressive Algorithms

Ranking Problem: Example

โ€ข A Prediction Bar:

27

Predict the apps the user is most likely to use at any given time and location.

Page 28: Online Passive-Aggressive Algorithms

the most likely app

the 4th likely app

...

assume the user is at a context

...

at the office

12:47

today is Wednesday

Page 29: Online Passive-Aggressive Algorithms

Goal

29

Find a function ๐‘“ that gets as input the

current context ๐’™ and predicts the apps the

user is likely to click at a given context.

Page 30: Online Passive-Aggressive Algorithms

Features

time of day

day of week

location

Page 31: Online Passive-Aggressive Algorithms

Discriminative Model

โ€ข Prediction:

31

Parameters ๐’˜๐ด โˆˆ โ„๐‘‘

of app A Context๐’™ โˆˆ โ„๐‘‘

...

Page 32: Online Passive-Aggressive Algorithms

New Evaluation

โ€ข The performance of the prediction system is measured by Receiver operating Characteristics (ROC) curve.

true positive rate=

app was in the prediction bar and was clicked

total contexts where was clicked

false positive rate=

app was in the prediction bar and wasnโ€™t clicked

total contexts where wasnโ€™t clicked

Page 33: Online Passive-Aggressive Algorithms

Maximizing AUC

โ€ข By definition of the AUC (Bamber, 1975; Hanley and McNeil,1982):

33

๐ด๐‘ˆ๐ถ

Page 34: Online Passive-Aggressive Algorithms

Pairwise Dataset

โ€ข For an application define two sets of context:

34Waze Waze

Page 35: Online Passive-Aggressive Algorithms

Maximizing AUC

โ€ข By definition of the AUC (Bamber, 1975; Hanley and McNeil,1982):

35

Page 36: Online Passive-Aggressive Algorithms
Page 37: Online Passive-Aggressive Algorithms

Solve using PA

โ€ข Set the next ๐‘ค๐‘– to be the minimizer of the following optimization problem

37

min๐‘คโˆˆโ„๐‘‘,๐œ‰โ‰ฅ0

1

2๐‘ค โˆ’ ๐‘ค๐‘–โˆ’1

2

๐‘ . ๐‘ก. ๐‘“๐‘ค ๐‘ฅ๐‘–+ , ๐ด๐‘– โˆ’ ๐‘“๐‘ค ๐‘ฅ๐‘–

โˆ’, ๐ด๐‘– โ‰ฅ 1

Page 38: Online Passive-Aggressive Algorithms

Implementation

โ€ข Online algorithm to solve the optimization problem efficiency on huge data.

โ€ข Theoretical guarantees of convergence.

38