online passive-aggressive algorithms
TRANSCRIPT
Online Passive-Aggressive Algorithms
Tirgul 11
Multi-Label Classification
2
Multilabel Problem: Example
โข Mapping Apps to smart folders:
3
Assign an installed app to one or more folders Candy Crush
Saga
Goal
โข Given ๐ด, an installed app:
โข Assign it to ๐ โ ๐ด the set of relevant folders:
โข Where ๐ด is the set of all folders:
4
Farm Story
SocialGames
GamesSocial Music Photography Shopping
Multilabel Classification
โข A variant of the classification problem:โข Multiple target labels must by assigned to each instance.
โข Setting:โข There are ๐ different possible labels: ๐ด = 1,โฆ , ๐ .
โข Every instance ๐๐ is associated with a set of relevant labels ๐๐.
โข Special case: โข There is a single relevant label for each instance: multiclass (single-label) classification.
5
Multilabel Classification
โข Other usage:โข Text categorization:
โข ๐๐ represents a document.
โข ๐๐ is the set of topics which are relevant to the documentโข Chosen from a predefined collection of topics.
โข E.g.: A text might be about any of religion, politics, finance or education at the same time or none of these.
6
Task
7
Assign userโs installed applications to one ore
more smart folders in an automatic way
Task
โข Training set of examples:
โข Each example (๐ด๐ , ๐๐) contains an application and a set of one or more smart folders:
8
Farm Story SocialGames
Model
9
Find a function ๐ = ๐๐ ๐ด with parameters ๐
that assigns a set of folders ๐ to an installed
app ๐ด.
Model and Inference
10
weight vectorsfeature maps
๐ฆ = ๐๐ค ๐ด = ๐๐๐๐ ๐๐๐ก๐๐๐(๐ด, ๐)
Feature maps
TF-IDF of title
Google Playโs category
TF-IDF of description
Representation of
related apps
Inference
โข Algorithmโs output upon receiving an instance ๐๐:โข A score for each of the ๐ labels in ๐ด.
12
GamesSocial Music Photography Shopping
๐ด =
0.6, 0.3, 0.7, 0.1, 0.01
๐๐ =Farm Story
๐ฆ๐ = ๐๐๐๐ ๐๐๐ก๐๐๐(๐ด๐ , ๐)
Inference
โข The algorithmโs prediction is a vector in โ๐ where each element in the vector corresponds to the score assigned to the respective label.
โข This form of prediction is often referred to as label ranking.
13
GamesSocial Music Photography Shopping
๐ด =
0.6, 0.3, 0.7, 0.1, 0.01
๐ฆ = ๐๐๐๐ ๐๐๐ก๐๐๐(๐ด, ๐)
โ โ๐
Inference
โข For a pair of labels ๐, ๐ โ ๐ด:โข If score ๐ > ๐ ๐๐๐๐(๐ ):
โข Label ๐ is ranked higher than label ๐ .
โข Goal of Algorithm:โข Rank every relevant label above every
irrelevant label.
14
Social
Music
Games
Photography
Shopping
0.6
0.3
The Margin
โข Example: ๐ด๐ , ๐๐ = , .
โข After making predictions ๐๐, the algorithm receives the correct set ๐๐ .
1. Find the least probable correct folder:
2. Find the most probable wrong folder:
15
Games
Music
Social
Photography
Shopping
Games Social
Games
Music
max
Iterate over examples
โข Example: ๐ด๐ , ๐๐ = , .
โข After making predictions ๐๐, the algorithm receives the correct set ๐๐ .
โข Update:
16
Games
Music
Social
Photography
Shopping
Games Social
max
The Margin
โข We define the margin attained by the algorithm on round ๐ for example ๐๐ , ๐๐ ,
17
๐พ ๐๐ , ๐๐ , ๐ฆ๐ = min๐โ๐ฆ๐
๐๐ ๐ ๐๐ , ๐ โ max๐ โ๐ฆ๐
๐๐๐ ๐๐ , ๐ .Games
Music
Social
Photography
Shopping
The Margin
โข The margin is positive if all relevant labels are ranked higher than all irrelevant labels.
โข We are not satisfied with only a positive margin; we require the margin of every prediction to be at least 1.
18
๐พ ๐๐ , ๐๐ , ๐ฆ๐ = min๐โ๐ฆ๐
๐๐ ๐ ๐๐ , ๐ โ max๐ โ๐ฆ๐
๐๐๐ ๐๐ , ๐ .
The Loss
โข Define a hinge loss:
19
๐พ ๐๐ , ๐๐ , ๐ฆ๐ = min๐โ๐ฆ๐
๐๐ ๐ ๐๐ , ๐ โ max๐ โ๐ฆ๐
๐๐๐ ๐๐ , ๐ .
โ ๐ค๐ , ๐ฅ๐ , ๐ฆ๐ = 0 ๐พ ๐๐ , ๐๐ , ๐ฆ๐ โฅ 1
1 โ ๐พ ๐๐ , ๐๐ , ๐ฆ๐ ๐๐กโ๐๐๐ค๐๐ ๐
๐ min๐โ๐ฆ๐
๐๐ ๐ ๐๐,๐ โmax๐ โ๐ฆ๐
๐๐๐ ๐ฅ๐,๐ <0
The Loss
โข Could also be written as follows:
โข where ๐ + = max(0, ๐)
20
โ ๐ค๐ , ๐ฅ๐ , ๐ฆ๐ = 1 โ ๐พ ๐๐ , ๐๐ , ๐ฆ๐ +
โ ๐ค๐ , ๐ฅ๐ , ๐ฆ๐ = 0 ๐พ ๐๐ , ๐๐ , ๐ฆ๐ โฅ 1
1 โ ๐พ ๐๐ , ๐๐ , ๐ฆ๐ ๐๐กโ๐๐๐ค๐๐ ๐
Learning
โข First approach:
โข Goal :
21
Multilabel PA Optimization Problem
โข An alternative approach:
22
๐๐ = ๐๐๐๐๐๐๐โ๐ฆ๐๐๐๐(๐๐ , ๐) ๐ ๐ = ๐๐๐๐๐๐ฅ๐ โ๐ฆ๐๐๐๐(๐๐ , ๐ )
๐๐+1 = ๐๐๐๐๐๐๐ค1
2๐ โ๐๐
2
๐ . ๐ก. โ ๐, ๐๐ , ๐๐ = 1 โ ๐พ ๐, ๐๐, ๐ฆ๐+= 0
i.e., the margin is greater than 1.
๐พ ๐๐ , ๐๐ , ๐ฆ๐ = ๐๐ ๐ ๐๐ , ๐๐ โ๐๐๐ ๐๐ , ๐ ๐
Passive Aggressive (PA)
โข The algorithm is passive whenever the hinge-loss is zero:โข โ๐ = 0๐ค๐+1 = ๐ค๐
โข When the loss is positive, the algorithm aggressively forces ๐ค๐+1 to satisfy the constraint โ ๐๐ , ๐๐ , ๐๐ = 0, regardless of the step size required.
23
Passive Aggressive
โข Solving with Lagrange Multipliers, we get the following update rule:
24
๐ค๐+1 = ๐ค๐ + ๐๐ ๐ ๐ฅ๐ , ๐๐ โ ๐ ๐ฅ๐ , ๐ ๐
๐๐ =โ๐
๐ ๐ฅ๐ , ๐๐ โ ๐ ๐ฅ๐ , ๐ ๐2
Passive Aggressive
โข The updated vector ๐ค๐+1 will classify example ๐ฅ๐ with โ ๐, ๐๐ , ๐๐ =0.
25
Ranking
26
Ranking Problem: Example
โข A Prediction Bar:
27
Predict the apps the user is most likely to use at any given time and location.
the most likely app
the 4th likely app
...
assume the user is at a context
...
at the office
12:47
today is Wednesday
Goal
29
Find a function ๐ that gets as input the
current context ๐ and predicts the apps the
user is likely to click at a given context.
Features
time of day
day of week
location
Discriminative Model
โข Prediction:
31
Parameters ๐๐ด โ โ๐
of app A Context๐ โ โ๐
...
New Evaluation
โข The performance of the prediction system is measured by Receiver operating Characteristics (ROC) curve.
true positive rate=
app was in the prediction bar and was clicked
total contexts where was clicked
false positive rate=
app was in the prediction bar and wasnโt clicked
total contexts where wasnโt clicked
Maximizing AUC
โข By definition of the AUC (Bamber, 1975; Hanley and McNeil,1982):
33
๐ด๐๐ถ
Pairwise Dataset
โข For an application define two sets of context:
34Waze Waze
Maximizing AUC
โข By definition of the AUC (Bamber, 1975; Hanley and McNeil,1982):
35
Solve using PA
โข Set the next ๐ค๐ to be the minimizer of the following optimization problem
37
min๐คโโ๐,๐โฅ0
1
2๐ค โ ๐ค๐โ1
2
๐ . ๐ก. ๐๐ค ๐ฅ๐+ , ๐ด๐ โ ๐๐ค ๐ฅ๐
โ, ๐ด๐ โฅ 1
Implementation
โข Online algorithm to solve the optimization problem efficiency on huge data.
โข Theoretical guarantees of convergence.
38