ranking with high-order and missing information

Ranking with High-Orderand Missing Information

M. Pawan KumarEcole Centrale Paris

Aseem Behl Puneet Dokania Pritish Mohapatra C. V. Jawahar

PASCAL VOC“Jumping” Classification

Features

Processing

Training

Classifier

PASCAL VOC

Features

Processing

Training

Classifier

Think of a classifier !!!

“Jumping” Classification

✗

PASCAL VOC

Features

Processing

Training

Classifier

Think of a classifier !!!✗

“Jumping” Ranking

Ranking vs. ClassificationRank 1 Rank 2 Rank 3

Rank 4 Rank 5 Rank 6

Average Precision = 1

Ranking vs. ClassificationRank 1 Rank 2 Rank 3

Rank 4 Rank 5 Rank 6

Average Precision = 1 Accuracy = 1= 0.92 = 0.67= 0.81

Ranking vs. Classification

Ranking is not the same as classification

Average precision is not the same as accuracy

Should we use 0-1 loss based classifiers?

Or should we use AP loss based rankers?

• Optimizing Average Precision (AP-SVM)

• High-Order Information

• Missing Information

Yue, Finley, Radlinski and Joachims, SIGIR 2007

Outline

Problem FormulationSingle Input X

Φ(xi)for all i P

Φ(xk)for all k N

Problem FormulationSingle Output R

Rik = +1 if i is better ranked than k

-1 if k is better ranked than i

Problem FormulationScoring Function

si(w) = wTΦ(xi) for all i P

sk(w) = wTΦ(xk) for all k N

S(X,R;w) = Σi P Σk N Rik(si(w) - sk(w))

Ranking at Test-Time

R(w) = maxR S(X,R;w)

x1

Sort samples according to individual scores si(w)

x2 x3 x4 x5 x6 x7 x8

Learning FormulationLoss Function

Δ(R*,R(w))

= 1 – AP of rank R(w)

Non-convex

Parameter cannot be regularized

Learning FormulationUpper Bound of Loss Function

Δ(R*,R(w))S(X,R(w);w) + - S(X,R(w);w)


Δ(R*,R(w))S(X,R(w);w) + - S(X,R*;w)


Δ(R*,R)S(X,R;w) + - S(X,R*;w)maxR

Convex Parameter can be regularized

minw ||w||2 + C ξ

S(X,R;w) + Δ(R*,R) - S(X,R*;w) ≤ ξ, for all R

Optimization for LearningCutting Plane Computation

maxR S(X,R;w) + Δ(R*,R)

x1 x2 x3 x4 x5 x6 x7 x8

Sort positive samples according to scores si(w)

Sort negative samples according to scores sk(w)

Find best rank of each negative sample independently

Optimization for LearningCutting Plane Computation

Trai

ning

Tim

e

0-1

AP

5x slowerAP

Slightly faster

Mohapatra, Jawahar and Kumar, NIPS 2014

ExperimentsPASCAL VOC 2011

Jumping

Phoning

Playing Instrument

Reading

Riding Bike

Riding Horse

Running

Taking Photo

Using Computer

Walking

Images Classes

10 ranking tasks

Cross-validation

Poselets Features

AP-SVM vs. SVMPASCAL VOC ‘test’ Dataset

Differencein AP

Better in 8 classes, tied in 2 classes

AP-SVM vs. SVMFolds of PASCAL VOC ‘trainval’ Dataset

Differencein AP

AP-SVM is statistically better in 3 classes

SVM is statistically better in 0 classes

• Optimizing Average Precision

• High-Order Information (HOAP-SVM)

• Missing Information

Dokania, Behl, Jawahar and Kumar, ECCV 2014

Outline

High-Order Information

• People perform similar actions

• People strike similar poses

• Objects are of same/similar sizes

• “Friends” have similar habits

• How can we use them for ranking? classification

Problem Formulationx

Input x = {x1,x2,x3}

Output y = {-1,+1}3

Ψ(x,y) = Ψ1(x,y)

Ψ2(x,y)

Unary Features

Pairwise Features

Learning Formulationx


Output y = {-1,+1}3

Δ(y*,y) = Fraction of incorrectly classified persons

Optimization for Learningx


Output y = {-1,+1}3

maxy wTΨ(x,y) + Δ(y*,y)

Graph Cuts (if supermodular)

LP Relaxation, or exhaustive search

Classificationx


Output y = {-1,+1}3

maxy wTΨ(x,y)

Graph Cuts (if supermodular)

LP Relaxation, or exhaustive search

Ranking?x


Output y = {-1,+1}3

Use difference of max-marginals

Max-Marginal for Positive Classx


Output y = {-1,+1}3

mm+(i;w) = maxy,yi=+1 wTΨ(x,y)

Best possible score when person i is positive

Convex in w

Max-Marginal for Negative Classx


Output y = {-1,+1}3

mm-(i;w) = maxy,yi=-1 wTΨ(x,y)

Best possible score when person i is negative

Convex in w

Rankingx


Output y = {-1,+1}3

si(w) = mm+(i;w) – mm-(i;w)

Difference-of-Convex in w

Use difference of max-marginals HOB-SVM

Ranking

si(w) = mm+(i;w) – mm-(i;w)

Why not optimize AP directly?

High Order AP-SVM

HOAP-SVM

Problem FormulationSingle Input X

Φ(xi)for all i P

Φ(xk)for all k N

Problem FormulationSingle Input R

Rik = +1 if i is better ranked than k

-1 if k is better ranked than i

Problem FormulationScoring Function

si(w) = mm+(i;w) – mm-(i;w) for all i P

sk(w) = mm+(k;w) – mm-(k;w) for all k N

S(X,R;w) = Σi P Σk N Rik(si(w) - sk(w))

Ranking at Test-Time

R(w) = maxR S(X,R;w)

x1

Sort samples according to individual scores si(w)

x2 x3 x4 x5 x6 x7 x8

Learning FormulationLoss Function

Δ(R*,R(w)) = 1 – AP of rank R(w)


minw ||w||2 + C ξ

S(X,R;w) + Δ(R*,R) - S(X,R*;w) ≤ ξ, for all R

Optimization for Learning

Difference-of-convex program

Kohli and Torr, ECCV 2006

Very efficient CCCP

Linearization step by Dynamic Graph Cuts

Update step equivalent to AP-SVM

ExperimentsPASCAL VOC 2011

Jumping

Phoning

Playing Instrument

Reading

Riding Bike

Riding Horse

Running

Taking Photo

Using Computer

Walking

Images Classes

10 ranking tasks

Cross-validation

Poselets Features

HOB-SVM vs. AP-SVMPASCAL VOC ‘test’ Dataset

Differencein AP

Better in 4, worse in 3 and tied in 3 classes

HOB-SVM vs. AP-SVMFolds of PASCAL VOC ‘trainval’ Dataset

Differencein AP

HOB-SVM is statistically better in 0 classes


HOAP-SVM vs. AP-SVMPASCAL VOC ‘test’ Dataset

Better in 7, worse in 2 and tied in 1 class

Differencein AP

HOAP-SVM vs. AP-SVMFolds of PASCAL VOC ‘trainval’ Dataset

HOAP-SVM is statistically better in 4 classes


Differencein AP

• Optimizing Average Precision

• High-Order Information

• Missing Information (Latent-AP-SVM)

Outline

Behl, Jawahar and Kumar, CVPR 2014

Fully Supervised Learning

Weakly Supervised Learning

Rank images by relevance to ‘jumping’

• Use Latent Structured SVM with AP loss– Unintuitive Prediction– Loose Upper Bound on Loss– NP-hard Optimization for Cutting Planes

• Carefully design a Latent-AP-SVM– Intuitive Prediction– Tight Upper Bound on Loss– Optimal Efficient Cutting Plane Computation

Two Approaches

Results

Questions?

Code + Data Available

ranking with high-order and missing information

Documents

r sx

output y

rw sx

rw r

rww sx

scores skwfind best

learning xinput x

ap loss