Optimizing Average Precision using Weakly Supervised DataAseem Behl
1, C.V. Jawahar
1 and M. Pawan Kumar
2
1IIIT Hyderabad, India,
2Ecole Centrale Paris & INRIA Saclay, France
C
C
Aim
To estimate accurate model parameters by optimizing average
precision with weakly supervised data
Disadvantages:
Prediction: LSSVM uses an unintuitive prediction
Learning: LSSVM optimizes a loose upper-bound on the AP-loss
Optimization: Exact loss-augmented inference is computationally inefficient
Learning: Compares scores between 2 different sets of annotation
CCCP Algorithm – Guarantees local optimum solution
Latent AP-SVM
Step 1: Find the best hi for each sample
Step 2: Sort samples according to best scores
Results
Action Classification
• 5-fold cross validation
• t-test performed
• increase in performance
• 6/10 classes over LSVM
• 7/10 classes over LSSVM
• Overall improvement:
• 5% over LSVM
• 4% over LSSVM
• Performance on test set
• increase in performance
• all classes over LSVM
• 8/10 classes over LSSVM
• Overall improvement:
• 5.1% over LSVM
• 3.7% over LSSVM
Notation
x h
y = “Using Computer”
1, xi ranked higher than xj
Y: ranking matrix, st. Yij= 0, xi & xj are ranked same
-1, xi ranked lower than xj
X: Input {xi= 1,..,n}
{Hp: Additional unknown information for positives {hi, i P} ∈
HN: Additional information for negatives {hj, j N} ∈
∆(Y, Y∗
): AP-loss = 1 − AP(Y, Y∗
)
Latent Structural SVM (LSSVM)
Prediction:
Prediction:
Learning: Compares scores between same sets of additional annotation
• Constraints of latent AP-SVM are a subset of LSSVM constraints
• Optimal solution of latent AP-SVM has a lower objective than LSSVM solution
• Latent AP-SVM provides a valid upper-bound on the AP-loss
1. Initialize the set of parameters w0
2. Repeat until convergence
3.Imputation of the additional annotations for positives
4. Parameter update using cutting plane algorithm.
Code and data available at: http://cvit.iiit.ac.in/projects/lapsvm/
Travel grant provided by Microsoft Research India.
Dataset- PASCAL VOC 2011 action classification dataset
4846 images depicting 10 action classes
2424 ‘trainval’ images and 2422 ‘test’ images
Problem formulation- x: image of person performing action
h: bounding box of the person
y: action classFeatures- 2400 activation scores of action-specific poselets &
4 object activation scores
NegativesPositives
NegativesPositives
Optimization
Hopt = argmaxH wT
Ψ(X,Y,H)
Yopt = argmaxY wT
Ψ(X,Y,Hopt)
(Yopt,Hopt) = maxY,H wT
Ψ(X,Y,H)
minw ½ ||w||2
+ Cξ
s.t. Y,∀ H : maxĤ{wT
Ψ(X,Y*,Ĥ)} - w
TΨ(X,Y,H) ≥ Δ(Y,Y
*) - ξ
minw ½ ||w||2
+ Cξ
s.t. Y,∀ HN : maxHp{wT
Ψ(X,Y*,{HP,HN}) - w
TΨ(X,Y,{HP,HN})} ≥ Δ(Y,Y
*) - ξ
AP-SVM
• AP-SVM optimizes the correct AP-loss function as opposed to 0/1 loss.
• AP-loss depends on the ranking of the samples
AP-loss = 0.24
0-1 loss = 0.40
AP-loss = 0.36
0-1 loss = 0.40
• AP is the most commonly used accuracy measure for binary classification
Learning:
Prediction: Yopt = maxY wT
Ψ(X,Y)
minw ½ ||w||2
+ Cξ
s.t. Y : w∀T
Ψ(X,Y*) - w
TΨ(X,Y) ≥ Δ(Y,Y
*) - ξ
Optimizing correct loss function is important for weakly supervised learning
We also get improved results on the IIIT 5K-WORD dataset
and PASCAL VOC 2007 object detection dataset
Independently choose additional annotation HP
Complexity: O(nP.|H|)
Maximize over HN and Y independently
Complexity: O(nP.nN)
Latent AP-SVM provides a tighter upper-bound on the AP Loss
AP(Y, Y∗
) = AP of ranking Y
• 0-1 loss depends only on the number of incorrectly classified samples