large-scale object recognition with weak supervision
DESCRIPTION
Large-Scale Object Recognition with Weak Supervision. Weiqiang Ren , Chong Wang, Yanhua Cheng, Kaiqi Huang, Tieniu Tan. { wqren,cwang,yhcheng,kqhuang,tnt }@nlpr.ia.ac.cn. Task2 : Classification + Localization. Task 2b: Classification + localization with additional training data - PowerPoint PPT PresentationTRANSCRIPT
Large-Scale Object Recognition with Weak Supervision
Weiqiang Ren, Chong Wang, Yanhua Cheng,
Kaiqi Huang, Tieniu Tan
{wqren,cwang,yhcheng,kqhuang,tnt}@nlpr.ia.ac.cn
Task2 : Classification + Localization
Task 2b: Classification + localization with additional training data— Ordered by classification error
1. Only classification labels are used
2. Full image as object location
Outline
• Motivation
• Method
• Results
Motivation
Knowing where to look, recognizing objects will be easier !
However, in the classification-only task, no annotations of object location are available.
Weakly Supervised Localization
Why Weakly Supervised Localization (WSL)?
Current WSL Results on VOC07
0
5
10
15
20
25
30
35
40
13.915.0
22.422.7
26.226.4
31.633.7
13.9: Weakly supervised object detector learning with model drift detection, ICCV 2011
15.0: Object-centric spatial pooling for image classification, ECCV 2012
22.4: Multi-fold mil training for weakly supervised object localization, CVPR 2014
22.7: On learning to localize objects with minimal supervision, ICML 2014
26.4: Weakly supervised object detection with posterior regularization, BMVC 2014
31.6: Weakly supervised object localization with latent category learning, ECCV 2014 Sep 11, Poster Session 4A, #34
26.2: Discovering Visual Objects in Large-scale Image Datasets with Weak Supervision, submitted to TPAMI
VOC 2007 Results
Ours 31.6
DPM 5.0 33.7
Weakly Supervised Object Localization with Latent Category Learning
ECCV 2014
VOC 2007 Results
Ours 26.2
DPM 5.0 33.7
Discovering Visual Objects in Large-scale Image Datasets with Weak Supervision
Submitted to TPAMI
Our Work
For the consideration of high efficiency in large-scale tasks, we use the second one.
Method
Framework
Conv Layers
FC Layers
…
Input Images
Cls Prediction
Det Prediction
Rescoring
2
1
3
4
1st : CNN Architecture
Chatfield et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets
2nd: MILinear SVM
Good region proposal algorithmsHigh recallHigh overlapSmall numberLow computation cost
MCG pretrained on VOC 2012Additional DataTraining: 128 windows/ imageTesting: 256 windows/imageCompared to Selective Search (~2000)
MILinear : Region Proposal
• Low Level Features– SIFT, LBP, HOG– Shape context, Gabor, …
• Mid-Level Features– Bag of Visual Words (BoVW)
• Deep Hierarchical Features– Convolutional Networks– Deep Auto-Encoders– Deep Belief Nets
MILinear: Feature Representations
• Clustering– KMeans
• Topic Model– pLSA, LDA, gLDA
• CRF• Multiple Instance Learning– DD, EMDD, APR– MI-NN,– MI-SVM, mi-SVM– MILBoost
MILinear: Positive Window Mining
• Multiple instance Linear SVM
• Optimization: trust region Newton– A kind of Quasi Newton method– Working in the primal– Faster convergence
MILinear: Objective Function and Optimization
MILinear: Optimization Efficiency
…
3rd: Detection Rescoring• Rescoring with softmax
……
……
1000 classes
128
boxe
s max
trainsoftmax
…1000 dim 1000 dim
Softmax: consider all the categories simultaneously at each minibatch of the optimization – Suppress the response of other appearance similar object categories
4th: Classification Rescoring
• Linear Combination
(1 )Scls cls WSLS S
…1000 dim
…1000 dim
…1000 dim
One funny thing: We have tried some other strategies of score combination, but it seems not working !
Results
1st: Classification without WSL
Method Top 5 Error
Baseline with one CNN : 13.7
Average with four CNNs: 12.5
2nd: MILinear on ImageNet 2014Methods Detection Error
Baseline (Full Image) 61.96
MILinear 40.96
Winner 25.3
2nd: MILinear on VOC 2007
2nd: MILinear on ILSVRC 2013 detection
mAP: 9.63%! vs 8.99% (DPM5.0)
2nd: MILinear for ClassificationMethods Top 5 Error
Milinear 17.1
3rd: WSL Rescoring (Softmax)Method Top 5 Error
Baseline with one CNN : 13.7
Average with four CNN : 12.5
MILinear 17.1
MILinear + Rescore 13.5
The Softmax based rescoring successfully suppresses the predictions of other appearance similar object categories !
4th: Cls and WSL Combinataion(1 )Scls cls WSLS S
Method Top 5 Error
Baseline with one CNN model: 13.7
Average with four CNN models: 12.5
MILinear 17.1
MILinear + Rescore 13.5
Cls (12.5) + MILinear (13.5) 11.5
WSL and Cls can be complementary to each other!
Russakovsky et al. ImageNet Large Scale Visual Object Challenge.
Conclusion
• WSL always helps classification
• WSL has large potential: WSL data is cheap
Thank You!