loss-based learning with latent variables m. pawan kumar École centrale paris École des ponts...
TRANSCRIPT
![Page 1: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/1.jpg)
Loss-based Learningwith Latent Variables
M. Pawan KumarÉcole Centrale Paris
École des Ponts ParisTechINRIA Saclay, Île-de-France
Joint work withBen Packer, Daphne Koller, Kevin Miller and Danny Goodman
![Page 2: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/2.jpg)
Image Classification
Images xi Labels yi
Bison
Deer
Elephant
Giraffe
Llama
Rhino
Boxes hi
y = “Deer”
Image x
![Page 3: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/3.jpg)
Image Classification
0.00 0.00 0.000.00 0.75 0.000.00 0.00 0.00
Feature Ψ(x,y,h) (e.g. HOG)
Score f : Ψ(x,y,h) (-∞, +∞)
f (Ψ(x,y1,h))
![Page 4: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/4.jpg)
Image Classification
0.00 0.23 0.000.00 0.00 0.010.01 0.00 0.00
f (Ψ(x,y2,h))
0.00 0.00 0.000.00 0.75 0.000.00 0.00 0.00
f (Ψ(x,y1,h))
Feature Ψ(x,y,h) (e.g. HOG)
Score f : Ψ(x,y,h) (-∞, +∞)
Prediction y(f)
Learn f
![Page 5: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/5.jpg)
Loss-based Learning
User defined loss function Δ(y,y(f))
f* = argminf Σi Δ(yi,yi(f))
Minimize loss between predicted and ground-truth output
No restriction on the loss function
General framework (object detection, segmentation, …)
![Page 6: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/6.jpg)
• Latent SVM
• Max-Margin Min-Entropy Models
• Dissimilarity Coefficient Learning
Outline
Andrews et al., NIPS 2001; Smola et al., AISTATS 2005;Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009
![Page 7: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/7.jpg)
Image Classification
Images xi Labels yi
Bison
Deer
Elephant
Giraffe
Llama
Rhino
Boxes hi
y = “Deer”
Image x
![Page 8: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/8.jpg)
Latent SVM
Scoring function
wTΨ(x,y,h)
Prediction
y(w),h(w) = argmaxy,h wTΨ(x,y,h))
Parameters Features
![Page 9: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/9.jpg)
Learning Latent SVM
Training data {(xi,yi), i = 1,2,…,n}
Highly non-convex in w
Cannot regularize w to prevent overfitting
w* = argminw Σi Δ(yi,yi(w))
![Page 10: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/10.jpg)
Learning Latent SVM
Δ(yi,yi(w))wTΨ(x,yi(w),hi(w)) + - wTΨ(x,yi(w),hi(w))
Δ(yi,yi(w))≤ wTΨ(x,yi(w),hi(w)) + - maxhi wTΨ(x,yi,hi)
Δ(yi,y)}≤ maxy,h{wTΨ(x,y,h) + - maxhi wTΨ(x,yi,hi)
Training data {(xi,yi), i = 1,2,…,n}
![Page 11: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/11.jpg)
Learning Latent SVM
Training data {(xi,yi), i = 1,2,…,n}
minw ||w||2 + C Σiξi
wTΨ(xi,y,h) + Δ(yi,y) - maxhi wTΨ(xi,yi,hi) ≤ ξi
Difference-of-convex program in w
Local minimum or saddle point solution (CCCP)
Self-Paced Learning, NIPS 2010
![Page 12: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/12.jpg)
Recap
minw ||w||2 + C Σiξi
wTΨ(xi,y,h) + Δ(yi,y) - maxhi wTΨ(xi,yi,hi) ≤ ξi
Scoring function
wTΨ(x,y,h)
Prediction
y(w),h(w) = argmaxy,h wTΨ(x,y,h))
Learning
![Page 13: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/13.jpg)
Image Classification
Images xi Labels yi
Bison
Deer
Elephant
Giraffe
Llama
Rhino
Boxes hi
y = “Deer”
Image x
![Page 14: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/14.jpg)
Image Classification
0.00 0.00 0.250.00 0.25 0.000.25 0.00 0.00
Score wTΨ(x,y,h) (-∞, +∞)
wTΨ(x,y1,h)
![Page 15: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/15.jpg)
Image Classification
0.00 0.24 0.000.00 0.00 0.000.01 0.00 0.00
wTΨ(x,y2,h)
0.00 0.00 0.250.00 0.25 0.000.25 0.00 0.00
wTΨ(x,y1,h)
Only maximum score used
No other useful cue?
Score wTΨ(x,y,h) (-∞, +∞)
Uncertainty in h
![Page 16: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/16.jpg)
• Latent SVM
• Max-Margin Min-Entropy (M3E) Models
• Dissimilarity Coefficient Learning
Outline
Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012
![Page 17: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/17.jpg)
M3E
Scoring function
Pw(y,h|x) = exp(wTΨ(x,y,h))/Z(x)
Prediction
y(w) = argminy Hα(Pw(h|y,x)) – log Pw(y|x)
Partition Function
MarginalizedProbability
RényiEntropy
Rényi Entropy of Generalized Distribution Gα(y;x,w)
![Page 18: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/18.jpg)
Rényi Entropy
Gα(y;x,w) =1
1-αlog
Σh Pw(y,h|x)α
Σh Pw(y,h|x)
α = 1. Shannon Entropy of Generalized Distribution
- Σh Pw(y,h|x) log(Pw(y,h|x))
Σh Pw(y,h|x)
![Page 19: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/19.jpg)
Rényi Entropy
Gα(y;x,w) =1
1-αlog
Σh Pw(y,h|x)α
Σh Pw(y,h|x)
α = Infinity. Minimum Entropy of Generalized Distribution
- maxh log(Pw(y,h|x))
![Page 20: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/20.jpg)
Rényi Entropy
Gα(y;x,w) =1
1-αlog
Σh Pw(y,h|x)α
Σh Pw(y,h|x)
α = Infinity. Minimum Entropy of Generalized Distribution
- maxh wTΨ(x,y,h)
Same prediction as latent SVM
![Page 21: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/21.jpg)
Learning M3E
Training data {(xi,yi), i = 1,2,…,n}
Highly non-convex in w
Cannot regularize w to prevent overfitting
w* = argminw Σi Δ(yi,yi(w))
![Page 22: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/22.jpg)
Learning M3E
Δ(yi,yi(w))Gα(yi(w);xi,w) +
Training data {(xi,yi), i = 1,2,…,n}
- Gα(yi(w);xi,w)
Δ(yi,yi(w))≤ Gα(yi;xi,w) + - Gα(yi(w);xi,w)
maxy{Δ(yi,y)≤ Gα(yi;xi,w) + - Gα(y;xi,w)}
![Page 23: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/23.jpg)
Learning M3E
Training data {(xi,yi), i = 1,2,…,n}
minw ||w||2 + C Σiξi
Gα(yi;xi,w) + Δ(yi,y) – Gα(y;xi,w) ≤ ξi
When α tends to infinity, M3E = Latent SVM
Other values can give better results
![Page 24: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/24.jpg)
Image Classification
271 images, 6 classes
90/10 train/test split
5 folds
Mammals Dataset
0/1 loss
![Page 25: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/25.jpg)
Image Classification
HOG-Based Model. Dalal and Triggs, 2005
![Page 26: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/26.jpg)
Motif Finding
~ 40,000 sequences
50/50 train/test split
5 Proteins, 5 folds
UniProbe Dataset
Binding vs. Not-Binding
![Page 27: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/27.jpg)
Motif + Markov Background Model. Yu and Joachims, 2009
Motif Finding
![Page 28: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/28.jpg)
Recap
Scoring function
Prediction
Learning
Pw(y,h|x) = exp(wTΨ(x,y,h))/Z(x)
y(w) = argminy Gα(y;x,w)
minw ||w||2 + C Σiξi
Gα(yi;xi,w) + Δ(yi,y) – Gα(y;xi,w) ≤ ξi
![Page 29: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/29.jpg)
• Latent SVM
• Max-Margin Min-Entropy Models
• Dissimilarity Coefficient Learning
Outline
Kumar, Packer and Koller, ICML 2012
![Page 30: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/30.jpg)
Object Detection
Images xi Labels yi
Bison
Deer
Elephant
Giraffe
Llama
Rhino
Boxes hi
y = “Deer”
Image x
![Page 31: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/31.jpg)
Minimizing General Loss
minw Σi Δ(yi,hi,yi(w),hi(w))
Unknown latent variable values
Supervised Samples
+ Σi Δ’(yi,yi(w),hi(w))Weakly
Supervised Samples
![Page 32: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/32.jpg)
Minimizing General Loss
minw Σi Δ(yi,hi,yi(w),hi(w))
A single distribution to achieve two objectives
Pw(hi|xi,yi)Σhi
![Page 33: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/33.jpg)
Problem
Model Uncertainty in Latent Variables
Model Accuracy of Latent Variable Predictions
![Page 34: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/34.jpg)
Solution
Model Uncertainty in Latent Variables
Model Accuracy of Latent Variable Predictions
Use two different distributions for the two different tasks
![Page 35: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/35.jpg)
Solution
Model Accuracy of Latent Variable Predictions
Use two different distributions for the two different tasks
Pθ(hi|yi,xi)
hi
![Page 36: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/36.jpg)
SolutionUse two different distributions for the two different tasks
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
![Page 37: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/37.jpg)
The Ideal CaseNo latent variable uncertainty, correct prediction
hi
Pw(yi,hi|xi)
(yi,hi)(yi,hi(w))
Pθ(hi|yi,xi)
hi(w)
![Page 38: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/38.jpg)
In PracticeRestrictions in the representation power of models
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
![Page 39: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/39.jpg)
Our FrameworkMinimize the dissimilarity between the two distributions
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
User-defined dissimilarity measure
![Page 40: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/40.jpg)
Our FrameworkMinimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)
![Page 41: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/41.jpg)
Our FrameworkMinimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
- β Σh,h’ Δ(yi,h,yi,h’)Pθ(h|yi,xi)Pθ(h’|yi,xi)
Hi(w,θ)
![Page 42: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/42.jpg)
Our FrameworkMinimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
- (1-β) Δ(yi(w),hi(w),yi(w),hi(w))
- β Hi(θ,θ)Hi(w,θ)
![Page 43: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/43.jpg)
Our FrameworkMinimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi)(yi(w),hi(w))
Pθ(hi|yi,xi)
- β Hi(θ,θ)Hi(w,θ)minw,θ Σi
![Page 44: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/44.jpg)
Optimization
minw,θ Σi Hi(w,θ) - β Hi(θ,θ)
Initialize the parameters to w0 and θ0
Repeat until convergence
End
Fix w and optimize θ
Fix θ and optimize w
![Page 45: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/45.jpg)
Optimization of θ
minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case I: yi(w) = yi
hi(w)
![Page 46: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/46.jpg)
Optimization of θ
minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case I: yi(w) = yi
hi(w)
![Page 47: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/47.jpg)
Optimization of θ
minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case II: yi(w) ≠ yi
![Page 48: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/48.jpg)
Optimization of θ
minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case II: yi(w) ≠ yi
Stochastic subgradient descent
![Page 49: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/49.jpg)
Optimization of w
minw Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)
Expected loss, models uncertainty
Form of optimization similar to Latent SVM
Δ independent of h, implies latent SVM
Concave-Convex Procedure (CCCP)
![Page 50: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/50.jpg)
Object Detection
Bison
Deer
Elephant
Giraffe
Llama
Rhino
Input x
Output y = “Deer”
Latent Variable h
Mammals Dataset
60/40 Train/Test Split
5 Folds
Train Input xi Output yi
![Page 51: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/51.jpg)
Results – 0/1 Loss
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Average Test Loss
LSVMOur
Statistically Significant
![Page 52: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/52.jpg)
Results – Overlap Loss
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50
0.1
0.2
0.3
0.4
0.5
0.6
Average Test Loss
LSVMOur
![Page 53: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/53.jpg)
Action DetectionInput x
Output y = “Using Computer”
Latent Variable h
PASCAL VOC 2011
60/40 Train/Test Split
5 Folds
Jumping
Phoning
Playing Instrument
Reading
Riding Bike
Riding Horse
Running
Taking Photo
Using Computer
Walking
Train Input xi Output yi
![Page 54: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/54.jpg)
Results – 0/1 Loss
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50
0.2
0.4
0.6
0.8
1
1.2
Average Test Loss
LSVMOur
Statistically Significant
![Page 55: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/55.jpg)
Results – Overlap Loss
Fold 1 Fold 2 Fold 3 Fold 4 Fold 50.62
0.64
0.66
0.68
0.7
0.72
0.74
Average Test Loss
LSVMOur
Statistically Significant
![Page 56: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/56.jpg)
• Latent SVM ignores latent variable uncertainty
• M3E for latent variable independent loss
• DISC for latent variable dependent loss
• Strict generalizations of Latent SVM
• Code available online
Conclusions
![Page 57: Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben](https://reader030.vdocuments.mx/reader030/viewer/2022032313/56649e715503460f94b6f430/html5/thumbnails/57.jpg)
Questions?
http://cvc.centrale-ponts.fr/personnel/pawan