crash course on machine learning part iv
DESCRIPTION
Crash Course on Machine Learning Part IV. Several slides from Derek Hoiem , and Ben Taskar. What you need to know. Dual SVM formulation How it’s derived The kernel trick Derive polynomial kernel Common kernels Kernelized logistic regression SVMs vs kernel regression - PowerPoint PPT PresentationTRANSCRIPT
Crash Course on Machine LearningPart IV
Several slides from Derek Hoiem, and Ben Taskar
What you need to know• Dual SVM formulation
– How it’s derived• The kernel trick• Derive polynomial kernel• Common kernels• Kernelized logistic regression• SVMs vs kernel regression• SVMs vs logistic regression
Example: Dalal-Triggs pedestrian detector
1. Extract fixed-sized (64x128 pixel) window at each position and scale
2. Compute HOG (histogram of gradient) features within each window
3. Score the window with a linear SVM classifier4. Perform non-maxima suppression to remove
overlapping detections with lower scoresNavneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
• Tested with– RGB– LAB– Grayscale
Slightly better performance vs. grayscale
uncentered
centered
cubic-corrected
diagonal
Sobel
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Outperforms
• Histogram of gradient orientations
– Votes weighted by magnitude– Bilinear interpolation between
cells
Orientation: 9 bins (for unsigned angles)
Histograms in 8x8 pixel cells
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Normalize with respect to surrounding cells
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
X=
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
# features = 15 x 7 x 9 x 4 = 3780
# cells
# orientations
# normalizations by neighboring cells
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
pos w neg w
pedestrian
Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05
Detection examples
Viola-Jones sliding window detector
Fast detection through two mechanisms• Quickly eliminate unlikely windows• Use features that are fast to compute
Viola and Jones. Rapid Object Detection using a Boosted Cascade of Simple Features (2001).
Cascade for Fast Detection
Examples
Stage 1H1(x) > t1?
Reject
No
YesStage 2
H2(x) > t2?Stage N
HN(x) > tN?
Yes
… Pass
Reject
No
Reject
No
• Choose threshold for low false negative rate• Fast classifiers early in cascade• Slow classifiers later, but most examples don’t get there
Features that are fast to compute
• “Haar-like features”– Differences of sums of intensity– Thousands, computed at various positions and
scales within detection window
Two-rectangle features Three-rectangle features Etc.
-1 +1
Feature selection with Adaboost
• Create a large pool of features (180K)• Select features that are discriminative and work well
together– “Weak learner” = feature + threshold
– Choose weak learner that minimizes error on the weighted training set
– Reweight
Top 2 selected features
Viola Jones Results
MIT + CMU face dataset
Speed = 15 FPS (in 2001)
What about pose estimation?
What about interactions?
3D modeling
Object context
From Divvala et al. CVPR 2009
Integration
• Feature level
• Margin Based– Max margin Structure Learning
• Probabilistic– Graphical Models
Integration
• Feature level
• Margin Based– Max margin Structure Learning
• Probabilistic– Graphical Models
Feature Passing
• Compute features from one estimated scene property to help estimate another
Image X Estimate
Y Estimate
X Features
Y Features
Feature passing: example
ObjectWindow
Below
Above
Use features computed from “geometric context” confidence images to improve object detection
Hoiem et al. ICCV 2005
Features: average confidence within each window
Scene Understanding1
1
0
0
0
0
0
0
0
0
Recognition using Visual Phrases , CVPR 2011
Feature Design
Above
Beside
Below
Recognition using Visual Phrases , CVPR 2011
Feature Passing
• Pros and cons– Simple training and inference– Very flexible in modeling interactions– Not modular
• if we get a new method for first estimates, we may need to retrain
Integration
• Feature Passing
• Margin Based– Max margin Structure Learning
• Probabilistic– Graphical Models
Structured Prediction• Prediction of complex outputs
– Structured outputs: multivariate, correlated, constrained
• Novel, general way to solve many learning problems
Structure1
1
0
0
0
0
0
0
0
0
Recognition using Visual Phrases , CVPR 2011
Handwriting Recognition
brace
Sequential structure
x y
Object Segmentation
Spatial structure
x y
Scene Parsing
Recursive structure
Bipartite Matching
What is the anticipated cost of collecting fees under the new proposal?
En vertu des nouvelles propositions, quel est le coût prévu de perception des droits?
x yWhat
is the
anticipatedcost
ofcollecting
fees under
the new
proposal?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de les droits?
Combinatorial structure
Local Prediction
Classify using local information Ignores correlations & constraints!
b r ea c
Local Predictionbuildingtreeshrubground
Structured Prediction
• Use local information • Exploit correlations
b r ea c
Structured Predictionbuildingtreeshrubground
Structured Models
Mild assumptions:
linear combination
sum of part scores
space of feasible outputs
scoring function
Supervised Structured Prediction
Learning Prediction
Estimate w
Example:Weighted matching
Generally: Combinatorial
optimization
Data
Model:
Likelihood(can be intractable)
MarginLocal(ignores structure)
Local Estimation
• Treat edges as independent decisions
• Estimate w locally, use globally– E.g., naïve Bayes, SVM, logistic regression – Cf. [Matusov+al, 03] for matchings
– Simple and cheap– Not well-calibrated for matching model– Ignores correlations & constraints
Data
Model:
Conditional Likelihood Estimation
• Estimate w jointly:
• Denominator is #P-complete [Valiant 79, Jerrum & Sinclair 93]
• Tractable model, intractable learning
• Need tractable learning method margin-based estimation
Data
Model:
• We want:
• Equivalently:
Structured large margin estimation
a lot!…
“brace”
“brace”
“aaaaa”
“brace” “aaaab”
“brace” “zzzzz”
Structured Loss
b c a r e b r o r e b r o c eb r a c e
2 2 10
Large margin estimation
• Given training examples , we want:
Maximize margin
Mistake weighted margin:
# of mistakes in y
*Collins 02, Altun et al 03, Taskar 03
Large margin estimation
• Eliminate
• Add slacks for inseparable case (hinge loss)
Large margin estimation• Brute force enumeration
• Min-max formulation
– ‘Plug-in’ linear program for inference
Min-max formulation
LP Inference
Structured loss (Hamming):
Inference
discrete optim.
Key step:
continuous optim.
Matching Inference LP
degree
Whatis
theanticipated
costof
collecting fees
under the
new proposal
?
En vertu delesnouvelles propositions, quel est le coût prévu de perception de le droits?
j
k
Need Hamming-like loss
LP Duality• Linear programming duality
– Variables constraints– Constraints variables
• Optimal values are the same– When both feasible regions are bounded
Min-max Formulation
LP duality
Min-max formulation summary
*Taskar et al 04