crash course on machine learning part iv

Crash Course on Machine LearningPart IV

Several slides from Derek Hoiem, and Ben Taskar

What you need to know• Dual SVM formulation

– How it’s derived• The kernel trick• Derive polynomial kernel• Common kernels• Kernelized logistic regression• SVMs vs kernel regression• SVMs vs logistic regression

Example: Dalal-Triggs pedestrian detector

1. Extract fixed-sized (64x128 pixel) window at each position and scale

2. Compute HOG (histogram of gradient) features within each window

3. Score the window with a linear SVM classifier4. Perform non-maxima suppression to remove

overlapping detections with lower scoresNavneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

• Tested with– RGB– LAB– Grayscale

Slightly better performance vs. grayscale

uncentered

centered

cubic-corrected

diagonal

Sobel


Outperforms

• Histogram of gradient orientations

– Votes weighted by magnitude– Bilinear interpolation between

cells

Orientation: 9 bins (for unsigned angles)

Histograms in 8x8 pixel cells


Normalize with respect to surrounding cells


X=


# features = 15 x 7 x 9 x 4 = 3780

# cells

# orientations

# normalizations by neighboring cells


pos w neg w

pedestrian


Detection examples

Viola-Jones sliding window detector

Fast detection through two mechanisms• Quickly eliminate unlikely windows• Use features that are fast to compute

Viola and Jones. Rapid Object Detection using a Boosted Cascade of Simple Features (2001).

http://www.cs.ubc.ca/~lowe/425/violaJones01.pdf

Cascade for Fast Detection

Examples

Stage 1H1(x) > t1?

Reject

No

YesStage 2

H2(x) > t2?Stage N

HN(x) > tN?

Yes

… Pass

Reject

No

Reject

No

• Choose threshold for low false negative rate• Fast classifiers early in cascade• Slow classifiers later, but most examples don’t get there

Features that are fast to compute

• “Haar-like features”– Differences of sums of intensity– Thousands, computed at various positions and

scales within detection window

Two-rectangle features Three-rectangle features Etc.

-1 +1

Feature selection with Adaboost

• Create a large pool of features (180K)• Select features that are discriminative and work well

together– “Weak learner” = feature + threshold

– Choose weak learner that minimizes error on the weighted training set

– Reweight

Top 2 selected features

Viola Jones Results

MIT + CMU face dataset

Speed = 15 FPS (in 2001)

What about pose estimation?

What about interactions?

3D modeling

Object context

From Divvala et al. CVPR 2009

Integration

• Feature level

• Margin Based– Max margin Structure Learning

• Probabilistic– Graphical Models

Feature Passing

• Compute features from one estimated scene property to help estimate another

Image X Estimate

Y Estimate

X Features

Y Features

Feature passing: example

ObjectWindow

Below

Above

Use features computed from “geometric context” confidence images to improve object detection

Hoiem et al. ICCV 2005

Features: average confidence within each window

Scene Understanding1

1

0

0

0

0

0

0

0

0

Recognition using Visual Phrases , CVPR 2011

Feature Design

Above

Beside

Below


Feature Passing

• Pros and cons– Simple training and inference– Very flexible in modeling interactions– Not modular

• if we get a new method for first estimates, we may need to retrain

Integration

• Feature Passing

• Margin Based– Max margin Structure Learning

• Probabilistic– Graphical Models

Structured Prediction• Prediction of complex outputs

– Structured outputs: multivariate, correlated, constrained

• Novel, general way to solve many learning problems

Structure1

1

0

0

0

0

0

0

0

0


Handwriting Recognition

brace

Sequential structure

x y

Object Segmentation

Spatial structure

x y

Scene Parsing

Recursive structure

Bipartite Matching

What is the anticipated cost of collecting fees under the new proposal?

En vertu des nouvelles propositions, quel est le coût prévu de perception des droits?

x yWhat

is the

anticipatedcost

ofcollecting

fees under

the new

proposal?

En vertu delesnouvelles propositions, quel est le coût prévu de perception de les droits?

Combinatorial structure

Local Prediction

Classify using local information Ignores correlations & constraints!

b r ea c

Local Predictionbuildingtreeshrubground

Structured Prediction

• Use local information • Exploit correlations

b r ea c

Structured Predictionbuildingtreeshrubground

Structured Models

Mild assumptions:

linear combination

sum of part scores

space of feasible outputs

scoring function

Supervised Structured Prediction

Learning Prediction

Estimate w

Example:Weighted matching

Generally: Combinatorial

optimization

Data

Model:

Likelihood(can be intractable)

MarginLocal(ignores structure)

Local Estimation

• Treat edges as independent decisions

• Estimate w locally, use globally– E.g., naïve Bayes, SVM, logistic regression – Cf. [Matusov+al, 03] for matchings

– Simple and cheap– Not well-calibrated for matching model– Ignores correlations & constraints

Data

Model:

Conditional Likelihood Estimation

• Estimate w jointly:

• Denominator is #P-complete [Valiant 79, Jerrum & Sinclair 93]

• Tractable model, intractable learning

• Need tractable learning method margin-based estimation

Data

Model:

• We want:

• Equivalently:

Structured large margin estimation

a lot!…

“brace”

“brace”

“aaaaa”

“brace” “aaaab”

“brace” “zzzzz”

Structured Loss

b c a r e b r o r e b r o c eb r a c e

2 2 10

Large margin estimation

• Given training examples , we want:

Maximize margin

Mistake weighted margin:

# of mistakes in y

*Collins 02, Altun et al 03, Taskar 03

Large margin estimation

• Eliminate

• Add slacks for inseparable case (hinge loss)

Large margin estimation• Brute force enumeration

• Min-max formulation

– ‘Plug-in’ linear program for inference

Min-max formulation

LP Inference

Structured loss (Hamming):

Inference

discrete optim.

Key step:

continuous optim.

Matching Inference LP

degree

Whatis

theanticipated

costof

collecting fees

under the

new proposal

?

En vertu delesnouvelles propositions, quel est le coût prévu de perception de le droits?

j

k

Need Hamming-like loss

LP Duality• Linear programming duality

– Variables constraints– Constraints variables

• Optimal values are the same– When both feasible regions are bounded

Min-max Formulation

LP duality

Min-max formulation summary

*Taskar et al 04

crash course on machine learning part iv

Documents

human detection

bill triggs

pete barnumnavneet dalal

window detectorfast

rapid object detection

cvpr05 slides

kselect features

cvpr05 x