crash course on machine learning part iv

54
Crash Course on Machine Learning Part IV Several slides from Derek Hoiem, and Ben Taskar

Upload: shaina

Post on 23-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Crash Course on Machine Learning Part IV. Several slides from Derek Hoiem , and Ben Taskar. What you need to know. Dual SVM formulation How it’s derived The kernel trick Derive polynomial kernel Common kernels Kernelized logistic regression SVMs vs kernel regression - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Crash Course on Machine  Learning Part  IV

Crash Course on Machine LearningPart IV

Several slides from Derek Hoiem, and Ben Taskar

Page 2: Crash Course on Machine  Learning Part  IV

What you need to know• Dual SVM formulation

– How it’s derived• The kernel trick• Derive polynomial kernel• Common kernels• Kernelized logistic regression• SVMs vs kernel regression• SVMs vs logistic regression

Page 3: Crash Course on Machine  Learning Part  IV

Example: Dalal-Triggs pedestrian detector

1. Extract fixed-sized (64x128 pixel) window at each position and scale

2. Compute HOG (histogram of gradient) features within each window

3. Score the window with a linear SVM classifier4. Perform non-maxima suppression to remove

overlapping detections with lower scoresNavneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 4: Crash Course on Machine  Learning Part  IV

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 5: Crash Course on Machine  Learning Part  IV

• Tested with– RGB– LAB– Grayscale

Slightly better performance vs. grayscale

Page 6: Crash Course on Machine  Learning Part  IV

uncentered

centered

cubic-corrected

diagonal

Sobel

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Outperforms

Page 7: Crash Course on Machine  Learning Part  IV

• Histogram of gradient orientations

– Votes weighted by magnitude– Bilinear interpolation between

cells

Orientation: 9 bins (for unsigned angles)

Histograms in 8x8 pixel cells

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 8: Crash Course on Machine  Learning Part  IV

Normalize with respect to surrounding cells

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 9: Crash Course on Machine  Learning Part  IV

X=

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

# features = 15 x 7 x 9 x 4 = 3780

# cells

# orientations

# normalizations by neighboring cells

Page 10: Crash Course on Machine  Learning Part  IV

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

pos w neg w

Page 11: Crash Course on Machine  Learning Part  IV

pedestrian

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 12: Crash Course on Machine  Learning Part  IV

Detection examples

Page 13: Crash Course on Machine  Learning Part  IV

Viola-Jones sliding window detector

Fast detection through two mechanisms• Quickly eliminate unlikely windows• Use features that are fast to compute

Viola and Jones. Rapid Object Detection using a Boosted Cascade of Simple Features (2001).

Page 14: Crash Course on Machine  Learning Part  IV

Cascade for Fast Detection

Examples

Stage 1H1(x) > t1?

Reject

No

YesStage 2

H2(x) > t2?Stage N

HN(x) > tN?

Yes

… Pass

Reject

No

Reject

No

• Choose threshold for low false negative rate• Fast classifiers early in cascade• Slow classifiers later, but most examples don’t get there

Page 15: Crash Course on Machine  Learning Part  IV

Features that are fast to compute

• “Haar-like features”– Differences of sums of intensity– Thousands, computed at various positions and

scales within detection window

Two-rectangle features Three-rectangle features Etc.

-1 +1

Page 16: Crash Course on Machine  Learning Part  IV

Feature selection with Adaboost

• Create a large pool of features (180K)• Select features that are discriminative and work well

together– “Weak learner” = feature + threshold

– Choose weak learner that minimizes error on the weighted training set

– Reweight

Page 17: Crash Course on Machine  Learning Part  IV

Top 2 selected features

Page 18: Crash Course on Machine  Learning Part  IV

Viola Jones Results

MIT + CMU face dataset

Speed = 15 FPS (in 2001)

Page 19: Crash Course on Machine  Learning Part  IV

What about pose estimation?

Page 20: Crash Course on Machine  Learning Part  IV

What about interactions?

Page 21: Crash Course on Machine  Learning Part  IV

3D modeling

Page 22: Crash Course on Machine  Learning Part  IV

Object context

From Divvala et al. CVPR 2009

Page 23: Crash Course on Machine  Learning Part  IV

Integration

• Feature level

• Margin Based– Max margin Structure Learning

• Probabilistic– Graphical Models

Page 24: Crash Course on Machine  Learning Part  IV

Integration

• Feature level

• Margin Based– Max margin Structure Learning

• Probabilistic– Graphical Models

Page 25: Crash Course on Machine  Learning Part  IV

Feature Passing

• Compute features from one estimated scene property to help estimate another

Image X Estimate

Y Estimate

X Features

Y Features

Page 26: Crash Course on Machine  Learning Part  IV

Feature passing: example

ObjectWindow

Below

Above

Use features computed from “geometric context” confidence images to improve object detection

Hoiem et al. ICCV 2005

Features: average confidence within each window

Page 27: Crash Course on Machine  Learning Part  IV

Scene Understanding1

1

0

0

0

0

0

0

0

0

Recognition using Visual Phrases , CVPR 2011

Page 28: Crash Course on Machine  Learning Part  IV

Feature Design

Above

Beside

Below

Recognition using Visual Phrases , CVPR 2011

Page 29: Crash Course on Machine  Learning Part  IV

Feature Passing

• Pros and cons– Simple training and inference– Very flexible in modeling interactions– Not modular

• if we get a new method for first estimates, we may need to retrain

Page 30: Crash Course on Machine  Learning Part  IV

Integration

• Feature Passing

• Margin Based– Max margin Structure Learning

• Probabilistic– Graphical Models

Page 31: Crash Course on Machine  Learning Part  IV

Structured Prediction• Prediction of complex outputs

– Structured outputs: multivariate, correlated, constrained

• Novel, general way to solve many learning problems

Page 32: Crash Course on Machine  Learning Part  IV

Structure1

1

0

0

0

0

0

0

0

0

Recognition using Visual Phrases , CVPR 2011

Page 33: Crash Course on Machine  Learning Part  IV

Handwriting Recognition

brace

Sequential structure

x y

Page 34: Crash Course on Machine  Learning Part  IV

Object Segmentation

Spatial structure

x y

Page 35: Crash Course on Machine  Learning Part  IV

Scene Parsing

Recursive structure

Page 36: Crash Course on Machine  Learning Part  IV

Bipartite Matching

What is the anticipated cost of collecting fees under the new proposal?

En vertu des nouvelles propositions, quel est le coût prévu de perception des droits?

x yWhat

is the

anticipatedcost

ofcollecting

fees under

the new

proposal?

En vertu delesnouvelles propositions, quel est le coût prévu de perception de les droits?

Combinatorial structure

Page 37: Crash Course on Machine  Learning Part  IV

Local Prediction

Classify using local information Ignores correlations & constraints!

b r ea c

Page 38: Crash Course on Machine  Learning Part  IV

Local Predictionbuildingtreeshrubground

Page 39: Crash Course on Machine  Learning Part  IV

Structured Prediction

• Use local information • Exploit correlations

b r ea c

Page 40: Crash Course on Machine  Learning Part  IV

Structured Predictionbuildingtreeshrubground

Page 41: Crash Course on Machine  Learning Part  IV

Structured Models

Mild assumptions:

linear combination

sum of part scores

space of feasible outputs

scoring function

Page 42: Crash Course on Machine  Learning Part  IV

Supervised Structured Prediction

Learning Prediction

Estimate w

Example:Weighted matching

Generally: Combinatorial

optimization

Data

Model:

Likelihood(can be intractable)

MarginLocal(ignores structure)

Page 43: Crash Course on Machine  Learning Part  IV

Local Estimation

• Treat edges as independent decisions

• Estimate w locally, use globally– E.g., naïve Bayes, SVM, logistic regression – Cf. [Matusov+al, 03] for matchings

– Simple and cheap– Not well-calibrated for matching model– Ignores correlations & constraints

Data

Model:

Page 44: Crash Course on Machine  Learning Part  IV

Conditional Likelihood Estimation

• Estimate w jointly:

• Denominator is #P-complete [Valiant 79, Jerrum & Sinclair 93]

• Tractable model, intractable learning

• Need tractable learning method margin-based estimation

Data

Model:

Page 45: Crash Course on Machine  Learning Part  IV

• We want:

• Equivalently:

Structured large margin estimation

a lot!…

“brace”

“brace”

“aaaaa”

“brace” “aaaab”

“brace” “zzzzz”

Page 46: Crash Course on Machine  Learning Part  IV

Structured Loss

b c a r e b r o r e b r o c eb r a c e

2 2 10

Page 47: Crash Course on Machine  Learning Part  IV

Large margin estimation

• Given training examples , we want:

Maximize margin

Mistake weighted margin:

# of mistakes in y

*Collins 02, Altun et al 03, Taskar 03

Page 48: Crash Course on Machine  Learning Part  IV

Large margin estimation

• Eliminate

• Add slacks for inseparable case (hinge loss)

Page 49: Crash Course on Machine  Learning Part  IV

Large margin estimation• Brute force enumeration

• Min-max formulation

– ‘Plug-in’ linear program for inference

Page 50: Crash Course on Machine  Learning Part  IV

Min-max formulation

LP Inference

Structured loss (Hamming):

Inference

discrete optim.

Key step:

continuous optim.

Page 51: Crash Course on Machine  Learning Part  IV

Matching Inference LP

degree

Whatis

theanticipated

costof

collecting fees

under the

new proposal

?

En vertu delesnouvelles propositions, quel est le coût prévu de perception de le droits?

j

k

Need Hamming-like loss

Page 52: Crash Course on Machine  Learning Part  IV

LP Duality• Linear programming duality

– Variables constraints– Constraints variables

• Optimal values are the same– When both feasible regions are bounded

Page 53: Crash Course on Machine  Learning Part  IV

Min-max Formulation

LP duality

Page 54: Crash Course on Machine  Learning Part  IV

Min-max formulation summary

*Taskar et al 04