recognition ii ali farhadi. we have talked about nearest neighbor naïve bayes logistic regression...

36
Recognition II Ali Farhadi

Upload: susan-foster

Post on 04-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Recognition II

Ali Farhadi

Page 2: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

We have talked about

• Nearest Neighbor • Naïve Bayes• Logistic Regression• Boosting

Page 3: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Support Vector Machines

Page 4: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Linear classifiers – Which line is better?

Data

Example i

w. = j w(j) x(j)

Page 5: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Pick the one with the largest margin!

w.x = j w(j) x(j)

w.x

+ b

= 0

w.x

+ b

> 0

w.x

+ b

>>

0

w.x

+ b

< 0

w.x

+ b

<<

0

γ

γ

γ

γ

Margin: measures height of w.x+b plane at each point, increases with distance

Max Margin: two equivalent forms

(1)

(2)

Page 6: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

How many possible solutions?

w.x

+ b

= 0

Any other ways of writing the same dividing line?• w.x + b = 0• 2w.x + 2b = 0• 1000w.x + 1000b = 0• ….• Any constant scaling has the same

intersection with z=0 plane, so same dividing line!

Do we really want to max γ,w,b?

Page 7: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Review: Normal to a plane

w.x

+ b

= 0

Key Terms

-- projection of xj onto w

-- unit vector normal to w

Page 8: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Idea: constrained margin

w.x

+ b

= +

1

w.x

+ b

= -1

w.x

+ b

= 0

x-x+

Final result: can maximize constrained margin by minimizing ||w||2!!!

Generally:

γ

Assume: x+ on positive line, x- on negative

Page 9: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Max margin using canonical hyperplanes

The assumption of canonical hyperplanes (at +1 and -1) changes the objective and the constraints!

w.x

+ b

= +

1

w.x

+ b

= -1

w.x

+ b

= 0

x-x+

γ

Page 10: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Support vector machines (SVMs)

• Solve efficiently by quadratic programming (QP)– Well-studied solution algorithms– Not simple gradient ascent, but close

• Hyperplane defined by support vectors– Could use them as a lower-dimension

basis to write down line, although we haven’t seen how yet

– More on this later

w.x

+ b

= +

1

w.x

+ b

= -1

w.x

+ b

= 0

margin 2

Support Vectors:• data points on the

canonical lines

Non-support Vectors:• everything else• moving them will not

change w

Page 11: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

What if the data is not linearly separable?

Add More Features!!!

What about overfitting?

Page 12: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

What if the data is still not linearly separable?

• First Idea: Jointly minimize w.w and number of training mistakes– How to tradeoff two criteria?– Pick C on development / cross validation

• Tradeoff #(mistakes) and w.w– 0/1 loss– Not QP anymore– Also doesn’t distinguish near misses and

really bad mistakes– NP hard to find optimal solution!!!

+ C #(mistakes)

Page 13: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Slack variables – Hinge loss

For each data point:• If margin ≥ 1, don’t care• If margin < 1, pay linear penalty

w.x

+ b

= +

1

w.x

+ b

= -1

w.x

+ b

= 0

ξ

ξ

ξ

ξ

+ C Σj ξj

- ξj ξj≥0

Slack Penalty C > 0:• C=∞ have to separate the data!• C=0 ignore data entirely!• Select on dev. set, etc.

Page 14: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Side Note: Different LossesLogistic regression: Boosting :

All our new losses approximate 0/1 loss!

SVM:

0-1 Loss:Hinge loss:

Page 15: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

What about multiple classes?

Page 16: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

One against All

Learn 3 classifiers:• + vs {0,-}, weights w+

• - vs {0,+}, weights w-

• 0 vs {+,-}, weights w0

Output for x: y = argmaxi wi.x

w+

w-

Any other way?

w0

Any problems?

Page 17: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Learn 1 classifier: Multiclass SVM

Simultaneously learn 3 sets of weights:• How do we

guarantee the correct labels?

• Need new constraints!

For j possible classes:

w+

w-

w0

Page 18: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Learn 1 classifier: Multiclass SVM

Also, can introduce slack variables, as before:

Page 19: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

What if the data is not linearly separable?

Add More Features!!!

Page 20: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting
Page 21: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Example: Dalal-Triggs pedestrian detector

1. Extract fixed-sized (64x128 pixel) window at each position and scale

2. Compute HOG (histogram of gradient) features within each window

3. Score the window with a linear SVM classifier4. Perform non-maxima suppression to remove

overlapping detections with lower scoresNavneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 22: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 23: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

• Tested with– RGB– LAB– Grayscale

Slightly better performance vs. grayscale

Page 24: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

uncentered

centered

cubic-corrected

diagonal

Sobel

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Outperforms

Page 25: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

• Histogram of gradient orientations

– Votes weighted by magnitude– Bilinear interpolation between

cells

Orientation: 9 bins (for unsigned angles)

Histograms in 8x8 pixel cells

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 26: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Normalize with respect to surrounding cells

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 27: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

X=

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

# features = 15 x 7 x 9 x 4 = 3780

# cells

# orientations

# normalizations by neighboring cells

Page 28: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Training set

Page 29: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

pos w neg w

Page 30: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

pedestrian

Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

Page 31: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Detection examples

Page 32: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting
Page 33: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Each window is separately classified

Page 34: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting
Page 35: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Statistical Template• Object model = sum of scores of features at

fixed positions

+3 +2 -2 -1 -2.5 = -0.5

+4 +1 +0.5 +3 +0.5 = 10.5

> 7.5?

> 7.5?

Non-object

Object

Page 36: Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting

Part Based Models

• Before that:

– Clustering