announcements - homeworkaarti/class/10701/slides/lecture11.pdfannouncements - homework •homework 1...
TRANSCRIPT
![Page 1: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/1.jpg)
Announcements - Homework
• Homework 1 is graded, please collect at end of lecture
• Homework 2 due today
• Homework 3 out soon (watch email)
• Ques 1 – midterm review
![Page 2: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/2.jpg)
HW1 score distribution
2
0
5
10
15
20
25
30
35
40
0~10 10~20 20~30 30~40 40~50 50~60 60~70 70~80 80~90 90~100 100~110
HW1 total score
![Page 3: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/3.jpg)
Announcements - Midterm
• When: Wednesday, 10/20
• Where: In Class
• What: You, your pencil, your textbook, your notes, course slides, your calculator, your good mood :)
• What NOT: No computers, iphones, or anything else that has an internet connection.
• Material: Everything from the beginning of the semester, until, and including SVMs and the Kernel trick
3
![Page 4: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/4.jpg)
Recitation Tomorrow!
• Boosting, SVM (convex optimization),
Midterm review!
• Strongly recommended!!
• Place: NSH 3305 (Note: change from last time)
• Time: 5-6 pm
Rob
![Page 5: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/5.jpg)
Support Vector Machines
Aarti Singh
Machine Learning 10-701/15-781Oct 13, 2010
![Page 6: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/6.jpg)
At Pittsburgh G-20 summit …
6
![Page 7: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/7.jpg)
Linear classifiers – which line is better?
7
![Page 8: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/8.jpg)
Pick the one with the largest margin!
8
![Page 9: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/9.jpg)
Parameterizing the decision boundary
9
w.x + b > 0 w.x + b < 0
Data:Example i (= 1,2,…,n):
w.x = j w(j) x(j)
![Page 10: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/10.jpg)
Parameterizing the decision boundary
10
w.x + b > 0 w.x + b < 0
![Page 11: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/11.jpg)
Maximizing the margin
11
margin = g = 2a/ǁwǁ
w.x + b > 0 w.x + b < 0
g g
Distance of closest examples from the line/hyperplane
![Page 12: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/12.jpg)
Maximizing the margin
12
w.x + b > 0 w.x + b < 0
g gmax g = 2a/ǁwǁ
w,b
s.t. (w.xj+b) yj ≥ a j
margin = g = 2a/ǁwǁ
Note: ‘a’ is arbitrary (can normalize equations by a)
Distance of closest examples from the line/hyperplane
![Page 13: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/13.jpg)
Support Vector Machines
13
w.x + b > 0 w.x + b < 0
g g
min w.ww,b
s.t. (w.xj+b) yj ≥ 1 j
Solve efficiently by quadratic programming (QP)
– Well-studied solution algorithms
Linear hyperplane defined by “support vectors”
![Page 14: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/14.jpg)
Support Vectors
14
w.x + b > 0 w.x + b < 0
g g
Linear hyperplane defined by “support vectors”
Moving other points a little doesn’t effect the decision boundary
only need to store the support vectors to predict labels of new points
How many support vectors in linearly separable case?
≤ m+1
![Page 15: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/15.jpg)
What if data is not linearly separable?
15
Use features of features of features of features….
But run risk of overfitting!
x12, x2
2, x1x2, …., exp(x1)
![Page 16: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/16.jpg)
What if data is still not linearly separable?
16
min w.w + C #mistakes w,b
s.t. (w.xj+b) yj ≥ 1 j
Allow “error” in classification
Maximize margin and minimize # mistakes on training data
C - tradeoff parameter
Not QP
0/1 loss (doesn’t distinguish between
near miss and bad mistake)
![Page 17: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/17.jpg)
What if data is still not linearly separable?
17
min w.w + C Σξjw,b
s.t. (w.xj+b) yj ≥ 1-ξj j
ξj ≥ 0 j
j
Allow “error” in classification
ξj - “slack” variables = (>1 if xj misclassifed)
pay linear penalty if mistake
C - tradeoff parameter (chosen by
cross-validation)
Still QP
Soft margin approach
![Page 18: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/18.jpg)
Slack variables – Hinge loss
18
min w.w + C Σξjw,b
s.t. (w.xj+b) yj ≥ 1-ξj j
ξj ≥ 0 j
j
Complexity penalization
Hinge loss
0-1 loss
0-1 1
![Page 19: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/19.jpg)
SVM vs. Logistic Regression
19
SVM : Hinge loss
0-1 loss
0-1 1
Logistic Regression : Log loss ( -ve log conditional likelihood)
Hinge lossLog loss
![Page 20: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/20.jpg)
What about multiple classes?
20
![Page 21: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/21.jpg)
One against all
21
Learn 3 classifiers separately: Class k vs. rest
(wk, bk)k=1,2,3
y = arg max wk.x + bkk
But wks may not be based on the same scale.Note: (aw).x + (ab) is also a solution
![Page 22: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/22.jpg)
Learn 1 classifier: Multi-class SVM
22
Simultaneously learn 3 sets of weights
y = arg max w(k).x + b(k)
Margin - gap between correct class and nearest other class
![Page 23: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/23.jpg)
Learn 1 classifier: Multi-class SVM
23
Simultaneously learn 3 sets of weights
y = arg max w(k).x + b(k)
Joint optimization: wks have the same scale.
![Page 24: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/24.jpg)
What you need to know
• Maximizing margin
• Derivation of SVM formulation
• Slack variables and hinge loss
• Relationship between SVMs and logistic regression– 0/1 loss
– Hinge loss
– Log loss
• Tackling multiple class– One against All
– Multiclass SVMs
24
![Page 25: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/25.jpg)
SVMs reminder
25
min w.w + C Σξjw,b
s.t. (w.xj+b) yj ≥ 1-ξj j
ξj ≥ 0 j
Hinge lossRegularization
Soft margin approach
![Page 26: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/26.jpg)
Today’s Lecture
• Learn one of the most interesting and exciting recent advancements in machine learning
– The “kernel trick”
– High dimensional feature spaces at no extra cost!
• But first, a detour
– Constrained optimization!
26
![Page 27: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/27.jpg)
Constrained Optimization
27
![Page 28: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/28.jpg)
Lagrange Multiplier – Dual Variables
28
Moving the constraint to objective functionLagrangian:
Solve:
Constraint is tight when a > 0
![Page 29: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/29.jpg)
Duality
29
Primal problem: Dual problem:
Weak duality –
For all feasible points
Strong duality – (holds under KKT conditions)
![Page 30: Announcements - Homeworkaarti/Class/10701/slides/Lecture11.pdfAnnouncements - Homework •Homework 1 is graded, please collect at end of lecture •Homework 2 due today •Homework](https://reader034.vdocuments.mx/reader034/viewer/2022051905/5ff7e10969694e50c739bbb6/html5/thumbnails/30.jpg)
Lagrange Multiplier – Dual Variables
30
Solving:b -ve b +ve
When a > 0, constraint is tight