machine learning: chenhao tan university of colorado ... · slides adapted from jordan boyd-graber,...
TRANSCRIPT
![Page 1: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/1.jpg)
Machine Learning: Chenhao TanUniversity of Colorado BoulderLECTURE 10
Slides adapted from Jordan Boyd-Graber, Chris Ketelsen
Machine Learning: Chenhao Tan | Boulder | 1 of 52
![Page 2: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/2.jpg)
Roadmap
• Last time: linear SVM formulation when data is linearly separable• This time:◦ Introduce duality◦ Make linear SVM work when data is not linearly separable◦ Introduce an efficient algorithm for finding weights
• Next time: Kernel trick
Machine Learning: Chenhao Tan | Boulder | 2 of 52
![Page 3: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/3.jpg)
Overview
Duality
Slack variables
Sequential Mimimal Optimization
Recap
Machine Learning: Chenhao Tan | Boulder | 3 of 52
![Page 4: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/4.jpg)
Duality
Outline
Duality
Slack variables
Sequential Mimimal Optimization
Recap
Machine Learning: Chenhao Tan | Boulder | 4 of 52
![Page 5: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/5.jpg)
Duality
Binary classification
Given: Strain = {(xi, yi)}mi=1 training examples, xi ∈ Rd, yi ∈ {−1, 1}
Goal: Find hypothesis function h : X → YLinear SVM: learn a linear decision rule of the form w · x + b
Machine Learning: Chenhao Tan | Boulder | 5 of 52
![Page 6: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/6.jpg)
Duality
Optimizing the objective function
minw,b
12||w||2 (1)
subject to yi(w · xi + b) ≥ 1, i ∈ [1,m]
Machine Learning: Chenhao Tan | Boulder | 6 of 52
![Page 7: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/7.jpg)
Duality
Optimizing Constrained Functions
The Method of Lagrange Multipliers
Constrained problem (Primalproblem)
minx
f (x)
s.t. gi(x) ≥ 0, i ∈ [1, n]
Lagrange Multiplier
L (x,α) = f (x)−n∑
i=1
αigi(x),
αi ≥ 0, i ∈ [1, n]
Machine Learning: Chenhao Tan | Boulder | 7 of 52
![Page 8: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/8.jpg)
Duality
Lagrange Multiplier
p∗: the optimal value in the primal problemWe claim that
p∗ = minx
maxα
L (x,α) = minx
maxα
f (x)−n∑
i=1
αigi(x)
This is because
max−αy =
{0 y ≥ 0+∞ otherwise
Machine Learning: Chenhao Tan | Boulder | 8 of 52
![Page 9: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/9.jpg)
Duality
Lagrange Multiplier
p∗: the optimal value in the primal problemWe claim that
p∗ = minx
maxα
L (x,α) = minx
maxα
f (x)−n∑
i=1
αigi(x)
This is because
max−αy =
{0 y ≥ 0+∞ otherwise
Machine Learning: Chenhao Tan | Boulder | 8 of 52
![Page 10: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/10.jpg)
Duality
Lagrange Multiplier
What happens if we reverse min and max:
maxα
minx
L (x,α) ≥ or ≤ minx
maxα
L (x,α)
The left leads to the dual problem.
Machine Learning: Chenhao Tan | Boulder | 9 of 52
![Page 11: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/11.jpg)
Duality
Lagrange Multiplier
What happens if we reverse min and max:
maxα
minx
L (x,α) ≤ minx
maxα
L (x,α)
The left leads to the dual problem.
Machine Learning: Chenhao Tan | Boulder | 9 of 52
![Page 12: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/12.jpg)
Duality
Primal vs. Dual
Prime problem
minw,b
12||w||2
s.t. yi(w · xi + b) ≥ 1, i ∈ [1,m]
Derive the function for dual problem.
Machine Learning: Chenhao Tan | Boulder | 10 of 52
![Page 13: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/13.jpg)
Duality
Primal vs. Dual
Prime problem
minw,b
12||w||2
s.t. yi(w · xi + b) ≥ 1, i ∈ [1,m]
Replace w, b with stationarity conditions.
Machine Learning: Chenhao Tan | Boulder | 10 of 52
![Page 14: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/14.jpg)
Duality
Primal vs. Dual
Primal problem
minw,b
12||w||2
s.t. yi(w · xi + b) ≥ 1, i ∈ [1,m]
Dual problem
maxα
m∑i=1
αi −12
m∑i=1
m∑j=1
αiαjyiyj(xj · xi)
s.t. αi ≥ 0, i ∈ [1,m]∑i
αiyi = 0
Machine Learning: Chenhao Tan | Boulder | 11 of 52
![Page 15: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/15.jpg)
Duality
Karush-Kuhn-Tucker (KKT) conditions
Primal and dual feasibility
yi(w · xi + b) ≥ 1, αi ≥ 0 (2)
Stationarity
w =m∑
i=1
αiyixi,m∑
i=1
αiyi = 0 (3)
Complementary slackness
αi = 0 ∨ yi(w · xi + b) = 1 (4)
Machine Learning: Chenhao Tan | Boulder | 12 of 52
![Page 16: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/16.jpg)
Duality
Karush-Kuhn-Tucker (KKT) conditions
Primal and dual feasibility
yi(w · xi + b) ≥ 1, αi ≥ 0 (2)
Stationarity
w =m∑
i=1
αiyixi,m∑
i=1
αiyi = 0 (3)
Complementary slackness
αi = 0 ∨ yi(w · xi + b) = 1 (4)
Machine Learning: Chenhao Tan | Boulder | 12 of 52
![Page 17: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/17.jpg)
Duality
Karush-Kuhn-Tucker (KKT) conditions
Primal and dual feasibility
yi(w · xi + b) ≥ 1, αi ≥ 0 (2)
Stationarity
w =m∑
i=1
αiyixi,m∑
i=1
αiyi = 0 (3)
Complementary slackness
αi = 0 ∨ yi(w · xi + b) = 1 (4)
Machine Learning: Chenhao Tan | Boulder | 12 of 52
![Page 18: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/18.jpg)
Slack variables
Outline
Duality
Slack variables
Sequential Mimimal Optimization
Recap
Machine Learning: Chenhao Tan | Boulder | 13 of 52
![Page 19: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/19.jpg)
Slack variables
Old objective function
minw,b
12||w||2 (5)
subject to yi(w · xi + b) ≥ 1, i ∈ [1,m]
Machine Learning: Chenhao Tan | Boulder | 14 of 52
![Page 20: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/20.jpg)
Slack variables
Can SVMs Work Here?
yi(w · xi + b) ≥ 1 (6)
Machine Learning: Chenhao Tan | Boulder | 15 of 52
![Page 21: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/21.jpg)
Slack variables
Can SVMs Work Here?
yi(w · xi + b) ≥ 1 (6)
Machine Learning: Chenhao Tan | Boulder | 15 of 52
![Page 22: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/22.jpg)
Slack variables
Trick: Allow for a few bad apples
Machine Learning: Chenhao Tan | Boulder | 16 of 52
![Page 23: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/23.jpg)
Slack variables
Old objective function
minw,b
12||w||2 (7)
subject to yi(w · xi + b) ≥ 1, i ∈ [1,m]
Machine Learning: Chenhao Tan | Boulder | 17 of 52
![Page 24: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/24.jpg)
Slack variables
Relaxing the constraint
yi(w · xi + b) ≥ 1− ξi
• ξi = 0 means at least one margin on correct side of decision boundary• ξi = 1/2 means at least one-half margin on correct side of decision boundary• ξi = 2 means at least one margin on wrong side of decision boundary
Machine Learning: Chenhao Tan | Boulder | 18 of 52
![Page 25: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/25.jpg)
Slack variables
New objective function
minw,b,ξ
12||w||2 + C
∑i=1
ξip (8)
subject to yi(w · xi + b) ≥ 1− ξi ∧ ξi ≥ 0, i ∈ [1,m]
• Standard margin• How wrong a point is (slack variables)• Tradeoff between margin and slack variables• How bad wrongness scales
Machine Learning: Chenhao Tan | Boulder | 19 of 52
![Page 26: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/26.jpg)
Slack variables
New objective function
minw,b,ξ
12||w||2 + C
∑i=1
ξip (8)
subject to yi(w · xi + b) ≥ 1− ξi ∧ ξi ≥ 0, i ∈ [1,m]
• Standard margin
• How wrong a point is (slack variables)• Tradeoff between margin and slack variables• How bad wrongness scales
Machine Learning: Chenhao Tan | Boulder | 19 of 52
![Page 27: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/27.jpg)
Slack variables
New objective function
minw,b,ξ
12||w||2 + C
∑i=1
ξip (8)
subject to yi(w · xi + b) ≥ 1− ξi ∧ ξi ≥ 0, i ∈ [1,m]
• Standard margin• How wrong a point is (slack variables)
• Tradeoff between margin and slack variables• How bad wrongness scales
Machine Learning: Chenhao Tan | Boulder | 19 of 52
![Page 28: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/28.jpg)
Slack variables
New objective function
minw,b,ξ
12||w||2 + C
∑i=1
ξip (8)
subject to yi(w · xi + b) ≥ 1− ξi ∧ ξi ≥ 0, i ∈ [1,m]
• Standard margin• How wrong a point is (slack variables)• Tradeoff between margin and slack variables
• How bad wrongness scales
Machine Learning: Chenhao Tan | Boulder | 19 of 52
![Page 29: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/29.jpg)
Slack variables
New objective function
minw,b,ξ
12||w||2 + C
∑i=1
ξip (8)
subject to yi(w · xi + b) ≥ 1− ξi ∧ ξi ≥ 0, i ∈ [1,m]
• Standard margin• How wrong a point is (slack variables)• Tradeoff between margin and slack variables• How bad wrongness scales
Machine Learning: Chenhao Tan | Boulder | 19 of 52
![Page 30: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/30.jpg)
Slack variables
Aside: Loss Functions
• Losses measure how bad a mistake is• Important for slack as well
Machine Learning: Chenhao Tan | Boulder | 20 of 52
![Page 31: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/31.jpg)
Slack variables
Aside: Loss Functions
• Losses measure how bad a mistake is• Important for slack as well
x0/1 Loss
Machine Learning: Chenhao Tan | Boulder | 20 of 52
![Page 32: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/32.jpg)
Slack variables
Aside: Loss Functions
• Losses measure how bad a mistake is• Important for slack as well
x
LinearHinge
0/1 Loss
Machine Learning: Chenhao Tan | Boulder | 20 of 52
![Page 33: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/33.jpg)
Slack variables
Aside: Loss Functions
• Losses measure how bad a mistake is• Important for slack as well
x
Quadratic Hinge
LinearHinge
0/1 Loss
Machine Learning: Chenhao Tan | Boulder | 20 of 52
![Page 34: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/34.jpg)
Slack variables
Aside: Loss Functions
• Losses measure how bad a mistake is• Important for slack as well
x
Quadratic Hinge
LinearHinge
0/1 Loss
We’ll focus on linear hinge loss, set p = 1Machine Learning: Chenhao Tan | Boulder | 20 of 52
![Page 35: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/35.jpg)
Slack variables
What is the role of C?
minw,b,ξ
12||w||2 + C
∑i=1
ξi (9)
subject to yi(w · xi + b) ≥ 1− ξi ∧ ξi ≥ 0, i ∈ [1,m]
A. C ↑⇒ low bias, low varianceB. C ↑⇒ low bias, high varianceC. C ↑⇒ high bias, low varianceD. C ↑⇒ high bias, high variance
Machine Learning: Chenhao Tan | Boulder | 21 of 52
![Page 36: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/36.jpg)
Slack variables
New Lagrangian
L (w, b, ξ,α,β) =12||w||2 + C
m∑i=1
ξi (10)
−m∑
i=1
αi [yi(w · xi + b)− 1 + ξi] (11)
−m∑
i=1
βiξi (12)
Taking the gradients (∇wL ,∇bL ,∇ξiL ) and solving for zero gives us
w =
m∑i=1
αiyixi (13)
m∑i=1
αiyi = 0 (14) αi + βi = C (15)
Machine Learning: Chenhao Tan | Boulder | 22 of 52
![Page 37: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/37.jpg)
Slack variables
New Lagrangian
L (w, b, ξ,α,β) =12||w||2 + C
m∑i=1
ξi (10)
−m∑
i=1
αi [yi(w · xi + b)− 1 + ξi] (11)
−m∑
i=1
βiξi (12)
Taking the gradients (∇wL ,∇bL ,∇ξiL ) and solving for zero gives us
w =
m∑i=1
αiyixi (13)
m∑i=1
αiyi = 0 (14) αi + βi = C (15)
Machine Learning: Chenhao Tan | Boulder | 22 of 52
![Page 38: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/38.jpg)
Slack variables
New Lagrangian
L (w, b, ξ,α,β) =12||w||2 + C
m∑i=1
ξi (10)
−m∑
i=1
αi [yi(w · xi + b)− 1 + ξi] (11)
−m∑
i=1
βiξi (12)
Taking the gradients (∇wL ,∇bL ,∇ξiL ) and solving for zero gives us
w =
m∑i=1
αiyixi (13)
m∑i=1
αiyi = 0 (14) αi + βi = C (15)
Machine Learning: Chenhao Tan | Boulder | 22 of 52
![Page 39: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/39.jpg)
Slack variables
New Lagrangian
L (w, b, ξ,α,β) =12||w||2 + C
m∑i=1
ξi (10)
−m∑
i=1
αi [yi(w · xi + b)− 1 + ξi] (11)
−m∑
i=1
βiξi (12)
Taking the gradients (∇wL ,∇bL ,∇ξiL ) and solving for zero gives us
w =
m∑i=1
αiyixi (13)
m∑i=1
αiyi = 0 (14) αi + βi = C (15)
Machine Learning: Chenhao Tan | Boulder | 22 of 52
![Page 40: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/40.jpg)
Slack variables
New Lagrangian
L (w, b, ξ,α,β) =12||w||2 + C
m∑i=1
ξi (10)
−m∑
i=1
αi [yi(w · xi + b)− 1 + ξi] (11)
−m∑
i=1
βiξi (12)
Taking the gradients (∇wL ,∇bL ,∇ξiL ) and solving for zero gives us
w =
m∑i=1
αiyixi (13)
m∑i=1
αiyi = 0 (14) αi + βi = C (15)
Machine Learning: Chenhao Tan | Boulder | 22 of 52
![Page 41: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/41.jpg)
Slack variables
Simplifying dual objective
w =
m∑i=1
αiyixim∑
i=1
αiyi = 0 αi + βi = C
L (w, b, ξ,α,β) =12||w||2 + C
m∑i=1
ξi
−m∑
i=1
αi [yi(w · xi + b)− 1 + ξi]
−m∑
i=1
βiξi
Machine Learning: Chenhao Tan | Boulder | 23 of 52
![Page 42: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/42.jpg)
Slack variables
Simplifying dual objective
w =
m∑i=1
αiyixim∑
i=1
αiyi = 0 αi + βi = C
L (w, b, ξ,α,β) =12||w||2 + C
m∑i=1
ξi
−m∑
i=1
αi [yi(w · xi + b)− 1 + ξi]
−m∑
i=1
βiξi
Machine Learning: Chenhao Tan | Boulder | 23 of 52
![Page 43: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/43.jpg)
Slack variables
Dual Problem
maxα
m∑i=1
αi −12
m∑i=1
m∑j=1
αiαjyiyj(xj · xi)
s.t. C ≥ αi ≥ 0, i ∈ [1,m]∑i
αiyi = 0
Machine Learning: Chenhao Tan | Boulder | 24 of 52
![Page 44: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/44.jpg)
Slack variables
Dual Problem
maxα
m∑i=1
αi −12
m∑i=1
m∑j=1
αiαjyiyj(xj · xi)
s.t. C ≥ αi ≥ 0, i ∈ [1,m]∑i
αiyi = 0
Machine Learning: Chenhao Tan | Boulder | 24 of 52
![Page 45: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/45.jpg)
Slack variables
Karush-Kuhn-Tucker (KKT) conditions
Primal and dual feasibility
yi(w · xi + b) ≥ 1− ξi, ξi ≥ 0,C ≥ αi ≥ 0, βi ≥ 0 (16)
Stationarity
w =m∑
i=1
αiyixi,m∑
i=1
αiyi = 0, αi + βi = C (17)
Complementary slackness
αi[yi(w · xi + b)− 1 + ξi] = 0, βiξi = 0 (18)
Machine Learning: Chenhao Tan | Boulder | 25 of 52
![Page 46: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/46.jpg)
Slack variables
Karush-Kuhn-Tucker (KKT) conditions
Primal and dual feasibility
yi(w · xi + b) ≥ 1− ξi, ξi ≥ 0,C ≥ αi ≥ 0, βi ≥ 0 (16)
Stationarity
w =m∑
i=1
αiyixi,m∑
i=1
αiyi = 0, αi + βi = C (17)
Complementary slackness
αi[yi(w · xi + b)− 1 + ξi] = 0, βiξi = 0 (18)
Machine Learning: Chenhao Tan | Boulder | 25 of 52
![Page 47: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/47.jpg)
Slack variables
Karush-Kuhn-Tucker (KKT) conditions
Primal and dual feasibility
yi(w · xi + b) ≥ 1− ξi, ξi ≥ 0,C ≥ αi ≥ 0, βi ≥ 0 (16)
Stationarity
w =m∑
i=1
αiyixi,m∑
i=1
αiyi = 0, αi + βi = C (17)
Complementary slackness
αi[yi(w · xi + b)− 1 + ξi] = 0, βiξi = 0 (18)
Machine Learning: Chenhao Tan | Boulder | 25 of 52
![Page 48: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/48.jpg)
Slack variables
More on Complementary Slackness
αi[yi(w · xi + b)− 1 + ξi] = 0, βiξi = 0 (19)
• xi satisfies the margin, yi(w · xi + b) > 1⇒ αi = 0
• xi does not satisfy the margin, yi(w · xi + b) < 1⇒ αi = C• xi is on the margin, yi(w · xi + b) = 1⇒ 0 ≤ αi ≤ C
Machine Learning: Chenhao Tan | Boulder | 26 of 52
![Page 49: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/49.jpg)
Slack variables
More on Complementary Slackness
αi[yi(w · xi + b)− 1 + ξi] = 0, βiξi = 0 (19)
• xi satisfies the margin, yi(w · xi + b) > 1⇒ αi = 0• xi does not satisfy the margin, yi(w · xi + b) < 1⇒ αi = C
• xi is on the margin, yi(w · xi + b) = 1⇒ 0 ≤ αi ≤ C
Machine Learning: Chenhao Tan | Boulder | 26 of 52
![Page 50: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/50.jpg)
Slack variables
More on Complementary Slackness
αi[yi(w · xi + b)− 1 + ξi] = 0, βiξi = 0 (19)
• xi satisfies the margin, yi(w · xi + b) > 1⇒ αi = 0• xi does not satisfy the margin, yi(w · xi + b) < 1⇒ αi = C• xi is on the margin, yi(w · xi + b) = 1⇒ 0 ≤ αi ≤ C
Machine Learning: Chenhao Tan | Boulder | 26 of 52
![Page 51: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/51.jpg)
Sequential Mimimal Optimization
Outline
Duality
Slack variables
Sequential Mimimal Optimization
Recap
Machine Learning: Chenhao Tan | Boulder | 27 of 52
![Page 52: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/52.jpg)
Sequential Mimimal Optimization
Sequential Mimimal Optimization
Trivia• Invented by John Platt in 1998 at Microsoft Research• Called Minimal due to solving small sub-problems
Machine Learning: Chenhao Tan | Boulder | 28 of 52
![Page 53: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/53.jpg)
Sequential Mimimal Optimization
Dual problem
maxα
m∑i=1
αi −12
m∑i=1
m∑j=1
αiαjyiyj(xj · xi)
s.t. C ≥αi ≥ 0, i ∈ [1,m]∑i
αiyi = 0
Machine Learning: Chenhao Tan | Boulder | 29 of 52
![Page 54: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/54.jpg)
Sequential Mimimal Optimization
Brief Interlude: Coordinate Ascent
maxα
m∑i=1
αi −12
m∑i=1
m∑j=1
αiαjyiyj(xj · xi)
s.t. C ≥αi ≥ 0, i ∈ [1,m]∑i
αiyi = 0
Loop over each training example, change αi to maximize the above function
Although coordinate ascent works OK for lots of problems, we have the constraint∑i αiyi = 0
Machine Learning: Chenhao Tan | Boulder | 30 of 52
![Page 55: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/55.jpg)
Sequential Mimimal Optimization
Brief Interlude: Coordinate Ascent
maxα
m∑i=1
αi −12
m∑i=1
m∑j=1
αiαjyiyj(xj · xi)
s.t. C ≥αi ≥ 0, i ∈ [1,m]∑i
αiyi = 0
Loop over each training example, change αi to maximize the above functionAlthough coordinate ascent works OK for lots of problems, we have the constraint∑
i αiyi = 0
Machine Learning: Chenhao Tan | Boulder | 30 of 52
![Page 56: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/56.jpg)
Sequential Mimimal Optimization
Outline for SVM Optimization (SMO)
1. Select two examples i, j
2. Update αj, αi to maximize the above function
Machine Learning: Chenhao Tan | Boulder | 31 of 52
![Page 57: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/57.jpg)
Sequential Mimimal Optimization
Karush-Kuhn-Tucker (KKT) conditions
Primal and dual feasibility
yi(w · xi + b) ≥ 1− ξi, ξi ≥ 0,C ≥ αi ≥ 0, βi ≥ 0 (20)
Stationarity
w =m∑
i=1
αiyixi,m∑
i=1
αiyi = 0, αi + βi = C (21)
Complementary slackness
αi[yi(w · xi + b)− 1 + ξi] = 0, βiξi = 0 (22)
Machine Learning: Chenhao Tan | Boulder | 32 of 52
![Page 58: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/58.jpg)
Sequential Mimimal Optimization
Outline for SVM Optimization (SMO)
yiαi + yjαj = yiαoldi + yjα
oldj = γ
Machine Learning: Chenhao Tan | Boulder | 33 of 52
![Page 59: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/59.jpg)
Sequential Mimimal Optimization
Step 2: Optimize αj
1. Compute upper (H) and lower (L) bounds that ensure 0 ≤ αj ≤ C.
yi 6= yj
L = max(0, αj − αi) (23)H = min(C,C + αj − αi) (24)
yi = yj
L = max(0, αi + αj − C) (25)H = min(C, αj + αi) (26)
This is because the update for αi is based on yiyj (sign matters)
Machine Learning: Chenhao Tan | Boulder | 34 of 52
![Page 60: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/60.jpg)
Sequential Mimimal Optimization
Step 2: Optimize αj
1. Compute upper (H) and lower (L) bounds that ensure 0 ≤ αj ≤ C.
yi 6= yj
L = max(0, αj − αi) (23)H = min(C,C + αj − αi) (24)
yi = yj
L = max(0, αi + αj − C) (25)H = min(C, αj + αi) (26)
This is because the update for αi is based on yiyj (sign matters)
Machine Learning: Chenhao Tan | Boulder | 34 of 52
![Page 61: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/61.jpg)
Sequential Mimimal Optimization
Step 2: Optimize αj
Compute errors for i and jEk ≡ f (xk)− yk (27)
η = 2xi · xj − xi · xi − xj · xj (28)
for new value for αj
α∗j = α
(old)j −
yj(Ei − Ej)
η(29)
Machine Learning: Chenhao Tan | Boulder | 35 of 52
![Page 62: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/62.jpg)
Sequential Mimimal Optimization
Step 3: Optimize αi
Set αi:α∗
i = α(old)i + yiyj
(α(old)j − αj
)(30)
This balances out the move that we made for αj.
Machine Learning: Chenhao Tan | Boulder | 36 of 52
![Page 63: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/63.jpg)
Sequential Mimimal Optimization
Step 3: Optimize αi
Set αi:α∗
i = α(old)i + yiyj
(α(old)j − αj
)(30)
This balances out the move that we made for αj.
Machine Learning: Chenhao Tan | Boulder | 36 of 52
![Page 64: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/64.jpg)
Sequential Mimimal Optimization
Overall algorithm
Iterate over i = {1, . . .m}Repeat until KKT conditions are met
Choose j randomly from m− 1 other optionsUpdate αi, αj
Find w, b based on stationarity conditions
Machine Learning: Chenhao Tan | Boulder | 37 of 52
![Page 65: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/65.jpg)
Sequential Mimimal Optimization
Iterations / Details
• What if i doesn’t violate the KKT conditions?• What if η ≥ 0?• When do we stop?
Machine Learning: Chenhao Tan | Boulder | 38 of 52
![Page 66: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/66.jpg)
Sequential Mimimal Optimization
Iterations / Details
• What if i doesn’t violate the KKT conditions? Skip it!• What if η ≥ 0?• When do we stop?
Machine Learning: Chenhao Tan | Boulder | 38 of 52
![Page 67: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/67.jpg)
Sequential Mimimal Optimization
Iterations / Details
• What if i doesn’t violate the KKT conditions? Skip it!• What if η ≥ 0? Skip it! (should not happen except for numerical instability)• When do we stop?
Machine Learning: Chenhao Tan | Boulder | 38 of 52
![Page 68: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/68.jpg)
Sequential Mimimal Optimization
Iterations / Details
• What if i doesn’t violate the KKT conditions? Skip it!• What if η ≥ 0? Skip it! (should not happen except for numerical instability)• When do we stop? Until we go through α’s without changing anything
Machine Learning: Chenhao Tan | Boulder | 38 of 52
![Page 69: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/69.jpg)
Sequential Mimimal Optimization
SMO Algorithm
Positive(-2, 2)(0, 4)(2, 1)
0
4
1
2
3 5
positive
negative
Negative(-2, -3)(0, -1)(2, -3)
• Initially, all alphas are zero
α =< 0, 0, 0, 0, 0, 0 >
• Intercept b is also zero• Capacity C = π
Machine Learning: Chenhao Tan | Boulder | 39 of 52
![Page 70: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/70.jpg)
Sequential Mimimal Optimization
SMO Algorithm
Positive(-2, 2)(0, 4)(2, 1)
0
4
1
2
3 5
positive
negative
Negative(-2, -3)(0, -1)(2, -3)
• Initially, all alphas are zero
α =< 0, 0, 0, 0, 0, 0 >
• Intercept b is also zero• Capacity C = π
Machine Learning: Chenhao Tan | Boulder | 39 of 52
![Page 71: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/71.jpg)
Sequential Mimimal Optimization
SMO Algorithm
Positive(-2, 2)(0, 4)(2, 1)
0
4
1
2
3 5
positive
negative
Negative(-2, -3)(0, -1)(2, -3)
• Initially, all alphas are zero
α =< 0, 0, 0, 0, 0, 0 >
• Intercept b is also zero• Capacity C = π
Machine Learning: Chenhao Tan | Boulder | 39 of 52
![Page 72: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/72.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: Predictions and Step
0
4
1
2
3 5
positive
negative
• Prediction: f (x0)
• Prediction: f (x4)
• Error: E0
• Error: E4
Machine Learning: Chenhao Tan | Boulder | 40 of 52
![Page 73: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/73.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: Predictions and Step
0
4
1
2
3 5
positive
negative
• Prediction: f (x0) = 0• Prediction: f (x4)
• Error: E0
• Error: E4
Machine Learning: Chenhao Tan | Boulder | 40 of 52
![Page 74: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/74.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: Predictions and Step
0
4
1
2
3 5
positive
negative
• Prediction: f (x0) = 0• Prediction: f (x4) = 0• Error: E0
• Error: E4
Machine Learning: Chenhao Tan | Boulder | 40 of 52
![Page 75: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/75.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: Predictions and Step
0
4
1
2
3 5
positive
negative
• Prediction: f (x0) = 0• Prediction: f (x4) = 0• Error: E0 = −1• Error: E4 = +1
Machine Learning: Chenhao Tan | Boulder | 40 of 52
![Page 76: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/76.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: Predictions and Step
0
4
1
2
3 5
positive
negative
• Prediction: f (x0) = 0• Prediction: f (x4) = 0• Error: E0 = −1• Error: E4 = +1
η = 2〈x0, x4〉 − 〈x0, x0〉 − 〈x4, x4〉
Machine Learning: Chenhao Tan | Boulder | 40 of 52
![Page 77: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/77.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: Predictions and Step
0
4
1
2
3 5
positive
negative
• Prediction: f (x0) = 0• Prediction: f (x4) = 0• Error: E0 = −1• Error: E4 = +1
η = 2〈x0, x4〉 − 〈x0, x0〉 − 〈x4, x4〉= 2 · −2− 8− 1 = −13
Machine Learning: Chenhao Tan | Boulder | 40 of 52
![Page 78: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/78.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: Bounds
• Lower and upper bounds for αj
L = max(0, αj − αi) (31)H = min(C,C + αj − αi) (32)
Machine Learning: Chenhao Tan | Boulder | 41 of 52
![Page 79: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/79.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: Bounds
• Lower and upper bounds for αj
L = max(0, αj − αi) = 0 (31)H = min(C,C + αj − αi) (32)
Machine Learning: Chenhao Tan | Boulder | 41 of 52
![Page 80: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/80.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: Bounds
• Lower and upper bounds for αj
L = max(0, αj − αi) = 0 (31)H = min(C,C + αj − αi) = π (32)
Machine Learning: Chenhao Tan | Boulder | 41 of 52
![Page 81: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/81.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: α update
New value for αj
α∗j = αj −
yj(Ei − Ej)
η(33)
(34)
Machine Learning: Chenhao Tan | Boulder | 42 of 52
![Page 82: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/82.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: α update
New value for αj
α∗j = αj −
yj(Ei − Ej)
η=−2η
=2
13(33)
(34)
Machine Learning: Chenhao Tan | Boulder | 42 of 52
![Page 83: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/83.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: α update
New value for αj
α∗j = αj −
yj(Ei − Ej)
η=−2η
=2
13(33)
New value for αi
(34)
Machine Learning: Chenhao Tan | Boulder | 42 of 52
![Page 84: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/84.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: α update
New value for αj
α∗j = αj −
yj(Ei − Ej)
η=−2η
=2
13(33)
New value for αi
α∗i = αi + yiyj
(α(old)j − αj
)(34)
Machine Learning: Chenhao Tan | Boulder | 42 of 52
![Page 85: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/85.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 0, j = 4: α update
New value for αj
α∗j = αj −
yj(Ei − Ej)
η=−2η
=2
13(33)
New value for αi
α∗i = αi + yiyj
(α(old)j − αj
)= αj =
213
(34)
Machine Learning: Chenhao Tan | Boulder | 42 of 52
![Page 86: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/86.jpg)
Sequential Mimimal Optimization
Margin
Machine Learning: Chenhao Tan | Boulder | 43 of 52
![Page 87: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/87.jpg)
Sequential Mimimal Optimization
Find weight vector and bias
• Weight vector
~w =
m∑i
αiyi~xi (35)
• Bias
b =b(old) − Ei − yi(α∗i − α
(old)i )xi · xi − yj(α
∗j − α
(old)j )xi · xj (36)
(37)
Machine Learning: Chenhao Tan | Boulder | 44 of 52
![Page 88: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/88.jpg)
Sequential Mimimal Optimization
Find weight vector and bias
• Weight vector
~w =
m∑i
αiyi~xi =213
[−22
]− 2
13
[0−1
](35)
• Bias
b =b(old) − Ei − yi(α∗i − α
(old)i )xi · xi − yj(α
∗j − α
(old)j )xi · xj (36)
(37)
Machine Learning: Chenhao Tan | Boulder | 44 of 52
![Page 89: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/89.jpg)
Sequential Mimimal Optimization
Find weight vector and bias
• Weight vector
~w =
m∑i
αiyi~xi =2
13
[−22
]− 2
13
[0−1
]=
[−4136
13
](35)
• Bias
b =b(old) − Ei − yi(α∗i − α
(old)i )xi · xi − yj(α
∗j − α
(old)j )xi · xj (36)
(37)
Machine Learning: Chenhao Tan | Boulder | 44 of 52
![Page 90: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/90.jpg)
Sequential Mimimal Optimization
Find weight vector and bias
• Weight vector
~w =
m∑i
αiyi~xi =2
13
[−22
]− 2
13
[0−1
]=
[−4136
13
](35)
• Bias
b =b(old) − Ei − yi(α∗i − α
(old)i )xi · xi − yj(α
∗j − α
(old)j )xi · xj (36)
=1− 213· 8 +
213· −2 = −0.54 (37)
Machine Learning: Chenhao Tan | Boulder | 44 of 52
![Page 91: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/91.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 2, j = 4
0
4
1
2
3 5
positive
negative
Let’s skip the boring stuff• E2 = −1.69• E4 = 0.00• η = −8
• α4 = α(old)j − yj(Ei−Ej)
η
• α2 = α(old)i + yiyj
(α(old)j − αj
)
Machine Learning: Chenhao Tan | Boulder | 45 of 52
![Page 92: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/92.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 2, j = 4
0
4
1
2
3 5
positive
negative
Let’s skip the boring stuff• E2 = −1.69• E4 = 0.00• η = −8
• α4 = α(old)j − yj(Ei−Ej)
η
• α2 = α(old)i + yiyj
(α(old)j − αj
)
Machine Learning: Chenhao Tan | Boulder | 45 of 52
![Page 93: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/93.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 2, j = 4
0
4
1
2
3 5
positive
negative
Let’s skip the boring stuff• E2 = −1.69• E4 = 0.00• η = −8
• α4 = α(old)j − yj(Ei−Ej)
η = 0.15 + −1.69−8 =
0.37• α2 = α
(old)i + yiyj
(α(old)j − αj
)
Machine Learning: Chenhao Tan | Boulder | 45 of 52
![Page 94: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/94.jpg)
Sequential Mimimal Optimization
SMO Optimization for i = 2, j = 4
0
4
1
2
3 5
positive
negative
Let’s skip the boring stuff• E2 = −1.69• E4 = 0.00• η = −8
• α4 = α(old)j − yj(Ei−Ej)
η = 0.15 + −1.69−8 =
0.37• α2 = α
(old)i + yiyj
(α(old)j − αj
)=
0− (0.15− 0.37) = 0.21
Machine Learning: Chenhao Tan | Boulder | 45 of 52
![Page 95: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/95.jpg)
Sequential Mimimal Optimization
Margin
Machine Learning: Chenhao Tan | Boulder | 46 of 52
![Page 96: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/96.jpg)
Sequential Mimimal Optimization
Weight vector and bias
• Bias b = −0.12• Weight vector
~w =m∑i
αiyi~xi (38)
Machine Learning: Chenhao Tan | Boulder | 47 of 52
![Page 97: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/97.jpg)
Sequential Mimimal Optimization
Weight vector and bias
• Bias b = −0.12• Weight vector
~w =m∑i
αiyi~xi =
[0.120.88
](38)
Machine Learning: Chenhao Tan | Boulder | 47 of 52
![Page 98: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/98.jpg)
Sequential Mimimal Optimization
Another Iteration (i = 0, j = 2)
Machine Learning: Chenhao Tan | Boulder | 48 of 52
![Page 99: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/99.jpg)
Sequential Mimimal Optimization
SMO Algorithm
• Convenient approach for solving: vanilla, slack, kernel approaches• Convex problem• Scalable to large datasets (implemented in scikit learn)• What we didn’t do:◦ Check KKT conditions◦ Randomly choose indices
Machine Learning: Chenhao Tan | Boulder | 49 of 52
![Page 100: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/100.jpg)
Recap
Outline
Duality
Slack variables
Sequential Mimimal Optimization
Recap
Machine Learning: Chenhao Tan | Boulder | 50 of 52
![Page 101: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/101.jpg)
Recap
Recap
• Duality• Slack variables
Machine Learning: Chenhao Tan | Boulder | 51 of 52
![Page 102: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/102.jpg)
Recap
Recap
• SMO: Optimize objective function for two data points• Convex problem: Will converge• Relatively fast• Gives good performance
Machine Learning: Chenhao Tan | Boulder | 51 of 52
![Page 103: Machine Learning: Chenhao Tan University of Colorado ... · Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan j Boulder j 1 of 52. Roadmap Last](https://reader036.vdocuments.mx/reader036/viewer/2022070916/5fb6ea4072f2915221057958/html5/thumbnails/103.jpg)
Recap
Wrapup
• Adding slack variables don’t break the SVM problem• Very popular algorithm◦ SVMLight (many options)◦ Libsvm / Liblinear (very fast)◦ Weka (friendly)◦ pyml (Python focused, from Colorado)
• Next up: simple algorithm for finding SVMs
Machine Learning: Chenhao Tan | Boulder | 52 of 52