machine learning basics lecture 5: svm iisvm: optimization •optimization (quadratic programming):...
TRANSCRIPT
![Page 1: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/1.jpg)
Machine Learning Basics Lecture 5: SVM II
Princeton University COS 495
Instructor: Yingyu Liang
![Page 2: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/2.jpg)
Review: SVM objective
![Page 3: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/3.jpg)
SVM: objective
• Let 𝑦𝑖 ∈ +1,−1 , 𝑓𝑤,𝑏 𝑥 = 𝑤𝑇𝑥 + 𝑏. Margin:
𝛾 = min𝑖
𝑦𝑖𝑓𝑤,𝑏 𝑥𝑖| 𝑤 |
• Support Vector Machine:
max𝑤,𝑏
𝛾 = max𝑤,𝑏
min𝑖
𝑦𝑖𝑓𝑤,𝑏 𝑥𝑖| 𝑤 |
![Page 4: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/4.jpg)
SVM: optimization
• Optimization (Quadratic Programming):
min𝑤,𝑏
1
2𝑤
2
𝑦𝑖 𝑤𝑇𝑥𝑖 + 𝑏 ≥ 1, ∀𝑖
• Solved by Lagrange multiplier method:
ℒ 𝑤, 𝑏, 𝜶 =1
2𝑤
2
−
𝑖
𝛼𝑖[𝑦𝑖 𝑤𝑇𝑥𝑖 + 𝑏 − 1]
where 𝜶 is the Lagrange multiplier
![Page 5: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/5.jpg)
Lagrange multiplier
![Page 6: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/6.jpg)
Lagrangian
• Consider optimization problem:min𝑤
𝑓(𝑤)
ℎ𝑖 𝑤 = 0, ∀1 ≤ 𝑖 ≤ 𝑙
• Lagrangian:
ℒ 𝑤, 𝜷 = 𝑓 𝑤 +
𝑖
𝛽𝑖ℎ𝑖(𝑤)
where 𝛽𝑖’s are called Lagrange multipliers
![Page 7: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/7.jpg)
Lagrangian
• Consider optimization problem:min𝑤
𝑓(𝑤)
ℎ𝑖 𝑤 = 0, ∀1 ≤ 𝑖 ≤ 𝑙
• Solved by setting derivatives of Lagrangian to 0
𝜕ℒ
𝜕𝑤𝑖= 0;
𝜕ℒ
𝜕𝛽𝑖= 0
![Page 8: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/8.jpg)
Generalized Lagrangian
• Consider optimization problem:min𝑤
𝑓(𝑤)
𝑔𝑖 𝑤 ≤ 0, ∀1 ≤ 𝑖 ≤ 𝑘
ℎ𝑗 𝑤 = 0, ∀1 ≤ 𝑗 ≤ 𝑙
• Generalized Lagrangian:
ℒ 𝑤, 𝜶, 𝜷 = 𝑓 𝑤 +
𝑖
𝛼𝑖𝑔𝑖(𝑤) +
𝑗
𝛽𝑗ℎ𝑗(𝑤)
where 𝛼𝑖 , 𝛽𝑗’s are called Lagrange multipliers
![Page 9: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/9.jpg)
Generalized Lagrangian
• Consider the quantity:
𝜃𝑃 𝑤 ≔ max𝜶,𝜷:𝛼𝑖≥0
ℒ 𝑤, 𝜶, 𝜷
• Why?
𝜃𝑃 𝑤 = ቊ𝑓 𝑤 , if 𝑤 satisfies all the constraints+∞, if 𝑤 does not satisfy the constraints
• So minimizing 𝑓 𝑤 is the same as minimizing 𝜃𝑃 𝑤
min𝑤
𝑓 𝑤 = min𝑤
𝜃𝑃 𝑤 = min𝑤
max𝜶,𝜷:𝛼𝑖≥0
ℒ 𝑤,𝜶, 𝜷
![Page 10: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/10.jpg)
Lagrange duality
• The primal problem
𝑝∗ ≔ min𝑤
𝑓 𝑤 = min𝑤
max𝜶,𝜷:𝛼𝑖≥0
ℒ 𝑤, 𝜶, 𝜷
• The dual problem
𝑑∗ ≔ max𝜶,𝜷:𝛼𝑖≥0
min𝑤
ℒ 𝑤, 𝜶, 𝜷
• Always true:𝑑∗ ≤ 𝑝∗
![Page 11: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/11.jpg)
Lagrange duality
• The primal problem
𝑝∗ ≔ min𝑤
𝑓 𝑤 = min𝑤
max𝜶,𝜷:𝛼𝑖≥0
ℒ 𝑤, 𝜶, 𝜷
• The dual problem
𝑑∗ ≔ max𝜶,𝜷:𝛼𝑖≥0
min𝑤
ℒ 𝑤, 𝜶, 𝜷
• Interesting case: when do we have 𝑑∗ = 𝑝∗?
![Page 12: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/12.jpg)
Lagrange duality
• Theorem: under proper conditions, there exists 𝑤∗, 𝜶∗, 𝜷∗ such that
𝑑∗ = ℒ 𝑤∗, 𝜶∗, 𝜷∗ = 𝑝∗
Moreover, 𝑤∗, 𝜶∗, 𝜷∗ satisfy Karush-Kuhn-Tucker (KKT) conditions:𝜕ℒ
𝜕𝑤𝑖= 0, 𝛼𝑖𝑔𝑖 𝑤 = 0
𝑔𝑖 𝑤 ≤ 0, ℎ𝑗 𝑤 = 0, 𝛼𝑖 ≥ 0
![Page 13: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/13.jpg)
Lagrange duality
• Theorem: under proper conditions, there exists 𝑤∗, 𝜶∗, 𝜷∗ such that
𝑑∗ = ℒ 𝑤∗, 𝜶∗, 𝜷∗ = 𝑝∗
Moreover, 𝑤∗, 𝜶∗, 𝜷∗ satisfy Karush-Kuhn-Tucker (KKT) conditions:𝜕ℒ
𝜕𝑤𝑖= 0, 𝛼𝑖𝑔𝑖 𝑤 = 0
𝑔𝑖 𝑤 ≤ 0, ℎ𝑗 𝑤 = 0, 𝛼𝑖 ≥ 0
dual complementarity
![Page 14: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/14.jpg)
Lagrange duality
• Theorem: under proper conditions, there exists 𝑤∗, 𝜶∗, 𝜷∗ such that
𝑑∗ = ℒ 𝑤∗, 𝜶∗, 𝜷∗ = 𝑝∗
• Moreover, 𝑤∗, 𝜶∗, 𝜷∗ satisfy Karush-Kuhn-Tucker (KKT) conditions:𝜕ℒ
𝜕𝑤𝑖= 0, 𝛼𝑖𝑔𝑖 𝑤 = 0
𝑔𝑖 𝑤 ≤ 0, ℎ𝑗 𝑤 = 0, 𝛼𝑖 ≥ 0
dual constraintsprimal constraints
![Page 15: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/15.jpg)
Lagrange duality
• What are the proper conditions?
• A set of conditions (Slater conditions):• 𝑓, 𝑔𝑖 convex, ℎ𝑗 affine
• Exists 𝑤 satisfying all 𝑔𝑖 𝑤 < 0
• There exist other sets of conditions• Search Karush–Kuhn–Tucker conditions on Wikipedia
![Page 16: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/16.jpg)
SVM: optimization
![Page 17: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/17.jpg)
SVM: optimization
• Optimization (Quadratic Programming):
min𝑤,𝑏
1
2𝑤
2
𝑦𝑖 𝑤𝑇𝑥𝑖 + 𝑏 ≥ 1, ∀𝑖
• Generalized Lagrangian:
ℒ 𝑤, 𝑏, 𝜶 =1
2𝑤
2
−
𝑖
𝛼𝑖[𝑦𝑖 𝑤𝑇𝑥𝑖 + 𝑏 − 1]
where 𝜶 is the Lagrange multiplier
![Page 18: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/18.jpg)
SVM: optimization
• KKT conditions:𝜕ℒ
𝜕𝑤= 0, 𝑤 = σ𝑖 𝛼𝑖𝑦𝑖𝑥𝑖 (1)
𝜕ℒ
𝜕𝑏= 0, 0 = σ𝑖 𝛼𝑖𝑦𝑖 (2)
• Plug into ℒ:
ℒ 𝑤, 𝑏, 𝜶 = σ𝑖 𝛼𝑖 −1
2σ𝑖𝑗 𝛼𝑖𝛼𝑗𝑦𝑖𝑦𝑗𝑥𝑖
𝑇𝑥𝑗 (3)
combined with 0 = σ𝑖 𝛼𝑖𝑦𝑖 , 𝛼𝑖 ≥ 0
![Page 19: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/19.jpg)
SVM: optimization
• Reduces to dual problem:
ℒ 𝑤, 𝑏, 𝜶 =
𝑖
𝛼𝑖 −1
2
𝑖𝑗
𝛼𝑖𝛼𝑗𝑦𝑖𝑦𝑗𝑥𝑖𝑇𝑥𝑗
𝑖
𝛼𝑖𝑦𝑖 = 0, 𝛼𝑖 ≥ 0
• Since 𝑤 = σ𝑖 𝛼𝑖𝑦𝑖𝑥𝑖, we have 𝑤𝑇𝑥 + 𝑏 = σ𝑖 𝛼𝑖𝑦𝑖𝑥𝑖𝑇𝑥 + 𝑏
Only depend on inner products
![Page 20: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/20.jpg)
Kernel methods
![Page 21: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/21.jpg)
Features
Color Histogram
Red Green Blue
Extract features
𝑥 𝜙 𝑥
![Page 22: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/22.jpg)
Features
![Page 23: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/23.jpg)
Features
• Proper feature mapping can make non-linear to linear
• Using SVM on the feature space {𝜙 𝑥𝑖 }: only need 𝜙 𝑥𝑖𝑇𝜙(𝑥𝑗)
• Conclusion: no need to design 𝜙 ⋅ , only need to design
𝑘 𝑥𝑖 , 𝑥𝑗 = 𝜙 𝑥𝑖𝑇𝜙(𝑥𝑗)
![Page 24: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/24.jpg)
Polynomial kernels
• Fix degree 𝑑 and constant 𝑐:𝑘 𝑥, 𝑥′ = 𝑥𝑇𝑥′ + 𝑐 𝑑
• What are 𝜙(𝑥)?
• Expand the expression to get 𝜙(𝑥)
![Page 25: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/25.jpg)
Polynomial kernels
Figure from Foundations of Machine Learning, by M. Mohri, A. Rostamizadeh, and A. Talwalkar
![Page 26: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/26.jpg)
Figure from Foundations of Machine Learning, by M. Mohri, A. Rostamizadeh, and A. Talwalkar
![Page 27: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/27.jpg)
Gaussian kernels
• Fix bandwidth 𝜎:
𝑘 𝑥, 𝑥′ = exp(− 𝑥 − 𝑥′2/2𝜎2)
• Also called radial basis function (RBF) kernels
• What are 𝜙(𝑥)? Consider the un-normalized version𝑘′ 𝑥, 𝑥′ = exp(𝑥𝑇𝑥′/𝜎2)
• Power series expansion:
𝑘′ 𝑥, 𝑥′ =
𝑖
+∞𝑥𝑇𝑥′ 𝑖
𝜎𝑖𝑖!
![Page 28: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/28.jpg)
Mercer’s condition for kenerls
• Theorem: 𝑘 𝑥, 𝑥′ has expansion
𝑘 𝑥, 𝑥′ =
𝑖
+∞
𝑎𝑖𝜙𝑖 𝑥 𝜙𝑖(𝑥′)
if and only if for any function 𝑐(𝑥),
∫ ∫ 𝑐 𝑥 𝑐 𝑥′ 𝑘 𝑥, 𝑥′ 𝑑𝑥𝑑𝑥′ ≥ 0
(Omit some math conditions for 𝑘 and 𝑐)
![Page 29: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/29.jpg)
Constructing new kernels
• Kernels are closed under positive scaling, sum, product, pointwise limit, and composition with a power series σ𝑖
+∞𝑎𝑖𝑘𝑖(𝑥, 𝑥′)
• Example: 𝑘1 𝑥, 𝑥′ , 𝑘2 𝑥, 𝑥′ are kernels, then also is
𝑘 𝑥, 𝑥′ = 2𝑘1 𝑥, 𝑥′ + 3𝑘2 𝑥, 𝑥′
• Example: 𝑘1 𝑥, 𝑥′ is kernel, then also is
𝑘 𝑥, 𝑥′ = exp(𝑘1 𝑥, 𝑥′ )
![Page 30: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/30.jpg)
Kernels v.s. Neural networks
![Page 31: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/31.jpg)
Features
Color Histogram
Red Green Blue
Extract features
𝑥
𝑦 = 𝑤𝑇𝜙 𝑥build hypothesis
![Page 32: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/32.jpg)
Features: part of the model
𝑦 = 𝑤𝑇𝜙 𝑥build hypothesis
Linear model
Nonlinear model
![Page 33: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/33.jpg)
Polynomial kernels
Figure from Foundations of Machine Learning, by M. Mohri, A. Rostamizadeh, and A. Talwalkar
![Page 34: Machine Learning Basics Lecture 5: SVM IISVM: optimization •Optimization (Quadratic Programming): min 𝑤,𝑏 s t 2 𝑇 + ≥ s,∀ •Solved by Lagrange multiplier method: ℒ](https://reader031.vdocuments.mx/reader031/viewer/2022040904/5e78b2e7e5d60945477e3a60/html5/thumbnails/34.jpg)
Polynomial kernel SVM as two layer neural network
𝑥1
𝑥2
𝑥12
𝑥22
2𝑥1𝑥2
2𝑐𝑥1
2𝑐𝑥2
𝑐
𝑦 = sign(𝑤𝑇𝜙(𝑥) + 𝑏)
First layer is fixed. If also learn first layer, it becomes two layer neural network