arvutiteaduse instituut - support vector machines...performance evaluation, statistical learning...
TRANSCRIPT
![Page 2: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/2.jpg)
So far…
May 11, 2015
![Page 3: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/3.jpg)
So far…
Supervised machine learning
Linear models
Non-linear models
Unsupervised machine learning
Generic scaffolding
May 11, 2015
![Page 4: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/4.jpg)
So far…
Supervised machine learning
Linear models
Least squares regression
Fisher’s discriminant, Perceptron, Logistic model
Non-linear models
Neural networks, Decision trees, Association rules
Unsupervised machine learning
Clustering/EM, PCA
Generic scaffolding
Probabilistic modeling, ML/MAP estimation
Performance evaluation, Statistical learning theory
Linear algebra, Optimization methods
May 11, 2015
![Page 5: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/5.jpg)
Coming up next
Supervised machine learning
Linear models
Least squares regression, SVM
Fisher’s discriminant, Perceptron, Logistic regression, SVM
Non-linear models
Neural networks, Decision trees, Association rules
SVM, Kernel-XXX
Unsupervised machine learning
Clustering/EM, PCA, Kernel-XXX
Generic scaffolding
Probabilistic modeling, ML/MAP estimation
Performance evaluation, Statistical learning theory
Linear algebra, Optimization methods
KernelsMay 11, 2015
![Page 6: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/6.jpg)
First things first
SVM: (𝑦 ∈ {−1,1})
library('e1071')
m = svm(X, y, kernel='linear')
predict(m, newX)
May 11, 2015
![Page 7: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/7.jpg)
Quiz
May 11, 2015
This line is called …
This vector is …
Those lines are …
𝑓 𝒙 = ?
𝒙𝟏 = ? 𝑦1 = ?
Functional margin of 𝒙𝟏?
Geometric margin of 𝒙𝟏?
Distance to origin?
![Page 8: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/8.jpg)
Quiz
May 11, 2015
Separating hyperplane
Normal 𝒘
Isolines (level lines)
𝑓 𝒙 = 𝒘𝑻𝒙 + 𝑏
𝒙𝟏 = (2, 6); 𝑦1 = −1
𝑦1 ⋅ 𝑓 𝒙𝟏 ≈ 2
𝑓(𝒙𝟏)/|𝒘| ≈ 3√2
𝑑 = 𝑏/|𝒘|
![Page 9: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/9.jpg)
Quiz
Suppose we scale 𝒘 and 𝑏 by some constant.
Will it:
Affect the separating hyperplane? How?
Affect the functional margins? How?
Affect the geometric margins? How?
May 11, 2015
![Page 10: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/10.jpg)
Quiz
Example: 𝒘 → 2𝒘, 𝑏 = 0
May 11, 2015
![Page 11: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/11.jpg)
Quiz
Suppose we scale 𝒘 and b by some constant.
Will it:
Affect the separating hyperplane? How?
No: 𝒘𝑇𝒙 + 𝑏 = 0 ⇔ 2𝒘𝑇𝒙 + 2𝑏 = 0
Affect the functional margins? How?
Yes: 2𝒘𝑇𝒙 + 2𝑏 𝑦 = 2 ⋅ 𝒘𝑇𝒙 + 𝑏 𝑦
Affect the geometric margins? How?
No: 2𝒘𝑇𝒙+2𝑏
|2𝒘|=
𝒘𝑇𝒙+𝑏
|𝒘|
May 11, 2015
![Page 12: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/12.jpg)
Which classifier is best?
May 11, 2015
![Page 13: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/13.jpg)
Maximal margin classifier
May 11, 2015
![Page 14: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/14.jpg)
Why maximal margin?
Well-defined, single stable solution
Noise-tolerant
Small parameterization
(Fairly) efficient algorithms exist for finding it
May 11, 2015
![Page 15: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/15.jpg)
Maximal margin: Separable case
May 11, 2015
𝑓 𝒙 = 1
𝑓 𝒙 = −1
![Page 16: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/16.jpg)
Maximal margin: Separable case
May 11, 2015
𝑓 𝒙 = 1
𝑓 𝒙 = −1
∀𝑖 𝑓 𝒙𝑖 𝑦𝑖 ≥ 1
![Page 17: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/17.jpg)
Maximal margin: Separable case
May 11, 2015
𝑓 𝒙 = 1
𝑓 𝒙 = −1
The (geometric)
distance to the
isoline 𝑓 𝒙 = 1 is:
![Page 18: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/18.jpg)
Maximal margin: Separable case
May 11, 2015
𝑓 𝒙 = 1
𝑓 𝒙 = −1
The (geometric)
distance to the
isoline 𝑓 𝒙 = 1 is:
𝑑 =𝑓 𝒙
𝒘=
1
𝒘
![Page 19: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/19.jpg)
Maximal margin: Separable case
Among all linear classifiers (𝒘, 𝑏)
… which keep all points at functional margin of
𝟏 or more,
… we shall look for the one which has the largest
distance 𝒅 to the corresponding isolines, i.e. the
largest geometric margin.
As 𝑑 =1
𝒘, this is equivalent to finding the classifier
with minimal |𝒘|.
…which is equivalent to finding the classifier with
minimal 𝒘 2
May 11, 2015
![Page 20: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/20.jpg)
May 11, 2015
![Page 21: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/21.jpg)
May 11, 2015
![Page 22: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/22.jpg)
May 11, 2015
![Page 23: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/23.jpg)
May 11, 2015
![Page 24: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/24.jpg)
Compare
“Generic” linear classification (separable case):
Find (𝒘, b), such that all points are classified correctly
i.e. 𝑓 𝒙𝑖 𝑦𝑖 > 0
Maximal margin classification (separable case):
Find (𝒘, b), such that all points are classified correctly
with a fixed functional margin
i.e. 𝑓 𝒙𝑖 𝑦𝑖 > 𝟏
and 𝒘 𝟐 is minimal.
May 11, 2015
![Page 25: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/25.jpg)
Remember
May 11, 2015
SVM optimization problem
(separable case):
min𝒘,𝑏
1
2𝒘 2
so that
𝒘𝑇𝒙𝑖 + 𝑏 𝑦𝑖 ≥ 1
![Page 26: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/26.jpg)
General case (“soft margin”)
The same, but we also penalize all margin
violations.
May 11, 2015
SVM optimization problem:
min𝒘,𝑏
1
2𝒘 2 + 𝐶
𝑖
𝜉𝑖
where
𝜉𝑖 = 1 − 𝑓 𝒙𝑖 𝑦𝑖 +
𝜉𝑖 = 1 − 𝑓 𝒙𝑖 𝑦𝑖 +
![Page 27: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/27.jpg)
General case (“soft margin”)
The same, but we also penalize all margin
violations.
May 11, 2015
SVM optimization problem:
min𝒘,𝑏
1
2𝒘 2 + 𝐶
𝑖
1 − 𝑓 𝒙𝑖 𝑦𝑖 +
𝜉𝑖 = 1 − 𝑓 𝒙𝑖 𝑦𝑖 +
![Page 28: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/28.jpg)
General case (“soft margin”)
The same, but we also penalize all margin
violations.
May 11, 2015
SVM optimization problem:
min𝒘,𝑏
1
2𝒘 2 + 𝐶
𝑖
1 − 𝑚𝑖 +
𝜉𝑖 = 1 − 𝑓 𝒙𝑖 𝑦𝑖 +
![Page 29: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/29.jpg)
General case (“soft margin”)
The same, but we also penalize all margin
violations.
May 11, 2015
SVM optimization problem:
min𝒘,𝑏
1
2𝒘 2 + 𝐶
𝑖
hinge(𝑚𝑖)
where
hinge 𝑚𝑖 = 1 − 𝑚𝑖 +
𝜉𝑖 = 1 − 𝑓 𝒙𝑖 𝑦𝑖 +
![Page 30: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/30.jpg)
Hinge losshinge 𝑚𝑖 = 1 − 𝑚𝑖 +
May 11, 2015
![Page 31: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/31.jpg)
Classification loss functions
“Generic”
classification:
min𝒘,𝑏
𝑖
[𝑚𝑖 < 0]
May 11, 2015
![Page 32: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/32.jpg)
Classification loss functions
Perceptron:
May 11, 2015
![Page 33: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/33.jpg)
Classification loss functions
Perceptron:
min𝒘,𝑏
𝑖
(−𝑚𝑖)+
May 11, 2015
![Page 34: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/34.jpg)
Classification loss functions
Least squares
classification*:
min𝒘,𝑏
𝑖
𝑚𝑖 − 1 2
May 11, 2015
![Page 35: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/35.jpg)
Classification loss functions
Boosting:
min𝒘,𝑏
𝑖
exp(−𝑚𝑖)
May 11, 2015
![Page 36: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/36.jpg)
Classification loss functions
Logistic regression:
min𝒘,𝑏
𝑖
log(1 + 𝑒−𝑚𝑖)
May 11, 2015
![Page 37: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/37.jpg)
Classification loss functions
Regularized logistic
regression:
min𝒘,𝑏
𝑖
log(1 + 𝑒−𝑚𝑖)
+𝜆1
2𝒘 2
May 11, 2015
![Page 38: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/38.jpg)
Classification loss functions
SVM:
min𝒘,𝑏
𝑖
1 − 𝑚𝑖 +
+1
2𝐶𝒘 2
May 11, 2015
![Page 39: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/39.jpg)
Classification loss functions
L2-SVM:
min𝒘,𝑏
𝑖
1 − 𝑚𝑖 +2
+1
2𝐶𝒘 2
May 11, 2015
![Page 40: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/40.jpg)
Classification loss functions
L1-regularized L2-SVM:
min𝒘,𝑏
𝑖
1 − 𝑚𝑖 +2 +
1
2𝐶𝒘
… etc
May 11, 2015
![Page 41: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/41.jpg)
In general
min𝒘,𝑏
𝑖
𝜙(𝑚𝑖) + 𝜆 ⋅ Ω(𝒘)
May 11, 2015
Model fit Model complexity
![Page 42: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/42.jpg)
Compare to MAP estimation
maxModel
𝑖
log 𝑃(𝑥𝑖|Model) + log 𝑃(Model)
May 11, 2015
Likelihood Model prior
![Page 43: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/43.jpg)
Compare to MAP estimation
maxModel
log 𝑃(Data|Model) + log 𝑃(Model)
May 11, 2015
Likelihood Model prior
![Page 44: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/44.jpg)
Solving the SVM
min𝒘,𝑏
1
2𝒘 2 + 𝐶
𝑖
1 − 𝑓 𝒙𝑖 𝑦𝑖 +
May 11, 2015
![Page 45: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/45.jpg)
Solving the SVM
min𝒘,𝑏,𝜉
1
2𝒘 2 + 𝐶
𝑖
𝜉𝑖
such that
𝑓 𝒙𝑖 𝑦𝑖 ≥ 1 − 𝜉𝑖
𝜉𝑖 ≥ 0
May 11, 2015
![Page 46: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/46.jpg)
Solving the SVM
min𝒘,𝑏,𝜉
1
2𝒘 2 + 𝐶
𝑖
𝜉𝑖
such that
𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0𝜉𝑖 ≥ 0
May 11, 2015
![Page 47: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/47.jpg)
Solving the SVM
min𝒘,𝑏,𝜉
1
2𝒘 2 + 𝐶
𝑖
𝜉𝑖
such that
𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0𝜉𝑖 ≥ 0
Quadratic function with linear constraints!
May 11, 2015
![Page 48: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/48.jpg)
Solving the SVM
min𝒘,𝑏
1
2𝒘 2 + 𝐶
𝑖
𝜉𝑖
such that
𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0𝜉𝑖 ≥ 0
Quadratic function with linear constraints!
May 11, 2015
Quadratic programming
Minimize
𝑓 𝒙 =1
2𝒙𝑇𝑸𝒙 + 𝒄𝑇𝒙
subject to:
𝑨𝒙 ≥ 𝒃𝑪𝒙 = 𝒅
![Page 49: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/49.jpg)
Solving the SVM
min𝒘,𝑏
1
2𝒘 2 + 𝐶
𝑖
𝜉𝑖
such that
𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0𝜉𝑖 ≥ 0
Quadratic function with linear constraints!
May 11, 2015
Quadratic programming
Minimize
𝑓 𝒙 =1
2𝒙𝑇𝑸𝒙 + 𝒄𝑇𝒙
subject to:
𝑨𝒙 ≥ 𝒃𝑪𝒙 = 𝒅
> library(quadprog)
> solve.QP(Q, -c, A, b, neq)
![Page 50: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/50.jpg)
A popular trick in optimization:
min𝑥
𝑓(𝑥) , 𝑠. 𝑡. 𝑔 𝑥 ≥ 0
is equivalent to:
min𝑥
max𝛼≥0
𝑓 𝑥 − 𝛼𝑔 𝑥
May 11, 2015
![Page 51: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/51.jpg)
Solving the SVM: Dual
min𝒘,𝑏,𝜉
1
2𝒘 2 + 𝐶 𝑖 𝜉𝑖 such that 𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0, 𝜉𝑖 ≥ 0
May 11, 2015
![Page 52: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/52.jpg)
Solving the SVM: Dual
min𝒘,𝑏,𝜉
1
2𝒘 2 + 𝐶 𝑖 𝜉𝑖 such that 𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0, 𝜉𝑖 ≥ 0
Is equivalent to:
min𝒘,b,𝜉
max𝜶≥0,𝜷≥0
1
2𝒘 2 + 𝐶
𝑖
𝜉𝑖
−
𝑖
𝛼𝑖(𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖)
−
𝑖
𝛽𝑖𝜉𝑖
May 11, 2015
![Page 53: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/53.jpg)
Solving the SVM: Dual
min𝒘,𝑏,𝜉
1
2𝒘 2 + 𝐶 𝑖 𝜉𝑖 such that 𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0, 𝜉𝑖 ≥ 0
Is equivalent to:
min𝒘,b,𝜉
max𝜶≥0,𝜷≥0
1
2𝒘 2 + 𝐶
𝑖
𝜉𝑖
−
𝑖
𝛼𝑖(𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖)
−
𝑖
𝛽𝑖𝜉𝑖
May 11, 2015
![Page 54: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/54.jpg)
Solving the SVM: Dual
min𝒘,𝑏,𝜉
1
2𝒘 2 + 𝐶 𝑖 𝜉𝑖 such that 𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0, 𝜉𝑖 ≥ 0
Is equivalent to:
min𝒘,b,𝜉
max𝜶≥0,𝜷≥0
1
2𝒘 2 +
𝑖
𝜉𝑖 𝐶 − 𝛼𝑖 − 𝛽𝑖
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
May 11, 2015
![Page 55: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/55.jpg)
Solving the SVM: Dual
min𝒘,𝑏,𝜉
1
2𝒘 2 + 𝐶 𝑖 𝜉𝑖 such that 𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0, 𝜉𝑖 ≥ 0
Is equivalent to:
min𝒘,b,𝜉
max𝜶≥0,𝜷≥0
1
2𝒘 2 +
𝑖
𝜉𝑖 𝐶 − 𝛼𝑖 − 𝛽𝑖
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
𝐶 − 𝛼𝑖 − 𝛽𝑖 = 0
May 11, 2015
![Page 56: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/56.jpg)
Solving the SVM: Dual
min𝒘,𝑏,𝜉
1
2𝒘 2 + 𝐶 𝑖 𝜉𝑖 such that 𝑓 𝒙𝑖 𝑦𝑖 − 1 − 𝜉𝑖 ≥ 0, 𝜉𝑖 ≥ 0
Is equivalent to:
min𝒘,b,𝜉
max𝜶≥0,𝜷≥0
1
2𝒘 2 +
𝑖
𝜉𝑖 𝐶 − 𝛼𝑖 − 𝛽𝑖
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
0 ≤ 𝛼𝑖 ≤ 𝐶
May 11, 2015
![Page 57: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/57.jpg)
Solving the SVM: Dual
min𝒘,b
max𝜶
1
2𝒘 2
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
0 ≤ 𝛼𝑖 ≤ 𝐶
May 11, 2015
![Page 58: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/58.jpg)
Solving the SVM: Dual
min𝒘,b
max𝜶
1
2𝒘 2
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
0 ≤ 𝛼𝑖 ≤ 𝐶
Sparsity: 𝛼𝑖 is nonzero only for those points which
have
𝑓 𝒙𝑖 𝑦𝑖 − 1 < 0
May 11, 2015
![Page 59: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/59.jpg)
Solving the SVM: Dual
min𝒘,b
max𝜶
1
2𝒘 2
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
0 ≤ 𝛼𝑖 ≤ 𝐶
Now swap the min and the max (can be done in
particular because everything is nice and convex).
May 11, 2015
![Page 60: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/60.jpg)
Solving the SVM: Dual
max𝜶
min𝒘,𝑏
1
2𝒘 2
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
0 ≤ 𝛼𝑖 ≤ 𝐶
Next solve the inner (unconstrained) min as usual.
May 11, 2015
![Page 61: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/61.jpg)
Solving the SVM: Dual
max𝜶
min𝒘,𝑏
1
2𝒘 2
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
0 ≤ 𝛼𝑖 ≤ 𝐶
Next solve the inner (unconstrained) min as usual:
𝛻𝒘 = 𝒘 − 𝛼𝑖𝑦𝑖𝒙𝑖 = 0
𝛻𝑏 = − 𝛼𝑖𝑦𝑖 = 0
May 11, 2015
![Page 62: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/62.jpg)
Solving the SVM: Dual
max𝜶
min𝒘,𝑏
1
2𝒘 2
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
0 ≤ 𝛼𝑖 ≤ 𝐶
Express 𝒘 and substitute:
𝒘 = 𝛼𝑖𝑦𝑖𝒙𝑖
𝛼𝑖𝑦𝑖 = 0
May 11, 2015
![Page 63: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/63.jpg)
Solving the SVM: Dual
max𝜶
min𝒘,𝑏
1
2𝒘 2
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
0 ≤ 𝛼𝑖 ≤ 𝐶
Express 𝒘 and substitute:
𝒘 = 𝛼𝑖𝑦𝑖𝒙𝑖
𝛼𝑖𝑦𝑖 = 0
May 11, 2015
Dual
representation
![Page 64: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/64.jpg)
Solving the SVM: Dual
max𝜶
min𝒘,𝑏
1
2𝒘 2
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
0 ≤ 𝛼𝑖 ≤ 𝐶
Express 𝒘 and substitute:
𝒘 = 𝛼𝑖𝑦𝑖𝒙𝑖
𝛼𝑖𝑦𝑖 = 0
May 11, 2015
“Balance”
![Page 65: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/65.jpg)
Solving the SVM: Dual
max𝜶
min𝒘,𝑏
1
2𝒘 2
−
𝑖
𝛼𝑖 𝑓 𝒙𝑖 𝑦𝑖 − 1
0 ≤ 𝛼𝑖 ≤ 𝐶
Express 𝒘 and substitute:
max𝜶
𝑖
𝛼𝑖 −1
2
𝑖,𝑗
𝛼𝑖𝛼𝑗𝑦𝑖𝑦𝑗𝒙𝑖𝑇𝒙𝑗
0 ≤ 𝛼𝑖 ≤ 𝐶
𝑖
𝛼𝑖𝑦𝑖 = 0
May 11, 2015
![Page 66: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/66.jpg)
Solving the SVM: Dual
max𝜶
𝑖
𝛼𝑖 −1
2
𝑖,𝑗
𝛼𝑖𝛼𝑗𝑦𝑖𝑦𝑗𝒙𝑖𝑇𝒙𝑗
0 ≤ 𝛼𝑖 ≤ 𝐶
𝑖
𝛼𝑖𝑦𝑖 = 0
May 11, 2015
![Page 67: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/67.jpg)
Solving the SVM: Dual
max𝜶
𝟏𝑇𝜶 −1
2𝜶𝑇 𝑲 ∘ 𝒀 𝜶
0 ≤ 𝜶 ≤ 𝐶𝒚𝑇𝜶 = 0
𝐾𝑖𝑗 = 𝒙𝑖𝑇𝒙𝑗, 𝑌𝑖𝑗 = 𝑦𝑖𝑦𝑗
May 11, 2015
![Page 68: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/68.jpg)
Solving the SVM: Dual
min𝜶
1
2𝜶𝑇 𝑲 ∘ 𝒀 𝜶 − 𝟏𝑇𝜶
𝜶 ≥ 0−𝜶 ≥ −𝐶𝒚𝑇𝜶 = 0
Then find 𝑏 from the condition*:
𝑓 𝒙𝑖 𝑦𝑖 = 1 if 0 < 𝛼𝑖 < 𝐶*see homework, it’s actually not that easy!
May 11, 2015
![Page 69: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/69.jpg)
May 11, 2015
Support vectors
![Page 70: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/70.jpg)
May 11, 2015
C
C
0
0
0
0
0
0.5
0.5
1
Support vectors
𝑖
𝛼𝑖𝑦𝑖 = 0
0 ≤ 𝛼𝑖 ≤ 𝐶
![Page 71: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/71.jpg)
Sparsity
The dual solution is often very sparse, this
allows to perform optimization efficiently
“Working set” approach.
May 11, 2015
![Page 72: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/72.jpg)
Kernels
𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏
𝒘 = 𝛼𝑖𝑦𝑖𝒙𝑖
𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝒙𝑖𝑇𝒙 + 𝑏
𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝐾(𝒙𝑖 , 𝒙) + 𝑏
May 11, 2015
![Page 73: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/73.jpg)
Kernels
𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏
𝒘 = 𝛼𝑖𝑦𝑖𝒙𝑖
𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝒙𝑖𝑇𝒙 + 𝑏
𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝐾(𝒙𝑖 , 𝒙) + 𝑏
May 11, 2015
Kernel function
![Page 74: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/74.jpg)
𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏
𝒘 = 𝛼𝑖𝑦𝑖𝒙𝑖
𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝒙𝑖𝑇𝒙 + 𝑏
𝑓 𝒙 = 𝛼𝑖𝑦𝑖𝐾(𝒙𝑖 , 𝒙) + 𝑏
Kernels
May 11, 2015
𝑓 𝑥 = 𝑤1𝑥 + 𝑤2𝑥2 + 𝑏
𝑓 𝒙 = 𝛼𝑖𝑦𝑖exp(−|𝒙𝑖 − 𝒙 𝟐) + 𝑏
![Page 75: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/75.jpg)
Quiz
SVM is a __________ linear classifier.
Margin maximization can be achieved via
minimization of ______________.
SVM uses _____ loss and _______
regularization.
Besides hinge loss I also know ____ loss and
___ loss.
SVM in both primal and dual form is solved
using ________ programming.
May 11, 2015
![Page 76: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/76.jpg)
Quiz
In primal formulation we solve for parameter
vector ___. In dual formulation we solve for
___ instead.
_____ form of SVM is typically sparse.
Support vectors are those training points for
which _______.
The relation between primal and dual variables
is: ___= 𝑖 ______.
A Kernel is a generalization of _____ product.
May 11, 2015
![Page 77: Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised](https://reader033.vdocuments.mx/reader033/viewer/2022042219/5ec4f1151fbf883e27638706/html5/thumbnails/77.jpg)
May 11, 2015