support vector machines - nbimathies/nbi_svm2019.pdf · support vector machines joachim mathiesen,...
TRANSCRIPT
Slide 1dato, og ”Enhedens
Support Vector Machines
Joachim Mathiesen, Niels Bohr Institute
Slide 2
(Over)simplified history
1960-1970s
Predominantly linear decision
boundaries/classifiers
1980s
Boom in neural networks and decision
trees
1990-2000s
Kernel machines/methods
outperformed neural networks
2010s
Revival of neural networks, and boosted
decision trees.
Slide 3
Classification
Slide 4
Generalized Linear Model (Logistic Regression) 1st Order Terms
Slide 5
Generalized Linear Model (Logistic Regression) 4th Order Terms
Slide 6
Random Forest
Slide 7
Support Vector Machines
Slide 8
Support Vector Machines
Efficient separation of non-linear regions based on kernel methods – we only have to know the dot product between data points.
No problems with convergence and no trapping in local minima – “simple” quadratic optimization
Slide 9
Classification SVM
Slide 10
“A complex pattern-classification problem, cast
in a high-dimensional space nonlinearly, is more
likely to be linearly separable than in a low-
dimensional space, provided that the space is
not densely populated.”
— Cover, T.M., Geometrical and Statistical
properties of systems of linear inequalities with
applications in pattern recognition, (1965)
Cover’s theorem
Slide 11
wikipedia.org
By mapping to a simplex, it is apparent that “Every partition of a samples into two sets is separable by a linear separator”
Slide 12
Basic SVM
The support vectors machine finds an optimal separation of the points, whereas a linear classifier, y=a+bx would have an infinite number of possible parameters a and b which would give a working decision boundary.
SVM maximizes the margin between the two classes.
Slide 13
Basic SVM Example: Radial Kernel
Beware of overfitting!
Slide 14
SVM
For a data set 𝑥1, 𝑥2, … , 𝑥𝑁 with target values {𝑦1, 𝑦2, … , 𝑦𝑁}, we aim to minimize
1
2𝑤
2
Subject to𝑦𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 = 1
Not optimal if you have over-lapping points belonging to dif-ferent classes near thedecision boundary.
Slide 15
Slack variables in SVM
In order not to be too sensitive to fuzziness close to separation boundary slack variables 𝜉 that allow for some misclassification. For a data set 𝑥1, 𝑥2, … , 𝑥𝑁 with target values {𝑦1, 𝑦2, … , 𝑦𝑁}, we now aim to minimize
1
2𝑤
2+ 𝐶
𝑖
𝜉𝑖
Subject to𝑦𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 ≥ 1 − 𝜉𝑖
𝜉𝑖 ≥ 0The cost C is the penalty you pay to do not classify correctly
Slide 16
-regression
Slide 17
-regression
Slide 18
-regressionThe -insensitive loss function, where predictions have to be within an distance of the true value.
Slide 19
-regression
Regression then works similarly to classification with slack-variables. For a data set 𝑥1, 𝑥2, … , 𝑥𝑁 with target values {𝑣1, 𝑣2, … , 𝑣𝑁}, we now aim to minimize
1
2𝑤
2+ 𝐶
𝑖
(𝜉𝑖+𝜉𝑖∗)
Subject to𝑣𝑖 −𝑤 ⋅ 𝑥𝑖 − 𝑏 ≤ 𝜖 + 𝜉𝑖−𝑣𝑖 + 𝑤 ⋅ 𝑥𝑖 + 𝑏 ≤ 𝜖 + 𝜉𝑖
∗
𝜉𝑖 ≥ 0, 𝜉𝑖∗ ≥ 0
Slide 20
SVMs are great general purpose models … with limited possibility for model interpretation
Slide 21
Main advantages of SVM:
Efficient separation of non-linear regions (based on basic dot-products).
Model follows from a quadratic optimization problem, i.e. no risk of ending in a local minimum – in contrast to for example neural networks.
Slide 22
Slide 23
Data science competitions –crowdsourcing platform for companies to pose problems and offer prices for the best predictive model on uploaded data.
Slide 24
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
Slide 25
SVM as a model for house prices in areas with
1000 < Zip Code < 2500
Basic estimate of square meter price as function of coordinates.
Slide 26
Radia
l Kern
al
Slide 27
Poly
nom
ial Kern
al
Slide 28
Change o
f gam
ma in r
adia
l kern
el,
changes p
erf
orm
ance
Slide 29
Slide 30
Lab exercise:
• Build a model for house/apartment prices at Østerbro.
• Filter dataI. Zip code = 2100II. Remove entries with sales
price lower than taxation value
III. Only use entries with sqmprice in the range 10kkr to 100kkr.
IV. Remove entries without UTM coordinates.
• Split data in training and validation sets
• Estimate error