support vector machines - nbimathies/nbi_svm2019.pdf · support vector machines joachim mathiesen,...

30
Slide 1 Support Vector Machines Joachim Mathiesen, Niels Bohr Institute

Upload: others

Post on 20-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 1dato, og ”Enhedens

Support Vector Machines

Joachim Mathiesen, Niels Bohr Institute

Page 2: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 2

(Over)simplified history

1960-1970s

Predominantly linear decision

boundaries/classifiers

1980s

Boom in neural networks and decision

trees

1990-2000s

Kernel machines/methods

outperformed neural networks

2010s

Revival of neural networks, and boosted

decision trees.

Page 3: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 3

Classification

Page 4: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 4

Generalized Linear Model (Logistic Regression) 1st Order Terms

Page 5: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 5

Generalized Linear Model (Logistic Regression) 4th Order Terms

Page 6: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 6

Random Forest

Page 7: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 7

Support Vector Machines

Page 8: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 8

Support Vector Machines

Efficient separation of non-linear regions based on kernel methods – we only have to know the dot product between data points.

No problems with convergence and no trapping in local minima – “simple” quadratic optimization

Page 9: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 9

Classification SVM

Page 10: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 10

“A complex pattern-classification problem, cast

in a high-dimensional space nonlinearly, is more

likely to be linearly separable than in a low-

dimensional space, provided that the space is

not densely populated.”

— Cover, T.M., Geometrical and Statistical

properties of systems of linear inequalities with

applications in pattern recognition, (1965)

Cover’s theorem

Page 11: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 11

wikipedia.org

By mapping to a simplex, it is apparent that “Every partition of a samples into two sets is separable by a linear separator”

Page 12: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 12

Basic SVM

The support vectors machine finds an optimal separation of the points, whereas a linear classifier, y=a+bx would have an infinite number of possible parameters a and b which would give a working decision boundary.

SVM maximizes the margin between the two classes.

Page 13: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 13

Basic SVM Example: Radial Kernel

Beware of overfitting!

Page 14: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 14

SVM

For a data set 𝑥1, 𝑥2, … , 𝑥𝑁 with target values {𝑦1, 𝑦2, … , 𝑦𝑁}, we aim to minimize

1

2𝑤

2

Subject to𝑦𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 = 1

Not optimal if you have over-lapping points belonging to dif-ferent classes near thedecision boundary.

Page 15: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 15

Slack variables in SVM

In order not to be too sensitive to fuzziness close to separation boundary slack variables 𝜉 that allow for some misclassification. For a data set 𝑥1, 𝑥2, … , 𝑥𝑁 with target values {𝑦1, 𝑦2, … , 𝑦𝑁}, we now aim to minimize

1

2𝑤

2+ 𝐶

𝑖

𝜉𝑖

Subject to𝑦𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 ≥ 1 − 𝜉𝑖

𝜉𝑖 ≥ 0The cost C is the penalty you pay to do not classify correctly

Page 16: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 16

-regression

Page 17: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 17

-regression

Page 18: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 18

-regressionThe -insensitive loss function, where predictions have to be within an distance of the true value.

Page 19: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 19

-regression

Regression then works similarly to classification with slack-variables. For a data set 𝑥1, 𝑥2, … , 𝑥𝑁 with target values {𝑣1, 𝑣2, … , 𝑣𝑁}, we now aim to minimize

1

2𝑤

2+ 𝐶

𝑖

(𝜉𝑖+𝜉𝑖∗)

Subject to𝑣𝑖 −𝑤 ⋅ 𝑥𝑖 − 𝑏 ≤ 𝜖 + 𝜉𝑖−𝑣𝑖 + 𝑤 ⋅ 𝑥𝑖 + 𝑏 ≤ 𝜖 + 𝜉𝑖

𝜉𝑖 ≥ 0, 𝜉𝑖∗ ≥ 0

Page 20: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 20

SVMs are great general purpose models … with limited possibility for model interpretation

Page 21: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 21

Main advantages of SVM:

Efficient separation of non-linear regions (based on basic dot-products).

Model follows from a quadratic optimization problem, i.e. no risk of ending in a local minimum – in contrast to for example neural networks.

Page 22: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 22

Page 23: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 23

Data science competitions –crowdsourcing platform for companies to pose problems and offer prices for the best predictive model on uploaded data.

Page 24: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 24

https://www.kaggle.com/c/house-prices-advanced-regression-techniques

Page 25: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 25

SVM as a model for house prices in areas with

1000 < Zip Code < 2500

Basic estimate of square meter price as function of coordinates.

Page 26: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 26

Radia

l Kern

al

Page 27: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 27

Poly

nom

ial Kern

al

Page 28: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 28

Change o

f gam

ma in r

adia

l kern

el,

changes p

erf

orm

ance

Page 29: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 29

Page 30: Support Vector Machines - NBImathies/NBI_SVM2019.pdf · Support Vector Machines Joachim Mathiesen, Niels Bohr Institute. Slide 2 (Over)simplified history 1960-1970s Predominantly

Slide 30

Lab exercise:

• Build a model for house/apartment prices at Østerbro.

• Filter dataI. Zip code = 2100II. Remove entries with sales

price lower than taxation value

III. Only use entries with sqmprice in the range 10kkr to 100kkr.

IV. Remove entries without UTM coordinates.

• Split data in training and validation sets

• Estimate error