uva cs 4501: machine learning lecture 11: support vector … · 2020. 8. 4. · computational...

58
UVA CS 4501: Machine Learning Lecture 11: Support Vector Machine (Basics) Dr. Yanjun Qi University of Virginia Department of Computer Science

Upload: others

Post on 11-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

UVACS4501:MachineLearning

Lecture11:SupportVectorMachine

(Basics)

Dr.YanjunQi

UniversityofVirginia

DepartmentofComputerScience

Page 2: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Wherearewe?èFivemajorsec@onsofthiscourse

q Regression(supervised)q Classifica@on(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

10/18/18 2

Dr.YanjunQi/UVA

Page 3: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Today

q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Op@miza@ontolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Op@miza@onwithdualformü Nonlineardecisionboundaryü Mul@classSVM

10/18/18 3

Dr.YanjunQi/UVA

Page 4: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

HistoryofSVM•  SVMisinspiredfromsta@s@callearningtheory[3]•  SVMwasfirstintroducedin1992[1]

•  SVMbecomespopularbecauseofitssuccessinhandwri[endigitrecogni@on(1994)–  1.1%testerrorrateforSVM.–  Thesameastheerrorratesofacarefullyconstructedneuralnetwork,LeNet

4.•  [email protected][2]orthediscussionin[3]fordetails

•  Regardedasanimportantexampleof“kernelmethods”,arguablytheho[estareainmachinelearning20yearsago

[1] B.E. Boser et al. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory 5 144-152, Pittsburgh, 1992.

[2] L. Bottou et al. Comparison of classifier methods: a case study in handwritten digit recognition. Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 2, pp. 77-82, 1994.

[3] V. Vapnik. The Nature of Statistical Learning Theory. 2nd edition, Springer, 1999.

10/18/18 4

Dr.YanjunQi/UVA

Theore@callysound/Impaccul

Page 5: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Handwri[endigitrecogni@on

10/18/18

Dr.YanjunQi/UVA

5

Page 6: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Applica@onsofSVMs

•  ComputerVision•  TextCategoriza@on•  Ranking(e.g.,Googlesearches)•  Handwri[enCharacterRecogni@on•  Timeseriesanalysis•  Bioinforma@cs•  ……….

àLotsofverysuccessfulapplica@ons!!!

10/18/18

Dr.YanjunQi/UVA

6

Page 7: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

ADatasetforbinary

classifica@on

•  Data/points/instances/examples/samples/records:[rows]•  Features/a0ributes/dimensions/independentvariables/covariates/

predictors/regressors:[columns,exceptthelast]•  Target/outcome/response/label/dependentvariable:special

columntobepredicted[lastcolumn]

10/18/18 7

Output as Binary Class:

only two possibilities

Dr.YanjunQi/UVA

Page 8: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Today

q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Op@miza@ontolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Op@miza@onwithdualformü Nonlineardecisionboundaryü Mul@classSVM

10/18/18 8

Dr.YanjunQi/UVA

Page 9: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

LinearClassifiersfx yest

denotes+1denotes-1

Howwouldyouclassifythisdata?

10/18/18 9

Dr.YanjunQi/UVA

Credit:Prof.Moore

Page 10: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

LinearClassifiersfx yest

denotes+1denotes-1

Howwouldyouclassifythisdata?

10/18/18 10

Dr.YanjunQi/UVA

Credit:Prof.Moore

Page 11: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

LinearClassifiersfx yest

denotes+1denotes-1

Howwouldyouclassifythisdata?

10/18/18 11

Dr.YanjunQi/UVA

Credit:Prof.Moore

Page 12: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

LinearClassifiersfx yest

denotes+1denotes-1

Howwouldyouclassifythisdata?

10/18/18 12

Dr.YanjunQi/UVA

Credit:Prof.Moore

Page 13: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

LinearClassifiersfx yest

denotes+1denotes-1

Anyofthesewouldbefine....butwhichisbest?

10/18/18 13

Dr.YanjunQi/UVA

Credit:Prof.Moore

Page 14: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

ClassifierMarginfx yest

denotes+1denotes-1 Definethemarginof

alinearclassifierasthewidththattheboundarycouldbeincreasedbybeforehilngadatapoint.

10/18/18 14

Dr.YanjunQi/UVA

Credit:Prof.Moore

Page 15: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

MaximumMarginfx yest

denotes+1denotes-1 Themaximum

marginlinearclassifieristhelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)

LinearSVM10/18/18 15

Dr.YanjunQi/UVA

Credit:Prof.Moore

Page 16: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

MaximumMarginfx yest

denotes+1denotes-1 Themaximum

marginlinearclassifieristhelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)

SupportVectorsarethosedatapointsthatthemarginpushesupagainst

LinearSVM10/18/18 16

Dr.YanjunQi/UVA

x2

x1

Credit:Prof.Moore

Page 17: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

MaximumMarginfx yest

denotes+1denotes-1 Themaximum

marginlinearclassifieristhelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)

SupportVectorsarethosedatapointsthatthemarginpushesupagainst

LinearSVM

f(x,w,b)=sign(wTx+b)

10/18/18 17

Dr.YanjunQi/UVA

x2

x1

Credit:Prof.Moore

Page 18: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Max margin classifiers

•  Instead of fitting all points, focus on boundary points

•  Learn a boundary that leads to the largest margin from both sets of points

From all the possible boundary lines, this leads to the largest margin on both sides

10/18/18 18

Dr.YanjunQi/UVA

x2

x1

Credit:Prof.Moore

Page 19: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Max margin classifiers

•  Instead of fitting all points, focus on boundary points

•  Learn a boundary that leads to the largest margin from points on both sides

D

D Why MAX margin?

•  Intuitive, ‘makes sense’

•  Some theoretical support (using VC dimension)

•  Works well in practice

10/18/18 19

Dr.YanjunQi/UVA

x2

x1

Credit:Prof.Moore

Page 20: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Max margin classifiers

•  Instead of fitting all points, focus on boundary points

•  Learn a boundary that leads to the largest margin from points on both sides

D

D That is why Also known as linear support vector machines (SVMs)

These are the vectors supporting the boundary

10/18/18 20

Dr.YanjunQi/UVA

x2

x1

Credit:Prof.Moore

Page 21: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

HowtorepresentaLinearDecisionBoundary?

10/18/18

Dr.YanjunQi/UVA

21

Page 22: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Review:AffineHyperplanes

•  h[ps://en.wikipedia.org/wiki/Hyperplane•  Anyhyperplanecanbegivenincoordinatesasthesolu@onofasinglelinear(algebraic)equa@onofdegree1.

10/18/18

Dr.YanjunQi/UVA

22

Q:Howdoesthisconnecttolinearregression?

Page 23: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

10/18/18

Dr.YanjunQi/UVA

23

Page 24: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Max-margin&DecisionBoundary

Class -1

Class 1

10/18/18

Dr.YanjunQi/UVA

24

W is a p-dim vector; b is a

scalar

x2

x1

Page 25: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Max-margin&DecisionBoundary•  Thedecisionboundaryshouldbeasfarawayfrom

thedataofbothclassesaspossible

Class -1

Class 1

W is a p-dim vector; b is a

scalar

10/18/18

Dr.YanjunQi/UVA

25

Page 26: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Specifying a max margin classifier

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1

Class +1 plane

boundary

Class -1 plane

10/18/18 26

Dr.YanjunQi/UVA

Page 27: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Specifying a max margin classifier

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1

Is the linear separation assumption realistic?

We will deal with this shortly, but lets assume it for now

10/18/18 27

Dr.YanjunQi/UVA

Page 28: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Today

q SupervisedClassifica@onq SupportVectorMachine(SVM)

ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Op@miza@ontolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Op@miza@onwithdualformü Nonlineardecisionboundaryü Mul@classSVM

10/18/18 28

Dr.YanjunQi/UVA

Page 29: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Maximizing the margin

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1 M

•  Lets define the width of the margin by M

•  How can we encode our goal of maximizing M in terms of our parameters (w and b)?

•  Lets start with a few obsevrations

10/18/18 29

Dr.YanjunQi/UVA

Concretederiva@onsofMseeExtra

Page 30: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

10/18/18

Dr.YanjunQi/UVA

30

Page 31: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Finding the optimal parameters

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

M x+

x-

M =2wTw

We can now search for the optimal parameters by finding a solution that:

1.  Correctly classifies all points

2.  Maximizes the margin (or equivalently minimizes wTw)

Several optimization methods can be used: Gradient descent, OR SMO (see extra slides)

10/18/18 31

Dr.YanjunQi/UVA

Page 32: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Today

q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Op@miza@ontolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Op@miza@onwithdualformü Nonlineardecisionboundaryü Prac@calGuide

10/18/18 32

Dr.YanjunQi/UVA

Page 33: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Optimization Step i.e. learning optimal parameter for SVM

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

M x+

x- €

M =2wTw

10/18/18 33

Dr.YanjunQi/UVA

Page 34: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Optimization Step i.e. learning optimal parameter for SVM

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

M x+

x- €

M =2wTw

Min (wTw)/2 subject to the following constraints:

10/18/18 34

Dr.YanjunQi/UVA

Page 35: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Optimization Step i.e. learning optimal parameter for SVM

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

M x+

x- €

M =2wTw

Min (wTw)/2 subject to the following constraints:

For all x in class + 1

wTx+b >= 1

For all x in class - 1

wTx+b <= -1

} A total of n constraints if we have n training samples

10/18/18 35

Dr.YanjunQi/UVA

Page 36: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Optimization Reformulation

Min (wTw)/2 subject to the following constraints:

For all x in class + 1

wTx+b >= 1

For all x in class - 1

wTx+b <= -1

} A total of n constraints if we have n input samples

10/18/18 36

Dr.YanjunQi/UVA

Page 37: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Optimization Reformulation

Min (wTw)/2 subject to the following constraints:

For all x in class + 1

wTx+b >= 1

For all x in class - 1

wTx+b <= -1

} A total of n constraints if we have n input samples

10/18/18 37

Dr.YanjunQi/UVA

argminw,b

wi2

i=1p∑

subject to ∀x i ∈Dtrain : yi x i ⋅w + b( ) ≥1

Page 38: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Optimization Reformulation

Min (wTw)/2 subject to the following constraints:

For all x in class + 1

wTx+b >= 1

For all x in class - 1

wTx+b <= -1

}A total of n constraints if we have n input samples

10/18/18 38

Dr.YanjunQi/UVA

argminw,b

wi2

i=1p∑

subject to ∀x i ∈Dtrain : yi wT x i + b( ) ≥1

Quadratic Objective

Quadratic programming i.e., -  Quadratic objective -  Linear constraints

Page 39: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

WhatNext?

q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Op@miza@ontolearnmodelparameters(w,b)ü LinearlyNon-separablecase(sopSVM)ü Op@miza@onwithdualformü Nonlineardecisionboundaryü Prac@calGuide

10/18/18 39

Dr.YanjunQi/UVA

Page 40: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Support Vector Machine

classification

Kernel Trick Func K(xi, xj)

Margin + Hinge Loss

QP with Dual form

Dual Weights

Task

Representation

Score Function

Search/Optimization

Models, Parameters

w = α ixiyii∑

argminw,b

wi2

i=1p∑ +C εi

i=1

n

subject to ∀xi ∈ Dtrain : yi xi ⋅w+b( ) ≥1−εi

K(x, z) :=Φ(x)TΦ(z)

40

maxα α i −i∑ 1

2 α iα jyi y ji,j∑ xi

Txj

α iyi =0i∑ , α i ≥0 ∀i

Page 41: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

References•  BigthankstoProf.ZivBar-JosephandProf.EricXing@CMUforallowingmetoreusesomeofhisslides

•  ElementsofSta@s@calLearning,byHas@e,TibshiraniandFriedman

•  Prof.AndrewMoore@CMU’sslides•  TutorialslidesfromDr.Tie-YanLiu,MSRAsia•  APrac@calGuidetoSupportVectorClassifica@onChih-WeiHsu,Chih-ChungChang,andChih-JenLin,2003-2010

•  TutorialslidesfromStanford“ConvexOp@miza@onI—Boyd&Vandenberghe

10/18/18 41

Dr.YanjunQi/UVACS

Page 42: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

EXTRA

10/18/18

Dr.YanjunQi/UVA

42

Page 43: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

10/18/18

Dr.YanjunQi/UVA

43

Thegradientpointsinthedirec3onofthegreatestrateofincreaseofthefunc3onanditsmagnitudeistheslopeofthegraphinthatdirec3on

Page 44: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

How to definethewidthofthemarginbyM(EXTRA)

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1 M

•  Lets define the width of the margin by M

•  How can we encode our goal of maximizing M in terms of our parameters (w and b)?

•  Lets start with a few obsevrations

10/18/18 44

Dr.YanjunQi/UVA

Concretederiva@onsofMseeExtra

Page 45: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

MarginM

10/18/18

Dr.YanjunQi/UVA

45

Page 46: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

10/18/18

Dr.YanjunQi/UVA

46

•  wT x+ + b = +1

•  wT x- + b = -1

•  M = | x+ - x- | = ?

Page 47: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Maximizing the margin: observation-1

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

M

•  Observation 1: the vector w is orthogonal to the +1 plane

•  Why?

Corollary: the vector w is orthogonal to the -1 plane 10/18/18 47

Dr.YanjunQi/UVA

Page 48: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Maximizing the margin: observation-1

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1

M

•  Observation 1: the vector w is orthogonal to the +1 plane

•  Why?

Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0

Corollary: the vector w is orthogonal to the -1 plane 10/18/18 48

Dr.YanjunQi/UVA

Page 49: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Maximizing the margin: observation-1

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1

M

•  Observation 1: the vector w is orthogonal to the +1 plane

•  Why?

Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0

Corollary: the vector w is orthogonal to the -1 plane 10/18/18 49

Dr.YanjunQi/UVA

Page 50: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Review:VectorProduct,Orthogonal,andNorm

Fortwovectorsxandy,

xTy

iscalledthe(inner)vectorproduct.

Thesquarerootoftheproductofavectorwithitself,

iscalledthe2-norm(|x|2),canalsowriteas|x|

xandyarecalledorthogonalif

xTy=0

10/18/18 50

Dr.YanjunQi/UVA

Page 51: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Observa@on1èReview:Orthogonal

10/18/18 51

Dr.YanjunQi/UVA

Page 52: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Maximizing the margin: observation-1

•  ObservaMon1:thevectorwisorthogonaltothe+1plane

Class 1

Class 2

M

10/18/18

Dr.YanjunQi/UVA

52

Page 53: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Maximizing the margin: observation-2

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

Classify as +1 if wTx+b >= 1 Classify as -1 if wTx+b <= - 1 Undefined if -1 <wTx+b < 1

M

•  Observation 1: the vector w is orthogonal to the +1 and -1 planes

•  Observation 2: if x+ is a point on the +1 plane and x- is the closest point to x+ on the -1 plane then

x+ = λ w + x-

x+

x-

Since w is orthogonal to both planes we need to ‘travel’ some distance along w to get from x+ to x-

10/18/18 53

Dr.YanjunQi/UVA

Page 54: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

10/18/18

Dr.YanjunQi/UVA

54

•  wT x+ + b = +1

•  wT x- + b = -1

•  M = | x+ - x- | = ?

•  x+ = λ w + x-

Putting it together

Page 55: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Putting it together

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

M

•  wT x+ + b = +1

•  wT x- + b = -1

•  x+ = λ w + x-

•  | x+ - x- | = M

x+

x-

We can now define M in terms of w and b

10/18/18 55

Dr.YanjunQi/UVA

Page 56: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

10/18/18

Dr.YanjunQi/UVA

56

Page 57: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Putting it together

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

M

•  wT x+ + b = +1

•  wT x- + b = -1

•  x+ = λw + x-

•  | x+ - x- | = M

x+

x-

We can now define M in terms of w and b

wT x+ + b = +1

=>

wT (λw + x-) + b = +1 =>

wTx- + b + λwTw = +1

=>

-1 + λwTw = +1

=>

λ = 2/wTw

10/18/18 57

Dr.YanjunQi/UVA

Page 58: UVA CS 4501: Machine Learning Lecture 11: Support Vector … · 2020. 8. 4. · Computational Learning Theory 5 144-152, Pittsburgh, 1992. [2] L. Bottou et al. Comparison of classifier

Putting it together

Predict class +1

Predict class -1 wTx+b=+1

wTx+b=0

wTx+b=-1

M

•  wT x+ + b = +1

•  wT x- + b = -1

•  x+ = λ w + x-

•  | x+ - x- | = M

•  λ = 2/wTw

x+

x-

We can now define M in terms of w and b

M = |x+ - x-|

=>

=>

M =| λw |= λ | w |= λ wTw

M = 2 wTwwTw

=2wTw

10/18/18 58

Dr.YanjunQi/UVA