non-parametric methods and support vector machinesshwu/courses/ml/slides/07_knn_svm.pdf ·...

110
Non-Parametric Methods and Support Vector Machines Shan-Hung Wu [email protected] Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 1 / 42

Upload: others

Post on 06-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Non-Parametric Methods andSupport Vector Machines

Shan-Hung [email protected]

Department of Computer Science,National Tsing Hua University, Taiwan

Machine Learning

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 1 / 42

Page 2: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Outline

1Non-Parametric Methods

K-NNParzen WindowsLocal Models

2Support Vector Machines

SVCSlacksNonlinear SVCDual ProblemKernel Trick

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 2 / 42

Page 3: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Outline

1Non-Parametric Methods

K-NNParzen WindowsLocal Models

2Support Vector Machines

SVCSlacksNonlinear SVCDual ProblemKernel Trick

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 3 / 42

Page 4: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Outline

1Non-Parametric Methods

K-NNParzen WindowsLocal Models

2Support Vector Machines

SVCSlacksNonlinear SVCDual ProblemKernel Trick

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 4 / 42

Page 5: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

K-NN Methods I

The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:

1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x

3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels

Distance metric? E.g., Euclidean distance d(x(i),x) = kx

(i)�xkTraining algorithm? Simply “remember” X in storage

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42

Page 6: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

K-NN Methods I

The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:

1 Choose the number K and a distance metric

2 Find the K nearest neighbors of a given point x

3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels

Distance metric? E.g., Euclidean distance d(x(i),x) = kx

(i)�xkTraining algorithm? Simply “remember” X in storage

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42

Page 7: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

K-NN Methods I

The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:

1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x

3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels

Distance metric? E.g., Euclidean distance d(x(i),x) = kx

(i)�xkTraining algorithm? Simply “remember” X in storage

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42

Page 8: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

K-NN Methods I

The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:

1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x

3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels

Distance metric? E.g., Euclidean distance d(x(i),x) = kx

(i)�xkTraining algorithm? Simply “remember” X in storage

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42

Page 9: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

K-NN Methods I

The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:

1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x

3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels

Distance metric?

E.g., Euclidean distance d(x(i),x) = kx

(i)�xkTraining algorithm? Simply “remember” X in storage

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42

Page 10: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

K-NN Methods I

The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:

1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x

3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels

Distance metric? E.g., Euclidean distance d(x(i),x) = kx

(i)�xkTraining algorithm?

Simply “remember” X in storage

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42

Page 11: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

K-NN Methods I

The K-nearest neighbor (K-NN) methods are straightforward, but afundamentally different way, to predict the label of a data point x:

1 Choose the number K and a distance metric2 Find the K nearest neighbors of a given point x

3 Predict the label of x by the majority vote (in classification) or average(in regression) of NNs’ labels

Distance metric? E.g., Euclidean distance d(x(i),x) = kx

(i)�xkTraining algorithm? Simply “remember” X in storage

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 5 / 42

Page 12: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

K-NN Methods II

Could be very complexK is a hyperparameter controlling the model complexity

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 6 / 42

Page 13: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Non-Parametric Methods

K-NN method is a special case of non-parametric (ormemory-based) methods

Non-parametric in the sense that f are not described by only fewparametersMemory-based in that all data (rather than just parameters) need to bememorized during the training process

K-NN is also a lazy method since the prediction function f is obtainedonly before the prediction

Motivates the development of other local models

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 7 / 42

Page 14: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Non-Parametric Methods

K-NN method is a special case of non-parametric (ormemory-based) methods

Non-parametric in the sense that f are not described by only fewparameters

Memory-based in that all data (rather than just parameters) need to bememorized during the training process

K-NN is also a lazy method since the prediction function f is obtainedonly before the prediction

Motivates the development of other local models

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 7 / 42

Page 15: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Non-Parametric Methods

K-NN method is a special case of non-parametric (ormemory-based) methods

Non-parametric in the sense that f are not described by only fewparametersMemory-based in that all data (rather than just parameters) need to bememorized during the training process

K-NN is also a lazy method since the prediction function f is obtainedonly before the prediction

Motivates the development of other local models

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 7 / 42

Page 16: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Non-Parametric Methods

K-NN method is a special case of non-parametric (ormemory-based) methods

Non-parametric in the sense that f are not described by only fewparametersMemory-based in that all data (rather than just parameters) need to bememorized during the training process

K-NN is also a lazy method since the prediction function f is obtainedonly before the prediction

Motivates the development of other local models

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 7 / 42

Page 17: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Non-Parametric Methods

K-NN method is a special case of non-parametric (ormemory-based) methods

Non-parametric in the sense that f are not described by only fewparametersMemory-based in that all data (rather than just parameters) need to bememorized during the training process

K-NN is also a lazy method since the prediction function f is obtainedonly before the prediction

Motivates the development of other local models

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 7 / 42

Page 18: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Pros & Cons

Pros:Almost no assumption on f other than smoothness

High capacity/complexityHigh accuracy given a large training set

Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems

Cons:Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction

Can speed up with index and/or approximation

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42

Page 19: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Pros & Cons

Pros:Almost no assumption on f other than smoothness

High capacity/complexityHigh accuracy given a large training set

Supports online training (by simply memorizing)

Readily extensible to multi-class and regression problemsCons:

Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction

Can speed up with index and/or approximation

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42

Page 20: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Pros & Cons

Pros:Almost no assumption on f other than smoothness

High capacity/complexityHigh accuracy given a large training set

Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems

Cons:Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction

Can speed up with index and/or approximation

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42

Page 21: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Pros & Cons

Pros:Almost no assumption on f other than smoothness

High capacity/complexityHigh accuracy given a large training set

Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems

Cons:Storage demanding

Sensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction

Can speed up with index and/or approximation

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42

Page 22: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Pros & Cons

Pros:Almost no assumption on f other than smoothness

High capacity/complexityHigh accuracy given a large training set

Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems

Cons:Storage demandingSensitive to outliers

Sensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction

Can speed up with index and/or approximation

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42

Page 23: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Pros & Cons

Pros:Almost no assumption on f other than smoothness

High capacity/complexityHigh accuracy given a large training set

Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems

Cons:Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)

Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction

Can speed up with index and/or approximation

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42

Page 24: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Pros & Cons

Pros:Almost no assumption on f other than smoothness

High capacity/complexityHigh accuracy given a large training set

Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems

Cons:Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)

Computationally expensive: O(ND) time for making each predictionCan speed up with index and/or approximation

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42

Page 25: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Pros & Cons

Pros:Almost no assumption on f other than smoothness

High capacity/complexityHigh accuracy given a large training set

Supports online training (by simply memorizing)Readily extensible to multi-class and regression problems

Cons:Storage demandingSensitive to outliersSensitive to irrelevant data features (vs. decision trees)Needs to deal with missing data (e.g., special distances)Computationally expensive: O(ND) time for making each prediction

Can speed up with index and/or approximation

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 8 / 42

Page 26: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Outline

1Non-Parametric Methods

K-NNParzen WindowsLocal Models

2Support Vector Machines

SVCSlacksNonlinear SVCDual ProblemKernel Trick

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 9 / 42

Page 27: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Parzen Windows and KernelsBinary KNN classifier:

f (x) = sign

⇣Âi :x

(i)2KNN(x)y(i)

The “radius” of voter boundary depends on the input x

We can instead use the Parzen window with a fixed radius:

f (x) = sign

⇣Âi y(i)1(x(i);kx

(i)�xk R)⌘

Parzen windows also replace the hard boundary with a soft one:

f (x) = sign

⇣Âi y(i)k(x(i),x)

k(x(i),x) is a radial basis function (RBF) kernel whose valuedecreases along space radiating outward from x

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 10 / 42

Page 28: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Parzen Windows and KernelsBinary KNN classifier:

f (x) = sign

⇣Âi :x

(i)2KNN(x)y(i)

The “radius” of voter boundary depends on the input x

We can instead use the Parzen window with a fixed radius:

f (x) = sign

⇣Âi y(i)1(x(i);kx

(i)�xk R)⌘

Parzen windows also replace the hard boundary with a soft one:

f (x) = sign

⇣Âi y(i)k(x(i),x)

k(x(i),x) is a radial basis function (RBF) kernel whose valuedecreases along space radiating outward from x

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 10 / 42

Page 29: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Parzen Windows and KernelsBinary KNN classifier:

f (x) = sign

⇣Âi :x

(i)2KNN(x)y(i)

The “radius” of voter boundary depends on the input x

We can instead use the Parzen window with a fixed radius:

f (x) = sign

⇣Âi y(i)1(x(i);kx

(i)�xk R)⌘

Parzen windows also replace the hard boundary with a soft one:

f (x) = sign

⇣Âi y(i)k(x(i),x)

k(x(i),x) is a radial basis function (RBF) kernel whose valuedecreases along space radiating outward from x

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 10 / 42

Page 30: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Parzen Windows and KernelsBinary KNN classifier:

f (x) = sign

⇣Âi :x

(i)2KNN(x)y(i)

The “radius” of voter boundary depends on the input x

We can instead use the Parzen window with a fixed radius:

f (x) = sign

⇣Âi y(i)1(x(i);kx

(i)�xk R)⌘

Parzen windows also replace the hard boundary with a soft one:

f (x) = sign

⇣Âi y(i)k(x(i),x)

k(x(i),x) is a radial basis function (RBF) kernel whose valuedecreases along space radiating outward from x

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 10 / 42

Page 31: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Common RBF Kernels

How to act like soft K-NN?

Gaussian RBF kernel:

k(x(i),x) = N (x(i)�x;0,s2

I)

Or simplyk(x(i),x) = exp

⇣�gkx

(i)�xk2

g � 0 (or s

2) is a hyperparameter controlling the smoothness of f

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 11 / 42

Page 32: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Common RBF Kernels

How to act like soft K-NN?Gaussian RBF kernel:

k(x(i),x) = N (x(i)�x;0,s2

I)

Or simplyk(x(i),x) = exp

⇣�gkx

(i)�xk2

g � 0 (or s

2) is a hyperparameter controlling the smoothness of f

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 11 / 42

Page 33: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Common RBF Kernels

How to act like soft K-NN?Gaussian RBF kernel:

k(x(i),x) = N (x(i)�x;0,s2

I)

Or simplyk(x(i),x) = exp

⇣�gkx

(i)�xk2

g � 0 (or s

2) is a hyperparameter controlling the smoothness of f

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 11 / 42

Page 34: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Outline

1Non-Parametric Methods

K-NNParzen WindowsLocal Models

2Support Vector Machines

SVCSlacksNonlinear SVCDual ProblemKernel Trick

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 12 / 42

Page 35: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Locally Weighted Linear Regression

In addition to the majority voting and average, we can define local

models for lazy predictions

E.g., in (eager) linear regression, we find w 2 RD+1 that minimizesSSE:

argmin

w

Âi(y(i)�w

>x

(i))2

Local model: to find w minimizing SSE local to the point x we

want to predict:

argmin

w

Âi

k(x(i),x)(y(i)�w

>x

(i))2

k(·, ·) 2 R is an RBF kernel

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 13 / 42

Page 36: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Locally Weighted Linear Regression

In addition to the majority voting and average, we can define local

models for lazy predictionsE.g., in (eager) linear regression, we find w 2 RD+1 that minimizesSSE:

argmin

w

Âi(y(i)�w

>x

(i))2

Local model: to find w minimizing SSE local to the point x we

want to predict:

argmin

w

Âi

k(x(i),x)(y(i)�w

>x

(i))2

k(·, ·) 2 R is an RBF kernel

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 13 / 42

Page 37: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Outline

1Non-Parametric Methods

K-NNParzen WindowsLocal Models

2Support Vector Machines

SVCSlacksNonlinear SVCDual ProblemKernel Trick

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 14 / 42

Page 38: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Kernel Machines

Kernel machines:

f (x) =N

Âi=1

cik(x(i),x)+ c0

For example:Parzen windows: ci = y(i) and c

0

= 0

Locally weighted linear regression: ci = (y(i)�w

>x

(i))2 and c0

= 0

The variable c 2 RN can be learned in either an eager or lazy mannerPros: complex, but highly accurate if regularized well

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 15 / 42

Page 39: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Kernel Machines

Kernel machines:

f (x) =N

Âi=1

cik(x(i),x)+ c0

For example:Parzen windows: ci = y(i) and c

0

= 0

Locally weighted linear regression: ci = (y(i)�w

>x

(i))2 and c0

= 0

The variable c 2 RN can be learned in either an eager or lazy mannerPros: complex, but highly accurate if regularized well

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 15 / 42

Page 40: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Sparse Kernel Machines

To make a prediction, we need to store all examplesMay be infeasible due to

Large dataset (N)Time limitSpace limit

Can we make c sparse?I.e., to make ci 6= 0 for only a small fraction of examples called support

vectors

How?

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 16 / 42

Page 41: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Sparse Kernel Machines

To make a prediction, we need to store all examplesMay be infeasible due to

Large dataset (N)Time limitSpace limit

Can we make c sparse?I.e., to make ci 6= 0 for only a small fraction of examples called support

vectors

How?

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 16 / 42

Page 42: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Outline

1Non-Parametric Methods

K-NNParzen WindowsLocal Models

2Support Vector Machines

SVCSlacksNonlinear SVCDual ProblemKernel Trick

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 17 / 42

Page 43: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Separating Hyperplane I

Model: F= {f : f (x;w,b) = w

>x+b}

A collection of hyperplanes

Prediction: y = sign(f (x))

Training: to find w and b such that

w

>x

(i) +b � 0, if y(i) = 1

w

>x

(i) +b 0, if y(i) =�1

or simplyy(i)(w>

x

(i) +b)� 0

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 18 / 42

Page 44: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Separating Hyperplane I

Model: F= {f : f (x;w,b) = w

>x+b}

A collection of hyperplanes

Prediction: y = sign(f (x))

Training: to find w and b such that

w

>x

(i) +b � 0, if y(i) = 1

w

>x

(i) +b 0, if y(i) =�1

or simplyy(i)(w>

x

(i) +b)� 0

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 18 / 42

Page 45: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Separating Hyperplane I

Model: F= {f : f (x;w,b) = w

>x+b}

A collection of hyperplanes

Prediction: y = sign(f (x))

Training: to find w and b such that

w

>x

(i) +b � 0, if y(i) = 1

w

>x

(i) +b 0, if y(i) =�1

or simplyy(i)(w>

x

(i) +b)� 0

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 18 / 42

Page 46: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Separating Hyperplane II

There are many feasible w’s and b’s when the classes are linearlyseparableWhich hyperplane is the best?

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 19 / 42

Page 47: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Support Vector ClassificationSupport vector classifier (SVC) picks one with largest margin:

y(i)(w>x

(i) +b)� a for all iMargin: 2a/kwk [Homework]

With loss of generality, we let a = 1 and solve the problem:

argmin

w,b1

2

kwk2

sibject to y(i)(w>x

(i) +b)� 1,8i

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 20 / 42

Page 48: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Support Vector ClassificationSupport vector classifier (SVC) picks one with largest margin:

y(i)(w>x

(i) +b)� a for all iMargin: 2a/kwk [Homework]

With loss of generality, we let a = 1 and solve the problem:

argmin

w,b1

2

kwk2

sibject to y(i)(w>x

(i) +b)� 1,8i

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 20 / 42

Page 49: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Outline

1Non-Parametric Methods

K-NNParzen WindowsLocal Models

2Support Vector Machines

SVCSlacksNonlinear SVCDual ProblemKernel Trick

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 21 / 42

Page 50: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Overlapping ClassesIn practice, classes may be overlapping

Due to, e.g., noises or outliers

The problemargmin

w,b1

2

kwk2

sibject to y(i)(w>x

(i) +b)� 1,8i

has no solution in this case. How to fix this?

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 22 / 42

Page 51: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Overlapping ClassesIn practice, classes may be overlapping

Due to, e.g., noises or outliers

The problemargmin

w,b1

2

kwk2

sibject to y(i)(w>x

(i) +b)� 1,8i

has no solution in this case. How to fix this?Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 22 / 42

Page 52: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

SlacksSVC tolerates slacks that fall outside of the regions they ought to beProblem:

argmin

w,b,x1

2

kwk2+C ÂNi=1

xi

sibject to y(i)(w>x

(i) +b)� 1�xi and xi � 0,8i

Favors large margin but also fewer slacks

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 23 / 42

Page 53: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Hyperparameter C

argmin

w,b,x1

2

kwk2 +C ÂNi=1

xi

The hyperparameter C controls the tradeoff betweenMaximizing marginMinimizing number of slacks

Provides a geometric explanation to the weight decay

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 24 / 42

Page 54: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Hyperparameter C

argmin

w,b,x1

2

kwk2 +C ÂNi=1

xi

The hyperparameter C controls the tradeoff betweenMaximizing marginMinimizing number of slacks

Provides a geometric explanation to the weight decay

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 24 / 42

Page 55: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Outline

1Non-Parametric Methods

K-NNParzen WindowsLocal Models

2Support Vector Machines

SVCSlacksNonlinear SVCDual ProblemKernel Trick

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 25 / 42

Page 56: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Nonlinearly Separable Classes

In practice, classes may be nonlinearly separable

SVC (with slacks) gives “bad” hyperplanes due to underfittingHow to make it nonlinear?

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 26 / 42

Page 57: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Nonlinearly Separable Classes

In practice, classes may be nonlinearly separable

SVC (with slacks) gives “bad” hyperplanes due to underfittingHow to make it nonlinear?

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 26 / 42

Page 58: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Feature Augmentation

Recall that in polynomial regression, we augment data features tomake a linear regressor nonlinear

We can can define a function F(·) that maps each data point to ahigh dimensional space:

argmin

w,b,x1

2

kwk2 +C Âi xi

sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 27 / 42

Page 59: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Feature Augmentation

Recall that in polynomial regression, we augment data features tomake a linear regressor nonlinearWe can can define a function F(·) that maps each data point to ahigh dimensional space:

argmin

w,b,x1

2

kwk2 +C Âi xi

sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 27 / 42

Page 60: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Outline

1Non-Parametric Methods

K-NNParzen WindowsLocal Models

2Support Vector Machines

SVCSlacksNonlinear SVCDual ProblemKernel Trick

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 28 / 42

Page 61: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Time Complexity

Nonlinear SVC:

argmin

w,b,x1

2

kwk2 +C Âi xi

sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i

The higher augmented feature dimension, the more variables in w tosolve

Can we solve w in time complexity that is independent with themapped dimension?

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 29 / 42

Page 62: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Time Complexity

Nonlinear SVC:

argmin

w,b,x1

2

kwk2 +C Âi xi

sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i

The higher augmented feature dimension, the more variables in w tosolveCan we solve w in time complexity that is independent with themapped dimension?

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 29 / 42

Page 63: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Dual Problem

Primal problem:

argmin

w,b,x1

2

kwk2 +C Âi xi

sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i

Dual problem:

argmax

a,b min

w,b,x L(w,b,x ,a,b )subject to a � 0,b � 0

where L(w,b,x ,a,b ) =1

2

kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)

Primal problem is convex, so strong duality holds

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 30 / 42

Page 64: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Dual Problem

Primal problem:

argmin

w,b,x1

2

kwk2 +C Âi xi

sibject to y(i)(w>F(x(i))+b)� 1�xi and xi � 0,8i

Dual problem:

argmax

a,b min

w,b,x L(w,b,x ,a,b )subject to a � 0,b � 0

where L(w,b,x ,a,b ) =1

2

kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)

Primal problem is convex, so strong duality holds

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 30 / 42

Page 65: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Solving Dual Problem I

L(w,b,x ,a,b ) =1

2

kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)

The inner problemmin

w,b,xL(w,b,x ,a,b )

is convex in terms of w, b, and x

Let’s solve it analytically:

∂L∂w

= w�Âi aiy(i)F(x(i)) = 0 ) w = Âi aiy(i)F(x(i))∂L∂b = Âi aiy(i) = 0

∂L∂xi

= C�ai �bi = 0 ) bi = C�ai

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 31 / 42

Page 66: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Solving Dual Problem I

L(w,b,x ,a,b ) =1

2

kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)

The inner problemmin

w,b,xL(w,b,x ,a,b )

is convex in terms of w, b, and x

Let’s solve it analytically:∂L∂w

= w�Âi aiy(i)F(x(i)) = 0 ) w = Âi aiy(i)F(x(i))

∂L∂b = Âi aiy(i) = 0

∂L∂xi

= C�ai �bi = 0 ) bi = C�ai

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 31 / 42

Page 67: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Solving Dual Problem I

L(w,b,x ,a,b ) =1

2

kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)

The inner problemmin

w,b,xL(w,b,x ,a,b )

is convex in terms of w, b, and x

Let’s solve it analytically:∂L∂w

= w�Âi aiy(i)F(x(i)) = 0 ) w = Âi aiy(i)F(x(i))∂L∂b = Âi aiy(i) = 0

∂L∂xi

= C�ai �bi = 0 ) bi = C�ai

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 31 / 42

Page 68: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Solving Dual Problem I

L(w,b,x ,a,b ) =1

2

kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)

The inner problemmin

w,b,xL(w,b,x ,a,b )

is convex in terms of w, b, and x

Let’s solve it analytically:∂L∂w

= w�Âi aiy(i)F(x(i)) = 0 ) w = Âi aiy(i)F(x(i))∂L∂b = Âi aiy(i) = 0

∂L∂xi

= C�ai �bi = 0 ) bi = C�ai

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 31 / 42

Page 69: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Solving Dual Problem IIL(w,b,x ,a,b ) =1

2

kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)

Substituting w = Âi aiy(i)F(x(i)) and bi = C�ai in L(w,b,x ,a,b ):

L(w,b,x ,a,b ) = Âi

ai �1

2

Âi,j

aiajy(i)y(j)F(x(i))>F(x(j))�bÂi

aiy(i),

min

w,b,xL(w,b,x ,a,b ) =

8>><

>>:

Âi ai � 1

2

Âi,j aiajy(i)y(j)F(x(i))>F(x(j)) ,if Âi aiy(i) = 0,

�•,otherwise

Outer maximization problem:

argmax

a

1>a � 1

2

a

>Ka

subject to 0 a C1 and y

>a = 0

Ki,j = y(i)y(j)F(x(i))>F(x(j))bi = C�ai � 0 implies ai C

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 32 / 42

Page 70: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Solving Dual Problem IIL(w,b,x ,a,b ) =1

2

kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)

Substituting w = Âi aiy(i)F(x(i)) and bi = C�ai in L(w,b,x ,a,b ):

L(w,b,x ,a,b ) = Âi

ai �1

2

Âi,j

aiajy(i)y(j)F(x(i))>F(x(j))�bÂi

aiy(i),

min

w,b,xL(w,b,x ,a,b ) =

8>><

>>:

Âi ai � 1

2

Âi,j aiajy(i)y(j)F(x(i))>F(x(j)) ,if Âi aiy(i) = 0,

�•,otherwise

Outer maximization problem:

argmax

a

1>a � 1

2

a

>Ka

subject to 0 a C1 and y

>a = 0

Ki,j = y(i)y(j)F(x(i))>F(x(j))bi = C�ai � 0 implies ai C

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 32 / 42

Page 71: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Solving Dual Problem IIL(w,b,x ,a,b ) =1

2

kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)

Substituting w = Âi aiy(i)F(x(i)) and bi = C�ai in L(w,b,x ,a,b ):

L(w,b,x ,a,b ) = Âi

ai �1

2

Âi,j

aiajy(i)y(j)F(x(i))>F(x(j))�bÂi

aiy(i),

min

w,b,xL(w,b,x ,a,b ) =

8>><

>>:

Âi ai � 1

2

Âi,j aiajy(i)y(j)F(x(i))>F(x(j)) ,if Âi aiy(i) = 0,

�•,otherwise

Outer maximization problem:

argmax

a

1>a � 1

2

a

>Ka

subject to 0 a C1 and y

>a = 0

Ki,j = y(i)y(j)F(x(i))>F(x(j))

bi = C�ai � 0 implies ai C

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 32 / 42

Page 72: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Solving Dual Problem IIL(w,b,x ,a,b ) =1

2

kwk2 +C Âi xt+Âi ai(1� y(i)(w>F(x(i))+b)�xi)+Âi bi(�xi)

Substituting w = Âi aiy(i)F(x(i)) and bi = C�ai in L(w,b,x ,a,b ):

L(w,b,x ,a,b ) = Âi

ai �1

2

Âi,j

aiajy(i)y(j)F(x(i))>F(x(j))�bÂi

aiy(i),

min

w,b,xL(w,b,x ,a,b ) =

8>><

>>:

Âi ai � 1

2

Âi,j aiajy(i)y(j)F(x(i))>F(x(j)) ,if Âi aiy(i) = 0,

�•,otherwise

Outer maximization problem:

argmax

a

1>a � 1

2

a

>Ka

subject to 0 a C1 and y

>a = 0

Ki,j = y(i)y(j)F(x(i))>F(x(j))bi = C�ai � 0 implies ai C

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 32 / 42

Page 73: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Solving Dual Problem II

Dual minimization problem of SVC:

argmin

a

1

2

a

>Ka �1>a

subject to 0 a C1 and y

>a = 0

Number of variables to solve?

N instead of augmented featuredimensionIn practice, this problem is solved by specialized solvers such as thesequential minimal optimization (SMO) [3]

As K is usually ill-conditioned

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 33 / 42

Page 74: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Solving Dual Problem II

Dual minimization problem of SVC:

argmin

a

1

2

a

>Ka �1>a

subject to 0 a C1 and y

>a = 0

Number of variables to solve? N instead of augmented featuredimension

In practice, this problem is solved by specialized solvers such as thesequential minimal optimization (SMO) [3]

As K is usually ill-conditioned

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 33 / 42

Page 75: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Solving Dual Problem II

Dual minimization problem of SVC:

argmin

a

1

2

a

>Ka �1>a

subject to 0 a C1 and y

>a = 0

Number of variables to solve? N instead of augmented featuredimensionIn practice, this problem is solved by specialized solvers such as thesequential minimal optimization (SMO) [3]

As K is usually ill-conditioned

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 33 / 42

Page 76: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Making Predictions

Prediction: y = sign(f (x)) = sign(w>x+b)

We have w = Âi aiy(i)F(x(i))

How to obtain b?By the complementary slackness of KKT conditions, we have:

ai(1� y(i)(w>F(x(i))+b)�xi) = 0 and bi(�xi) = 0

For any x

(i) having 0 < ai < C, we have

bi = C�ai > 0 ) xi = 0,

(1� y(i)(w>F(x(i))+b)�xi) = 0 ) b = y(i)�w

>F(x(i))

In practice, we usually take the average over all x

(i)’s having0 < ai < C to avoid numeric error

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 34 / 42

Page 77: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Making Predictions

Prediction: y = sign(f (x)) = sign(w>x+b)

We have w = Âi aiy(i)F(x(i))

How to obtain b?

By the complementary slackness of KKT conditions, we have:

ai(1� y(i)(w>F(x(i))+b)�xi) = 0 and bi(�xi) = 0

For any x

(i) having 0 < ai < C, we have

bi = C�ai > 0 ) xi = 0,

(1� y(i)(w>F(x(i))+b)�xi) = 0 ) b = y(i)�w

>F(x(i))

In practice, we usually take the average over all x

(i)’s having0 < ai < C to avoid numeric error

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 34 / 42

Page 78: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Making Predictions

Prediction: y = sign(f (x)) = sign(w>x+b)

We have w = Âi aiy(i)F(x(i))

How to obtain b?By the complementary slackness of KKT conditions, we have:

ai(1� y(i)(w>F(x(i))+b)�xi) = 0 and bi(�xi) = 0

For any x

(i) having 0 < ai < C, we have

bi = C�ai > 0 ) xi = 0,

(1� y(i)(w>F(x(i))+b)�xi) = 0 ) b = y(i)�w

>F(x(i))

In practice, we usually take the average over all x

(i)’s having0 < ai < C to avoid numeric error

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 34 / 42

Page 79: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Making Predictions

Prediction: y = sign(f (x)) = sign(w>x+b)

We have w = Âi aiy(i)F(x(i))

How to obtain b?By the complementary slackness of KKT conditions, we have:

ai(1� y(i)(w>F(x(i))+b)�xi) = 0 and bi(�xi) = 0

For any x

(i) having 0 < ai < C, we have

bi = C�ai > 0 ) xi = 0,

(1� y(i)(w>F(x(i))+b)�xi) = 0 ) b = y(i)�w

>F(x(i))

In practice, we usually take the average over all x

(i)’s having0 < ai < C to avoid numeric error

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 34 / 42

Page 80: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Making Predictions

Prediction: y = sign(f (x)) = sign(w>x+b)

We have w = Âi aiy(i)F(x(i))

How to obtain b?By the complementary slackness of KKT conditions, we have:

ai(1� y(i)(w>F(x(i))+b)�xi) = 0 and bi(�xi) = 0

For any x

(i) having 0 < ai < C, we have

bi = C�ai > 0 ) xi = 0,

(1� y(i)(w>F(x(i))+b)�xi) = 0 ) b = y(i)�w

>F(x(i))

In practice, we usually take the average over all x

(i)’s having0 < ai < C to avoid numeric error

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 34 / 42

Page 81: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Outline

1Non-Parametric Methods

K-NNParzen WindowsLocal Models

2Support Vector Machines

SVCSlacksNonlinear SVCDual ProblemKernel Trick

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 35 / 42

Page 82: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Kernel as Inner Product

We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w

>x+b = Âi aiy(i)F(x(i))>F(x)+b

Time complexity?If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficientlyPolynomial kernel: k(a,b) = (a>b/a +b )g

E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,

p2a

1

,p

2a2

,a2

1

,a2

2

,p

2a1

a2

]> 2 R6

Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0

k(a,b) = exp(�gkak2 +2ga

>b� gkbk2)=

exp(�gkak2 � gkbk2)(1+ 2ga

>b

1!

+ (2ga

>b)2

2!

+ · · ·)Let a 2 R2, then F(a) =

exp(�gkak2)[1,q

2g

1!

a1

,q

2g

1!

a2

,q

2g

2!

a2

1

,q

2g

2!

a2

2

,2q

g

2!

a1

a2

, · · · ]> 2 R•

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42

Page 83: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Kernel as Inner Product

We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w

>x+b = Âi aiy(i)F(x(i))>F(x)+b

Time complexity?

If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficientlyPolynomial kernel: k(a,b) = (a>b/a +b )g

E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,

p2a

1

,p

2a2

,a2

1

,a2

2

,p

2a1

a2

]> 2 R6

Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0

k(a,b) = exp(�gkak2 +2ga

>b� gkbk2)=

exp(�gkak2 � gkbk2)(1+ 2ga

>b

1!

+ (2ga

>b)2

2!

+ · · ·)Let a 2 R2, then F(a) =

exp(�gkak2)[1,q

2g

1!

a1

,q

2g

1!

a2

,q

2g

2!

a2

1

,q

2g

2!

a2

2

,2q

g

2!

a1

a2

, · · · ]> 2 R•

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42

Page 84: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Kernel as Inner Product

We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w

>x+b = Âi aiy(i)F(x(i))>F(x)+b

Time complexity?If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficiently

Polynomial kernel: k(a,b) = (a>b/a +b )g

E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,

p2a

1

,p

2a2

,a2

1

,a2

2

,p

2a1

a2

]> 2 R6

Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0

k(a,b) = exp(�gkak2 +2ga

>b� gkbk2)=

exp(�gkak2 � gkbk2)(1+ 2ga

>b

1!

+ (2ga

>b)2

2!

+ · · ·)Let a 2 R2, then F(a) =

exp(�gkak2)[1,q

2g

1!

a1

,q

2g

1!

a2

,q

2g

2!

a2

1

,q

2g

2!

a2

2

,2q

g

2!

a1

a2

, · · · ]> 2 R•

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42

Page 85: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Kernel as Inner Product

We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w

>x+b = Âi aiy(i)F(x(i))>F(x)+b

Time complexity?If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficientlyPolynomial kernel: k(a,b) = (a>b/a +b )g

E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,

p2a

1

,p

2a2

,a2

1

,a2

2

,p

2a1

a2

]> 2 R6

Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0

k(a,b) = exp(�gkak2 +2ga

>b� gkbk2)=

exp(�gkak2 � gkbk2)(1+ 2ga

>b

1!

+ (2ga

>b)2

2!

+ · · ·)Let a 2 R2, then F(a) =

exp(�gkak2)[1,q

2g

1!

a1

,q

2g

1!

a2

,q

2g

2!

a2

1

,q

2g

2!

a2

2

,2q

g

2!

a1

a2

, · · · ]> 2 R•

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42

Page 86: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Kernel as Inner Product

We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w

>x+b = Âi aiy(i)F(x(i))>F(x)+b

Time complexity?If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficientlyPolynomial kernel: k(a,b) = (a>b/a +b )g

E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,

p2a

1

,p

2a2

,a2

1

,a2

2

,p

2a1

a2

]> 2 R6

Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0

k(a,b) = exp(�gkak2 +2ga

>b� gkbk2)=

exp(�gkak2 � gkbk2)(1+ 2ga

>b

1!

+ (2ga

>b)2

2!

+ · · ·)Let a 2 R2, then F(a) =

exp(�gkak2)[1,q

2g

1!

a1

,q

2g

1!

a2

,q

2g

2!

a2

1

,q

2g

2!

a2

2

,2q

g

2!

a1

a2

, · · · ]> 2 R•

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42

Page 87: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Kernel as Inner Product

We need to evaluate F(x(i))>F(x(j)) whenSolving dual problem of SVC, where Ki,j = y(i)y(j)F(x(i))>F(x(j))Making a prediction, where f (x) = w

>x+b = Âi aiy(i)F(x(i))>F(x)+b

Time complexity?If we choose F carefully, we can can evaluate F(x(i))>F(x) = k(x(i),x)efficientlyPolynomial kernel: k(a,b) = (a>b/a +b )g

E.g., let a = 1, b = 1, g = 2 and a 2 R2, thenF(a) = [1,

p2a

1

,p

2a2

,a2

1

,a2

2

,p

2a1

a2

]> 2 R6

Gaussian RBF kernel: k(a,b) = exp(�gka�bk2) , g � 0

k(a,b) = exp(�gkak2 +2ga

>b� gkbk2)=

exp(�gkak2 � gkbk2)(1+ 2ga

>b

1!

+ (2ga

>b)2

2!

+ · · ·)Let a 2 R2, then F(a) =

exp(�gkak2)[1,q

2g

1!

a1

,q

2g

1!

a2

,q

2g

2!

a2

1

,q

2g

2!

a2

2

,2q

g

2!

a1

a2

, · · · ]> 2 R•

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 36 / 42

Page 88: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Kernel Trick

If we choose F induced by Polynomial or Gaussian RBF kernel, then

Ki,j = y(i)y(j)k(x(i),x)

takes only O(D) time to evaluate, and

f (x) = Âi

aiy(i)k(x(i),x)+b

takes O(ND) time

Independent with the augmented feature dimensiona , b , and g are new hyperparameters

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 37 / 42

Page 89: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Kernel Trick

If we choose F induced by Polynomial or Gaussian RBF kernel, then

Ki,j = y(i)y(j)k(x(i),x)

takes only O(D) time to evaluate, and

f (x) = Âi

aiy(i)k(x(i),x)+b

takes O(ND) timeIndependent with the augmented feature dimension

a , b , and g are new hyperparameters

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 37 / 42

Page 90: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Kernel Trick

If we choose F induced by Polynomial or Gaussian RBF kernel, then

Ki,j = y(i)y(j)k(x(i),x)

takes only O(D) time to evaluate, and

f (x) = Âi

aiy(i)k(x(i),x)+b

takes O(ND) timeIndependent with the augmented feature dimensiona , b , and g are new hyperparameters

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 37 / 42

Page 91: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Sparse Kernel MachinesSVC is a kernel machine:

f (x) = Âi

aiy(i)k(x(i),x)+b

It is surprising that SVC works like K-NN in some sense

However, SVC is a sparse kernel machineOnly the slacks become the support vectors (ai > 0)

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 38 / 42

Page 92: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Sparse Kernel MachinesSVC is a kernel machine:

f (x) = Âi

aiy(i)k(x(i),x)+b

It is surprising that SVC works like K-NN in some senseHowever, SVC is a sparse kernel machineOnly the slacks become the support vectors (ai > 0)

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 38 / 42

Page 93: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

KKT Conditions and Types of SVs

By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0

Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0

Depending on the value of ai, each example x

(i) can be:Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)

1� y(i)(w>F(x(i))+b)�xi 0

Since bi = C�ai 6= 0, we have xi = 0

Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1

1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = C�ai 6= 0, we have xi = 0

Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = 0, we have xi � 0

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42

Page 94: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

KKT Conditions and Types of SVs

By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0

Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0

Depending on the value of ai, each example x

(i) can be:

Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi 0

Since bi = C�ai 6= 0, we have xi = 0

Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1

1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = C�ai 6= 0, we have xi = 0

Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = 0, we have xi � 0

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42

Page 95: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

KKT Conditions and Types of SVs

By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0

Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0

Depending on the value of ai, each example x

(i) can be:

Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi 0

Since bi = C�ai 6= 0, we have xi = 0

Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1

1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = C�ai 6= 0, we have xi = 0

Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = 0, we have xi � 0

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42

Page 96: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

KKT Conditions and Types of SVs

By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0

Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0

Depending on the value of ai, each example x

(i) can be:Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)

1� y(i)(w>F(x(i))+b)�xi 0

Since bi = C�ai 6= 0, we have xi = 0

Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1

1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = C�ai 6= 0, we have xi = 0

Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)

1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = 0, we have xi � 0

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42

Page 97: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

KKT Conditions and Types of SVs

By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0

Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0

Depending on the value of ai, each example x

(i) can be:Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)

1� y(i)(w>F(x(i))+b)�xi 0

Since bi = C�ai 6= 0, we have xi = 0

Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1

1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = C�ai 6= 0, we have xi = 0

Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)

1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = 0, we have xi � 0

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42

Page 98: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

KKT Conditions and Types of SVs

By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0

Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0

Depending on the value of ai, each example x

(i) can be:Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)

1� y(i)(w>F(x(i))+b)�xi 0

Since bi = C�ai 6= 0, we have xi = 0

Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1

1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = C�ai 6= 0, we have xi = 0

Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)

1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = 0, we have xi � 0

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42

Page 99: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

KKT Conditions and Types of SVs

By KKT conditions, we have:Primal feasibility: y(i)(w>F(x(i))+b)� 1�xi and xi � 0

Complementary slackness: ai(1� y(i)(w>F(x(i))+b)�xi) = 0 andbi(�xi) = 0

Depending on the value of ai, each example x

(i) can be:Non SVs (ai = 0): y(i)(w>F(x(i))+b)� 1 (usually strict)

1� y(i)(w>F(x(i))+b)�xi 0

Since bi = C�ai 6= 0, we have xi = 0

Free SVs (0 < ai < C): y(i)(w>F(x(i))+b) = 1

1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = C�ai 6= 0, we have xi = 0

Bounded SVs (ai = C): y(i)(w>F(x(i))+b) 1 (usually strict)1� y(i)(w>F(x(i))+b)�xi = 0

Since bi = 0, we have xi � 0

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 39 / 42

Page 100: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Remarks I

Pros of SVC:Global optimality (convex problem)

Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set

Cons:Nonlinear SVC not scalable to large tasks

Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time

Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory

Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning

The effect of a (C, g) combination is unknown in advanceUsually done by grid search

Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42

Page 101: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Remarks I

Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)

Works well with small training setCons:

Nonlinear SVC not scalable to large tasksTakes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time

Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory

Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning

The effect of a (C, g) combination is unknown in advanceUsually done by grid search

Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42

Page 102: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Remarks I

Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set

Cons:Nonlinear SVC not scalable to large tasks

Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time

Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory

Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning

The effect of a (C, g) combination is unknown in advanceUsually done by grid search

Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42

Page 103: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Remarks I

Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set

Cons:Nonlinear SVC not scalable to large tasks

Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time

Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory

Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning

The effect of a (C, g) combination is unknown in advanceUsually done by grid search

Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42

Page 104: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Remarks I

Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set

Cons:Nonlinear SVC not scalable to large tasks

Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time

Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory

Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning

The effect of a (C, g) combination is unknown in advanceUsually done by grid search

Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42

Page 105: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Remarks I

Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set

Cons:Nonlinear SVC not scalable to large tasks

Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time

Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory

Sensitive to irrelevant data features (vs. decision trees)

Non-trivial hyperparameter tuningThe effect of a (C, g) combination is unknown in advanceUsually done by grid search

Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42

Page 106: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Remarks I

Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set

Cons:Nonlinear SVC not scalable to large tasks

Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time

Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory

Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning

The effect of a (C, g) combination is unknown in advanceUsually done by grid search

Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42

Page 107: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Remarks I

Pros of SVC:Global optimality (convex problem)Works with different kernels (linear, Polynomial, Gaussian RBF, etc.)Works well with small training set

Cons:Nonlinear SVC not scalable to large tasks

Takes O(N2)⇠ O(N3) time to train using SMO in LIBSVM [1]On the other hand, linear SVC takes O(ND) time

Kernel matrix K requires O(N2) spaceIn practice, we cache only a small portion of K in memory

Sensitive to irrelevant data features (vs. decision trees)Non-trivial hyperparameter tuning

The effect of a (C, g) combination is unknown in advanceUsually done by grid search

Separate only 2 classesUsually wrapped by the 1-vs-1 technique for multi-class classification

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 40 / 42

Page 108: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Remarks II

Does nonlinear SVC always perform better than linear SVC?

No

Choose linear SVC (e.g., LIBLINEAR [2]) whenN is large (since nonlinear SVC does not scale), orD is large (since classes may already be linearly separable)

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 41 / 42

Page 109: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Remarks II

Does nonlinear SVC always perform better than linear SVC? No

Choose linear SVC (e.g., LIBLINEAR [2]) whenN is large (since nonlinear SVC does not scale), orD is large (since classes may already be linearly separable)

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 41 / 42

Page 110: Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

Reference I

[1] Chih-Chung Chang and Chih-Jen Lin.Libsvm: a library for support vector machines.ACM Transactions on Intelligent Systems and Technology (TIST),2(3):27, 2011.

[2] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, andChih-Jen Lin.Liblinear: A library for large linear classification.Journal of machine learning research, 9(Aug):1871–1874, 2008.

[3] John Platt et al.Sequential minimal optimization: A fast algorithm for training supportvector machines.1998.

Shan-Hung Wu (CS, NTHU) Non-Parametric Methods & SVM Machine Learning 42 / 42