an overview of kernel-based learning methods

25
An Overview of Kernel-Based Learning Methods Yan Liu Nov 18, 2003

Upload: abram

Post on 11-Jan-2016

55 views

Category:

Documents


1 download

DESCRIPTION

An Overview of Kernel-Based Learning Methods. Yan Liu Nov 18, 2003. Outline. Introduction Theory Basis: Reproducing Kernel Hilbert space(RKHS), Mercer’s theorem, Representer theorem, regularization Kernel –based learning algorithm - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Overview of Kernel-Based Learning Methods

An Overview of Kernel-Based Learning Methods

Yan Liu

Nov 18, 2003

Page 2: An Overview of Kernel-Based Learning Methods

Outline Introduction Theory Basis:

Reproducing Kernel Hilbert space(RKHS), Mercer’s theorem, Representer theorem, regularization

Kernel –based learning algorithm Supervised learning: support vector machines(SVMs),

kernel fisher discriminant (KFD) Unsupervised learning: one class SVM , kernel PCA

Kernel design Standard kernels Making kernels from kernels Application oriented kernels: Fisher kernel

Page 3: An Overview of Kernel-Based Learning Methods

Introduction Example Idea: map the problem

into higher dimensional space.

Let F be a potentially much higher dimensional feature space. Let f : X -> F, x->f(x)

Learning problem now works with samples (f(x_1), y_1), . . . , (f(x_N)), y_N) in F × Y.

Key : Can this mapped problem be classified in a “simple” way?

Page 4: An Overview of Kernel-Based Learning Methods

Exploring Theory: Roadmap

Page 5: An Overview of Kernel-Based Learning Methods

Reproducing Kernel Hilbert Space -1

Inner product space:

Hilbert space: Hilbert space is a complete inner product space,

obeying the following:

Page 6: An Overview of Kernel-Based Learning Methods

Reproducing Kernel Hilbert Space -2

Reproducing Kernel Hilbert Space (RKHS) Gram matrix

given a kernel k(x, y), define the gram matrix to be Kij = k(xi, xj)

We say the kernel is positive definite when the corresponding gram matrix is positive definite

Definition of RKHS

Page 7: An Overview of Kernel-Based Learning Methods

Reproducing Kernel Hilbert Space -3

Reproducing properties:

Comment RKHS is a “bounded” Hilbert space RKHS is a “smoothed” Hilbert space

Page 8: An Overview of Kernel-Based Learning Methods

Mercer’s Theorem-1 Mercer’s Theorem

For discrete case, assume A is the Gram Matrix. If A is positive definite, then

Page 9: An Overview of Kernel-Based Learning Methods

Mercer’s Theorem-2 Comment

Mercer’s theorem provides a concrete way to construct the basis for a RKHS

Mercer’s condition is the only constraint for a kernel: the corresponding gram matrix must be positive definite to be a kernel

Page 10: An Overview of Kernel-Based Learning Methods

Representer Theorem-1

Page 11: An Overview of Kernel-Based Learning Methods

Representer Theorem-2 Comment

Representer theorem is a powerful result. It shows that although we search for the optimal solution in an infinite-dimension feature space, adding the regularization term reduces the problem to finite-dimensional space (training examples)

In reality, regularization and RKHS are equivalent.

Page 12: An Overview of Kernel-Based Learning Methods

Exploring Theory: Roadmap

Page 13: An Overview of Kernel-Based Learning Methods

Outline Introduction Theory Basis:

Reproducing Kernel Hilbert space(RKHS), Mercer’s theorem, Representer theorem, regularization

Kernel –based learning algorithm Supervised learning: support vector machines(SVMs),

kernel fisher discriminant (KFD) Unsupervised learning: one class SVM , kernel PCA

Kernel design Standard kernels Making kernels from kernels Application oriented kernels: Fisher kernel

Page 14: An Overview of Kernel-Based Learning Methods

Support Vector Machines-1quick overview

Page 15: An Overview of Kernel-Based Learning Methods

Support Vector Machines-1quick overview

Page 16: An Overview of Kernel-Based Learning Methods

Support Vector Machines-3 Parameter Sparsity

Most a_i are zeros

C: regularization constant : slack variables

Page 17: An Overview of Kernel-Based Learning Methods

Support Vector Machines-4Optimization technique

Chunking: Each step sovles the problem containing all non-zero

a_I plus some of the a_I violating KKT conditions

Decomposition methods: SVM_light The size of the subproblem is fixed, add and remove

one sample in each iteration

Sequential minimal optimization (SMO) Each iteration solves a quadratic problem of size two

Page 18: An Overview of Kernel-Based Learning Methods

Kernel Fisher Discriminant-1Overview of LDA Fisher’s discriminant (or LDA): find the linear projection

with the most discriminative direction Maximizing the Rayleigh coefficient

where S_w is the within class variance and S_B is between class variance.

Comparison with PCA

Page 19: An Overview of Kernel-Based Learning Methods

Kernel Fisher Discriminant-2 KFD: solves the problem of Fisher’s linear discriminant

to get a nonlinear discriminant in input space. One can express w in terms of mapped training

patterns:

The optimization problem for the KFD can be written as:

Page 20: An Overview of Kernel-Based Learning Methods

Kernel PCA -1 The basic idea of PCA: find a set of orthogonal directions

that capture most of the variance in the data.

However, sometimes the clusters are more than N (N is the number of dimensions)

Kernel PCA tries to map the data into a higher dimensional space and perform standard PCA. Using the kernel trick, we can do all our calculations in a lower dimension.

Page 21: An Overview of Kernel-Based Learning Methods

Kernel PCA -2 Covariance matrix

By definition

Then we have

Define the gram matrix

At last we have:

Therefore we simply have to solve an eigenvalue problem on the Gram matrix.

Page 22: An Overview of Kernel-Based Learning Methods

Outline Introduction Theory Basis:

Reproducing Kernel Hilbert space(RKHS), Mercer’s theorem, Representer theorem, regularization

Kernel –based learning algorithm Supervised learning: support vector machines(SVMs),

kernel fisher discriminant (KFD) Unsupervised learning: one class SVM , kernel PCA

Kernel design Standard kernels Making kernels from kernels Application oriented kernels: Fisher kernel

Page 23: An Overview of Kernel-Based Learning Methods

Standard Kernels

Page 24: An Overview of Kernel-Based Learning Methods

Making kernels out of Kernels Theorem:

K(x, z) = K1(x,z) + K2(x,z) K(x, z) = aK1(x,z) K(x, z) = K1(x,z) * K2(x, z) K(x, z) = f(x) f(z) K(x, z) = K3(Φ (x), Φ (y))

Kernel selection

Page 25: An Overview of Kernel-Based Learning Methods

Fisher-kernel Jaakolla and Haussler proposed using a generative

model as a kernel in a discriminative (non-probabilistic) kernel classifier.

Build a HMM model for each family Compute the fisher scores for each parameter in the

HMM Use scores as features and predict by SVM with RBF

kernel Good performance for protein family classification