cs 59000 statistical machine learning lecture 13

CS 59000 Statistical Machine learningLecture 13

Yuan (Alan) QiPurdue CS

Oct. 8 2008

Outline

• Review of kernel trick, kernel ridge regression and kernel Principle Component Analysis

• Gaussian processes (GPs)• From linear regression to GP • GP for regression

Kernel Trick1. Reformulate an algorithm such that input

vector enters only in the form of inner product .

2. Replace input x by its feature mapping: 3. Replace the inner product by a Kernel

function:Examples: Kernel PCA, Kernel Fisher

discriminant, Support Vector Machines

Dual variables:

Dual Representation for Ridge Regression

Kernel Ridge Regression

Using kernel trick:

Minimize over dual variables:

Generate Kernel Matrix

Positive semidefiniteConsider Gaussian kernel:

Principle Component Analysis (PCA)

Assume

We have

is a normalized eigenvector:

Feature Mapping

Eigen-problem in feature space

Dual Variables

Suppose , we have

Eigen-problem in Feature Space (1)

Eigen-problem in Feature Space (2)

Normalization condition:

Projection coefficient:

General Case for Non-zero Mean Case

Kernel Matrix:

Gaussian Processes

How kernels arise naturally in a Bayesian setting?

Instead of assigning a prior on parameters w, we assign a prior on function value y.Infinite space in theory

Finite space in practice (finite number of training set and test set)

Linear Regression Revisited

We have

From Prior on Parameter to Prior on Function

The prior on function value:

Stochastic Process

A stochastic process is specified by giving the joint distribution for any finite set of values in a consistent manner (Loosely speaking, it means that a marginalized joint distribution is the same as the joint distribution that is defined in the subspace.)

Gaussian Processes

The joint distribution of any variables is a multivariable Gaussian distribution.

Without any prior knowledge, we often set mean to be 0. Then the GP is specified by the covariance :

Impact of Kernel FunctionCovariance matrix : kernel function

Application economics & finance

Gaussian Process for Regression

Likelihood:

Prior:

Marginal distribution:

Samples of GP Prior over Functions

Samples of Data Points

Predictive Distribution

is a Gaussian distribution with mean and variance:

Predictive Mean

We see the same form as kernel ridge regression and kernel PCA.

GP Regression

Discussion: the difference between GP regression and Bayesian regression with Gaussian basis functions?

Marginal Distribution of Target Values

cs 59000 statistical machine learning lecture 13

Documents

jedi 29 octobre mama shelter 97 place saint hubert 59000

cs 562: statistical natural language processing

1 the r project for statistical computing eric fouh,...

3-year academic assessment plan cover sheet · cs 715 -...

cs 188: artificial intelligence learning iii: statistical...

first-principles statistical...

-56000 -56500 -57000 -57500 -58000 -58500 -59000 ... - gsi

multiﬁeld visualization using local statistical eld...

cs 59000 statistical machine learning lecture 6

dec 2015 mid-e e e - amazon web...

cs statistical programming (course material) · cs 712...

generic statistical business process model...

cs 59000 statistical machine learning lecture 18

speech recognition: statistical methods - cs course webpages

cs 59000 statistical machine learning lecture 13 yuan (alan)...

· 3rd year 4th year 2 year' year year year year annual...

cs 294-5: statistical natural language...

59000 triple pump drive - john deere · funk™ series...

cs 595-052 machine learning and statistical natural...

cs 59000 statistical machine learning lecture 18 yuan (alan)...