cs 59000 statistical machine learning lecture 13
Post on 11-Jan-2016
38 Views
Preview:
DESCRIPTION
TRANSCRIPT
CS 59000 Statistical Machine learningLecture 13
Yuan (Alan) QiPurdue CS
Oct. 8 2008
Outline
• Review of kernel trick, kernel ridge regression and kernel Principle Component Analysis
• Gaussian processes (GPs)• From linear regression to GP • GP for regression
Kernel Trick1. Reformulate an algorithm such that input
vector enters only in the form of inner product .
2. Replace input x by its feature mapping: 3. Replace the inner product by a Kernel
function:Examples: Kernel PCA, Kernel Fisher
discriminant, Support Vector Machines
Dual variables:
Dual Representation for Ridge Regression
Kernel Ridge Regression
Using kernel trick:
Minimize over dual variables:
Generate Kernel Matrix
Positive semidefiniteConsider Gaussian kernel:
Principle Component Analysis (PCA)
Assume
We have
is a normalized eigenvector:
Feature Mapping
Eigen-problem in feature space
Dual Variables
Suppose , we have
Eigen-problem in Feature Space (1)
Eigen-problem in Feature Space (2)
Normalization condition:
Projection coefficient:
General Case for Non-zero Mean Case
Kernel Matrix:
Gaussian Processes
How kernels arise naturally in a Bayesian setting?
Instead of assigning a prior on parameters w, we assign a prior on function value y.Infinite space in theory
Finite space in practice (finite number of training set and test set)
Linear Regression Revisited
Let
We have
From Prior on Parameter to Prior on Function
The prior on function value:
Stochastic Process
A stochastic process is specified by giving the joint distribution for any finite set of values in a consistent manner (Loosely speaking, it means that a marginalized joint distribution is the same as the joint distribution that is defined in the subspace.)
Gaussian Processes
The joint distribution of any variables is a multivariable Gaussian distribution.
Without any prior knowledge, we often set mean to be 0. Then the GP is specified by the covariance :
Impact of Kernel FunctionCovariance matrix : kernel function
Application economics & finance
Gaussian Process for Regression
Likelihood:
Prior:
Marginal distribution:
Samples of GP Prior over Functions
Samples of Data Points
Predictive Distribution
is a Gaussian distribution with mean and variance:
Predictive Mean
We see the same form as kernel ridge regression and kernel PCA.
GP Regression
Discussion: the difference between GP regression and Bayesian regression with Gaussian basis functions?
Marginal Distribution of Target Values
top related