a brief introduction to gaussian process

14
A Brief Introduction to Gaussian Process Eric Xihui Lin December 19, 2014 Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 1 / 14

Upload: eric-xihui-lin

Post on 24-Jan-2018

797 views

Category:

Education


5 download

TRANSCRIPT

Page 1: A brief introduction to Gaussian process

A Brief Introduction to Gaussian Process

Eric Xihui Lin

December 19, 2014

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 1 / 14

Page 2: A brief introduction to Gaussian process

Stochastic Process

For finite number of t: t1, ..., tk ,

(Xt1 , ...,Xtk )

follows a multivariate distribution.A stochastic process a generalization to infinity dimension.In one word, a stochastic process is a random function: Xt .Can be used as a prior distribution of a function (explain later)

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 2 / 14

Page 3: A brief introduction to Gaussian process

Gaussian Process (GP)

Gaussian Process: if for any t1, ..., tk , (Xt1 , ...,Xtk ) is Gaussiandistributed.A GP can be completely defined by a mean function µ(t) and avariance/kernel function K (s, t) := Var(Xs ,Xt), i.e.,

Xt ∼ GP(µ(·),K (·, ·))

Usually µ(t) ≡ 0;

K (s, t) = exp(−θ2 ||Xs − Xt ||2

)or exp (−θ||xs − Xt ||) ,

In pratice, t are finite and it is equivalent to multivariate normal.

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 3 / 14

Page 4: A brief introduction to Gaussian process

Covariance function

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 4 / 14

Page 5: A brief introduction to Gaussian process

GP as Linear Regression

Mapping φ : Rn → Rm, where usually m > nBayesian linear regression on Rm

y | β = βTφ(x)

β ∼ N (0, α−1I)

E (y) = 0 and cov(y) = 1αΦΦT =: K , which can be specified by the

kernel funciton.

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 5 / 14

Page 6: A brief introduction to Gaussian process

Prediction

Given observations (x1, y1), . . . , (xN , yn), and a new x0,Since yx ∼ GP,

(y1, . . . , yN , y0) ∼ N(0,(

CN kkT c

)).

y0 | y1, . . . , yN ∼ N(kT C−1

N (y1, . . . , yN)T , c − kT C−1N k

).

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 6 / 14

Page 7: A brief introduction to Gaussian process

GP: Example

1

1picture comes from scikit-learnEric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 7 / 14

Page 8: A brief introduction to Gaussian process

Gaussian Process Regression

Assume Gaussian noise y = f + εn, i.e.,

y | f ∼ N(f , σ2).

Assign a Gaussian prior to f , i.e.,

f ∼ GP(0, k(·, ·; θ))

Classification: can be done through the link function.Usually θ is specified, but it can be estimated by maximum likelyhood.

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 8 / 14

Page 9: A brief introduction to Gaussian process

GP Regression: Example

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 9 / 14

Page 10: A brief introduction to Gaussian process

GP Regression in R

library(kernlab);

gp.f <- gausspr(y ~ x, data = DATA,type = 'regression', # Default: depends on yscaled = TRUE, # default to truekernel = 'rbfdot',kpar = list(sigma = 0.1),var = 0.001, # defaultvariance.model = FALSE)

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 10 / 14

Page 11: A brief introduction to Gaussian process

Application in Mining

In Geostatistics, it is called KrigingX is 2/3-D geographic informationGiven some observations, find the distributions of reserve of oil, Gold orothers.

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 11 / 14

Page 12: A brief introduction to Gaussian process

Application in Optimization

Areas: Geographics, Experimental and Clinical Design,Hyper-parameter tuningProblem: given an implicite function f (x), which are expensive toevaluate, find x that maximize f (x).Need to avoid frequently evaluate the functionStep:

1 Fit a GP to initial points2 Decide the next point to explore: maximize h(µ̂(x), σ̂(x))3 Evaluate at the new point and update the GP4 Stop or go to step 1 based on some criterion

metric h is chosen to balance exploitation (high mean) andexpoloration (high variance, possibly even better solution)

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 12 / 14

Page 13: A brief introduction to Gaussian process

Optimization: illustration

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 13 / 14

Page 14: A brief introduction to Gaussian process

Reference

1 Dr. Ruslan Salakhutdinov’s course note:http://www.cs.toronto.edu/~rsalakhu/sta4273_2013/

2 Brochu, E., Cora, M., and de Freitas, N. A tutorial on Bayesianoptimization of expensive cost functions, with application to activeuser modeling and hierarchical re-inforcement learning. In TR-2009-23,UBC, 2009.

3 Wikipedia: http://en.wikipedia.org/wiki/Kriging4 Roustant, O., Ginsbourger, D., Deville, Y., (2012) DiceKriging,

DiceOptim: Two R Packages for the Analysis of ComputerExperiments by Kriging-Based Metamodeling and Optimization, Jouralof Statistical Software, Vol. 51, Issue 1

5 Karatzoglou, A., Smola, A., Kernal - An S4 Package for KernelMethods in R

Eric Xihui Lin A Brief Introduction to Gaussian Process December 19, 2014 14 / 14