gaussian processes: applications in machine learning
TRANSCRIPT
![Page 1: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/1.jpg)
Gaussian Processes: Applications in MachineLearning
Abhishek Agarwal(05329022)
Under the Guidance of Prof. Sunita Sarawagi
KReSIT, IIT Bombay
Seminar PresentationMarch 29, 2006
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 2: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/2.jpg)
Outline
Introduction to Gaussian Processes(GP)
Prior & Posterior Distributions
GP Models: Regression
GP Models: Binary Classification
Covariance Functions
Conclusion.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 3: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/3.jpg)
Introduction
Supervised Learning
Gaussian Processes
Defines distribution over functions.Collection of random variables, any finite number of whichhave joint Gaussian distributions.[1] [2]
f ∼ GP(m, k)
Hyperparameters and Covariance function.Predictions
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 4: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/4.jpg)
Prior Distribution
Represents our belief about the function distribution, whichwe pass through parameters
Example: GP(m, k)
m(x) =1
4x2, k(x , x ′) = exp(−1
2(x − x ′)2).
To draw sample from the distribution:
Pick some data points.Find distribution parameters at each point.
µi = m(xi ) & Σij = k(xi , xj) i , j = 1, . . . , n
Pick the function values from each individual distribution.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 5: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/5.jpg)
Prior Distribution(contd.)
−5 −4 −3 −2 −1 0 1 2 3 4 51
2
3
4
5
6
7
8
9
data points
func
tion
valu
es
Figure: Prior distribution over function using Gaussian Process
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 6: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/6.jpg)
Posterior Distribution
Distribution changes in presence of Training data D(x , y).
Functions which satisy D are given higher probability.
−5 −4 −3 −2 −1 0 1 2 3 4 5−1
0
1
2
3
4
5
6
7
8
data points
func
tion
valu
es
Figure: Posterior distribution over functions using Gaussian Processes
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 7: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/7.jpg)
Posterior Distribution (contd.)
Prediction for unlabeled data x∗GP outputs the function distribution at x∗Let f be the distribution at data points in D and f∗ at x∗f and f∗ will have a joint Gaussian distribution, represented as:
[
f
f∗
]
∼
( [
µµ∗
] [
Σ Σ∗
Σ∗
T Σ∗∗
] )
Conditional distribution of f∗ given f can be expressed as:
f∗|f ∼ N ( µ∗ + Σ∗
TΣ−1(f − µ), Σ∗∗ − Σ∗
TΣ−1Σ∗) (1)
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 8: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/8.jpg)
Posterior Distribution (contd.)
Parameters of the posterior in Eq. 1 are:
f∗|D ∼ GP(mD, kD) ,
where mD(x) = m(x) + Σ(X , x)TΣ−1(f − m)
kD(x , x ′) = k(x , x ′) − Σ(X , x)TΣ−1Σ(X , x ′)
−5 −4 −3 −2 −1 0 1 2 3 4 50
1
2
3
4
5
6
7
8
data points
func
tion
valu
es
Figure: Prediction from GPAbhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 9: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/9.jpg)
GP Models: Regression
GP can be directly applied to Bayesian Linear Regressionmodel like:
f (x) = φ(x)Tw with prior w ∼ N (0,Σ)Parameters for this distribution will be:
E[f (x)] = φ(x)TE[w ] = 0,
E[f (x)f (x ′)] = φ(x)T E[wwT ]φ(x ′) = φ(x)T Σpφ(x ′)
So, f (x) and f (x ′) are jointly Gaussian with zero mean andcovariance φ(x)TΣpφ(x ′).
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 10: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/10.jpg)
GP Models: Regression (contd.)
In Regression, posterior distribution over the weights, is givenas (9):
posterior =likelhood ∗ prior
marginal likelihood
Both prior p(f|X ) and likelihood p(y |f, X ) are Gaussian:
prior: f|X ∼ N (0,K ) (5)likelihood: y|f ∼ N (f, σn
2I)
Marginal Likelihood p(y |X ) is defined as (6):
p(y |X ) =
∫
p(y |f, X )p(f|X )df (2)
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 11: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/11.jpg)
GP Models: Classification
Modeling Binary Classifier
Squash the output of a regression model using a responsefunction, like sigmoid.Ex: Linear logistic regression model:
p(C1|x) = λ(xTw), λ(z) =1
1 + exp(−z)
Likelihood is expressed as (7):
p(yi |xi ,w) = σ(yi fi ),
fi ∼ f (xi ) = x iTw
and therefore its non-Gaussain.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 12: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/12.jpg)
GP Models: Classification (contd.)
Distribution over latent function, after seeing the test data, isgiven as:
p(f∗|X , y , x∗) =
∫
p(f∗|X , x∗, f)p(f|X , y)df, (3)
where p(f|X , y) = p(y |f)p(f|X )/p(y |X ) is the posterior overthe latent variable.
Computation of the above integral is analytically intractable
Both, likelihood and posterior are non-Gaussian.Need to use some analytic Approximation of integrals.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 13: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/13.jpg)
GP Models: Laplace Approximations
Gaussian Approximation of p(f|X , y):
Using second order Taylor expansion, we obtain:
q(f|X , y) = N (f |̂f,A−1)
where where f̂ = argmaxf p(f|X , y) andA = −55 log p(f|X , y)|f=f̂
To find f̂, we use Newton’s method, because of non-linearity of5 log p(f|X , y) (9)
Prediction is given as:
π∗ = p(y∗ = +1|X , y , x∗) =
∫
σ(f∗)p(f∗|X , y , x∗)df∗, (4)
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 14: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/14.jpg)
Covariance Function
Encodes our belief about the prior distribution over function
Some properties:
StaionaryIsotropicDot-Product Covariance
Ex: Squared Exponential(SE) covarince function:
cov(f (xp), f (xq)) = exp(−1
2|xp − xq|
2)
Learned with other hyper-parameters.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 15: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/15.jpg)
Summary and Future Work
Current Research:
Fast sparse approximation algorithm for matrix inversion.Approximation algorithm for non-Gaussian likelihoods.
GP approach has outperformed traditional methods in manyapplications.
Gaussin Process based Positioning System (GPPS) [6]Multi user Detection (MUD) in CDMA [7]
GP models are more powerful and flexible than simplelinear parametric models and less complex in comparisonto other models like multi-layer perceptrons. [1]
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 16: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/16.jpg)
Rasmussen and Williams. Gaussian Process for MachineLearning, The MIT Press, 2006.
Matthias Seeger. Gaussian Process for Machine Learning,2004. International Journal of Neural Systems, 14(2):69-106,2004.
Christopher Williams, Bayesian Classification with GaussianProcesses, In IEEE Trans. Pattern analysis and MachineIntelligence, 1998
Rasmussen and Williams, Gaussian Process for Regression. InProceedings of NIPS’ 1996.
Rasmussen, Evaluation of Gaussian Processes and OtherMethods for Non-linear Regression. PhD thesis, Dept. ofComputer Science, University of Toronto, 1996. Available fromhttp://www.cs.utoronto.ca/ carl/
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 17: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/17.jpg)
Anton Schwaighofer, et. al. GPPS: A Gaussian ProcessPositioning System for Cellular Networks, In proceedings ofNIPS’ 2003.
Murillo-Fuentes, et. al. Gaussian Processes for MultiuserDetection in CDMA receivers, Advances in Neural InformationProcessing System’ 2005
David Mackay, Introduction to Gaussian Processes
C. Williams. Gaussian processes. In M. A. Arbib, editor,Handbook of Brain Theory and Neural Networks, pages466-470. The MIT Press, second edition, 2002.
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 18: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/18.jpg)
Thank You !!
Questions ??
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 19: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/19.jpg)
Extra
Prior:
log p(f|X ) = −1
2fTK−1f −
1
2log |K | −
n
2log 2π (5)
Mariginal likelihood
log p(y|X ) = −1
2yT (K+σn
2I)−1y−1
2log |K+σn
2I|−n
2log 2π
(6)
Likelihoodp(y = +1|x , w) = σ(xTw), (7)
For symmetric like hood σ(−z) = 1 − σ(z).
p(yi |xi , w) = σ(x iTw), (8)
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning
![Page 20: Gaussian Processes: Applications in Machine Learning](https://reader034.vdocuments.mx/reader034/viewer/2022042512/55839d81d8b42a08148b4e21/html5/thumbnails/20.jpg)
Extra (contd.)
first derivative of posterior
f̂ = K (5 log p(f|X , y))
Prediction
p(w |y , X ) =p(y|X,w) ∗ p(w)
p(y |X )
Abhishek Agarwal (05329022) Gaussian Processes: Applications in Machine Learning