gaussian processes - harvard universityhea- · 2018. 1. 10. · bombcat.giffit. what happened? 5.0...
TRANSCRIPT
Gaussian Processesa practical introduction to
for astronomy
Dan Foreman-Mackey Flatiron Institute // dfm.io // github.com/dfm // @exoplaneteer
Resources
a gaussianprocess.org/gpml
b george.readthedocs.io
c dfm.io/gp.js
d github.com/dfm/gp
1
Gaussian Processes
Screenshot from ui.adsabs.harvard.edu
the number of published astronomy papers containing the words "gaussian process"
Photo by Adi Goldstein on Unsplash
Don't you think you should be using Gaussian Processes?
Photo by Adi Goldstein on Unsplash
Don't you think you should be using Gaussian Processes?
After today, you will be.
Cool.
Today
1 Why?
2 What?
3 How?
2
The importance of correlated noise a motivating example
�5.0 �2.5 0.0 2.5 5.0x
�1.0
�0.5
0.0
0.5
y
�5.0 �2.5 0.0 2.5 5.0x
�1.0
�0.5
0.0
0.5
y
truth
�5.0 �2.5 0.0 2.5 5.0x
�1.0
�0.5
0.0
0.5
y
assuming independent noise
fit
�5.0 �2.5 0.0 2.5 5.0x
�1.0
�0.5
0.0
0.5
y
assuming independent noise
fitBOMBCAT.GIF
What happened?
�5.0 �2.5 0.0 2.5 5.0x
�1.0
�0.5
0.0
0.5
y
�5.0 �2.5 0.0 2.5 5.0x
�1.0
�0.5
0.0
0.5
y
data
poin
t n
datapoint m
The true covariance matrix
log p({yn} | ✓) = �1
2
NX
n=1
[yn �mn]
2
�n2
+ log(2⇡ �n2)
�
log p({yn} | ✓) = �1
2
rT C�1 r � 1
2
log detC � N
2
log(2⇡)
r =
0
B@y1 �m1
...yN �mN
1
CA C =
0
B@�1
2 0. . .
0 �N2
1
CAand
if...
log p({yn} | ✓) = �1
2
NX
n=1
[yn �mn]
2
�n2
+ log(2⇡ �n2)
�
log p({yn} | ✓) = �1
2
rT C�1 r � 1
2
log detC � N
2
log(2⇡)
r =
0
B@y1 �m1
...yN �mN
1
CA C =
0
B@�1
2 0. . .
0 �N2
1
CAand
if...
log p({yn} | ✓) = �1
2
NX
n=1
[yn �mn]
2
�n2
+ log(2⇡ �n2)
�
log p({yn} | ✓) = �1
2
rT C�1 r � 1
2
log detC � N
2
log(2⇡)
r =
0
B@y1 �m1
...yN �mN
1
CA C =
0
B@�1
2 0. . .
0 �N2
1
CAand
if...
* Note: this part has everything to do with Gaussian processes
*
�5.0 �2.5 0.0 2.5 5.0x
�1.0
�0.5
0.0
0.5
y
�5.0 �2.5 0.0 2.5 5.0x
�1.0
�0.5
0.0
0.5
y
Gaussian process
👍Excellent.
But: we don't know .
* we need to fit for it.
*
3
The math of Gaussian processes
Rasmussen & Williams gaussianprocess.org/gpml
log p({yn} | ✓, ↵) = �1
2
r✓T K↵
�1 r✓ �1
2
log detK↵ � N
2
log(2⇡)
parameters of mean model
parameters of covariance model
residual vector
covariance matrix
log p({yn} | ✓, ↵) = �1
2
r✓T K↵
�1 r✓ �1
2
log detK↵ � N
2
log(2⇡)
parameters of mean model
parameters of covariance model
residual vector
covariance matrix
This is the equation for an N-dimensional Gaussian** hint: this is where the name comes from…
covariance matrix
mean model
generative model
This is the equation for an N-dimensional Gaussian** hint: this is where the name comes from…
y ⇠ N (m✓, K↵)
[K↵]nm = �n2 �nm + k↵(xn, xm)
pixel n
pixel m
[K↵]nm = �n2 �nm + k↵(xn, xm)
[K↵]nm = �n2 �nm + k↵(xn, xm)
[K↵]nm = �n2 �nm + k↵(xn, xm)
[K↵]nm = �n2 �nm + k↵(xn, xm)
the "kernel" function
k↵(xn, xm) = a
2exp
✓� (xn � xm)
2
2 ⌧
2
◆for example*:
* Note: this part has nothing to do with the name Gaussian process
�2
0
2
samples
0 2 4 6 8 10x , �x
0.0
0.5
1.0
kernel
�2
0
2
samples
0 2 4 6 8 10x , �x
0.0
0.5
1.0
kernel
scale
�2
0
2
samples
0 2 4 6 8 10x , �x
0.0
0.5
1.0
kernel
scale
amplitude
�2
0
2
samples
0 2 4 6 8 10x , �x
0.0
0.5
1.0
kernel
�2
0
2
samples
0 2 4 6 8 10x , �x
0.0
0.5
1.0
kernel
�2
0
2
samples
0 2 4 6 8 10x , �x
0.0
0.5
1.0
kernel
�2
0
2
samples
0 2 4 6 8 10x , �x
0.0
0.5
1.0
kernel
�2
0
2
samples
0 2 4 6 8 10x , �x
0
2
4
kernel
log p({yn} | ✓, ↵) = �1
2
r✓T K↵
�1 r✓ �1
2
log detK↵ � N
2
log(2⇡)
a drop-in replacement for χ2
import numpy as np
def gp_log_like(params, x, y, yerr): K = params[0]**2 * np.exp(-0.5*(x[:, None]-x[None, :])**2/params[1]**2) K[np.diag_indices_from(K)] += yerr**2 ll = np.dot(y, np.linalg.solve(K, y)) ll += np.linalg.slogdet(K)[1] return -0.5*ll
A fully functional GP implementation in Python
+ scipy.optimize or emcee
Using GPs in Python
1 george
2 scikit-learn
3 GPy
4 PyMC3
5 etc.
Using GPs in Python
1 george
2 scikit-learn
3 GPy
4 PyMC3
5 etc.
4
Why? The problems with Gaussian processes
log p({yn} | ✓, ↵) = �1
2
r✓T K↵
�1 r✓ �1
2
log detK↵ � N
2
log(2⇡)
log p({yn} | ✓, ↵) = �1
2
r✓T K↵
�1 r✓ �1
2
log detK↵ � N
2
log(2⇡)
a Choosing the kernel
log p({yn} | ✓, ↵) = �1
2
r✓T K↵
�1 r✓ �1
2
log detK↵ � N
2
log(2⇡)
a Choosing the kernel
b Scaling to large datasets
b Scaling to large datasets
102 103 104
number of datapoints
10�4
10�3
10�2
10�1
100
compu
tation
alco
st[s]
experiment
O(N3)
a Approximation
b Structure
Approximation methods
i Subsample
ii Sparsity
iii Low-rank approximations (see HODLR solver in george)
iv etc.
Structured models
i Kronecker products
ii Evenly sampled data
iii Semi-separable kernels (see celerite)
iv etc.
celerite.readthedocs.io
102 103 104 105
number of data points [N]
10�5
10�4
10�3
10�2
10�1
100co
mpu
tationa
lco
st[sec
ond
s]
1
2
4
8
16
32
64
128
256
direct
O(N)
100 101 102
number of terms [J]
64
256
1024
4096
16384
65536
262144
O(J2)
5
Recap
log p({yn} | ✓, ↵) = �1
2
r✓T K↵
�1 r✓ �1
2
log detK↵ � N
2
log(2⇡)
if there is correlated noise (instrumental or astrophysical) in your data*, try a Gaussian process:
* Hint: there is!
to do this in Python, try:
import georgegeorge.readthedocs.io
Resources
a gaussianprocess.org/gpml
b george.readthedocs.io
c dfm.io/gp.js
d github.com/dfm/gp
Gaussian Processesa practical introduction to
for astronomy
Dan Foreman-Mackey Flatiron Institute // dfm.io // github.com/dfm // @exoplaneteer