gaussian processes - harvard universityhea- · 2018. 1. 10. · bombcat.gifﬁt. what happened? 5.0...

Gaussian Processesa practical introduction to

for astronomy

Dan Foreman-Mackey Flatiron Institute // dfm.io // github.com/dfm // @exoplaneteer

Resources

a gaussianprocess.org/gpml

b george.readthedocs.io

c dfm.io/gp.js

d github.com/dfm/gp

e [email protected]

1

Gaussian Processes

Screenshot from ui.adsabs.harvard.edu

the number of published astronomy papers containing the words "gaussian process"

Photo by Adi Goldstein on Unsplash

Don't you think you should be using Gaussian Processes?

Photo by Adi Goldstein on Unsplash

Don't you think you should be using Gaussian Processes?

After today, you will be.

Today

1 Why?

2 What?

3 How?

2

The importance of correlated noise a motivating example

�5.0 �2.5 0.0 2.5 5.0x

�1.0

�0.5

0.0

0.5

y

�5.0 �2.5 0.0 2.5 5.0x

�1.0

�0.5

0.0

0.5

y

truth

�5.0 �2.5 0.0 2.5 5.0x

�1.0

�0.5

0.0

0.5

y

assuming independent noise

fit

�5.0 �2.5 0.0 2.5 5.0x

�1.0

�0.5

0.0

0.5

y

assuming independent noise

fitBOMBCAT.GIF

What happened?

�5.0 �2.5 0.0 2.5 5.0x

�1.0

�0.5

0.0

0.5

y

data

poin

t n

datapoint m

The true covariance matrix

log p({yn} | ✓) = �1

2

NX

n=1

[yn �mn]

2

�n2

+ log(2⇡ �n2)

�

log p({yn} | ✓) = �1

2

rT C�1 r � 1

2

log detC � N

2

log(2⇡)

r =

0

B@y1 �m1

...yN �mN

1

CA C =

0

B@�1

2 0. . .

0 �N2

1

CAand

if...

log p({yn} | ✓) = �1

2

NX

n=1

[yn �mn]

2

�n2

+ log(2⇡ �n2)

�

log p({yn} | ✓) = �1

2

rT C�1 r � 1

2

log detC � N

2

log(2⇡)

r =

0

B@y1 �m1

...yN �mN

1

CA C =

0

B@�1

2 0. . .

0 �N2

1

CAand

if...

* Note: this part has everything to do with Gaussian processes

*

�5.0 �2.5 0.0 2.5 5.0x

�1.0

�0.5

0.0

0.5

y

�5.0 �2.5 0.0 2.5 5.0x

�1.0

�0.5

0.0

0.5

y

Gaussian process

👍Excellent.

But: we don't know .

* we need to fit for it.

*

3

The math of Gaussian processes

Rasmussen & Williams gaussianprocess.org/gpml

log p({yn} | ✓, ↵) = �1

2

r✓T K↵

�1 r✓ �1

2

log detK↵ � N

2

log(2⇡)

parameters of mean model

parameters of covariance model

residual vector

covariance matrix

log p({yn} | ✓, ↵) = �1

2

r✓T K↵

�1 r✓ �1

2

log detK↵ � N

2

log(2⇡)

parameters of mean model

parameters of covariance model

residual vector

covariance matrix

This is the equation for an N-dimensional Gaussian** hint: this is where the name comes from…

covariance matrix

mean model

generative model

This is the equation for an N-dimensional Gaussian** hint: this is where the name comes from…

y ⇠ N (m✓, K↵)

[K↵]nm = �n2 �nm + k↵(xn, xm)

pixel n

pixel m

[K↵]nm = �n2 �nm + k↵(xn, xm)

[K↵]nm = �n2 �nm + k↵(xn, xm)

the "kernel" function

k↵(xn, xm) = a

2exp

✓� (xn � xm)

2

2 ⌧

2

◆for example*:

* Note: this part has nothing to do with the name Gaussian process

�2

0

2

samples

0 2 4 6 8 10x , �x

0.0

0.5

1.0

kernel

�2

0

2

samples

0 2 4 6 8 10x , �x

0.0

0.5

1.0

kernel

scale

�2

0

2

samples

0 2 4 6 8 10x , �x

0.0

0.5

1.0

kernel

scale

amplitude

�2

0

2

samples

0 2 4 6 8 10x , �x

0.0

0.5

1.0

kernel

�2

0

2

samples

0 2 4 6 8 10x , �x

0

2

4

kernel

log p({yn} | ✓, ↵) = �1

2

r✓T K↵

�1 r✓ �1

2

log detK↵ � N

2

log(2⇡)

a drop-in replacement for χ2

import numpy as np

def gp_log_like(params, x, y, yerr): K = params[0]**2 * np.exp(-0.5*(x[:, None]-x[None, :])**2/params[1]**2) K[np.diag_indices_from(K)] += yerr**2 ll = np.dot(y, np.linalg.solve(K, y)) ll += np.linalg.slogdet(K)[1] return -0.5*ll

A fully functional GP implementation in Python

+ scipy.optimize or emcee

Using GPs in Python

1 george

2 scikit-learn

3 GPy

4 PyMC3

5 etc.

4

Why? The problems with Gaussian processes

log p({yn} | ✓, ↵) = �1

2

r✓T K↵

�1 r✓ �1

2

log detK↵ � N

2

log(2⇡)

log p({yn} | ✓, ↵) = �1

2

r✓T K↵

�1 r✓ �1

2

log detK↵ � N

2

log(2⇡)

a Choosing the kernel

log p({yn} | ✓, ↵) = �1

2

r✓T K↵

�1 r✓ �1

2

log detK↵ � N

2

log(2⇡)

a Choosing the kernel

b Scaling to large datasets

b Scaling to large datasets

102 103 104

number of datapoints

10�4

10�3

10�2

10�1

100

compu

tation

alco

st[s]

experiment

O(N3)

a Approximation

b Structure

Approximation methods

i Subsample

ii Sparsity

iii Low-rank approximations (see HODLR solver in george)

iv etc.

Structured models

i Kronecker products

ii Evenly sampled data

iii Semi-separable kernels (see celerite)

iv etc.

celerite.readthedocs.io

102 103 104 105

number of data points [N]

10�5

10�4

10�3

10�2

10�1

100co

mpu

tationa

lco

st[sec

ond

s]

1

2

4

8

16

32

64

128

256

direct

O(N)

100 101 102

number of terms [J]

64

256

1024

4096

16384

65536

262144

O(J2)

5

Recap

log p({yn} | ✓, ↵) = �1

2

r✓T K↵

�1 r✓ �1

2

log detK↵ � N

2

log(2⇡)

if there is correlated noise (instrumental or astrophysical) in your data*, try a Gaussian process:

* Hint: there is!

to do this in Python, try:

import georgegeorge.readthedocs.io

Resources

a gaussianprocess.org/gpml

b george.readthedocs.io

c dfm.io/gp.js

d github.com/dfm/gp

e [email protected]

Gaussian Processesa practical introduction to

for astronomy

Dan Foreman-Mackey Flatiron Institute // dfm.io // github.com/dfm // @exoplaneteer

gaussian processes - harvard universityhea- · 2018. 1. 10. · bombcat.gifﬁt. what happened? 5.0...

Documents