nassp masters 5003f - computational astronomy - 2010 lecture 4 random variables continued. monte...

28
NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 • Random variables continued. • Monte Carlo • Uncertainty propagation • Correct presentation of data. • How to obtain the ‘best-fit’ model: basic considerations. • Techniques for finding minima (or maxima).

Upload: eugene-preston

Post on 28-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

NASSP Masters 5003F - Computational Astronomy - 2010

Lecture 4

• Random variables continued.

• Monte Carlo

• Uncertainty propagation

• Correct presentation of data.

• How to obtain the ‘best-fit’ model: basic considerations.

• Techniques for finding minima (or maxima).

Page 2: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Advertisement!• A simple FITS dump-to-ASCII script is

available off my web page.

• I’ll try to get FTOOLS installed on the NASSP machines.

NASSP Masters 5003F - Computational Astronomy - 2009

Page 3: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

A python note:• Sarah Blyth has a good intro manual for

plotting with pylab:

NASSP Masters 5003F - Computational Astronomy - 2009

http://www.ast.uct.ac.za/~sarblyth/pythonGuide/PythonPlottingBeginnersGuide.pdf

Page 4: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

NASSP Masters 5003F - Computational Astronomy - 2009

Samples from the probability density function

x

p(x)Estimate of μ:

Estimate of σ2:

N

iixN 1

22 ˆ1

N samples xi – random numbers having the distribution p(x)

Page 5: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

NASSP Masters 5003F - Computational Astronomy - 2009

Estimating the PDF: a frequency histogram

x

p(x)

Definition of a histogram:-Set up bins, usually (butnot always) of equal width;-Count the samples whichfall in each bin.

Note that the bin heightshave some scatter – in factthese numbers are Poissonrandom variables.

Page 6: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

NASSP Masters 5003F - Computational Astronomy - 2009

Estimating the PDF: mean, variance etc• Estimating the three important properties

of the PDF from N samples of X:– A frequency histogram serves as an estimate

of p(x).– Estimate of the mean:

– Estimate of the variance:

• Note: the result of every (non-trivial) transformation of a set of random numbers is itself a random number. (For example, the estimators for the mean and variance are themselves random numbers.)

N

iixN 1

N

iixN 1

22 ˆ1

Note: the ‘hats’ here mean ‘estimate’.

Page 7: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Estimating the correlation between two different random variables:

• Suppose we have N paired samples of variables A and B.

• The covariance between these samples is estimated by

• If we have M different variables, we can define an M x M covariance matrix.

NASSP Masters 5003F - Computational Astronomy - 2009

N

iBiAiBA ba

N 1

2, ˆˆ

1

Page 8: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

NASSP Masters 5003F - Computational Astronomy - 2009

Uncertainty propagation

• If some random variable z is a function f(y) of other random variables y=[y1,y2,...yM], then

– σ2i,i ≡ σ2

i is just the variance of the variable yi.

– σ2i,j is the covariance between yi and yj.

• The formula looks horrible, but in practice it is often simple to evaluate...

M

i

M

jji

jiz y

f

y

f

1 1

2,

2

Page 9: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

NASSP Masters 5003F - Computational Astronomy - 2009

Uncertainty propagation

• Often (not always!) the different yi are uncorrelated – ie, the value of one does not depend on another. In this case σi,j2=0 for i≠j and so

• Examples (all uncorrelated):

.1

2

2

2

N

iy

iz iy

f

.222

offonnett TTT

N

iiyN 1

1 .1

1

22

2

N

iyiN

.2

22

yy

z

yz ln

offonnett TTT

Page 10: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

The Monte Carlo• It is often useful to be able to construct

simulated data.– Perhaps to test some code designed to

process real data;– Or to estimate a PDF for which you don’t have

a formula.

• But of course, realistic data must contain random noise.

• A procedure which constructs a set of N random variables is called a Monte Carlo (after the famous casino).

NASSP Masters 5003F - Computational Astronomy - 2009

Page 11: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Simulating random variables:• There are routines in numpy and scipy to

give you random numbers from a large library of different PDFs.– Bear in mind that these modules have

essentially been written by amateurs – so be a bit wary – check them where practical!

• There are simple algorithms for simulating gaussian and poisson randoms.

• Joint randoms are a bit trickier.– The ‘rejection method’– Markov-chain Monte Carlo

NASSP Masters 5003F - Computational Astronomy - 2009

Page 12: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Making a frequency histogram• The input information is a list of MC-generated

samples from a PDF.• Start by making a normal histogram of these

samples:– Ie, set up bin boundaries, then count how many

samples fall within each bin.– Calculate an uncertainty from each bin value from the

square root of the counts in the bin.• Because you want to compare to a PDF, the

‘integral’ of your FH must = 1.– To get this, divide each bin value by

• the bin width; and• the total number of samples (summed over all bins).

– NOTE! Everything you do to the bin value, also do to the uncertainty.

– Histogram values are integers but FH values turn to reals (floating points).

NASSP Masters 5003F - Computational Astronomy - 2009

Page 13: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Graphs – correct presentation of data.• Distinguish between data and interpretation.

– Any kind of fitted model counts as interpretation.– Usually draw data as points (or crosses, circles,

etc – ie, as a discrete symbol). Joining the points up is EVIL!!

• …unless this is the best way to clarify a crowded plot.• If you do join symbols with a line, make sure the

symbols are not obscured by the line.

– Include error bars where appropriate.• Most often on just the Y axis but occasionally also on X.• Sometimes you won’t know the errors,• or the scatter in the points will indicate it anyway.

– Interpretation = theory can be drawn with curves.NASSP Masters 5003F - Computational Astronomy - 2009

Page 14: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Error bars

NASSP Masters 5003F - Computational Astronomy - 2009

Is this spike significant?

Page 15: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Error bars

NASSP Masters 5003F - Computational Astronomy - 2009

Probably

Page 16: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Error bars

NASSP Masters 5003F - Computational Astronomy - 2009

Probably not.

Page 17: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Correct presentation of data continued.• Try to find some way to transform the axes

such that the points are expected to fall on a line – sometimes one with zero slope.– Why? Because there are uncounted ways not to

be linear but only 1 way to be linear.– You’ll also need to transform the error bars.– A log scale is sometimes useful too if the data has

information over a wide range of scales.

NASSP Masters 5003F - Computational Astronomy - 2009

Page 18: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Axes transforms - examples• For a ‘power law’, ie y(x) = Axα: take logs

(to base 10) of both x and y axes.– Gives a straight line of slope α and intercept

log(A).

• For an exponential decay y(x) = Ae-kx: take logs of the y axis only.

• y(x) = Ax-2: plot y against 1/x2.

NASSP Masters 5003F - Computational Astronomy - 2009

Page 19: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Example of MC, error propagation, correct presentation:

NASSP Masters 5003F - Computational Astronomy - 2009

1000 samples

Page 20: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

With more samples…

NASSP Masters 5003F - Computational Astronomy - 2009

105 samples

Page 21: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

NASSP Masters 5003F - Computational Astronomy - 2009

An example of when error bars are superfluous (also an example of axes transformation)

Page 22: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

Another… this is a survival function histogram. Errors are not independent!

NASSP Masters 5003F - Computational Astronomy - 2009

From M Tajer et al (2004).

Page 23: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

The astronomer has:• Lots of (noisy) data points y = [y1, y2, .. yn];

• A model with a relatively small number of adjustable parameters Θ = [θ1, θ2, .. θm].

NASSP Masters 5003F - Computational Astronomy - 2009

Page 24: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

NASSP Masters 5003F - Computational Astronomy - 2009

What do we want from this?

1. We want to find a model which is in some sense the ‘best fit’ to the data. This means obtaining:– The best fit values of the parameters Θ;– Uncertainties for these.

2. We may also want to compare competing models.– A very common example: comparing a model

without any signal to a model (or rather, the whole class of models) with some signal.

– The model without is known as the null hypothesis.

Model-comparison will come in later lectures...

Page 25: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

The ‘best fit’ model.1. First construct your model.

– Informed by the physics of the situation.

2. Two basic approaches: Bayesian and Frequentist. Will cover Bayesian later. Frequentist approach is to define some objective function U which, when minimized (or maximized), returns a ‘best fit’ model.

– U must obviously be a function of both the data and the model parameters.

– Examples: least-squares, likelihood, ‘entropy’.

NASSP Masters 5003F - Computational Astronomy - 2009

Page 26: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

A backward arrangement..?• I’m going to cover techniques of

minimization first, then discuss ‘best-fit’ objective functions later.

• A good reference for all kinds of algorithms: Press et al, “Numerical Recipes <for fortran, C, C++ etc>”.– The code they provide is not always very

good, but the explanations and recipes are excellent.

– Older versions (still excellent) are available on the web - eg:

NASSP Masters 5003F - Computational Astronomy - 2009

http://www.nrbook.com/a/bookfpdf.php

Page 27: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

NASSP Masters 5003F - Computational Astronomy - 2009

Minimization

• Nearly always in model-fitting we are trying to find the minimum in an analytic function – which is math-speak for a function which can be expanded in a Taylor series about the point of interest.

• ▼U, the gradient, is a vector of 1st derivatives of U w.r.t each parameter.

• H is called the Hessian and is a matrix of 2nd derivatives of U.

3O2

100

T0000 ΘΘHΘΘΘΘΘΘ UUU

Page 28: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 4 Random variables continued. Monte Carlo Uncertainty propagation Correct presentation of

NASSP Masters 5003F - Computational Astronomy - 2009

Minimization• The definition of a minimum in U is a place

where the gradient equals 0 – ie where the partial derivatives ∂U/∂θi = 0 for all θi.

• IF the function was a perfect paraboloid*, ie if there were no terms in the Taylor series of order > 2, then no matter where we are in the Θ space, we could go to the minimum in 1 mighty jump, because

• But because this is nearly always NOT true, in practice, minimization is an affair of many steps.– It’s of course desirable to keep the number of steps

as small as possible.

U minΘΘH

*Or if we can directly invert the equations ∂U/∂θi = 0.