nassp masters 5003f - computational astronomy - 2010 lecture 6 objective functions for model...

26
NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 • Objective functions for model fitting: – Sum of squared residuals (=> the ‘method of least squares’). – Likelihood • Hypothesis testing

Upload: brendan-poole

Post on 14-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

NASSP Masters 5003F - Computational Astronomy - 2010

Lecture 6

• Objective functions for model fitting:– Sum of squared residuals (=> the ‘method of

least squares’).– Likelihood

• Hypothesis testing

Page 2: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

NASSP Masters 5003F - Computational Astronomy - 2009

Model fitting – reminder of the terminology:• We have data yi at

samples of some independent variable xi.

• The model is our estimate of the parent or truth function.

• Let’s express the model m(xi) as a function of a few parameters θ1, θ2 .. θM.

• Finding the ‘best fit’ model then just means best estimates of the θ. (Bold – shorthand for a list)

• Knowledge of physics informs choice of m, θ.

The parent function –what we’d like tofind out (but never can,exactly).

Page 3: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

NASSP Masters 5003F - Computational Astronomy - 2009

Naive best fit calculation:• The residuals for a particular model = yi-mi.• To ‘thread the model through the middle of the

noise’, we want the magnitudes of all residuals to be small.

• A reasonable way (not the only way) to achieve this is to define a sum of squared residuals as our objective function:

• Fitting by minimizing this objective function is called the method of least squares. It is extremely common.

• NOTE! This approach IGNORES possible uncerts in x.

Page 4: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

But what if the noise is not homogeneous?• Some bits clearly have

higher σ than others.

• Answer: weight by 1/σ2i

• This form of U is sometimes called χ2 (pronounced kai squared).

• To use it, we need to know the σi.

NASSP Masters 5003F - Computational Astronomy - 2009

Page 5: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

NASSP Masters 5003F - Computational Astronomy - 2009

Simple example: mi = θ1 + θ2siModel – red is si, green the flat background.

The data yi:

Contour map of Uls.

Truth values!

Page 6: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

An even simpler example:• Last lecture, I noted that there do exist

cases in which we can directly invert

• For least squares, this happens if the model is a polynomial function of the parameters θi.

• Expansion of grad U in this case gives a set of M linear equations in the M parameters called the normal equations.

• It is easy to solve these to get the θi.NASSP Masters 5003F - Computational Astronomy - 2009

0U

Page 7: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

Simplest example of all: fitting a straight line.

• Called linear regression by the statisticians.

• There is a huge amount of literature on it.

• Normal equations for a line model turn out to be:

• Polynomial is an easy extension to this.NASSP Masters 5003F - Computational Astronomy - 2009

N

ii

ii

N

ii

i

N

ii

iN

ii

i

N

ii

iN

ii

i

yx

y

xx

xx

1 2

1 2

1 2

2

1 2

1 21 2

slope

intercept

Page 8: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

NASSP Masters 5003F - Computational Astronomy - 2009

χ2 for Poisson data – possible, but problematic.

• Choose data yi as estimator for σi2?

– No - can have zero values in denominator.

• Choose (evolving) model as estimator for σi2?

– No - gives a biased result.

• Better: Mighell formula

• Unbiased, but no good for goodness-of-fit.– Use Mighell to fit θ then standard U for

“goodness of fit” (GOF).

Mighell K J, Ap J 518, 380 (1999)

Page 9: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

Another choice of U: likelihood.• Likelihood is best illustrated by Poisson

data.

• Consider a single Poisson random variable y: its PDF is

where m here plays the role of the expectation value of y.

• We’re used to thinking of this as a function just of one variable, ie y;– but it is really a function of both y and m.

NASSP Masters 5003F - Computational Astronomy - 2009

!y

emmyp

my

Page 10: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

Poisson PDF

NASSP Masters 5003F - Computational Astronomy - 2009

Page 11: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

Poisson PDF

NASSP Masters 5003F - Computational Astronomy - 2009

Page 12: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

NASSP Masters 5003F - Computational Astronomy - 2009

PDF for y vs likelihood for θ.

Probability p(y|θ) = θy e–θ / y!

Likelihood p(y|θ) = θy e–θ / y!

Page 13: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

The likelihood function.• Before, we thought “given m, let us apply

the PDF to obtain the probability of getting between y and y+dy.”

• Now we are saying “well we know y, we just measured it. We don’t know m. But surely the PDF taken as a function of m indicates the probability density for m.”

• Problems with this:– Likelihood function is not necessarily

normalized, like a ‘proper’ PDF;– What assurance do we have that the true PDF

for m has this shape??NASSP Masters 5003F - Computational Astronomy - 2009

Page 14: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

Likelihood continued.• Usually we have many (N) samples yi.

Can we arrive at a single likelihood for all samples taken together?

• (Note that we’ve stopped talking just about Poisson data now – this expression is valid for any form of p.)

• Sometimes easier to deal with the log-likelihood L:

NASSP Masters 5003F - Computational Astronomy - 2009

N

i ii mypp1total

N

i ii myLL1total

Page 15: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

Likelihood continued• To get the best-fit model m, we need to

maximize the likelihood (or equivalently, the log likelihood).

• If we want an objective function to minimize, it is convenient to choose –L.

• Can show that for Gaussian data, minimizing –L is equivalent to minimizing the variance-weighted sum of squared residuals (=chi squared) given before.– Proof left as an exercise!

NASSP Masters 5003F - Computational Astronomy - 2009

Page 16: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

NASSP Masters 5003F - Computational Astronomy - 2009

Poissonian/likelihood version of slide 3Model – red is si, green the flat background.

The data yi:

Map of the joint likelihood L.

Page 17: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

What if also errors in xi?• Tricky… Bayes better in this case.

NASSP Masters 5003F - Computational Astronomy - 2009

Page 18: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

What next?• In fitting a model, we want (amplifying a bit

on lecture 4):1. The best fit values of the parameters;

2. Then we want to know if these values are good enough!• If not: need to go back to the drawing board and

choose a new model.

3. If the model passes, want uncertainties in the best-fit parameters.• (I’ll put this off to a later lecture…)

• Number 1 is accomplished. √

NASSP Masters 5003F - Computational Astronomy - 2009

Page 19: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

How to tell if our model is correct.• Supposing our model is absolutely

accurate.• The U value we calculate is, nevertheless,

a random variable: each fresh set of data will give rise to a slightly different value of U.

• In other words, U, even in the case of a perfectly accurate model, will have some spread – in fact, like any other random variable, it will have a PDF.– This PDF is sometimes calculable from first

principles (if not, one can do a Monte Carlo to estimate it).

NASSP Masters 5003F - Computational Astronomy - 2009

Page 20: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

How to tell if our model is correct.• The procedure is:

– First calculate the PDF for U in the ‘perfect fit’ case;

– From this curve, obtain the value of the PDF at our best-fit value of U;

– If p(Ubest fit) is very small, it is unlikely that our model is correct.

– Note that both χ2 and –L have the property that they cannot be negative.

– A model which is a less than ideal match to the truth function will always generate U values with a PDF displaced to higher values of U.

NASSP Masters 5003F - Computational Astronomy - 2009

Page 21: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

Perfect vs. imperfect p(U):

NASSP Masters 5003F - Computational Astronomy - 2009

A perfect model gives thisshape PDF

PDF for imperfect modelis ALWAYS displaced to higher U.

Page 22: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

Goodness of model continued• Because plausible Us are >=0; and because an

imperfect model always gives higher U: we prefer to– generate the survival function for the perfect model;– that tells us the probability of a perfect model giving us

the measured value of U or higher.

• This procedure is called hypothesis testing.• Because we make the hypothesis:

– “Suppose our model is correct. What sort of U value should we expect to find?”

• We’ll encounter the technique again next lecture when we turn to enquire if there is any signal at all buried in the noise.

NASSP Masters 5003F - Computational Astronomy - 2009

Page 23: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

• If we use the least-squares U (also known as χ2), this is easy, because p(U) is known for this:

where– Г is the gamma function

– and υ is called the degrees of freedom.

• Note: the PDF has a peak at U~υ.

Perfect-model p(U)s:

NASSP Masters 5003F - Computational Astronomy - 2009

22

2exp2

22

UU

Up

0

1xt tedtx

Page 24: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

What are degrees of freedom?• The easiest way to illustrate what degrees

of freedom is, is to try fitting a polynomial of higher and higher order to a set of noisy data.

• The more orders we include, the nearer the model will fit the data, and the smaller the sum of squared residuals (χ2) will be, until…

• when M=N (ie the number of parameters, polynomial orders in this case, equals the number of data points), the model will go through every point exactly. χ2 will equal 0.

NASSP Masters 5003F - Computational Astronomy - 2009

Page 25: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

Degrees of freedom• Defined as N-M: number of data points

minus number of parameters fitted.

• It is sometimes convenient to define a reduced chi squared

– PDF for χ2reduced should of course peak at

about 1.– There is no advantage in using this for

minimization rather than the ‘raw’ χ2.

NASSP Masters 5003F - Computational Astronomy - 2009

22reduced

1 MN

Page 26: NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least

‘Survival function’ for U.• Remember the survival function of a PDF

is defined as

• For χ2 this is

• where Г written with 2 arguments like this is called the incomplete gamma function:

NASSP Masters 5003F - Computational Astronomy - 2009

2

2,21,2

UUP

xpdxxxPx

0

0

x at tedtxa

0

1,