nassp masters 5003f - computational astronomy - 2010 lecture 6 objective functions for model...
TRANSCRIPT
NASSP Masters 5003F - Computational Astronomy - 2010
Lecture 6
• Objective functions for model fitting:– Sum of squared residuals (=> the ‘method of
least squares’).– Likelihood
• Hypothesis testing
NASSP Masters 5003F - Computational Astronomy - 2009
Model fitting – reminder of the terminology:• We have data yi at
samples of some independent variable xi.
• The model is our estimate of the parent or truth function.
• Let’s express the model m(xi) as a function of a few parameters θ1, θ2 .. θM.
• Finding the ‘best fit’ model then just means best estimates of the θ. (Bold – shorthand for a list)
• Knowledge of physics informs choice of m, θ.
The parent function –what we’d like tofind out (but never can,exactly).
NASSP Masters 5003F - Computational Astronomy - 2009
Naive best fit calculation:• The residuals for a particular model = yi-mi.• To ‘thread the model through the middle of the
noise’, we want the magnitudes of all residuals to be small.
• A reasonable way (not the only way) to achieve this is to define a sum of squared residuals as our objective function:
• Fitting by minimizing this objective function is called the method of least squares. It is extremely common.
• NOTE! This approach IGNORES possible uncerts in x.
But what if the noise is not homogeneous?• Some bits clearly have
higher σ than others.
• Answer: weight by 1/σ2i
• This form of U is sometimes called χ2 (pronounced kai squared).
• To use it, we need to know the σi.
NASSP Masters 5003F - Computational Astronomy - 2009
NASSP Masters 5003F - Computational Astronomy - 2009
Simple example: mi = θ1 + θ2siModel – red is si, green the flat background.
The data yi:
Contour map of Uls.
Truth values!
An even simpler example:• Last lecture, I noted that there do exist
cases in which we can directly invert
• For least squares, this happens if the model is a polynomial function of the parameters θi.
• Expansion of grad U in this case gives a set of M linear equations in the M parameters called the normal equations.
• It is easy to solve these to get the θi.NASSP Masters 5003F - Computational Astronomy - 2009
0U
Simplest example of all: fitting a straight line.
• Called linear regression by the statisticians.
• There is a huge amount of literature on it.
• Normal equations for a line model turn out to be:
• Polynomial is an easy extension to this.NASSP Masters 5003F - Computational Astronomy - 2009
N
ii
ii
N
ii
i
N
ii
iN
ii
i
N
ii
iN
ii
i
yx
y
xx
xx
1 2
1 2
1 2
2
1 2
1 21 2
slope
intercept
NASSP Masters 5003F - Computational Astronomy - 2009
χ2 for Poisson data – possible, but problematic.
• Choose data yi as estimator for σi2?
– No - can have zero values in denominator.
• Choose (evolving) model as estimator for σi2?
– No - gives a biased result.
• Better: Mighell formula
• Unbiased, but no good for goodness-of-fit.– Use Mighell to fit θ then standard U for
“goodness of fit” (GOF).
Mighell K J, Ap J 518, 380 (1999)
Another choice of U: likelihood.• Likelihood is best illustrated by Poisson
data.
• Consider a single Poisson random variable y: its PDF is
where m here plays the role of the expectation value of y.
• We’re used to thinking of this as a function just of one variable, ie y;– but it is really a function of both y and m.
NASSP Masters 5003F - Computational Astronomy - 2009
!y
emmyp
my
Poisson PDF
NASSP Masters 5003F - Computational Astronomy - 2009
Poisson PDF
NASSP Masters 5003F - Computational Astronomy - 2009
NASSP Masters 5003F - Computational Astronomy - 2009
PDF for y vs likelihood for θ.
Probability p(y|θ) = θy e–θ / y!
Likelihood p(y|θ) = θy e–θ / y!
The likelihood function.• Before, we thought “given m, let us apply
the PDF to obtain the probability of getting between y and y+dy.”
• Now we are saying “well we know y, we just measured it. We don’t know m. But surely the PDF taken as a function of m indicates the probability density for m.”
• Problems with this:– Likelihood function is not necessarily
normalized, like a ‘proper’ PDF;– What assurance do we have that the true PDF
for m has this shape??NASSP Masters 5003F - Computational Astronomy - 2009
Likelihood continued.• Usually we have many (N) samples yi.
Can we arrive at a single likelihood for all samples taken together?
• (Note that we’ve stopped talking just about Poisson data now – this expression is valid for any form of p.)
• Sometimes easier to deal with the log-likelihood L:
NASSP Masters 5003F - Computational Astronomy - 2009
N
i ii mypp1total
N
i ii myLL1total
Likelihood continued• To get the best-fit model m, we need to
maximize the likelihood (or equivalently, the log likelihood).
• If we want an objective function to minimize, it is convenient to choose –L.
• Can show that for Gaussian data, minimizing –L is equivalent to minimizing the variance-weighted sum of squared residuals (=chi squared) given before.– Proof left as an exercise!
NASSP Masters 5003F - Computational Astronomy - 2009
NASSP Masters 5003F - Computational Astronomy - 2009
Poissonian/likelihood version of slide 3Model – red is si, green the flat background.
The data yi:
Map of the joint likelihood L.
What if also errors in xi?• Tricky… Bayes better in this case.
NASSP Masters 5003F - Computational Astronomy - 2009
What next?• In fitting a model, we want (amplifying a bit
on lecture 4):1. The best fit values of the parameters;
2. Then we want to know if these values are good enough!• If not: need to go back to the drawing board and
choose a new model.
3. If the model passes, want uncertainties in the best-fit parameters.• (I’ll put this off to a later lecture…)
• Number 1 is accomplished. √
NASSP Masters 5003F - Computational Astronomy - 2009
How to tell if our model is correct.• Supposing our model is absolutely
accurate.• The U value we calculate is, nevertheless,
a random variable: each fresh set of data will give rise to a slightly different value of U.
• In other words, U, even in the case of a perfectly accurate model, will have some spread – in fact, like any other random variable, it will have a PDF.– This PDF is sometimes calculable from first
principles (if not, one can do a Monte Carlo to estimate it).
NASSP Masters 5003F - Computational Astronomy - 2009
How to tell if our model is correct.• The procedure is:
– First calculate the PDF for U in the ‘perfect fit’ case;
– From this curve, obtain the value of the PDF at our best-fit value of U;
– If p(Ubest fit) is very small, it is unlikely that our model is correct.
– Note that both χ2 and –L have the property that they cannot be negative.
– A model which is a less than ideal match to the truth function will always generate U values with a PDF displaced to higher values of U.
NASSP Masters 5003F - Computational Astronomy - 2009
Perfect vs. imperfect p(U):
NASSP Masters 5003F - Computational Astronomy - 2009
A perfect model gives thisshape PDF
PDF for imperfect modelis ALWAYS displaced to higher U.
Goodness of model continued• Because plausible Us are >=0; and because an
imperfect model always gives higher U: we prefer to– generate the survival function for the perfect model;– that tells us the probability of a perfect model giving us
the measured value of U or higher.
• This procedure is called hypothesis testing.• Because we make the hypothesis:
– “Suppose our model is correct. What sort of U value should we expect to find?”
• We’ll encounter the technique again next lecture when we turn to enquire if there is any signal at all buried in the noise.
NASSP Masters 5003F - Computational Astronomy - 2009
• If we use the least-squares U (also known as χ2), this is easy, because p(U) is known for this:
where– Г is the gamma function
– and υ is called the degrees of freedom.
• Note: the PDF has a peak at U~υ.
Perfect-model p(U)s:
NASSP Masters 5003F - Computational Astronomy - 2009
22
2exp2
22
UU
Up
0
1xt tedtx
What are degrees of freedom?• The easiest way to illustrate what degrees
of freedom is, is to try fitting a polynomial of higher and higher order to a set of noisy data.
• The more orders we include, the nearer the model will fit the data, and the smaller the sum of squared residuals (χ2) will be, until…
• when M=N (ie the number of parameters, polynomial orders in this case, equals the number of data points), the model will go through every point exactly. χ2 will equal 0.
NASSP Masters 5003F - Computational Astronomy - 2009
Degrees of freedom• Defined as N-M: number of data points
minus number of parameters fitted.
• It is sometimes convenient to define a reduced chi squared
– PDF for χ2reduced should of course peak at
about 1.– There is no advantage in using this for
minimization rather than the ‘raw’ χ2.
NASSP Masters 5003F - Computational Astronomy - 2009
22reduced
1 MN
‘Survival function’ for U.• Remember the survival function of a PDF
is defined as
• For χ2 this is
• where Г written with 2 arguments like this is called the incomplete gamma function:
NASSP Masters 5003F - Computational Astronomy - 2009
2
2,21,2
UUP
xpdxxxPx
0
0
x at tedtxa
0
1,