notes modeling2015.pdf on course web site (tbd) chapter 11...

55
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2015 http://www.astro.cornell.edu/~cordes/A6523 Lectures 20 and 21: Source confusion Localization in the Fourier domain: derivation Implementation in python Prewhitening Source finding in surveys Notes Modeling2015.pdf on course web site (tbd) Chapter 11 in Gregory (Nonlinear model fitting) Chapter 29 of Mackay (Monte Carlo Methods) http://www.inference.phy.cam.ac.uk/mackay/itila/

Upload: hoangdiep

Post on 17-Mar-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

A6523 Signal Modeling, Statistical Inference and

Data Mining in Astrophysics

Spring 2015 http://www.astro.cornell.edu/~cordes/A6523

Lectures 20 and 21: –  Source confusion –  Localization in the Fourier domain: derivation –  Implementation in python –  Prewhitening –  Source finding in surveys

•  Notes Modeling2015.pdf on course web site (tbd) •  Chapter 11 in Gregory (Nonlinear model fitting) •  Chapter 29 of Mackay (Monte Carlo Methods)

http://www.inference.phy.cam.ac.uk/mackay/itila/

Page 2: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

All  chapters  available  on  line  (en2re  book  in  one  PDF  as  well)  

Page 3: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

1974ApJ...188..279C

Page 4: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

1974ApJ...188..279C

Page 5: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Derivation of PDF for Source Confusion

•  Poisson locations •  Power-law amplitude distribution •  First and second moments using shot-noise

formalism

Page 6: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Source Confusion: Application of Poisson Processes

Celestial sources are distributed on the sky according to Poisson statistics: the number ofsources in a solid angle ∆θ is a Poisson random variable. All telescopes have finite angu-lar resolution so there can be a large number of sources in the telescope’s point spread function(PSF) at any instant. The variation in number of these sources and variations in their intensityor flux density produces confusion noise. In some cases confusion noise can dominate othercontributions to images, time series, spectra, etc. and therefore must be taken into account indetection and model-fitting contexts.

We can derive the mean, variance, and PDF of confusion by starting from simple properties ofPoisson RVs. The probability of obtaining k events or objects in some relevant interval of time,solid angle, volume, etc. is

P (k) =e−λ(λ)k

k!, (1)

where λ is the mean number.

The characteristic function is

Φ(ω) ≡�eiωk

�=

kPke

iωk =�

k

e−λλk

k!eiωk = e−λ �

k

�λeiω

�k

k!= e−λeλe

iω= eλ(e

iω−1) (2)

The first and second moments of k can be derived from the CF by taking first and secondderivatives.

1

Page 7: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Poisson Distributed SourcesSky model:

I(θ) =�

jsjhj(θ − θj)

where sj = flux density and hj = source shape, and θj = source location. In the following wewill assume all sources have the same shape and implicitly that they are point sources

Measurements: The PSF is convolved with the sky model to produce the measured image, etc.in the simplest case. I.e. we ignore instrumental effects and calibration.

The measured quantity isJ(θ) =

jsjg(θ − θj)

where g(θ) is the PSF if all sources are point sources.

Flux density PDF: We assume a power law

fs(s) = Kss−γ, s0 ≤ s ≤ s1.

Sky density of sources: Let ηθ = the number of sources per unit solid angle.

Moments: As often is the case, we are interested in the mean and variance of the measuredquantity J . Think of J as a fluctuating quantity that is WSS over the sky and where the largenumber of weak sources produces a stochastic variation that can mask individual sources unlessthey are very strong. We want to know what the threshold should be to identify individualsources.

2

Page 8: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

MomentsOur sky model is basically a shot-noise model. There is a large literature on shot noise in a wide

variety of contexts. Campbell’s theorem tells us how to calculate the first and second moments

of J . This theorem is proven by using the characteristic function of Poisson events and taking

into account the PSF and source amplitude PDF.

Mean:�J� =

��

jsjg(θ − θj

= ηθ �s��dθg(θ).

Second moment:�J2

�=

[�

jsjg(θ − θj]

2�

= �J�2 + ηθ �s2��dθg2(θ).

The two flux density moments are gotten by integrating over fs(s) to yield

�s� = Ks

s2−γ1 − s2−γ

0

2− γ

�s2� = Ks

s3−γ1 − s3−γ

0

3− γ

To avoid Olber’s paradox we need γ < 2 if s1 → ∞ or γ > 2 if s0 → 0.

3

Page 9: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

IssuesThe first and second moments include all sources, weak and strong, and therefore are not usefulfor defining a threshold for detection.

We need to somehow separate the confusion fluctuations from multiple sources within the PSFfrom strong, individual sources that standout.

From the above we haveσ2J = ηθ

�dss2fs(s)

�dθg2(θ).

What we need to do is to cut off the integral over s.

For any given source we have J = sg(θ). We want to find σJ so that we cut off the integral atsmax = mσs where m = 5, say.

Since our observable is J , we really have a threshold Jmax = mσJ and therefore smax =mσJ/g(θ).

The integral becomes

σ2J = ηθ

� smax=mσJ/g(θ)

s0dss2fs(s)

�dθg2(θ).

Through a change of variable, the result is obtained

σJ = m(3−γ)/(γ−1)

ηθKs

3− γ

�dθgγ−1(θ)

1/(γ−1

4

Page 10: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Arrival Time Estimation from Matched FilteringLeast-squares solution and localization error

Consider discrete sampling of a template and measured profile from which we want to deter-mine the amplitude b and location τ through matched filtering (MF). The time and frequencydomain quantities are:

st = template ⇐⇒ skmt = model = a + bst−τ ⇐⇒ mk

pt = measured profile = mt + nt ⇐⇒ pknt = noise ⇐⇒ nk,

where the Fourier transform is defined as sk =N−1�

t=0

ste−2πitk/N .

Frequency-domain approach: Using weights wk, minimize

χ2 =N−1�

k=0

wk|pk − mk|2

=N−1�

k=0

wk

���pk − aNδk0 − bske−2πikτ/N

���2

1

Page 11: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

We ignore the k = 0 term and consider only the first half of the array because time-domainquantities are real. So the cost function is (rewritten as Q)

Q =

N/2�

k=1

wk

�|pk|2 + b2|sk|2 − 2bRe{pks∗ke+2πikτ/N}

�. (1)

Taking derivatives we can solve for b and find an implicit equation for τ .

∂bQ = 2b

N/2�

k=1

wk|sk|2 − 2

N/2�

k=1

wkRe

�pks

∗ke

+2πikτ/N�= 0

b =

N/2�

k=1

wkRe

�pks

∗ke

+2πikτ/N�

N/2�

k=1

wk|sk|2

2

Page 12: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

∂τQ = 4(πib/N)b

N/2�

k=1

wkIm�pks

∗ke

+2πikτ/N�= 0

τ is the solution ofN/2�

k=1

wkIm�pks

∗ke

+2πikτ/N�= 0 (2)

Another approach: The same implicit equation for τ (Equation 2) can be gotten by multiply-ing the weighted DFT of the cross correlation, wkskp∗k by the phasor e2πikτ/N and finding thebest value of τ :

Q(τ ) =

N/2�

k=1

wkskp∗ke

2πikτ/N

= b

N/2�

k=1

wk |sk|2 e2πikτ/N (3)

When τ = τ the sum is maximized and real,

Qmax = Q(τ ) = b

N/2�

k=1

wk |sk|2

3

Page 13: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Another solution for the scale factor is

b =Qmax

N/2�

k=1

wk |sk|2

τ is the solution of ∂τRe{Q(τ )} = 0 or the solution of Im{Q(τ )} = 0. Either one gives thesame equation as Equation 2.

Parameter errors:

Expand Q from Equation 1 to second order,

Q(b, τ ) ≈ Qmin + ∂bQ δb + ∂τQ δτ + ∂2bQ (δb)2 + ∂2

τQ (δτ )2 + 2∂2bτQ δb δτ

= ∂2bQ (δb)2 + ∂2

τQ (δτ )2 + 2∂2bτQ δbδτ.

Scale factor:

Defining σb as the error along the b axis in the b-τ plane and using

∂2bQ = 2

N/2�

k=1

wk |sk|2 ,

4

Page 14: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

we have

Q = Qmin + ∂2bQ (δb)2

�Q� = �Qmin� + ∂2bQ �(δb)2� ≡ �Qmin� + ∂2

bQ σ2b

yielding

σ2b =

�Q� − �Qmin�∂2bQ

=�Q� − �Qmin�

2

N/2�

k=1

wk |sk|2=

1

2

N/2�

k=1

wk |sk|2,

where the rightmost equation results from defining the 1σ error as the contour of Q that is oneunit above the minimum.

It is reasonable to define the weights in terms of the additive noise, so wk = 1/σ2k and if they

are all the same (as for white noise), then

σ2b =

σ2

2

N/2�

k=1

|sk|2=

Nσ2t

2

N/2�

k=1

|sk|2,

where all σk have been set equal to σ and then related to the rms noise level σt in the time series.

5

Page 15: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Arrival time:

Using the same approach we have

σ2τ =

�Q� − �Qmin�∂2τQ

=1

∂2τQ

.

From earlier expressions we get

σ2τ =

1

2

�2π

N

�2

b

N/2�

k=1

wkk2Re{pks∗ke+2πikτ/N}

.

Note that the units of τ are in sample numbers. To get time units, we need to multiply by thesample interval, ∆t.

6

Page 16: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Template Fitting of Millisecond Pulsars

•  Pulse widths range from ~25 to 1000 μs •  In the best cases, timing precision is ~ 50 ns

(a factor of 1/500 of the pulse width) •  Templates (average pulse shapes) appear to

be stable over decades so pulsars can be used as precise clocks

•  Departures from the templates occur because of fluctuations at the single pulse level

Page 17: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Templates for MSPs

Page 18: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first
Page 19: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first
Page 20: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first
Page 21: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first
Page 22: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

B1937 + 21 1.56

J1713 + 0747 4.57

B1937 + 21 1.56

J1909− 3744 2.95

J1713 + 0747 4.57

J2317 + 1439 3.45

J2043 + 1711 2.38

J1744− 1134 4.07

J1640 + 2224 3.16

J1741 + 1351 3.75

B1855 + 09 5.36

J2017 + 0603 2.90

J0023 + 0923 3.05

J2214 + 3000 3.12

J1738 + 0333 5.85

J1832− 0836 2.72

J1600− 3053 3.60

J1853 + 1303 4.09

J1910 + 1256 4.98

J1923 + 2515 3.79

J0030 + 0451 4.87

J2145− 0750 16.05

J0645 + 5158 8.85

J1012 + 5307 5.26

J1614− 2230 3.15

J0613− 0200 3.06

J1944 + 0907 5.18

J1903 + 0327 2.15

J0931− 1902 4.64

J1024− 0719 5.16

J1918− 0642 7.65

J1643− 1224 4.62

J2010− 1323 5.22

J0340 + 4130 3.30

B1953 + 29 6.13

J1455− 3330 7.99

J2302 + 4442 5.19

J1949 + 3106 13.14

Pulsar P (ms)

10−4

10−3

10−2

10−1

100

101

σ30 min [µs]

Frequency Band = 1400 MHz

0.0 0.5 1.0 1.5 2.0 2.5 3.0

σJ,1/FWHM

Characteriza2on  of  all  MSPs  used  by  NANOGrav  for  precision  2ming.    Most  MSPs  are  limited  by  template  fiHng  errors  (finite  S/N)  but  a  few  are  limited  by  departures  of  pulse  shapes  from  the  template  by  pulsar-­‐intrinsic  effects  

Page 23: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Lecture 21

Page 24: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

ImplementationThe method is easily implemented in python. Specifically, a nonlinear solver is used to mini-

mize the cost function written here as

Q =

N/2�

k=1

|pk − mk|2 =N/2�

k=1

���pk − bske−2πikτ/N

���2,

where we ignore the k = 0 term so that any mean offset of the template is irrelevant. We also

use only the first half of the array because data and template are real. The two parameter fit

yields the scale factor b and time offset τ . Errors are calculated as above.

import scipy.optimize as spotfft = fft(template)pfft = fft(profile)bhat0 = bccftauhat0 = tauccf+ishiftparamvec0 = array((bhat0, tauhat0))

paramvec = spo.minpack.leastsq(tfresids, paramvec0, args=(tfft, pfft))bhat = paramvec[0][0]tauhat = paramvec[0][1]

The module leastsq in scipy.optimize.minpack starts with an initial guess of the parameters

paramvec0.

As such, the returned result can be at a local minimum rather than the global minimum.

How to check: try different starting values for paramvec0.

7

Page 25: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

def tfresids(params, tfft, pfft):

"""

Calculates residuals between scaled and rotated template and data.

"""

b=params[0]

tau=params[1]

Nfft = size(pfft)

Nsum = Nfft/2

arg=(2.*pi*tau/float(Nfft)) * arange(0., Nfft, 1.)

phasevec = cos(arg) - 1j*sin(arg)

resids = abs(pfft[1:Nsum] - b*tfft[1:Nsum]*phasevec[1:Nsum])

return resids

def toa_errors_additive(tfft, b, sigma_t):

"""

Calculates error in b = scale factor and tau = TOA due to additive noise.

input:

fft of template

b = fit value for scale factor

sigma_t = rms additive noise in time domain

output:

sigma_b

sigma_tau

"""

Nfft = size(tfft)

Nsum = Nfft / 2

kvec = arange(1,Nsum)

sigma_b = sigma_t*sqrt(float(Nfft) / (2.*sum(abs(tfft[1:Nsum])**2)))

sigma_tau = (sigma_t*Nfft/(2.*pi*abs(b))) \

* sqrt(float(Nfft) \

/ (2.*sum(kvec**2*abs(tfft[1:Nsum])**2)))

return sigma_tau, sigma_b

8

Page 26: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

About  700  lines  of  code.  

Page 27: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first
Page 28: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first
Page 29: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first
Page 30: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

100 101 102 103 104 105

S/N

10−5

10−4

10−3

10−2

10−1

100σTOA(bins)

Analysis of template fitting errors for simulated Gaussian profile

Fourier RMSFourier TotalCCF RMSCCF TotalMF Prediction

Template  fiHng  errors  for  Gaussian  pulse  with  width  of  10  bins  vs.  S/N    RMS  =  std(best  fit)  Total  =  total  mean-­‐square  difference  from  true  TOA    MF  predic2on  =  predicted  RMS  error  for  matched  filtering.  

Parabolic  interpola2on  of  the  CCF  gives  the  same  standard  devia2on  as  the  Fourier  approach  and  the  theore2cal  MF  error.        But  there  is  a  systema'c  devia'on  from  the  true  TOA  that  causes  the  large  total  error  for  parabolic  interpola2on  

100  realiza2ons  

Page 31: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Variance vs Mean Square Error for an Estimator

Let θ = estimator for a parameter θ.

The estimator has a mean �θ� and variance Var(θ) =�(θ − �θ�)2

Estimators may be biased:

Let B = bias = �θ� − θ.

We can have the situation where the variance of the estimator is small but the mean-square error(MSE) is large because systematic errors are sizable.

MSE =�(θ − θ)2

=

��θ − �θ� + �θ� − θ

�2�

=�(θ − �θ�)2

�+�(�θ� − θ)2

�+ 2

�θ − �θ�

���θ� − θ

= Var(θ) + B2.

1

Page 32: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Cramer-Rao Bound on the Variance of an Unbiased Estimator

For a derivation see http://en.wikipedia.org/wiki/Cramer-Rao bound.

There is a lower bound on the variance of an unbiased estimator θ that involves the Fisher information.

Let

θ = parameterL(θ) = likelihood function

L(θ) = lnL(θ) = log likelihoodfX(x) = PDF of data x

The Fisher information is

I(θ) ≡��

∂L(θ; x)

∂θ

�2�≡ −

�∂2L(θ; x)

∂2θ

where �· · · � denotes averaging over the PDF of x. The two expressions can be shown to be equivalentby manipulating the relevant integrals.

The Cramer-Rao bound applies to an unbiased estimator:

Var(θ) ≥ 1

I(θ).

1

Page 33: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Gaussian ExampleSingle datum from a Gaussian PDF N(µ, σ2) where the variance σ2 is known so the parameter to befound is θ = µ.

The log likelihood is simply

L(µ, x) = −1

2ln(2πσ2)− (x− µ)2

2σ2.

This gives �∂2L(µ; x)

∂2µ

�= − 1

σ2

which implies

Var(µ) ≥ − 1

1/σ2= σ2.

In this simple case an estimator for the mean is simply µ=x (the single data point), so the result is notsurprising.

Now suppose we have N i.i.d. samples x from the same Gaussian:

L(µ;x) =N�

j=1

Lj(µ; xj) =N�

j=1

ln fx(xj;µ, σ2).

In this caseI(θ) =

N

σ2

2

Page 34: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

which says that the minimum variance is

Var(µ) ≥ σ2

N.

Lo and behold we have discovered the√N law again!

The utility of the C-R bound is that in more complex situations, you can estimate what the best perfor-mance is for an estimator and compare with what you are actually getting.

3

Page 35: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Prewhitening

What is Prewhitening? Prewhitening is an operation that processes a time series (or some otherdata sequence) to make it behave statistically like white noise. The ‘pre’ means that whiteningprecedes some other analysis that likely works better if the additive noise is white.

These operations can be viewed in either the time domain or the frequency domain:

1. Make the ACF of the time series appear more like a delta function.

2. Make the spectrum appear flat.

Example data sets that may require prewhitening:

1. A well behaved noise process with an additive low frequency (or polynomial) trend added to it.

2. A deterministic signal with an additive red-noise process.

Viewed in the frequency domain, prewhitening means that the dynamic range of the measured datais reduced.

1

Page 36: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Why bother? Recall from our discussions of spectral analysis the issues of leakage and bias. These

arise from sidelobes inherent to spectral estimation. We can minimize leakage in two ways: (1)

make sidelobes smaller and (2) minimize the power that is prone to leaking into sidelobes. Spectral

windows address the former while prewhitening mitigates the latter. Leakage into sidelobes also

constitutes bias in spectral estimates. However bias appears in other data analysis procedures.

Consider least-squares fitting of a sinusoid to a signal of the form

x(t) = A cos(ωt + φ) + r(t) + n(t),

where n(t) is WSS white noise and r(t) is red noise with a steep power spectrum. Red noise can

strongly bias fitting of a model x(t) = A cos(ω+φ) because its power can leak across the underlying

spectrum causing a least-square fit to give highly discrepant values of A, ω, and φ.

Prewhitening of the time series ideally would yield a transformed time series of the form

x�(t) = A� cos(ωt + φ) + n�(t)

to which fitting a sinusoidal model will be less biased.

2

Page 37: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Procedures:

We have already seen one analysis that is related to prewhitening: the matched filter (MF). TheMF doesn’t whiten the spectrum of the output but it does weights the frequency components of themeasured quantity to maximize the S/N of the signal.

The signal model in this case is x(t) = a0A(t) + n(t). Recall for an arbitrary spectrum Sn(f ) foradditive noise that the frequency-domain MF for a signal A(t) is

h(f ) ∝ A(f )

Sn(f ).

Taking equality for simplicity, when the filter is applied to the measurements x(t), we have

y(f ) = x(f )h∗(f ) ∝ a0|A(f )|2

Sn(f )+

n(f )A∗(f )

Sn(f ).

This means that the ensemble-average spectrum of the filter output is�|y(f)|2

�=

a20|A(f)|4

S2n(f)

+�|n(f)|2�|A(f)|2

S2n(f)

=a20|A(f)|4

S2n(f)

+Sn(f)|A(f)|2

S2n(f)

=a20|A(f)|4

S2n(f)

+|A(f)|2

Sn(f)

=|A(f)|2

Sn(f)

�a20|A(f)|2

Sn(f)+ 1

3

Page 38: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Signals with trends: A common situation is where a quantity of the form a0A(t) + n(t) is super-posed with a strong trend, such as a baseline variation. Similar issues arise in measurements ofspectra.

Consequences of trends include:

1. Bias in estimating parameters of A(t− t0) or its spectral analog A(ν − ν0).

2. Erroneous estimates of cross correlations between two time series such as

x(t) = s1(t) + n1(t) and y(t) = s2(t) + n2(t),

where s1,2 are signals of interest and n1,2 are measurement errors. I.e. we may be interested inthe correlation

C =1

Nt

t

s1(t)s2(t) or C =1

Nt

t

[s1(t)− s1][s2(t)− s2]

where s1,2 = (1/Nt)�

t s1,2(t) are the sample means.

If there are trends p1,2(t) added to x(t) and y(t) the correlation C of x and y used to estimate C maybe dominated completely by the trends and not the signal parts of the measurements.

A fix: Trends can often be modeled as a polynomial of some order that can be fitted to the mea-surements. The order of the polynomial needs to be chosen ‘wisely.’ For a pulse or spectral lineconfined to some range of t or ν this is straight forward. But for a detection problem where thesignal location is not known, the situation is very tricky.

4

Page 39: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Prewhitening filter: Consider again x(t) = a0A(t) + n(t) and let’s trivially construct a frequency-domain filter that whitens the measurements.

We want a filter h(t) that flattens the noise n(t) in the frequency domain. Let y(t) = x(t) ⊗ h(t)where ⊗ means convolution. All we need is h(f ) =

�Sn(f ). Then the ensemble spectrum of the

output y(f ) is

�|y(f )|2� = �|x(f )|2��|h(f )|2�

=�|x(f )|2�Sn(f )

=a20�|A(f )|2�

Sn(f )+ 1

Note how this differs from the result for a matched filter. But the result is that in the mean thespectrum of the additive noise has been flattened.

Prewhitening is important in both detection and estimation applications.

5

1/

Page 40: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Leakage and Bias

Page 41: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Prewhitening in the least-squares estimation context:

Consider our standard linear model

y = Xθ + n,

which has a least-squares solution for the parameter vector

θ =�X†C−1

n X�−1

X†C−1n y,

where the covariance matrix of the noise vector n is

Cn = �nn†�.

This is also the maximum likelihood solution in the right circumstances (which are?).

As with any covariance matrix, Cn is Hermitian and positive, semi-definite. This means that the

quadratic form for an arbitrary vector z satisfies

z†Cnz ≥ 0.

Such matrices can always be factored according to the Cholesky decomposition:

Cn = LL†

where L is a lower-diagonal matrix; e.g.

L =

a 0 0 0b c 0 0d e f 0g h i j

.

6

See http://en.wikipedia.org/wiki/Cholesky_decomposition

Page 42: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Utility: we can transform the model as follows using L:

y = Lyw

X = LXw.

Substituting into the solution vector for θ and using

y† = (Lyw)† = y†

wL†, X† = (LXw)

† = X†wL

†, and C−1n = (LL†)−1 = L†−1

L−1

yields

θ =�X†C−1

n X�−1

X†C−1n y

= (X†w L

†C−1n L� �� �

≡I

Xw)−1X†

w L†C−1

n L� �� �≡I

y

=�X†

wXw

�−1X†

wy.

So what? The solution is identical to the least-squares case where the noise covariance matrix isdiagonal; i.e. the noise vector nw = L−1n has been transformed to white noise. We have whitenedthe data.

When is this useful? An example is the fitting of a sinusoidal function amid red noise whereleakage effects are important just as they are for spectral analysis. A specific example is the fittingof astrometric parameters or periodicities in radial velocity data.

What’s the catch? You need to know the covariance matrix of the noise n to do the Choleskydecomposition. This can be easier said than done!

7

Page 43: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Examples of sine wave + red and white noiseExamples were generated with a signal

y(t) = cos(2πt/P + φ) + r(t)/snrr + w(t)/snrw

where r, w have unit variance and are scaled by the signal to noise ratios snrr and snrw, respectively.

The covariance matrix for the combined noise n = r + w was calculated by averaging Cn = �nn†�over 1000 realizations.

Note that for some real situations where we have only a single time series, we would need tocalculate Cn differently, e.g. from first principles, prior knowledge, etc.

In practice, realizations of r were generated and the mean subtracted. Then white noise was addedto form n and then the Cholesky decomposition was done using the command

L = scipy.linalg.cholesky(Cn, lower=True)

For data vectors of length N , the lower-diagonal matrix L is N × N . If the mean had been sub-tracted from the white noise as well, the rank of the covariance matrix would be N − 1 and thedecomposition would fail.

Results in the following figures indicate that

1. Power-law red noise with spectral indices si <∼ 2 do not benefit particularly from whiteningbecause leakage is much less.

8

Page 44: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

2. What matters is the signal to noise ratio of the cosine to the signal contained in one resolutionbandwidth ∆f ∼ T−1 centered on the frequency of the sinusoid. For a steep power law, only asmall fraction of the total power in the red noise is in this band whereas the flatter the spectrum,the larger this fraction is.

9

Page 45: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

0 50 100 150 200 250−200

−150

−100

−50

0

50

100

150

200

Sig

nal+

Noi

se

Time Series

100 101 10210−6

10−5

10−4

10−3

10−2

10−1

100101102103

Spectra

0 50 100 150 200 250Time (bins)

−200

−150

−100

−50

0

50

100

150

200

Noi

seon

ly

100 101 102

Frequency (bins)

10−6

10−5

10−4

10−3

10−2

10−1

100101102103

Cholesky whitening: N =256 Sine+RN+WN Si = 1.0 S/Nr = 0.01 S/Nw = 1.00

Figure 1: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

10

Page 46: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

0 50 100 150 200 250−6

−4

−2

0

2

4

6

Sig

nal+

Noi

se

Time Series

100 101 10210−5

10−4

10−3

10−2

10−1

100Spectra

0 50 100 150 200 250Time (bins)

−4

−3

−2

−1

0

1

2

3

4

Noi

seon

ly

100 101 102

Frequency (bins)

10−6

10−5

10−4

10−3

10−2

10−1

Cholesky whitening: N =256 Sine+RN+WN Si = 1.0 S/Nr = 0.50 S/Nw = 1.00

Figure 2: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

11

Page 47: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

0 50 100 150 200 250−60

−40

−20

0

20

40

60

80

100

Sig

nal+

Noi

se

Time Series

100 101 10210−5

10−4

10−3

10−2

10−1

100101102103

Spectra

0 50 100 150 200 250Time (bins)

−60

−40

−20

0

20

40

60

80

100

Noi

seon

ly

100 101 102

Frequency (bins)

10−5

10−4

10−3

10−2

10−1

100101102103

Cholesky whitening: N =256 Sine+RN+WN Si = 2.0 S/Nr = 0.01 S/Nw = 1.00

Figure 3: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

12

Page 48: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

0 50 100 150 200 250−20

−15

−10

−5

0

5

10

15

Sig

nal+

Noi

se

Time Series

100 101 10210−6

10−5

10−4

10−3

10−2

10−1

100101102

Spectra

0 50 100 150 200 250Time (bins)

−20

−15

−10

−5

0

5

10

15

Noi

seon

ly

100 101 102

Frequency (bins)

10−5

10−4

10−3

10−2

10−1

100

101

102

Cholesky whitening: N =256 Sine+RN+WN Si = 2.0 S/Nr = 0.10 S/Nw = 1.00

Figure 4: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

13

Page 49: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

0 50 100 150 200 250−4

−3

−2

−1

0

1

2

3

4

5

Sig

nal+

Noi

se

Time Series

100 101 10210−6

10−5

10−4

10−3

10−2

10−1

100Spectra

0 50 100 150 200 250Time (bins)

−4

−3

−2

−1

0

1

2

3

4

5

Noi

seon

ly

100 101 102

Frequency (bins)

10−5

10−4

10−3

10−2

10−1

100

Cholesky whitening: N =256 Sine+RN+WN Si = 2.0 S/Nr = 0.50 S/Nw = 1.00

Figure 5: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

14

Page 50: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

0 50 100 150 200 250−50

−40

−30

−20

−10

0

10

20

30

40

Sig

nal+

Noi

se

Time Series

100 101 10210−5

10−4

10−3

10−2

10−1

100101102103

Spectra

0 50 100 150 200 250Time (bins)

−50

−40

−30

−20

−10

0

10

20

30

40

Noi

seon

ly

100 101 102

Frequency (bins)

10−5

10−4

10−3

10−2

10−1

100101102103

Cholesky whitening: N =256 Sine+RN+WN Si = 3.0 S/Nr = 0.01 S/Nw = 1.00

Figure 6: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

15

Page 51: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

0 50 100 150 200 250−150

−100

−50

0

50

100

150

Sig

nal+

Noi

se

Time Series

100 101 10210−5

10−4

10−3

10−2

10−1

100101102103104

Spectra

0 50 100 150 200 250Time (bins)

−150

−100

−50

0

50

100

150

Noi

seon

ly

100 101 102

Frequency (bins)

10−5

10−4

10−3

10−2

10−1

100101102103104

Cholesky whitening: N =256 Sine+RN+WN Si = 5.0 S/Nr = 0.01 S/Nw = 1.00

Figure 7: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

16

Page 52: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

0 50 100 150 200 250−15

−10

−5

0

5

10

15

Sig

nal+

Noi

se

Time Series

100 101 10210−6

10−5

10−4

10−3

10−2

10−1

100101102

Spectra

0 50 100 150 200 250Time (bins)

−15

−10

−5

0

5

10

15

Noi

seon

ly

100 101 102

Frequency (bins)

10−6

10−5

10−4

10−3

10−2

10−1

100101102

Cholesky whitening: N =256 Sine+RN+WN Si = 5.0 S/Nr = 0.10 S/Nw = 1.00

Figure 8: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

17

Page 53: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

0 50 100 150 200 250−6

−4

−2

0

2

4

6

Sig

nal+

Noi

se

Time Series

100 101 10210−6

10−5

10−4

10−3

10−2

10−1

100Spectra

0 50 100 150 200 250Time (bins)

−5

−4

−3

−2

−1

0

1

2

3

4

Noi

seon

ly

100 101 102

Frequency (bins)

10−6

10−5

10−4

10−3

10−2

10−1

Cholesky whitening: N =256 Sine+RN+WN Si = 5.0 S/Nr = 0.50 S/Nw = 1.00

Figure 9: Example of whitening using the Cholesky decomposition. The signal consists of a sine wave with period of 10.23 time bins withadditive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noise are given. Top left: original time series (red)and whitened time series (black). Bottom left: original noise (red) and whitened noise (black). Top right: power spectra of the original andwhitened time series. Bottom right: power spectra of original and whitened noise sequences.

18

Page 54: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Impulse Response and Spectrum of Whitening FilterWe can think of the Cholesky decomposition as a filter that suppresses low frequencies for thepurpose of estimating the parameters of a sinusoid. The filter response can be calculated from theimpulse response as follows:

Construct a data vector i corresponding to ij = 0 for all j except j = j0 where ij0 = 1.

Then the impulse response is h = L−1i. Then, expressed as a time function hj, j = 1, · · · , N , thefrequency-domain response is the squared magnitude of the DFT of hj:

Hk = |hk|2

19

Page 55: Notes Modeling2015.pdf on course web site (tbd) Chapter 11 ...cordes/A6523/A6523_2015_Lecture20-21.pdf · The first and second moments of k can be derived from the CF by taking first

Figure 10: Example of whitening using the Cholesky decomposition along with the impulse response and its spectrum. The signal consists ofa sine wave with period of 10.23 time bins with additive red and white noise. Signal-to-noise ratios of the signal relative to each kind of noiseare given. Left figure: Top left: original time series (red) and whitened time series (black). Bottom left: original noise (red) and whitenednoise (black). Top right: power spectra of the original and whitened time series. Bottom right: power spectra of original and whitened noisesequences. Right figure: Top panel: input impulse (red) and impulse response of the Cholesky filter. Bottom Panel: Spectra of the impulseand impulse response, respectively. The filter shows the suppression of frequencies below about 25 bins; this frequency is signal-to-noise ratiodependent.

20