a statistical application in astronomy: streaming …bodhi/talks/streamingmotionleo-i.pdfstreaming...

34
Streaming motion in Leo I Separating signal from background A statistical application in astronomy: Streaming motion in Leo I Bodhisattva Sen DPMMS, University of Cambridge, UK Columbia University, New York, USA [email protected] ETH Zurich, Switzerland 31 May, 2012

Upload: others

Post on 14-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

A statistical application in astronomy:Streaming motion in Leo I

Bodhisattva Sen

DPMMS, University of Cambridge, UKColumbia University, New York, [email protected]

ETH Zurich, Switzerland

31 May, 2012

Page 2: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

Outline

1 Streaming motion in Leo IModelingThreshold models

2 Separating signal from backgroundMethodTheory

Page 3: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

What is streaming motion?

Local Group dwarf spheroidal (dSph) galaxies: small, dim

Is Leo I in equilibrium or tidally disrupted by Milky Way?

Such a disruption can give rise to streaming motion: the leadingand trailing stars move away from the center of the main body ofthe perturbed system

Page 4: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Data on 328 stars

(Y ,Σ): Line of sight velocity; std. dev. of meas. error(R,Θ): Projected position, orthogonal to line of sightContaminated with foreground stars (in the line of sight)

Magnitude of streaming motion

Likely to increase beyond a threshold radiusLikely to be aligned with the major axis of the system

Page 5: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Statistical questions

Is streaming motion evident in Leo I?

If so, how can it be described and estimated?

To what extent can it be described by a threshold model?

Findings [Sen et al. (AoAS, 2009)]:

The magnitude of streaming motion appears to be modest,(nearly) significant at 5% level

Appears consistent with a threshold model

Difficult to identify the threshold radius precisely

Page 6: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Outline

1 Streaming motion in Leo IModelingThreshold models

2 Separating signal from backgroundMethodTheory

Page 7: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Yi = ν(Ri ,Θi ) + εi + δi ; εi ∼ N(0, σ2) independent of (R,Θ)

Measurement error: δi |Σi ∼ N(0,Σ2i )

Cosine Model: ν(r , θ) = ν + λ(r) cos θ; λ ↑

(ν, λ) = arg minv∈R,u↑

nXi=1

{Yi − v − u(Ri ) cos Θi}2

σ2 + Σ2i

σ2 =

1

n

nXi=1

[{Yi − ν − λ(Ri ) cos Θi}2 − Σ2

i ]

λ measures the effect of streaming; λ ↑

cos θ determines the deviation from the major axis

Page 8: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

λ Simulated effect of streaming

Page 9: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Is streaming evident?

Test λ = 0 using log-likelihood ratio statisticNo streaming: ν(r , θ) ≡ ν, a constantP-value ≈ 0.055 (using permutation test)

How much is streaming at radius r?

Need point-wise confidence intervals for λ

Result: n1/3{λ(r)− λ(r)} d→ ηC. Need to estimate η – tricky!

Need ways to by-pass the estimation of η.

Likelihood ratio (LR) based test or bootstrap methods

LR is pivotal and can be used to construct CI for λ(r) (Banerjeeand Wellner, AoS, 2001)

Page 10: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Likelihood ratio based methodBanerjee and Wellner, AoS, 2001

Test H0 : λ(r) = ξ0

∆SSE(r, ξ0) = minv∈R,u(r)=ξ0,u↑

nXi=1

{Yi − v − u(Ri ) cos Θi}2

σ2 + Σ2i

− minv∈R,u↑

nXi=1

{Yi − v − u(Ri ) cos Θi}2

σ2 + Σ2i

Under H0, ∆SSE(r , ξ0)d→ D

D does not contain any nuisance parameters

Invert this sequence of hypotheses tests (by varying ξ0) to get aCI for λ(r)

Page 11: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Bootstrap methods

Want to bootstrap n1/3{λ(r)− λ(r)}

Efron’s bootstrap fails

We claim that the bootstrap estimate does not have any weaklimit (Sen et al., AoS, 2010)

Smoothed bootstrap worksLR Bootstrap

0.90 0.95 0.90 0.95r0 L U CP L U CP L U L U λ400 0 3.54 .901 0 3.86 .952 0 3.57 0 3.57 1.92500 0.10 4.50 .882 0 5.02 .936 0 3.58 0 3.90 1.98600 0.26 6.66 .827 0 7.30 .897 0 3.30 0 3.64 1.98700 0.36 8.88 .913 0.05 9.56 .961 0 4.26 0 4.69 1.99750 1.85 8.88 .906 1.37 9.56 .952 0.44 7.86 0 8.37 5.37

Page 12: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Outline

1 Streaming motion in Leo IModelingThreshold models

2 Separating signal from backgroundMethodTheory

Page 13: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Threshold models

Goal: To determine a “threshold” in the domain of the functionwhere some “activity” takes place

Could either be a rapid change in the function value or adiscontinuity

Find the threshold radius in the Leo I

Model: ν(r , θ) = ν + λ(r) cos θ where λ(r) =

{0 0 ≤ r ≤ d0,

> 0 d0 < r ≤ 1.

Two approaches: change point versus split point

Page 14: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Change point models

ν(r , θ) = ν + βψ(r − ρ) cos θ

-1.0 -0.5 0.0 0.5 1.0

0.00.2

0.40.6

0.81.0

x

-1.0 -0.5 0.0 0.5 1.00.0

0.20.4

0.60.8

1.0

x

ψ(x) = 1{x ≥ 0} ψ(x) = max(0, x)

Page 15: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Change point models

ν(r , θ) = ν + βψ(r − ρ) cos θρ is the change-point

Minimize: SSE(v , β, r) =∑n

i=1{Yi−v−βψ(Ri−r) cos Θi}2

σ2+Σ2i

SSE(v , β, r)− SSE(v , β, r)

Page 16: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Split point model

Do not assume specific form for λ(r); λ ↑

Find the closest stump to λ (under mis-specification)

κ(b, r) =∫ r

0 λ2(s)ds +

∫ τr [λ(s)− b]2ds

Population parameters: (β, γ) = arg minκ(b, r)

γ is the split point

Estimate λ by λ to estimate γ

Page 17: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

ModelingThreshold models

Rescaled version of the criterion function

Asymptotics for the split point: n1/3-rate of convergence?

Page 18: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

Outline

1 Streaming motion in Leo IModelingThreshold models

2 Separating signal from backgroundMethodTheory

Page 19: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

Problem

Most astronomical data sets are polluted to some extent byforeground/background objects (“contaminants/noise”) that canbe difficult to distinguish from objects of interest(“member/signal”)

Contaminants may have the same apparent magnitudes, colors,and even velocities as signal stars

How do you separate out the “signal” stars?

We develop an algorithm for evaluating membership (estimatingparameters & probability of an object belonging to the signalpopulation)

Page 20: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

Example

Data on stars in nearby dwarf spheroidal (dSph) galaxiesData: (X1i ,X2i ,V3i , σi ,ΣMgi , ...)

Velocity samples suffer from contamination by foreground MilkyWay stars

Page 21: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

Approach

Our method is based on the Expectation-Maximization (EM)algorithm

We assign parametric distributions to the observables; derivedfrom the underlying physics in most cases

The EM algorithm provides estimates of the unknownparameters (mean velocity, velocity dispersion, etc.)

Also, probability of each star belonging to the signal population;see Walker et al. (2008)

Page 22: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

Page 23: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

A toy example

Suppose N ∼ Poisson(b + s) is the number of stars observed

s = rate for observing a signal star

b = the foreground rate

Given N = n, we have W1, . . . ,Wn ∼ fb,s where data{Wi = (X1i ,X2i ,V3i , σi )}n

i=1, and fb,s(w) = bfb(w)+sfs(w)b+s

We assume that fb and fs are parameterized (modeled by theunderlying physics) probability densities

Page 24: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

For galaxy stars

The stellar density (number of stars per unit area) fallsexponentially with radius, RThe distribution of velocity given position is assumed to benormal with mean µ and variance σ2 + σ2

i

For foreground stars

The density is uniform over the field of viewThe distribution of velocities V3i is independent of position(X1i ,X2i )

We adopt V3i from the Besancon Milky Way model (Robin et al.2003), which specifies velocity distributions of Milky Way starsalong a given line of sight

Page 25: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

Outline

1 Streaming motion in Leo IModelingThreshold models

2 Separating signal from backgroundMethodTheory

Page 26: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

ModelSuppose N ∼ Poisson(b + s) is the number of stars observeds = rate for observing a signal starb = the foreground rate

Given N = n, we have W1, . . . ,Wn ∼ fb,s where data{Wi = (X1i ,X2i ,V3i , σi ,ΣMgi , . . .)}n

i=1, and fb,s(w) = bfb(w)+sfs(w)b+s

We assume that fb and fs are parameterized (modeled by theunderlying physics) probability densities; β = (s,b, µ, σ2, . . .)

We would ideally like to maximize the likelihood

L(β) =N∏

i=1

{bfb(Wi ) + sfs(Wi )

b + s

}(Difficult!)

Page 27: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

The LikelihoodLet Yi be the indicator of a foreground star, i.e., Yi = 1 if the i ’thstar is a foreground star, and Yi = 0 otherwise

Note that Yi ’s are i.i.d. Bernoulli( bb+s )

Let Z = (W,Y,N) be the complete data [whereW = (W1,W2, . . . ,Wn) and Y = (Y1,Y2, . . . ,Yn)]

The likelihood for the complete data can be written as

LC(β) = e−(b+s) (b + s)N

N!

N∏i=1

{b fb(Wi )

b + s

}Yi{

s fs(Wi )

b + s

}1−Yi

Log-likelihood: lC(β)

Page 28: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

Algorithm

Start with some initial estimates of the parameter β [in our simpleexample β = (s,b, µ, σ2, . . .)]

E-step:

Evaluates the expectation of the log-likelihood given theobserved data under the current estimates of the unknownparametersEvaluate Q(β, βn) = Ebβn

[lC(β)|W,N]

M-step:

Maximizes the expectation Q(β, βn) with respect to β

Iterate until the estimates stabilize (which is guaranteed!)

Page 29: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

The output of the algorithm

We obtain estimates of the unknown parameters βn

Can also be used to give estimated probabilities that the i ’th staris a signal

pmem(i) = P{Yi = 0|Data} =sfs(Wi )

sfs(Wi ) + bfb(Wi )

These probabilities can later be used as weights

Example: (ν, λ) = arg minv∈R,u↑∑n

i=1{Yi−v−u(Ri ) cos Θi}2

σ2+Σ2i

pmem(i)

Page 30: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

Summary

Developed flexible methods to detect streaming motion in dSph

Methods rely on monotone function estimation

Change-point models versus split-point models

Foreground contamination addressed using a version of the EMalgorithm for finite mixture models

Can get estimated probability that the i ’th star is a signal, whichcan later be used as weights

Various variants of the EM algorithm can be used to fit morecomplex models

Thank you!

Questions?

Page 31: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

ReferencesBanerjee, M. & Wellner, J. (2001). AoS, 29, 1699- 1731.Cule, M., et al. (2010). JRSS-B, 72, 545-600.Sen, B., et al. (2010). AoS, 38, 1953-1977.Sen, B., et al. (2009). AoAS, 3, 96-116.Walker, M., et al. (2008). Astronomical Journal, 137, 3109.Robin, A. C., et al. (2003). Astron. Astrophys., 409, 523.

Page 32: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

Scenario I

Introduce a non-parametric componentVelocity dispersion was assumed constant; now can model it asa function of projected radius RNeeds a tuning parameter to find σ(r)

Page 33: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

Scenario II

Do not assume exponential density profileAssume that as you move from the center of the galaxy, thechance of observing a signal star decreases

Page 34: A statistical application in astronomy: Streaming …bodhi/Talks/StreamingMotionLeo-I.pdfStreaming motion in Leo I Separating signal from background A statistical application in astronomy:

Streaming motion in Leo ISeparating signal from background

MethodTheory

A further extension [Cule et al. (JRSS-B, 2010)]

Mixture model f (x) =∑k

j=1 πj fj (x);∑k

j=1 πj = 1Model f1, f2, . . . , fk as log-concave densities on Rp

Maximize∏n

i=1{π1f1(Xi ) + π2f2(X1) + . . .+ πk fk (Xi )} over πi ’s andfi ’s log-concaveNo tuning parameter required – completely non-parametric