a statistical application in astronomy: streaming …bodhi/talks/streamingmotionleo-i.pdfstreaming...
TRANSCRIPT
Streaming motion in Leo ISeparating signal from background
A statistical application in astronomy:Streaming motion in Leo I
Bodhisattva Sen
DPMMS, University of Cambridge, UKColumbia University, New York, [email protected]
ETH Zurich, Switzerland
31 May, 2012
Streaming motion in Leo ISeparating signal from background
Outline
1 Streaming motion in Leo IModelingThreshold models
2 Separating signal from backgroundMethodTheory
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
What is streaming motion?
Local Group dwarf spheroidal (dSph) galaxies: small, dim
Is Leo I in equilibrium or tidally disrupted by Milky Way?
Such a disruption can give rise to streaming motion: the leadingand trailing stars move away from the center of the main body ofthe perturbed system
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Data on 328 stars
(Y ,Σ): Line of sight velocity; std. dev. of meas. error(R,Θ): Projected position, orthogonal to line of sightContaminated with foreground stars (in the line of sight)
Magnitude of streaming motion
Likely to increase beyond a threshold radiusLikely to be aligned with the major axis of the system
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Statistical questions
Is streaming motion evident in Leo I?
If so, how can it be described and estimated?
To what extent can it be described by a threshold model?
Findings [Sen et al. (AoAS, 2009)]:
The magnitude of streaming motion appears to be modest,(nearly) significant at 5% level
Appears consistent with a threshold model
Difficult to identify the threshold radius precisely
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Outline
1 Streaming motion in Leo IModelingThreshold models
2 Separating signal from backgroundMethodTheory
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Yi = ν(Ri ,Θi ) + εi + δi ; εi ∼ N(0, σ2) independent of (R,Θ)
Measurement error: δi |Σi ∼ N(0,Σ2i )
Cosine Model: ν(r , θ) = ν + λ(r) cos θ; λ ↑
(ν, λ) = arg minv∈R,u↑
nXi=1
{Yi − v − u(Ri ) cos Θi}2
σ2 + Σ2i
σ2 =
1
n
nXi=1
[{Yi − ν − λ(Ri ) cos Θi}2 − Σ2
i ]
λ measures the effect of streaming; λ ↑
cos θ determines the deviation from the major axis
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
λ Simulated effect of streaming
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Is streaming evident?
Test λ = 0 using log-likelihood ratio statisticNo streaming: ν(r , θ) ≡ ν, a constantP-value ≈ 0.055 (using permutation test)
How much is streaming at radius r?
Need point-wise confidence intervals for λ
Result: n1/3{λ(r)− λ(r)} d→ ηC. Need to estimate η – tricky!
Need ways to by-pass the estimation of η.
Likelihood ratio (LR) based test or bootstrap methods
LR is pivotal and can be used to construct CI for λ(r) (Banerjeeand Wellner, AoS, 2001)
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Likelihood ratio based methodBanerjee and Wellner, AoS, 2001
Test H0 : λ(r) = ξ0
∆SSE(r, ξ0) = minv∈R,u(r)=ξ0,u↑
nXi=1
{Yi − v − u(Ri ) cos Θi}2
σ2 + Σ2i
− minv∈R,u↑
nXi=1
{Yi − v − u(Ri ) cos Θi}2
σ2 + Σ2i
Under H0, ∆SSE(r , ξ0)d→ D
D does not contain any nuisance parameters
Invert this sequence of hypotheses tests (by varying ξ0) to get aCI for λ(r)
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Bootstrap methods
Want to bootstrap n1/3{λ(r)− λ(r)}
Efron’s bootstrap fails
We claim that the bootstrap estimate does not have any weaklimit (Sen et al., AoS, 2010)
Smoothed bootstrap worksLR Bootstrap
0.90 0.95 0.90 0.95r0 L U CP L U CP L U L U λ400 0 3.54 .901 0 3.86 .952 0 3.57 0 3.57 1.92500 0.10 4.50 .882 0 5.02 .936 0 3.58 0 3.90 1.98600 0.26 6.66 .827 0 7.30 .897 0 3.30 0 3.64 1.98700 0.36 8.88 .913 0.05 9.56 .961 0 4.26 0 4.69 1.99750 1.85 8.88 .906 1.37 9.56 .952 0.44 7.86 0 8.37 5.37
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Outline
1 Streaming motion in Leo IModelingThreshold models
2 Separating signal from backgroundMethodTheory
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Threshold models
Goal: To determine a “threshold” in the domain of the functionwhere some “activity” takes place
Could either be a rapid change in the function value or adiscontinuity
Find the threshold radius in the Leo I
Model: ν(r , θ) = ν + λ(r) cos θ where λ(r) =
{0 0 ≤ r ≤ d0,
> 0 d0 < r ≤ 1.
Two approaches: change point versus split point
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Change point models
ν(r , θ) = ν + βψ(r − ρ) cos θ
-1.0 -0.5 0.0 0.5 1.0
0.00.2
0.40.6
0.81.0
x
-1.0 -0.5 0.0 0.5 1.00.0
0.20.4
0.60.8
1.0
x
ψ(x) = 1{x ≥ 0} ψ(x) = max(0, x)
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Change point models
ν(r , θ) = ν + βψ(r − ρ) cos θρ is the change-point
Minimize: SSE(v , β, r) =∑n
i=1{Yi−v−βψ(Ri−r) cos Θi}2
σ2+Σ2i
SSE(v , β, r)− SSE(v , β, r)
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Split point model
Do not assume specific form for λ(r); λ ↑
Find the closest stump to λ (under mis-specification)
κ(b, r) =∫ r
0 λ2(s)ds +
∫ τr [λ(s)− b]2ds
Population parameters: (β, γ) = arg minκ(b, r)
γ is the split point
Estimate λ by λ to estimate γ
Streaming motion in Leo ISeparating signal from background
ModelingThreshold models
Rescaled version of the criterion function
Asymptotics for the split point: n1/3-rate of convergence?
Streaming motion in Leo ISeparating signal from background
MethodTheory
Outline
1 Streaming motion in Leo IModelingThreshold models
2 Separating signal from backgroundMethodTheory
Streaming motion in Leo ISeparating signal from background
MethodTheory
Problem
Most astronomical data sets are polluted to some extent byforeground/background objects (“contaminants/noise”) that canbe difficult to distinguish from objects of interest(“member/signal”)
Contaminants may have the same apparent magnitudes, colors,and even velocities as signal stars
How do you separate out the “signal” stars?
We develop an algorithm for evaluating membership (estimatingparameters & probability of an object belonging to the signalpopulation)
Streaming motion in Leo ISeparating signal from background
MethodTheory
Example
Data on stars in nearby dwarf spheroidal (dSph) galaxiesData: (X1i ,X2i ,V3i , σi ,ΣMgi , ...)
Velocity samples suffer from contamination by foreground MilkyWay stars
Streaming motion in Leo ISeparating signal from background
MethodTheory
Approach
Our method is based on the Expectation-Maximization (EM)algorithm
We assign parametric distributions to the observables; derivedfrom the underlying physics in most cases
The EM algorithm provides estimates of the unknownparameters (mean velocity, velocity dispersion, etc.)
Also, probability of each star belonging to the signal population;see Walker et al. (2008)
Streaming motion in Leo ISeparating signal from background
MethodTheory
Streaming motion in Leo ISeparating signal from background
MethodTheory
A toy example
Suppose N ∼ Poisson(b + s) is the number of stars observed
s = rate for observing a signal star
b = the foreground rate
Given N = n, we have W1, . . . ,Wn ∼ fb,s where data{Wi = (X1i ,X2i ,V3i , σi )}n
i=1, and fb,s(w) = bfb(w)+sfs(w)b+s
We assume that fb and fs are parameterized (modeled by theunderlying physics) probability densities
Streaming motion in Leo ISeparating signal from background
MethodTheory
For galaxy stars
The stellar density (number of stars per unit area) fallsexponentially with radius, RThe distribution of velocity given position is assumed to benormal with mean µ and variance σ2 + σ2
i
For foreground stars
The density is uniform over the field of viewThe distribution of velocities V3i is independent of position(X1i ,X2i )
We adopt V3i from the Besancon Milky Way model (Robin et al.2003), which specifies velocity distributions of Milky Way starsalong a given line of sight
Streaming motion in Leo ISeparating signal from background
MethodTheory
Outline
1 Streaming motion in Leo IModelingThreshold models
2 Separating signal from backgroundMethodTheory
Streaming motion in Leo ISeparating signal from background
MethodTheory
ModelSuppose N ∼ Poisson(b + s) is the number of stars observeds = rate for observing a signal starb = the foreground rate
Given N = n, we have W1, . . . ,Wn ∼ fb,s where data{Wi = (X1i ,X2i ,V3i , σi ,ΣMgi , . . .)}n
i=1, and fb,s(w) = bfb(w)+sfs(w)b+s
We assume that fb and fs are parameterized (modeled by theunderlying physics) probability densities; β = (s,b, µ, σ2, . . .)
We would ideally like to maximize the likelihood
L(β) =N∏
i=1
{bfb(Wi ) + sfs(Wi )
b + s
}(Difficult!)
Streaming motion in Leo ISeparating signal from background
MethodTheory
The LikelihoodLet Yi be the indicator of a foreground star, i.e., Yi = 1 if the i ’thstar is a foreground star, and Yi = 0 otherwise
Note that Yi ’s are i.i.d. Bernoulli( bb+s )
Let Z = (W,Y,N) be the complete data [whereW = (W1,W2, . . . ,Wn) and Y = (Y1,Y2, . . . ,Yn)]
The likelihood for the complete data can be written as
LC(β) = e−(b+s) (b + s)N
N!
N∏i=1
{b fb(Wi )
b + s
}Yi{
s fs(Wi )
b + s
}1−Yi
Log-likelihood: lC(β)
Streaming motion in Leo ISeparating signal from background
MethodTheory
Algorithm
Start with some initial estimates of the parameter β [in our simpleexample β = (s,b, µ, σ2, . . .)]
E-step:
Evaluates the expectation of the log-likelihood given theobserved data under the current estimates of the unknownparametersEvaluate Q(β, βn) = Ebβn
[lC(β)|W,N]
M-step:
Maximizes the expectation Q(β, βn) with respect to β
Iterate until the estimates stabilize (which is guaranteed!)
Streaming motion in Leo ISeparating signal from background
MethodTheory
The output of the algorithm
We obtain estimates of the unknown parameters βn
Can also be used to give estimated probabilities that the i ’th staris a signal
pmem(i) = P{Yi = 0|Data} =sfs(Wi )
sfs(Wi ) + bfb(Wi )
These probabilities can later be used as weights
Example: (ν, λ) = arg minv∈R,u↑∑n
i=1{Yi−v−u(Ri ) cos Θi}2
σ2+Σ2i
pmem(i)
Streaming motion in Leo ISeparating signal from background
MethodTheory
Summary
Developed flexible methods to detect streaming motion in dSph
Methods rely on monotone function estimation
Change-point models versus split-point models
Foreground contamination addressed using a version of the EMalgorithm for finite mixture models
Can get estimated probability that the i ’th star is a signal, whichcan later be used as weights
Various variants of the EM algorithm can be used to fit morecomplex models
Thank you!
Questions?
Streaming motion in Leo ISeparating signal from background
MethodTheory
ReferencesBanerjee, M. & Wellner, J. (2001). AoS, 29, 1699- 1731.Cule, M., et al. (2010). JRSS-B, 72, 545-600.Sen, B., et al. (2010). AoS, 38, 1953-1977.Sen, B., et al. (2009). AoAS, 3, 96-116.Walker, M., et al. (2008). Astronomical Journal, 137, 3109.Robin, A. C., et al. (2003). Astron. Astrophys., 409, 523.
Streaming motion in Leo ISeparating signal from background
MethodTheory
Scenario I
Introduce a non-parametric componentVelocity dispersion was assumed constant; now can model it asa function of projected radius RNeeds a tuning parameter to find σ(r)
Streaming motion in Leo ISeparating signal from background
MethodTheory
Scenario II
Do not assume exponential density profileAssume that as you move from the center of the galaxy, thechance of observing a signal star decreases
Streaming motion in Leo ISeparating signal from background
MethodTheory
A further extension [Cule et al. (JRSS-B, 2010)]
Mixture model f (x) =∑k
j=1 πj fj (x);∑k
j=1 πj = 1Model f1, f2, . . . , fk as log-concave densities on Rp
Maximize∏n
i=1{π1f1(Xi ) + π2f2(X1) + . . .+ πk fk (Xi )} over πi ’s andfi ’s log-concaveNo tuning parameter required – completely non-parametric