surveys of cosmic populations: statistical issues...

43
Surveys of cosmic populations: Statistical issues Eddington, Malmquist, Lutz-Kelker and all that Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/ Cosmic Populations @ CASt — 9–10 June 2014 1 / 42

Upload: dangtuyen

Post on 22-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Surveys of cosmic populations:Statistical issues

Eddington, Malmquist, Lutz-Kelker and all that

Tom LoredoDept. of Astronomy, Cornell University

http://www.astro.cornell.edu/staff/loredo/

Cosmic Populations @ CASt — 9–10 June 2014

1 / 42

We Survey Everything!

GRBs

Lunar Craters TNOs

Stars & Galaxies

Solar Flares

2 / 42

Size-Frequency Distributions

log(N)–log(S) curves, number counts, number-size dist’ns. . .

Lunar Craters TNOs

GRBsQuasars

Solar Flares

3 / 42

Intrinsic Size-Frequency Distributions

Craters, Solar flares, asteroids

Known source distance → convert apparent sizes to intrinsic sizes

Lunar Craters Solar Flares

Note approximate power law behavior

4 / 42

Apparent Size-Frequency DistributionsTNOs, star counts, galaxy counts, GRBs. . .

Distance unknown → SFD is “projection” of intrinsic dist’n

F ∝ L/d2 (stars, galaxies)

∝ D2/(d2⊙d

2⊕) (minor planets)

∼ 1200 GRBs from 4th BATSE Survey

5 / 42

Diverse Distributions

Basic: Directions and Fluxes

Peak fluxes and directions of GRBs from 4B catalog

6 / 42

Directions, Fluxes and Indicators

Luminosity & Distance Distributions

104 Galaxies from Millennium Galaxy Catalogue

7 / 42

High-dimensional

Exoplanet properties

Unadjusted scatterplots from Open Exoplanet Catalogue

8 / 42

Complications

• Selection effects (truncation, censoring) — obvious (usually)

9 / 42

Complications

• “Scatter” effects (measurement error, etc.) — insidious

10 / 42

Two classes of surveys

Blind (ab initio discovery) surveysSurvey a region and attempt to find sources using only thenew data

• Most large-scale sky surveys• GRB surveys• . . .

Targeted (follow-up/counterpart) surveysSurvey a known population in a new regime

• Multi-λ surveys• Variability surveys• SN surveys• Exoplanet discovery via RV, transit observations• . . .

Focus here on blind surveys aiming to learn luminosity distributions

See Feigelson’s talk for discussion of selection effects in targeted surveys

11 / 42

Surveying and “Un-surveying”

F

z

Observables Measurements CatalogPopulation

SelectionObservationMapping

Space: !" # # #

= precise = uncertain

L

r

Indicator scatter & Transformation bias

Measurement Error

Truncation & Censoring

F

z

F

⇐ Inference goes this way!

• This lecture: Understanding the forward process

• Later lectures: How to go backward

12 / 42

Statistical issues for surveys

1 Observables, desirables & integralsBrightness distributionsDistance–luminosity distributions

2 Cornucopia of complicationsSelection effectsScatter distortions

3 Statistical methodology — A glimpse

13 / 42

Statistical issues for surveys

1 Observables, desirables & integralsBrightness distributionsDistance–luminosity distributions

2 Cornucopia of complicationsSelection effectsScatter distortions

3 Statistical methodology — A glimpse

14 / 42

“You can’t always get what you want”What we want

• The distribution of sources in space: number density

n(r) = n(r ,Ω) at distance r , direction Ω

= n(r) (isotropic case)

An intensity function for a (possibly non-homogeneous)Poisson point process:

p(object in dV ) = n(r)dV

and for disjoint regions V1 and V2,

p(object inV1|object inV2, n) = p(object inV1|n)

⇒ p(N objects inV |n) =µn

n!e−µ, µ =

V

n(r)dr

15 / 42

• The distribution of source luminosities: luminosity distribution

fL(L; r ) = pdf for L given r

= δ(L− L0) — “standard candles”

= fL(L) — “universal”

= fL(L; r) — “evolution” (isotropic)

• The luminosity function: Ψ(r , L) = n(r)fL(L; r )This defines a marked point process

Terminology & notation not uniform in the literature E.g., fL sometimes

called “luminosity function”

16 / 42

What we getThe primary observables are direction, Ω, and flux

(energy/unit time/unit area):

F =L

4πr2← the “root of all evil!”

Note this conflates r and L!

Further complications (ignored here!)

• Passbands/k-corrections

• Extinction

• Reflective sources (F ∝ 1/r4)

• Transient/variable sources

• . . .

17 / 42

Example: Quasar Optical Luminosity Function

Magnitude Dist n Redshifts

2QZ Magnitudes, directions, & redshifts for > 25,000 QSOs with B < 21

18 / 42

Evol n of Luminosity Function

Boyle+ (2000)

19 / 42

Cosmology

Geometry of space-time alters inverse-square law (redshift, timedilation) and volume element in spatial integrals:

Flux in band b

F b =Lbol

4πr2C1(r , χ, α)

Cosmo params H0, Ωm, ΩΛ

Spectral parameters

dV = r2dr dΩC2(r , χ)

Redshift z (observable!) usually used as a proxy for r via Hubble’slaw (at low z):

z =λ− λ0

λ0=

vHub + vpec

c

→ cz = H0r + vpec

For z>∼1, must use the full luminosity distance-redshift law fromGR; depends on H0, Ωm, ΩΛ. . .

20 / 42

What We Really Want

Physics works in phase space: positions and velocities.

We’d really like to infer ρ(r , v ) — all the issues of inferring n(r),plus issues from imperfect measurement of v .

Applications:

• Stars in Milky Way — Galactic dynamics, dark matter

• Stars in nearby dwarf galaxies — dark matter on small scales

• Galaxy proper motions — large scale structure

We’ll focus on position + luminosity; many issues we’ll discuss areonly more important for upcoming peculiar velocity and parallaxsurveys (6dF, Gaia).

21 / 42

Diverse UnitsRadio, x-ray & γ-ray surveys, and surveys of other quanta (cosmicrays, grav’l radiation, neutrinos) use energy units directly (L, F ).

Optical & IR surveys use absolute magnitude M instead of L:

M ≡ −2.5 log10L

Lfid, fM(M; r) = pdf for M given r

and apparent magnitude m instead of F :

m ≡ −2.5 log10F

Ffid= −2.5 log10

L

4πr2Ffid= M + µ

with distance modulus µ instead of r

µ = 5 log10r

10pc(stars)

= 5 log10r

Mpc+ 25 (galaxies)

22 / 42

The Three-Halves (or Five-Halves) LawAssumptions

• Euclidean space: F = L

4πr2

• Homogeneous and isotropic: n(r ) = n0

• Standard candles: fL(L; r) = δ(L− L0)

Flux distribution

A precise flux measurement → r(F ) =(

L04πF

)1/2

# with flux > F = # closer than r(F )

N>(F ) =4π

3[r(F )]3n0

∝ F−3/2

Differential distribution (surf. dens. per unit flux & steradian):

Σ(F ) = −1

dN>

dF∝ F−5/2

23 / 42

Generalizing:

Fundamental Equation of Stellar StatisticsΣ = density of sources wrt observables (direction, flux ormagnitude, . . . )

p(F in dF ,Ω in dΩ) = Σ(F ,Ω)dFdΩ

[Σ] = #/(unit sr × unit flux) or #/(sqr degree × unit mag). . .

In spherical coordinates (r , θ, φ), volume element is dV = r2drdΩ

with dΩ = sin θdθdφ

Use law of total probability to calclulate Σ(F ,Ω) from luminosityfunction:

Σ(F ,Ω) =

dr r2∫

dL p(r ,Ω, L,F )

=

dr r2∫

dL p(r ,Ω, L) p(F |r , L)

=

dr

dL r2 n(r) fL(L; r ) δ

(

F −L

4πr2

)

24 / 42

Flux and magnitude versions

Σ(F ,Ω) =

dr

dL r2 n(r) fL(L; r ) δ

(

F −L

4πr2

)

= 4π

dr r4 n(r) fL(4πr2F ; r)

Σ(m,Ω) =

dr

dM r2 n(r) fM(M; r ) δ [m − (M + µ)]

=

dr r2 n(r) fM(m − µ(r); r)

If either the density or luminosity function is known, and if Σ isaccurately measured, this is a Fredholm integral equation.

But Σ is sampled (incompletely), often with measurement error, andusually both n and f are uncertain.

25 / 42

Visualizing the IntegralLuminosity: M ∼ Norm(−21, .32)Density: Uniform to r = 100; linear dropCurve has constant m = 13.25 ± 0.25

26 / 42

Indicators

Indicators are additional observables, σ, that help make r and L

identifiable ⇒ unravel the integral.

Two classes:

• Direct: Knowing σ → knowing either r or L

• Stochastic: r or L are correlated with σ

Several types (usually all called “distance indicators”):

• Distance indicators: p(r |σ)

• Luminosity indicators: p(L|σ)

• Size indicators: p(D|σ)→ r via geometry

27 / 42

Direct Distance Indicator: Parallax

Distant objects

Target object

Earth's orbital motion

1 AU

Target apparent

motionParallax

angle

π

r

Parallax directly measures the distance to nearby

stars:

tanπ =1AU

r

→ ≡π

1 arcsec≈

1pc

r

r =1pc

“Parallax” is sometimes used as a synonym fordistance.

Similar geometric considerations→ orbits of mi-nor planets.

28 / 42

Direct Distance Indicator: Redshift

Redshift z lets you infer r via Hubble’s law (at low z):

z =λ− λ0

λ0=

vHub + vpec

c

→ cz = H0r + vpec

For z>∼1, must use the full luminosity distance-redshift law fromGR; depends on H0, Ωm, ΩΛ. . .

Complications:

• Peculiar velocity → “scatter” at low z

• Dependence on (uncertain) cosmology

• When inferred indirectly (“photo-z”) uncertainties may belarge/complex

29 / 42

Stochastic Luminosity Indicators

Measure a source property σ allowing statistical inference of L via:

p(L|σ) = gσ(L), hopefully narrow (<∼30% is good!)

Examples:

σ = Color/spectral type of star (H-R diagram, “photometric parallax”)

= Period & color of periodic variable star (Period-luminosity rel’n)

= Asymptotic rot’n velocity of spiral galaxy (Tully-Fisher)

= Velocity dispersion & angular size of elliptical galaxy

(Fundamental plane)

= Shape & color of SN Ia light curve

. . .

Can consider measurement of σ → estimate of L with“measurement error” (real σ msmt. error (noise) may compound this)

30 / 42

Stellar Luminosity Indicators

Hipparcos H-R Diagram Variable Star Period-Lum. Rel’n

31 / 42

Galaxy Luminosity Indicators

Tully-Fisher Fundamental Plane

32 / 42

Inferential Goals

• Estimate shape of Σ(m) (no aux. info)

• Estimate characteristics r , L for each object

• Estimate gσ(L) (“calibration”)

• Estimate n(r) with gσ(L) known

• Estimate fL(L) for entire population

• Detect/estimate evolution, fL(L; r )

• Jointly estimate n(r), fL(L; r )

• Estimate cosmological parameters (n and fL are nuisances)

• . . .

33 / 42

Statistical issues for surveys

1 Observables, desirables & integralsBrightness distributionsDistance–luminosity distributions

2 Cornucopia of complicationsSelection effectsScatter distortions

3 Statistical methodology — A glimpse

34 / 42

Scatter Biases in Univariate Distributions

“Eddington Bias”

n

r

n

r

r Uncertainty

^

“A series of quantities are measured and classified in equal ranges.

Each measure has a known uncertainty. On account of the errors of

measurement some quantities are put into the wrong ranges. If the

true number in a range is greater than those in the adjacent ranges,

we should expect more observations to be scattered out of the range

than into it, so that the observed number will need a positive

correction.” (Jeffreys 1938)

35 / 42

Luminosity Calibration via Parallax“Lutz-Kelker Bias”

Source msmt. complications

• Parallax error (λ ≡ σ/)• Flux error• Transformation bias ( → r)• Density law (prior)

Selection complications

• Magnitude (flux)truncation/thinning

• Parallax censoring• Usually “soft” (random)

36 / 42

Distance Estimation via Luminosity Indicator

“Malmquist Bias”

Source msmt. complications

• Indicator scatter• Transformation bias• Flux error

Selection complications

• Magnitude (flux)truncation/thinning

• Usually “soft” (random)

37 / 42

Distance Errors Due to Indicator Scatter

Average true radius r ofsources assigned r

0 20 40 60 80 100 120 140 160

r

-0.2

-0.1

0.0

0.1

0.2

r/rδ

0 20 40 60 80 100 120 140 1600

20

40

60

80

100

120

140

160

r

38 / 42

Statistical issues for surveys

1 Observables, desirables & integralsBrightness distributionsDistance–luminosity distributions

2 Cornucopia of complicationsSelection effectsScatter distortions

3 Statistical methodology — A glimpse

39 / 42

Analyzing Surveys: Two Classes of Methods

F

z

Observables Measurements CatalogPopulation

SelectionObservationMapping

Space: !" # # #

= precise = uncertain

L

r

Indicator scatter & Transformation bias

Measurement Error

Truncation & Censoring

F

z

F

Inverse methods• Try to “correct” or “debias” data via adjustments/weights

• Focus on moments & empirical dist’n function (EDF)

Forward modeling methods• Try to predict data by applying survey process to model

• Focus on likelihood

(Analogous to “design-based” vs. “model-based” methods in survey sampling)

40 / 42

Seminal Work

Eddington (1913, 1940) & Jeffreys (1938)• Measurement error in univariate dist’ns (“density

deconvolution/demixing”)

• Adjusted EDF/estimates vs. likelihood (Eddington vs.Jeffreys)

Malmquist (1920)• Correct (r , L) dist’ns for truncation by adjusting moment

estimators, assuming uniform n and Gaussian Φ(M)

Lutz-Kelker (1973)• Correct parallax distances for scatter by adjusting moment

estimators, assuming uniform n and Gaussian Φ(M)

41 / 42

Recent Developments & Open Issues

Brightness distributions

X Parametric modelling (viamarginalizing latentvariables)

X Nonparametric estimateswith truncation/censoring;no meas. error

× Nonparametric estimateswith scatter andtruncation

(r , L) distributions

X Parametric f (L) givenn(r), or n(r) given f (L)

X As above, nonparametricwith no meas. error

× Everything else (includingext. to phase space)!

42 / 42

Recap of Key Ideas

• Blind vs. targeted surveys

• Intrinsic characteristics, observables, indicators

• Selection effects: truncation, censoring

• Scatter distortions: Eddington, Lutz-Kelker, Malmquist

• Inverse square law and mapping effects

• 3/2 or 5/2 law; fundamental (integral) equation

43 / 42