photometric redshifts with poisson-only noise christian wolf oxford physics edinburgh - 6 may 2009
TRANSCRIPT
Photometric Redshifts with Poisson-only Noise
Christian Wolf
Oxford Physics
Edinburgh - 6 May 2009
Talk Outline
I. Why Photo-z’s?
II. State of the Art
III. Future challenges
IV. The 2-empirical approach
V. Persistent photo-z issues
VI. 2-test: noisy model
VII. The PHAISE proposal
I. Why Photo-z’s?
z < 0.01 Stebbins & Whitford, AJ, 1948
Photography is deeper than spectroscopy!
Baum 1957
Baum 1962, IAU Symp. 15
Photo-Zeeing in the 80’s/90’s
• Koo 1985 z/(1+z) ~ 0.04 @ z < 0.6– UBVI photographic plates
• Loh & Spillar 1986– Degrading all filter images
to worst seeing important– Star and galaxy library– 10% outliers
• Half zphot wrong– Insufficiently blue
templates– Photometric blends
• Half zspec wrong– Blends– Single-line detections
• Connolly et al. 1995– 4-D space (z,U-B,B-R,R-I)
• Distribution has Df = 1.8
– Colour ‘plane’ at z = [0,0.4] rotation after one filter
– Step-wise quadratic fits z/(1+z) < 0.04 @ z < 0.8
Redshift Errors & Resolution
• Objects at different redshifts
• Filterset 's fixed
z = 0.843
z = 1.958
z = 2.828
€
Δz =Δλ
λ rest=
Δλ
λ /(1+ z)
400 nm 1000 nm
flux /
qef
G2 star vs.QSO z=3
€
Δ
€
€
⇒Δz1+ z
=Δλ
λ
II. State of the Art: Medium-band SEDs
rms 0.0087%-20%
outliers
rms 0.008(R<21)1 outlier
QSOsat z~2.8
R=23.8R=22.9R=21.5
R=20 R=22R=23.7
Galaxies at z~0.45
State of the Art: ugriz-only
Collister & Lahav 2004
ANN 2 template
~4% outliers z/(1+z)>0.1rms z/(1+z) = 0.042
Bias -0.017
~0% outliers z/(1+z)>0.1rms z/(1+z) = 0.023
Bias ~0.00
ANN easy at z<1: no ambiguities… but wait for future data!
III. Future Challenges
• Catastrophic failures & misclassifications
• Large z errors
• Mean z bias
• Unrealistic z errors
Origin of Challenges
• Catastrophic failures & misclassifications
• Large z errors
• Mean z bias
• Unrealistic z errors
• Model ambiguities in colour space
• PDF too unconstrained
• PDF wrong
• Mismatch between data and model
Common Fixes
• Catastrophic failures & misclassifications
• Large z errors
• Mean z bias
• Unrealistic z errors
• Model ambiguities in colour space
• PDF too unconstrained
• PDF wrong
• Mismatch between data and model
Add priors
Add more data
Repair modelsUse templateerror function
Why do These Matter?
• Super-large photo-z surveys for cosmology– Now: PanStarrs, DES
– 2015++: LSST, IDEM
• Redshift bias from – Model:data calibration
– Catastrophic outliers |Δz| ≈ |Δzoutlier| outlier
• Kitching, Taylor & Heavens: Δw ≈ 5Δz (3D cosmic shear)
Δz = 0.01 unacceptable 1% outliers unacceptable
• Even spectroscopic surveys– May have 1% wrong z’s
– Have incompleteness, i.e. more undiscovered outliers
http://www.astro.uni-bonn.de/~hendrik/PHAT/index.html
Back to The Principles: Overview
Farb-bibliothek
Schätzer/Klassifikator
result
model data
estimator
spectral energy
distribution
PDF: p(z)
empirical dataor
external template
2-fittingartificial neural netlearning algorithms
Back to The Principles: Overview
Farb-bibliothek
Schätzer/Klassifikator
result
model data
estimator
spectral energy
distribution
PDF: p(z)
empirical dataor
external template
2-fittingartificial neural netlearning algorithms
Frequentist precision statistics:= “Using what IS there: N(z)!”
Bayesian frontier exploration:= “What do we (not) know: p(z)=?”
Model-Estimator Combinations
Code 2 NN
Model
Template Empirical
2
+ PDF Ambiguity warning
• NN– No PDF, no warning
• Template model+ Can be extrapolated in z,mag– Calibration issues– Priors’ issues
• Empirical model+ Good priors+ No calibration issues– Can not be extrapolated
Model-Estimator Combinations
Code 2 NN
Model
Template Empirical
2
+ PDF Ambiguity warning
• NN– No PDF, no warning
• Template model+ Can be extrapolated in z,mag– Calibration issues– Priors’ issues
• Empirical model+ Good priors+ No calibration issues– Can not be extrapolated
?
VI. The 2-empirical Approach
• Goal– Combine 2-PDF with reliability of empirical model
• Suggest– Replace templates with empirical model:
has correct calibration & priors, but has also noise
• However– PDF from 2-model testing only correct,
if model correct and noise-free– Templates are noise-free but incorrect,
so produce wrong PDF as well
Compare: Kernel Regression
• From global fits (1980s) to local fits (2000s)– Locally optimal solution– Requires more data and
computing power
• Kernel function– Smooth over wide range
for robust solution– Smooth over small range
for good representation
• Identical to 2-fitting if– Model noise-free – Gaussian kernel function
with = data
Colour given:Locally fitz(colour)z
Equations: 2-testing
• Probability of single given model object to produce data object
• Parameter estimate
• Expected error
• Bimodality detectionp
z
For Now: Ignore Model Errors
• SDSS QSO sample– Plenty of z-ambiguities
– DR5: 75,770 objects split half:half into model:data
– Pretend noise-free model
Ric
hard
s et
al.
2007
Result: Non-bimodal Objects
Fraction of outliers with|z| > 3z,limit
Photo-z “bias”= mean z of non-outliers
(non-outliers)
Fraction ofsample with
z < z,limit
Results: Non-bimodal Objects
Fraction of outliers with|z| > 3z,limit
Photo-z “bias”= mean z of non-outliers
(non-outliers)
Fraction ofsample with
z < z,limit
z rms z
Result: Bimodal Objects
• 15474 detected ambiguities– Two z’s given one colour
– Need more data to break
• Meanwhile– Trust more probable z
• Mean p-ratio 78:22 predicts12077 right : 3397 wrong
• 12051 right indeed!
– Use two weighted results• Reliable: phigh ≈ fhigh
• Sensitivity limits?8% 1:>20 and 1% 1:>50
• Undetected ambiguities inevitable (= erroneously uni-modal) – 30% of space, undetected 1:50-ambiguity 0.6% outliers
Result: Redshift Distributions
Histogram of zphot-estimatescount bimodal objects twice
using p-weights
Co-addition of all p(z)
p(z) inform beyond zphot
Result: Size of Model Sample
V. Persistent Photo-Z Issues
1. RMS redshift error• Has a floor supported by intrinsic scatter, deeper
photometry useless
2. Redshift bias• Sub-samples can show local bias even when
method globally bias-free
3. Catastrophic outliers• Faint ambiguities (extreme p-ratio) undetectable,
only guard is all-out spectroscopy
Error Floor from Intrinsic Scatter
• Example: – QSO near (g-r)~1 or z~3.7
– z signal: Ly forest in g-band
• Training sample in box– Redshift distribution:
mean 3.66, rms 0.115
– RMS/(1+z) = 0.024
• Testing sample in box– RMS/(1+z) error 0.023
Locally linear
Local Redshift Biases
Local Redshift Biases
Not an issue whenplotted over zphot
(by design!)
Outliers from Undetected Ambiguities
Model objects within kernel: Nprimary + N2nd = Nmodel,local
Assume 2nd > 0; observe N2nd = 0: N2nd = 1
Hence, individual residual outlier risk: p2nd = 1/Nmodel,local
Incomplete Models Mean Outliers
• Incomplete targeting– No problem, use weights
• Incomplete z recovery– Model completeness f(z)– Main reason: z different– Missed “model outliers”– Part of data PDF missing
• Maximum bias risk for objects at fixed colour
Incomplete Models Mean Outliers
• Incomplete targeting– No problem, use weights
• Incomplete z recovery– Model completeness f(z)– Main reason: z different– Missed “model outliers”– Part of data PDF missing
• Maximum bias risk for objects at fixed colour
• Assume deep survey non-recov=0.2
– |zout|=1
|z|=0.2 !!!
Spectroscopic incompleteness deserves by far the greatest concern in empirical redshift estimation.*
* |z|<10-3 means 99.9% completeness & reliability
Model
q1
q2
q1
q2
2-Test: Noise-free Model
p(data|model) = modelG2 (data)
Gdata = G2
data =
2
Model
q1
q2
q1
q2
q1
q2
q1
q2
Model
VI. 2-Test: Noisy Model
p(data|model) = modelG2 (data)
Gdata = G2
data =
2
Gdata = Gmodel G2
data =
model + 2
2 =
data - model
Data Noise vs. Model Noise
• If data >
model
– Replace model point by Gaussian 2 =
data - model
• If data ≈
model
– When 2 0 then also Nmodel,local 0 and outlier risk 1
– Define p(z) only for regions larger than one object or…
• If data <
model
– Larger target smoothing i.e.
• Resample data point with resample =
target - data
• Replace model point by Gaussian 2 =
target - model
Error Propagation: Equations
Locally linear
Model has scatter
Data has scatter
At fixed colour: zphot = z
True z scatter:
Estimated photo-z error:
Only equal if
Noisy Model, Noisy Data: n(z)=?
Use noise levels of data = 0.1414model = 0.1000
Reconstruction of n(z) with Poisson precision
Revisiting The data~model Case
z rms z2 0
Unifying 2-testing with Kernel Regression - Practical Requirements
• Merge two approaches– Model smoothing by
kernel function – Correct 2 error scale
• Strictly require– Target smoothing scale
constant across space– Data error > model error
• Either from the start
• Or noise be introduced into data on purpose
• Desire for– Better model photometry– Constant data error scale
• Bright objects: errors in magnitudes
• Faint objects: errors in flux units (background)
• Or transform mag scale so that error constant
• Issues – Varying exposure depth
or interstellar extinction
VII. The PHAISE Proposal
PHoto-z Archive for Imaging Survey Exploitation
– Gaussian-precision photo-z code & residual risk quantification from model incompleteness/size moves all attention to model
– Residual outlier risk incomplete
– Noise floor on n(z) 2n(z) 1/Nmodel,z-bin
• The plan: A central repository for empirical model data– Avoid duplication of efforts, provide “the best empirical model”– Best-possible n(z)/photo-z quality from here, by definition – Dynamic and growing with time, well-known incompleteness– Web submission of small “photo-zeeing” jobs or customer
installation for large applications (PanStarrs, LSST, …)
PHAISE Issues & Plan
• Calibrating new photo-z survey to PHAISE?– Pure calculation of colour
transformations reliable? – Must observe calibration
fields?
• Digesting diverse input– Start with SDSS, VVDS,
GOODS etc.– Keeping track of sources
of incompleteness
• 5-year goal: It works!– Existing spectroscopic
sources digested– Incompleteness at R>22
too high for cosmology
• 10-year goal– Deep complete spec-z
survey fills gaps– VIMOS, FMOS, SIDE,…– “Fundamental” limits, e.g.
source blending, AGN ...
Summary
I. Presented method delivers n(z|c) or photo-z with Poisson precision if model complete
II. Completeness of empirical spectroscopic model in faint regime is primary quality limit
III. Need deep, large, very complete spec survey!
IV. Combine resources, do it once, and ABAP• Set up PHAISE, codes, technicalities
• Propose “The Deep Complete” survey
• Campaign for suitable optical + NIR instrumentation