photometric redshifts with poisson-only noise christian wolf oxford physics edinburgh - 6 may 2009

Photometric Redshifts with Poisson-only Noise

Christian Wolf

Oxford Physics

Edinburgh - 6 May 2009

Talk Outline

I. Why Photo-z’s?

II. State of the Art

III. Future challenges

IV. The 2-empirical approach

V. Persistent photo-z issues

VI. 2-test: noisy model

VII. The PHAISE proposal

I. Why Photo-z’s?

z < 0.01 Stebbins & Whitford, AJ, 1948

Photography is deeper than spectroscopy!

Baum 1957

Baum 1962, IAU Symp. 15

Photo-Zeeing in the 80’s/90’s

• Koo 1985 z/(1+z) ~ 0.04 @ z < 0.6– UBVI photographic plates

• Loh & Spillar 1986– Degrading all filter images

to worst seeing important– Star and galaxy library– 10% outliers

• Half zphot wrong– Insufficiently blue

templates– Photometric blends

• Half zspec wrong– Blends– Single-line detections

• Connolly et al. 1995– 4-D space (z,U-B,B-R,R-I)

• Distribution has Df = 1.8

– Colour ‘plane’ at z = [0,0.4] rotation after one filter

– Step-wise quadratic fits z/(1+z) < 0.04 @ z < 0.8

Redshift Errors & Resolution

• Objects at different redshifts

• Filterset 's fixed

z = 0.843

z = 1.958

z = 2.828

€

Δz =Δλ

λ rest=

Δλ

λ /(1+ z)

400 nm 1000 nm

flux /

qef

G2 star vs.QSO z=3

€

Δ

€

€

⇒Δz1+ z

=Δλ

λ

II. State of the Art: Medium-band SEDs

rms 0.0087%-20%

outliers

rms 0.008(R<21)1 outlier

QSOsat z~2.8

R=23.8R=22.9R=21.5

R=20 R=22R=23.7

Galaxies at z~0.45

State of the Art: ugriz-only

Collister & Lahav 2004

ANN 2 template

~4% outliers z/(1+z)>0.1rms z/(1+z) = 0.042

Bias -0.017

~0% outliers z/(1+z)>0.1rms z/(1+z) = 0.023

Bias ~0.00

ANN easy at z<1: no ambiguities… but wait for future data!

III. Future Challenges

• Catastrophic failures & misclassifications

• Large z errors

• Mean z bias

• Unrealistic z errors

Origin of Challenges


• Large z errors

• Mean z bias


• Model ambiguities in colour space

• PDF too unconstrained

• PDF wrong

• Mismatch between data and model

Common Fixes


• Large z errors

• Mean z bias


• Model ambiguities in colour space

• PDF too unconstrained

• PDF wrong

• Mismatch between data and model

Add priors

Add more data

Repair modelsUse templateerror function

Why do These Matter?

• Super-large photo-z surveys for cosmology– Now: PanStarrs, DES

– 2015++: LSST, IDEM

• Redshift bias from – Model:data calibration

– Catastrophic outliers |Δz| ≈ |Δzoutlier| outlier

• Kitching, Taylor & Heavens: Δw ≈ 5Δz (3D cosmic shear)

Δz = 0.01 unacceptable 1% outliers unacceptable

• Even spectroscopic surveys– May have 1% wrong z’s

– Have incompleteness, i.e. more undiscovered outliers

http://www.astro.uni-bonn.de/~hendrik/PHAT/index.html

Back to The Principles: Overview

Farb-bibliothek

Schätzer/Klassifikator

result

model data

estimator

spectral energy

distribution

PDF: p(z)

empirical dataor

external template

2-fittingartificial neural netlearning algorithms

Back to The Principles: Overview

Farb-bibliothek

Schätzer/Klassifikator

result

model data

estimator

spectral energy

distribution

PDF: p(z)

empirical dataor

external template

2-fittingartificial neural netlearning algorithms

Frequentist precision statistics:= “Using what IS there: N(z)!”

Bayesian frontier exploration:= “What do we (not) know: p(z)=?”

Model-Estimator Combinations

Code 2 NN

Model

Template Empirical

2

+ PDF Ambiguity warning

• NN– No PDF, no warning

• Template model+ Can be extrapolated in z,mag– Calibration issues– Priors’ issues

• Empirical model+ Good priors+ No calibration issues– Can not be extrapolated

Model-Estimator Combinations

Code 2 NN

Model

Template Empirical

2

+ PDF Ambiguity warning

• NN– No PDF, no warning

• Template model+ Can be extrapolated in z,mag– Calibration issues– Priors’ issues

• Empirical model+ Good priors+ No calibration issues– Can not be extrapolated

?

VI. The 2-empirical Approach

• Goal– Combine 2-PDF with reliability of empirical model

• Suggest– Replace templates with empirical model:

has correct calibration & priors, but has also noise

• However– PDF from 2-model testing only correct,

if model correct and noise-free– Templates are noise-free but incorrect,

so produce wrong PDF as well

Compare: Kernel Regression

• From global fits (1980s) to local fits (2000s)– Locally optimal solution– Requires more data and

computing power

• Kernel function– Smooth over wide range

for robust solution– Smooth over small range

for good representation

• Identical to 2-fitting if– Model noise-free – Gaussian kernel function

with = data

Colour given:Locally fitz(colour)z

Equations: 2-testing

• Probability of single given model object to produce data object

• Parameter estimate

• Expected error

• Bimodality detectionp

z

For Now: Ignore Model Errors

• SDSS QSO sample– Plenty of z-ambiguities

– DR5: 75,770 objects split half:half into model:data

– Pretend noise-free model

Ric

hard

s et

al.

2007

Result: Non-bimodal Objects

Fraction of outliers with|z| > 3z,limit

Photo-z “bias”= mean z of non-outliers

(non-outliers)

Fraction ofsample with

z < z,limit

Results: Non-bimodal Objects

Fraction of outliers with|z| > 3z,limit

Photo-z “bias”= mean z of non-outliers

(non-outliers)

Fraction ofsample with

z < z,limit

z rms z

Result: Bimodal Objects

• 15474 detected ambiguities– Two z’s given one colour

– Need more data to break

• Meanwhile– Trust more probable z

• Mean p-ratio 78:22 predicts12077 right : 3397 wrong

• 12051 right indeed!

– Use two weighted results• Reliable: phigh ≈ fhigh

• Sensitivity limits?8% 1:>20 and 1% 1:>50

• Undetected ambiguities inevitable (= erroneously uni-modal) – 30% of space, undetected 1:50-ambiguity 0.6% outliers

Result: Redshift Distributions

Histogram of zphot-estimatescount bimodal objects twice

using p-weights

Co-addition of all p(z)

p(z) inform beyond zphot

Result: Size of Model Sample

V. Persistent Photo-Z Issues

1. RMS redshift error• Has a floor supported by intrinsic scatter, deeper

photometry useless

2. Redshift bias• Sub-samples can show local bias even when

method globally bias-free

3. Catastrophic outliers• Faint ambiguities (extreme p-ratio) undetectable,

only guard is all-out spectroscopy

Error Floor from Intrinsic Scatter

• Example: – QSO near (g-r)~1 or z~3.7

– z signal: Ly forest in g-band

• Training sample in box– Redshift distribution:

mean 3.66, rms 0.115

– RMS/(1+z) = 0.024

• Testing sample in box– RMS/(1+z) error 0.023

Locally linear

Local Redshift Biases

Local Redshift Biases

Not an issue whenplotted over zphot

(by design!)

Outliers from Undetected Ambiguities

Model objects within kernel: Nprimary + N2nd = Nmodel,local

Assume 2nd > 0; observe N2nd = 0: N2nd = 1

Hence, individual residual outlier risk: p2nd = 1/Nmodel,local

Incomplete Models Mean Outliers

• Incomplete targeting– No problem, use weights

• Incomplete z recovery– Model completeness f(z)– Main reason: z different– Missed “model outliers”– Part of data PDF missing

• Maximum bias risk for objects at fixed colour

Incomplete Models Mean Outliers

• Incomplete targeting– No problem, use weights

• Incomplete z recovery– Model completeness f(z)– Main reason: z different– Missed “model outliers”– Part of data PDF missing

• Maximum bias risk for objects at fixed colour

• Assume deep survey non-recov=0.2

– |zout|=1

|z|=0.2 !!!

Spectroscopic incompleteness deserves by far the greatest concern in empirical redshift estimation.*

* |z|<10-3 means 99.9% completeness & reliability

Model

q1

q2

q1

q2

2-Test: Noise-free Model

p(data|model) = modelG2 (data)

Gdata = G2

data =

2

Model

q1

q2

q1

q2

q1

q2

q1

q2

Model

VI. 2-Test: Noisy Model

p(data|model) = modelG2 (data)

Gdata = G2

data =

2

Gdata = Gmodel G2

data =

model + 2

2 =

data - model

Data Noise vs. Model Noise

• If data >

model

– Replace model point by Gaussian 2 =

data - model

• If data ≈

model

– When 2 0 then also Nmodel,local 0 and outlier risk 1

– Define p(z) only for regions larger than one object or…

• If data <

model

– Larger target smoothing i.e.

• Resample data point with resample =

target - data

• Replace model point by Gaussian 2 =

target - model

Error Propagation: Equations

Locally linear

Model has scatter

Data has scatter

At fixed colour: zphot = z

True z scatter:

Estimated photo-z error:

Only equal if

Noisy Model, Noisy Data: n(z)=?

Use noise levels of data = 0.1414model = 0.1000

Reconstruction of n(z) with Poisson precision

Revisiting The data~model Case

z rms z2 0

Unifying 2-testing with Kernel Regression - Practical Requirements

• Merge two approaches– Model smoothing by

kernel function – Correct 2 error scale

• Strictly require– Target smoothing scale

constant across space– Data error > model error

• Either from the start

• Or noise be introduced into data on purpose

• Desire for– Better model photometry– Constant data error scale

• Bright objects: errors in magnitudes

• Faint objects: errors in flux units (background)

• Or transform mag scale so that error constant

• Issues – Varying exposure depth

or interstellar extinction

VII. The PHAISE Proposal

PHoto-z Archive for Imaging Survey Exploitation

– Gaussian-precision photo-z code & residual risk quantification from model incompleteness/size moves all attention to model

– Residual outlier risk incomplete

– Noise floor on n(z) 2n(z) 1/Nmodel,z-bin

• The plan: A central repository for empirical model data– Avoid duplication of efforts, provide “the best empirical model”– Best-possible n(z)/photo-z quality from here, by definition – Dynamic and growing with time, well-known incompleteness– Web submission of small “photo-zeeing” jobs or customer

installation for large applications (PanStarrs, LSST, …)

PHAISE Issues & Plan

• Calibrating new photo-z survey to PHAISE?– Pure calculation of colour

transformations reliable? – Must observe calibration

fields?

• Digesting diverse input– Start with SDSS, VVDS,

GOODS etc.– Keeping track of sources

of incompleteness

• 5-year goal: It works!– Existing spectroscopic

sources digested– Incompleteness at R>22

too high for cosmology

• 10-year goal– Deep complete spec-z

survey fills gaps– VIMOS, FMOS, SIDE,…– “Fundamental” limits, e.g.

source blending, AGN ...

Summary

I. Presented method delivers n(z|c) or photo-z with Poisson precision if model complete

II. Completeness of empirical spectroscopic model in faint regime is primary quality limit

III. Need deep, large, very complete spec survey!

IV. Combine resources, do it once, and ABAP• Set up PHAISE, codes, technicalities

• Propose “The Deep Complete” survey

• Campaign for suitable optical + NIR instrumentation

photometric redshifts with poisson-only noise christian wolf oxford physics edinburgh - 6 may 2009

Documents