photometric redshifts foreventi.na.astro.it/wp-content/uploads/2018/04/brescia... · 2018. 4....

Photometric redshifts for

LSST

LSST Science Collaboration: Dark Energy (Photo-z)

Stefano Cavuoti, Massimo Brescia

M. T. Botticella (PI), A. Mercurio, S. Cavuoti, M. Brescia (co-PI)

G. Angora, E. Cappellaro, M. Della Valle, L. Greggio, G. Longo, F. Mannucci, M. Paolillo, G. Riccio, M. Vicedomini

within

Italian Working Group on: Transient and Variable stars, Dark Energy (Photo-z), Galaxies

Huge data: LSST

3-Gigapixel camera

A 6 GB image every 20 seconds

30 TB every night for 10 years

100 PB final image data archive (all public)

20 PB final science catalogue database

50 billion object database

Real-time event mining: ~10million eventsper night X 10 years (and for most of them a follow-up observation is required…!!!)

It requires data-driven science solutions (Astroinformatics)!!

New perspective to do science

LSST-DESC Data Challenge 1 Photo-z Simulation CatalogueThe total simulated catalogue (named Buzzard) available to the collaboration covers 400

deg2 and contains 238 million galaxies up to an apparent magnitude limit of r=29 and

spanning the redshift range 0 < z ≤ 8.7

Systematic problems with galaxy colours above z>2 were observed, so the catalogue

was trimmed to include only galaxies in the redshift range 0 < z ≤ 2

Six LSST filter passbands: u g r i z y

Assigned magnitude errors in the six bands

using the model described in Ivezic+ 2008,

assuming full 10-year depth observations

Only 8 deg2 used for this data challenge

Our approach

фLAB

Feature Selection

Photo-z point estimation Photo-z PDF

PHiLAB(Parameter Handling

investigation LABoratory)

Brescia+ 2018submitted to MNRAS

MLPQNA(Multi Layer Perceptron

trained by Quasi Newton Algorithm)

METAPHOR(Machine learning

Estimation Tool for AccuratePHOtometric

Redshifts)

Cavuoti+ 2017MNRAS 465, 2

Brescia+ 2013ApJ 772, 2, 140

Our novel FS method: PHiLABPHiLAB (Parameter Handling investigation LABoratory)

Based on two main concepts: «shadow features» and Naïve-LASSO regularization and exploiting Random Forest model asimportance computing engine.

фLAB

Able to solve the All-relevant feature selection!

LASSO (Least Absolute Shrinkage and Selection)

LASSO penalizes regression coefficients with anL1-norm penalty, shrinking many of them tozero. Features with non-zero regressioncoefficients are “selected”.

𝑚𝑖𝑛𝑓

𝑖=1

𝑛

𝐿(𝑓 𝑥 ) + 𝛌𝐋𝐩 𝐰

Regularization in Machine Learning is a process of introducingadditional information to solve learning overfitting or to performFeature Selection in a sparse Parameter Space.The regularization is a functional term added to any loss functionL and is also widely considered as a penalty term.

Regularization term

u g r i z y

u g r i zy

The shadow features concept is related to the extension of aparameter space with artificial features. A shadow feature foreach real one is introduced by randomly shuffling its valuesamong the N samples of the given dataset.

SHADOW FEATURES represent the noisy versions ofthe real ones and their calculated importance can beused to estimate the relevance of the real features.

METAPHOR for photo-z PDFBased on the concept of photometry perturbation

The photometry perturbation law is

α multiplicative constant

(useful in case of multi-survey photometry);

u(μ=0, σ=1) random value

(from standard normal distribution);

F complex function

(for example a polynomial fitting of

mean mag errors on the binned bands).

1) evaluate photometric error distributions

2) assess the correlation between spectroscopic and photometric errors

3) disentangle photometric uncertainties from those intrinsic to the method itself

𝒎 = 𝒎+𝜶𝑭𝒖(𝝁=𝟎,𝝈=𝟏)

Feature Selection + Photo-z point estimates

FEATUREFeature importance

MAG PS MAG+COLOURSi-z 0,480z-y 0,160r-i 0,135g-r 0,130u-g 0,052g 0,092 0,004u 0,041 0,003r 0,042 0,003y 0,039 0,003i 0,008 0,001z 0,008 0,001

NOTESNo features

rejectedrejectedi, z mags

Feature Selection + Photo-z point estimates

FEATUREFeature importance

MAG PS MAG+COLOURSi-z 0,480z-y 0,160r-i 0,135g-r 0,130u-g 0,052g 0,092 0,004u 0,041 0,003r 0,042 0,003y 0,039 0,003i 0,008 0,001z 0,008 0,001

NOTESNo features

rejectedrejectedi, z mags

Note as the colour i-z is the best feature while i, z were magnitudes rejected

weak relevant features + best relevant features All-relevant FS

LSST MLPQNA PHZ TEST

Zspec Distributions

TRAIN 35,551 objects

TEST 8,864 objects

Buzzard catalogue

Photo-z + PDFs estimated

with METAPHOR + MLPQNA

PHiLAB for LSST photo-z

Lesson learned from FSFeatures which carry most of the informationare not those usually selected by theastronomer on his/her personal experience….

(LSST photo-z) – LSST Challenge Collab. 2018, in prep.

8,864 simulated objects

Comparison between

11- and 9-features

(rejected i and z magnitudes)

LSST passbands ugrizy + colours

фLAB

Statistics 11-features 9-features(removed i, z)

9-features(removed r, y)

|bias| 0.002028 0.002011 0.002885

σ 0.050 0.048 0.062

NMAD 0.023 0.021 0.031

η>0.15 2.12% 2.05% 3.08%

removing two features, casually chosen from the relevant set

+

Photo-z Challenge participants

BPZ template Benitez 2000EAZY template Brammer+ 2008LePhare template Arnouts+ 1999ANNz2 ML Sadeh+ 2016NN ML Graham+ 2018FlexZBoost ML Izbicki & Lee 2017GPz ML Almosallam+ 2016METAPHOR+MLPQNA ML Cavuoti+ 2017, Brescia+ 2013SkyNet ML Graff+ 2014TPZ ML Carrasco Kind & Brunner 2013Delight ML/template Leistedt & Hogg 2017

Statistical comparison on:

• Point estimate photo-z• Photo-z PDF

Point estimate photo-z.Point estimate density is representedwith fixed density contours, whileoutliers at lower density arerepresented by blue points

METHOD |biasmedian| σRMS %η>0.15

BPZ 0,002005 0,0215 0,032

EAZY 0,003765 0,0226 0,029

LePhare 0,002007 0,0239 0,056

ANNz2 0,000307 0,0244 0,047

NN 0,001049 0,0170 0,034

FlexZBoost 0,000211 0,0148 0,017

GPz 0,000950 0,0201 0,037

METAPHOR 0,000333 0,0242 0,048

SkyNet 0,000174 0,0218 0,037

TPZ 0,003048 0,0166 0,031

Delight 0,002005 0,0216 0,038

Photo-z Challenge results

Photo-z Challenge results (PDFs)Quantile-Quantile (QQ) plots.Comparison between quantiles of twodistributions: CDF(z) vs ideal(z).

Easy way to assess some properties,like location, scale and skewness.

Delight and NN: excess of low values anddearth of high values broad PDF

SkyNet: dearth of low values and excess ofhigh values narrow PDF

BPZ, EAZY: above diagonal systematicunderprediction of photo-z

TPZ: mostly below diagonal systematicoverprediction of photo-z

Photo-z Challenge results (PDFs)PIT histograms.Level of uniformity of the PDFdistribution (black dotted lineequivalent to the ideal QQ plotdiagonal).

Outliers amount is evident where trueredshift falls outside the p(z),corresponding to PIT = 0.0 or 1.0

Broad PDF shows an excess at PIT=0.5

Narrow PDF shows an excess at theedges of the PIT.

Underprediction occurs in case offewer PIT values at PIT < 0.5 and moreat PIT > 0.5

Overprediction the opposite.

𝑃𝐼𝑇 = න−∞

𝑍𝑡𝑟𝑢𝑒

𝑝 𝑧 𝑑𝑧

Photo-z Working Group activities recap

Our official involvement in LSST PZ WG established on June 9, 2017

As expected (cf. Brescia+ 2018, Springer CCIS Vol. 822, arXiv: 1802.07683), there is no any objective evaluation estimatori.e. still open discussion about statistical evaluation

Mon. Not. R. Astron. Soc. 000-000 (0000) Printed April 4, 2018 (MN LATEX v.2.2)

LSST Challenge Collab. 2018, in prep.

Coming soon the Challenge paper(estimated submission by the end of May 2018);

photometric redshifts foreventi.na.astro.it/wp-content/uploads/2018/04/brescia... · 2018. 4....

Documents