photometric redshifts foreventi.na.astro.it/wp-content/uploads/2018/04/brescia... · 2018. 4....
TRANSCRIPT
Photometric redshifts for
LSST
LSST Science Collaboration: Dark Energy (Photo-z)
Stefano Cavuoti, Massimo Brescia
M. T. Botticella (PI), A. Mercurio, S. Cavuoti, M. Brescia (co-PI)
G. Angora, E. Cappellaro, M. Della Valle, L. Greggio, G. Longo, F. Mannucci, M. Paolillo, G. Riccio, M. Vicedomini
within
Italian Working Group on: Transient and Variable stars, Dark Energy (Photo-z), Galaxies
Huge data: LSST
3-Gigapixel camera
A 6 GB image every 20 seconds
30 TB every night for 10 years
100 PB final image data archive (all public)
20 PB final science catalogue database
50 billion object database
Real-time event mining: ~10million eventsper night X 10 years (and for most of them a follow-up observation is required…!!!)
It requires data-driven science solutions (Astroinformatics)!!
New perspective to do science
LSST-DESC Data Challenge 1 Photo-z Simulation CatalogueThe total simulated catalogue (named Buzzard) available to the collaboration covers 400
deg2 and contains 238 million galaxies up to an apparent magnitude limit of r=29 and
spanning the redshift range 0 < z ≤ 8.7
Systematic problems with galaxy colours above z>2 were observed, so the catalogue
was trimmed to include only galaxies in the redshift range 0 < z ≤ 2
Six LSST filter passbands: u g r i z y
Assigned magnitude errors in the six bands
using the model described in Ivezic+ 2008,
assuming full 10-year depth observations
Only 8 deg2 used for this data challenge
Our approach
фLAB
Feature Selection
Photo-z point estimation Photo-z PDF
PHiLAB(Parameter Handling
investigation LABoratory)
Brescia+ 2018submitted to MNRAS
MLPQNA(Multi Layer Perceptron
trained by Quasi Newton Algorithm)
METAPHOR(Machine learning
Estimation Tool for AccuratePHOtometric
Redshifts)
Cavuoti+ 2017MNRAS 465, 2
Brescia+ 2013ApJ 772, 2, 140
Our novel FS method: PHiLABPHiLAB (Parameter Handling investigation LABoratory)
Based on two main concepts: «shadow features» and Naïve-LASSO regularization and exploiting Random Forest model asimportance computing engine.
фLAB
Able to solve the All-relevant feature selection!
LASSO (Least Absolute Shrinkage and Selection)
LASSO penalizes regression coefficients with anL1-norm penalty, shrinking many of them tozero. Features with non-zero regressioncoefficients are “selected”.
𝑚𝑖𝑛𝑓
𝑖=1
𝑛
𝐿(𝑓 𝑥 ) + 𝛌𝐋𝐩 𝐰
Regularization in Machine Learning is a process of introducingadditional information to solve learning overfitting or to performFeature Selection in a sparse Parameter Space.The regularization is a functional term added to any loss functionL and is also widely considered as a penalty term.
Regularization term
u g r i z y
u g r i zy
The shadow features concept is related to the extension of aparameter space with artificial features. A shadow feature foreach real one is introduced by randomly shuffling its valuesamong the N samples of the given dataset.
SHADOW FEATURES represent the noisy versions ofthe real ones and their calculated importance can beused to estimate the relevance of the real features.
METAPHOR for photo-z PDFBased on the concept of photometry perturbation
The photometry perturbation law is
α multiplicative constant
(useful in case of multi-survey photometry);
u(μ=0, σ=1) random value
(from standard normal distribution);
F complex function
(for example a polynomial fitting of
mean mag errors on the binned bands).
1) evaluate photometric error distributions
2) assess the correlation between spectroscopic and photometric errors
3) disentangle photometric uncertainties from those intrinsic to the method itself
𝒎 = 𝒎+𝜶𝑭𝒖(𝝁=𝟎,𝝈=𝟏)
Feature Selection + Photo-z point estimates
FEATUREFeature importance
MAG PS MAG+COLOURSi-z 0,480z-y 0,160r-i 0,135g-r 0,130u-g 0,052g 0,092 0,004u 0,041 0,003r 0,042 0,003y 0,039 0,003i 0,008 0,001z 0,008 0,001
NOTESNo features
rejectedrejectedi, z mags
Feature Selection + Photo-z point estimates
FEATUREFeature importance
MAG PS MAG+COLOURSi-z 0,480z-y 0,160r-i 0,135g-r 0,130u-g 0,052g 0,092 0,004u 0,041 0,003r 0,042 0,003y 0,039 0,003i 0,008 0,001z 0,008 0,001
NOTESNo features
rejectedrejectedi, z mags
Note as the colour i-z is the best feature while i, z were magnitudes rejected
weak relevant features + best relevant features All-relevant FS
LSST MLPQNA PHZ TEST
Zspec Distributions
TRAIN 35,551 objects
TEST 8,864 objects
Buzzard catalogue
Photo-z + PDFs estimated
with METAPHOR + MLPQNA
PHiLAB for LSST photo-z
Lesson learned from FSFeatures which carry most of the informationare not those usually selected by theastronomer on his/her personal experience….
(LSST photo-z) – LSST Challenge Collab. 2018, in prep.
8,864 simulated objects
Comparison between
11- and 9-features
(rejected i and z magnitudes)
LSST passbands ugrizy + colours
фLAB
Statistics 11-features 9-features(removed i, z)
9-features(removed r, y)
|bias| 0.002028 0.002011 0.002885
σ 0.050 0.048 0.062
NMAD 0.023 0.021 0.031
η>0.15 2.12% 2.05% 3.08%
removing two features, casually chosen from the relevant set
+
Photo-z Challenge participants
BPZ template Benitez 2000EAZY template Brammer+ 2008LePhare template Arnouts+ 1999ANNz2 ML Sadeh+ 2016NN ML Graham+ 2018FlexZBoost ML Izbicki & Lee 2017GPz ML Almosallam+ 2016METAPHOR+MLPQNA ML Cavuoti+ 2017, Brescia+ 2013SkyNet ML Graff+ 2014TPZ ML Carrasco Kind & Brunner 2013Delight ML/template Leistedt & Hogg 2017
Statistical comparison on:
• Point estimate photo-z• Photo-z PDF
Point estimate photo-z.Point estimate density is representedwith fixed density contours, whileoutliers at lower density arerepresented by blue points
METHOD |biasmedian| σRMS %η>0.15
BPZ 0,002005 0,0215 0,032
EAZY 0,003765 0,0226 0,029
LePhare 0,002007 0,0239 0,056
ANNz2 0,000307 0,0244 0,047
NN 0,001049 0,0170 0,034
FlexZBoost 0,000211 0,0148 0,017
GPz 0,000950 0,0201 0,037
METAPHOR 0,000333 0,0242 0,048
SkyNet 0,000174 0,0218 0,037
TPZ 0,003048 0,0166 0,031
Delight 0,002005 0,0216 0,038
Photo-z Challenge results
Photo-z Challenge results (PDFs)Quantile-Quantile (QQ) plots.Comparison between quantiles of twodistributions: CDF(z) vs ideal(z).
Easy way to assess some properties,like location, scale and skewness.
Delight and NN: excess of low values anddearth of high values broad PDF
SkyNet: dearth of low values and excess ofhigh values narrow PDF
BPZ, EAZY: above diagonal systematicunderprediction of photo-z
TPZ: mostly below diagonal systematicoverprediction of photo-z
Photo-z Challenge results (PDFs)PIT histograms.Level of uniformity of the PDFdistribution (black dotted lineequivalent to the ideal QQ plotdiagonal).
Outliers amount is evident where trueredshift falls outside the p(z),corresponding to PIT = 0.0 or 1.0
Broad PDF shows an excess at PIT=0.5
Narrow PDF shows an excess at theedges of the PIT.
Underprediction occurs in case offewer PIT values at PIT < 0.5 and moreat PIT > 0.5
Overprediction the opposite.
𝑃𝐼𝑇 = න−∞
𝑍𝑡𝑟𝑢𝑒
𝑝 𝑧 𝑑𝑧
Photo-z Working Group activities recap
Our official involvement in LSST PZ WG established on June 9, 2017
As expected (cf. Brescia+ 2018, Springer CCIS Vol. 822, arXiv: 1802.07683), there is no any objective evaluation estimatori.e. still open discussion about statistical evaluation
Mon. Not. R. Astron. Soc. 000-000 (0000) Printed April 4, 2018 (MN LATEX v.2.2)
LSST Challenge Collab. 2018, in prep.
Coming soon the Challenge paper(estimated submission by the end of May 2018);