the chandra source catalog 2.0: data processing pipelines … · the chandra source catalog 2.0:...

1
The Chandra Source Catalog 2.0: Data Processing Pipelines Joseph B. Miller 1 , Christopher Allen 1 , Jamie A. Budynkiewicz 1 , Judy C. Chen 1 , Danny G. Gibbs II 1 , Charles Paxson 1 , Craig S. Anderson 1 , Douglas Burke 1 , Francesca Civano 1 , Raffaele D’Abrusco 1 , Stephen M. Doe 2 , Ian N. Evans 1 , Janet D. Evans 1 , Giuseppina Fabbiano 1 , Kenny J. Glotfelty 1 , Dale E. Graessle 1 , John D. Grier 1 , Roger M. Hain 1 , Diane M. Hall 3 , Peter N. Harbo 1 , John C. Houck 1 , Jennifer Lauer 1 , Omar Laurino 1 , Nicholas Lee 1 , J. Rafael Martinez-Galarza 1 , Michael L. McCollough 1 , Jonathan C. McDowell 1 , Warren McLaughlin 1 , Douglas L. Morgan 1 , Amy E. Mossman 1 , Dan T. Nguyen 1 , Joy S. Nichols 1 , Michael A. Nowak 4 , David A. Plummer 1 , Francis A. Primini 1 , Arnold H. Rots 1 , Aneta Siemiginowska 1 , Beth A. Sundheim 1 , Michael S. Tibbetts 1 , David W. Van Stone 1 , Panagoula Zografou 1 1 Smithsonian Astrophysical Observatory 2 formerly Smithsonian Astrophysical Observatory 3 Northrop Grumman Mission Systems 4 MIT Kavli Institute for Astrophysics and Space Research This work has been supported by NASA under contract NAS 8-03060 to the Smithsonian Astrophysical Observatory for operation of the Chandra X-ray Center. Processing Overview With the construction of the second Chandra Source Catalog (CSC2.0), came new requirements using new techniques to define a software system that analyzes deeper into the X-ray sky as seen by Chandra. A software system capable of processing ~10,000 observations and nearly 320,000 point and compact X-ray sources was developed, built, and run in a partnership between scientists and software developers. 20 data processing pipelines organized into 3 distinct phases: detection, master matching and source properties Detection phase - adjusts for shifts in fine astrometry, combines observations, detects sources, and assesses the likelihood that sources are real Master Match phase - detections across stacks of observations are analyzed for coverage of the same source to produce a master source Source Property phase - each source is characterized by over 350 properties including photometry, spectroscopy, and variability, at the observation, stack, and master levels across 6 energy bands 2 additional pipelines - one to compute limiting sensitivity and another to produce source properties on large extended sources characterized by convex hulls Quality Assurance (QA) steps ensure accuracy of data products by handling corner cases through manual intervention where the software tools may get confused Processing pipelines are designed to run in parallel on a high performance computing cluster (HPCC) – see poster #238.04 Processing is done on the smallest unit for a given task - pipelines may be run on an observation, a stack of observations, an ensemble (or group of stacks), or even a single source. Distribution is managed by the Catalog Automatic Processing system, which optimizes the processing time Master Match After all detections are found, the true and marginal detections within groups of overlapping stacks (ensembles) are compared against each other to determine the final list of sources in the field. Detecting Sources Prior to processing, observations (OBI) within one arcmin of each other, are grouped into stacks. While the collection of pipelines operates on a stack, each pipeline may operate on a component of the stack (e.g an observation in the stack or even a group of detections). The results of this phase include source detections, stacked products and per-obi products. Match bright source between OBIs Compute offsets between each OBI and the OBI with the most sources Using a weighted average, find a common tangent point for all OBIs Fine Astrometry Calibrate each OBI using the same CALDB and software Run source detection to find bright sources in the OBI PreCal and Detect Multi-OBI Stacks ONLY Calibrate each OBI using the same CALDB and software Transform multi-OBI stacks to the same tangent plane Make per OBI products - images, backgrounds, etc. Calibrate Create stacked products by combining per-OBI products Detect sources on the stacked event files using 2 methods – wavdetect and Voronoi Tesselation Combine and Detect Chose best detection between 2 detection algorithms Organize detections into bundles of overlapping sources Validate Sources QA: Reviewer can manipulate detections in difficult fields 1 st Pass Use MLE for initial fit, correcting for detect outliers Re-center source at fitted position 2 nd Pass Refine position using MLE Compute likelihood Calculate errors using Markov Chain Monte Carlo Maximum Likelihood Estimator (MLE) Create the mrgsrc product a combination of all processing steps Qualify detections using likelihoods into True Marginal or False Stack Final Detection List Rebundle and Rerun QA: Adjust the fit for sources with poor fits Identify overlapping detections with significant overlap as the same source Ignore overlapping sources from the same stack Sources that overlap multiple sources have an ambiguous association Find Master Sources Find areas in valid parts of the observation where no detection was found ”Nodetects” are used as upper limits in photometry Find Missed Detections Source Properties Once all the sources have been identified, each source is characterized at the OBI, stack and master levels. The catalog identifies on the order of 350 properties for each source. When expanded out for each band and when including errors, there are on the order of 1600 properties computed. 3 Stack Ensemble QA: Cases when too many overlaps cannot clearly identify the actual sources Combine to 1 master QA: Cases that couldn’t find a reliable match Compute “nodetect” Compute properties for each OBI the detect source is valid. Per-OBI Properties Compute properties averages across all OBIs in the detect stack Stack Properties For all detections associated with the source, create groups of similar OBIs with based on flux Bayesian Blocks For each Bayesian group, compute properties averages across all OBIs in the group Master Properties Major Properties

Upload: others

Post on 20-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Chandra Source Catalog 2.0: Data Processing Pipelines … · The Chandra Source Catalog 2.0: Data Processing Pipelines Joseph B. Miller1, Christopher Allen1, Jamie A. Budynkiewicz1,

The Chandra Source Catalog 2.0: Data Processing Pipelines

Joseph B. Miller1, Christopher Allen1, Jamie A. Budynkiewicz1, Judy C. Chen1, Danny G. Gibbs II1, Charles Paxson1, Craig S. Anderson1, Douglas Burke1, Francesca Civano1, Raffaele D’Abrusco1, Stephen M. Doe2, Ian N. Evans1, Janet D. Evans1, Giuseppina Fabbiano1, Kenny J. Glotfelty1, Dale E. Graessle1, John D. Grier1, Roger

M. Hain1, Diane M. Hall3, Peter N. Harbo1, John C. Houck1, Jennifer Lauer1, Omar Laurino1, Nicholas Lee1, J. Rafael Martinez-Galarza1, Michael L. McCollough1, Jonathan C. McDowell1, Warren McLaughlin1, Douglas L. Morgan1, Amy E. Mossman1, Dan T. Nguyen1, Joy S. Nichols1, Michael A. Nowak4, David A.

Plummer1, Francis A. Primini1, Arnold H. Rots1, Aneta Siemiginowska1, Beth A. Sundheim1, Michael S. Tibbetts1, David W. Van Stone1, Panagoula Zografou1

1Smithsonian Astrophysical Observatory 2 formerly Smithsonian Astrophysical Observatory 3Northrop Grumman Mission Systems 4MIT Kavli Institute for Astrophysics and Space Research

This work has been supported by NASA under contract NAS 8-03060 to the Smithsonian Astrophysical

Observatory for operation of the Chandra X-ray Center.

Processing OverviewWith the construction of the second Chandra Source Catalog (CSC2.0), came new requirements using new techniques to define a software system that analyzes deeper into the X-ray sky as seen by Chandra. A software system capable of processing ~10,000 observations and nearly 320,000 point and compact X-ray sources was developed, built, and run in a partnership between scientists and software developers.

• 20 data processing pipelines organized into 3 distinct phases: detection, master matching and source properties• Detection phase - adjusts for shifts in fine astrometry, combines observations, detects sources, and assesses the likelihood that sources are real • Master Match phase - detections across stacks of observations are analyzed for coverage of the same source to produce a master source • Source Property phase - each source is characterized by over 350 properties including photometry, spectroscopy, and variability, at the observation, stack, and

master levels across 6 energy bands • 2 additional pipelines - one to compute limiting sensitivity and another to produce source properties on large extended sources characterized by convex hulls • Quality Assurance (QA) steps ensure accuracy of data products by handling corner cases through manual intervention where the software tools may get confused• Processing pipelines are designed to run in parallel on a high performance computing cluster (HPCC) – see poster #238.04• Processing is done on the smallest unit for a given task - pipelines may be run on an observation, a stack of observations, an ensemble (or group of stacks), or

even a single source. • Distribution is managed by the Catalog Automatic Processing system, which optimizes the processing time

Master MatchAfter all detections are found, the true and marginal detections within groups of overlapping stacks (ensembles) are compared against each other to determine the final list of sources in the field.

Detecting SourcesPrior to processing, observations (OBI) within one arcmin of each other, are grouped into stacks. While the collection of pipelines operates on a stack, each pipeline may operate on a component of the stack (e.g an observation in the stack or even a group of detections). The results of this phase include source detections, stacked products and per-obi products.

• MatchbrightsourcebetweenOBIs• ComputeoffsetsbetweeneachOBI

andtheOBIwiththemostsources• Usingaweightedaverage,finda

commontangentpointforallOBIs

FineAstrometry• CalibrateeachOBIusingthe

sameCALDBandsoftware• Runsourcedetectiontofind

brightsourcesintheOBI

PreCal andDetect

Multi-OBIStacksONLY

• CalibrateeachOBIusingthesameCALDBandsoftware

• Transformmulti-OBIstackstothesametangentplane

• MakeperOBIproducts-images,backgrounds,etc.

Calibrate• Createstackedproductsby

combiningper-OBIproducts• Detectsourcesonthe

stackedeventfilesusing2methods– wavdetect andVoronoi Tesselation

CombineandDetect• Chosebestdetectionbetween

2detectionalgorithms• Organizedetectionsinto

bundlesofoverlappingsources

ValidateSources

QA:Reviewercanmanipulatedetectionsindifficultfields

1st Pass• UseMLEforinitialfit,

correctingfordetectoutliers

• Re-centersourceatfittedposition

2nd Pass• RefinepositionusingMLE• Computelikelihood• Calculateerrorsusing

MarkovChainMonteCarlo

MaximumLikelihoodEstimator(MLE)

• Createthemrgsrc productacombinationofallprocessingsteps

• QualifydetectionsusinglikelihoodsintoTrueMarginalorFalse

StackFinalDetectionList

Rebundleand

Rerun

QA:Adjustthefitforsourceswithpoorfits

• Identifyoverlappingdetectionswithsignificantoverlapasthesamesource

• Ignoreoverlappingsourcesfromthesamestack

• Sourcesthatoverlapmultiplesourceshaveanambiguousassociation

FindMasterSources

• Findareasinvalidpartsoftheobservationwherenodetectionwasfound

• ”Nodetects”areusedasupperlimitsinphotometry

FindMissedDetections

Source PropertiesOnce all the sources have been identified, each source is characterized at the OBI, stack and master levels. The catalog identifies on the order of 350 properties for each source. When expanded out for each band and when including errors, there are on the order of 1600 properties computed.

3StackEnsemble

QA:Caseswhentoomanyoverlapscannotclearly

identifytheactualsources

Combineto1master

QA:Casesthatcouldn’tfindareliablematch

Compute“nodetect”

• ComputepropertiesforeachOBIthedetectsourceisvalid.

Per-OBIProperties

• ComputepropertiesaveragesacrossallOBIsinthedetectstack

StackProperties

• Foralldetectionsassociatedwiththesource,creategroupsofsimilarOBIswithbasedonflux

BayesianBlocks

• ForeachBayesiangroup,computepropertiesaveragesacrossallOBIsinthegroup

MasterProperties

MajorProperties