naasc data processing capabilities (including reprocessing scope)
DESCRIPTION
NAASC data processing capabilities (including reprocessing scope). Mark Lacy Data Services Lead, NAASC, NRAO. NAASC Data Services. Data services group formed within the NAASC (other groups are User Support Services [Brogan] and JAO support [Hibbard]). - PowerPoint PPT PresentationTRANSCRIPT
NAASC data processing capabilities (including reprocessing scope)
Mark LacyData Services Lead, NAASC, NRAO
ANASAC 13-14 Sept 2010
ALMANAASC Data Services• Data services group formed within the NAASC (other groups
are User Support Services [Brogan] and JAO support [Hibbard]).
• Goal: to provide processed ALMA data and the tools to analyze it to NA users.
• Responsibilities:– NA ALMA archive and user portal, including VO and
interaction with VAO LLC– Splatalogue– Simdata– Pipeline (implementation)– “Advanced tools” (e.g. data cube visualization and
marginalization).
ANASAC 13-14 Sept 2010
ALMA
ANASAC 13-14 Sept 2010
Overview of ES processing• JAO plans to process all Early Science (ES) data in
order to perform Quality Assurance (QA2)• Processing at SCO will be performed using
desktop machines. • Tests indicate that these will be able to deal with
ES data rates (expected to be ~20TB/year. ~1/10th of Full Science, but with ramp-up near end).
• NAASC will provide reprocessing capabilities for NA users.– Already getting experience with CSV data
processing through NAASC SV “Tiger team”
ALMA
ANASAC 13-14 Sept 2010
Details of NAASC processing plans in ES• We have recently written a computing plan
for the NAASC covering ES through operations. – A small cluster (~4-12 nodes), forerunner of
NAASC pipeline machine will be built up slowly, based on EVLA experience.
– In addition, we will purchase desktop machines for visitor use and evaluation, with the aim of producing recommendations to users for offline processing.
– We will thus have the ability to perform reprocessing of all NA data.
ALMA
ANASAC 13-14 Sept 2010
NAASC cluster - ES
ALMA
ANASAC 13-14 Sept 2010
NAASC cluster - operations
ALMA
ANASAC 13-14 Sept 2010
How users will reprocess• Option 1: Come to the NAASC and use the cluster
through a login on an NRAO desktop machine (or the desktop directly for small datasets).
• Option 2: Use VNC from their home institution to login to the cluster.
• Option 3: Submit a pipeline job remotely to the NAASC via a webpage.
Which we do will depend on the level of support and interaction with the data that is required. Likely to begin with option 1 and move to option 3 as algorithms for e.g. automatic flagging improve, with option 2 as a backup.
(Also likely to have ASDM to MS conversion implemented for users getting their data from the archive.)
ALMA
ANASAC 13-14 Sept 2010
Getting the data to NA• Baseline plan is disk shipment for bulk data, but
we will attempt to take advantage of improved links to Chile required by NOAO for DES and LSST.
• Have AUI/AURA agreement to share fast data link Chile to Florida Intl University (10Gb/s).
• Thereafter data travels via Internet 2 to Charlottesville/UVa
• Should be adequate to move both bulk data and metadata without requiring shipping of disks.
• Archive replication tests to begin next year.
ALMA
ANASAC 13-14 Sept 2010
NAASC and related software systems• Splatalogue
– Currently concentrating on documentation and database enhancement.
– Future plans include improvements to usability (new front end).– Plan to make Splatalogue an “official” ALMA software project,
working on a Splatalogue memo to ALMA describing the database and the plan for management and maintenance.
• Simdata (task in CASA)– Simdata now largely complete, including single dish capability
(in collaboration with NAOJ).– CASA code freeze Sept 17th prior to October release.– Working on new ES examples.
simdata will allow us to demonstrate the limitations of the ES array both in terms of sensitivity and dynamic range/uv-coverage
ALMAExample: ALMA Band 6 deep pointing
9x8hr 234GHz ALMA track in continuum.Simulated using Oxford S-cubedsimulations (Obreschkow & Rawlings 2009) for the model and simdata2 in CASA for the “observation”
Model Early Science (16 ants)
Full Science (50 ants)
ALMACASA/pipeline performance• CASA currently has similar speed to other packages for ~ 10 GB datasets
except for a few high nails being aggressively pursued (flagging, plotting)• CASA’s architecture has been written with parallelization in mind
• Channelization of radio data makes the problem “embarrassingly parallelizable”
• However, particularly for imaging, the problem is I/O and not CPU limited making the problem trickier (~60:40 I/O:CPU).
• Pursuing mitigation through hardware solutions (fast file systems e.g. Lustre with Infiniband interconnect), and software solutions (improving i/o efficiency in code).
• Nevertheless, parallelization efforts of highest current risk and priority• Release of multi-core CASA functionality will be staged so that
functionality becomes available for pipeline testing and the community as soon as possible • Simple imaging (single field or simple mosaic cube) well progressed,
expected for October 2010 release• Multi-core flagging and more imaging cases (multi-frequency synthesis
continuum) expected June 2011
ALMACASA development Priorities• Support of ALMA and EVLA commissioning needs• Parallelization and cluster fine-tuning for imaging and flagging
• Working on combining Torque resource manager with Python scripting in CASA
• Improvements needed for polarization calibration of linear feeds• Improvements to calibration table plotting (incorporate into plotms)• Planet models for use as resolved calibrators• Splatalogue search capabilities (including offline database) and
overplotting• Viewer improvements (especially for spectral line plotting and analysis)• Improvements to image analysis tasks• Improvements to “TV” based flagging in the Viewer (on-the-fly spectral
and time averaging)• A CARMA miriad filler (through partnership with Peter Teuben at U.
Maryland)• Expanded and more modularized simulation capabilities.
ALMA
ANASAC 13-14 Sept 2010
NAASC advanced tools• The NAASC staff will push some of the
ALMA-related software development items as Splatalogue & Simdata reach completion. We will also be hiring an additional developer.– For example, image cube visualization and
analysis are areas which will likely require work.– Can’t do this all ourselves, so will aim to be
responsive to community suggestions and contributions, incorporating some into CASA and posting others as “contributed software”.
ALMA
ANASAC 13-14 Sept 2010
Summary• Within 1 year expect significant data from
ALMA, comparable data rate to that from e.g. HST, Spitzer.
• Within 3 years, data rate will exceed by more than an order of magnitude that from any other PI-driven telescope apart from the EVLA.
• Must continue to be focused on the challenges and opportunities this presents.
ALMA
ANASAC 13-14 Sept 2010
Backup slides
ALMA
ANASAC 13-14 Sept 2010
CASA tutorial examples
3C391 polarization (EVLA)
M99 moment maps (CARMA)