the swinburne pulsar portal: real-time supercomputing processing of big data

17
Arna Karick eResearch Consultant/Data Analyst/Astro Swinburne Research Swinburne Pulsar Portal Real-time Supercomputing Processing of Big Data

Upload: arna-karick

Post on 04-Dec-2014

92 views

Category:

Science


4 download

DESCRIPTION

Presented at the Astroinformatics 2013: Knowledge from Data conference - December 11, 2013

TRANSCRIPT

Page 1: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Arna KarickeResearch Consultant/Data Analyst/Astro

Swinburne Research

Swinburne Pulsar PortalReal-time Supercomputing Processing

of Big Data

Page 2: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

This project is an extension of

The Swinburne University of Technology Metadata Stores Project

and partly supported by the Australian National Data Service (ANDS)

 

ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the

Education Investment Fund (EIF) Super Science Initiative

Page 3: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

The ERA of All-Sky Science @ radio, optical and infrared wavelengths...

ASKAP: Wallaby all-sky HI survey - 620 Giga voxels/2.5 TB data cubes

MWA: All-sky radio survey, ~ 6 PB/yr archived data

WISE: All-sky IR survey – recent AllWISE data release (Nov 2013) Source catalog: 747 million objects VST ATLAS: 4500 square deg. of the Southern sky (U, V, R, I, Z)

VPHAS+: VST H-alpha Survey of the Southern Galactic Plane

+ Molonglo Observatory Synthesis Telescope (MOST)

WISE IR Survey

Page 4: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Value of Data Access & Analysis Tools for researchers and citizen scientists...

Hubble Legacy Archive

HST & SDSS obviously....• Dealing with the data deluge

• Efficient research (less reinventing the wheel)

• Greater Exposure

• New Collaborations

• Publications & increased citations

• New Discoveries

Page 5: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Swinburne Pulsar Portal - The Who?

Matthew Bailes: Research Astronomer & Pro-Vice Chancellor (Research)

Andrew Jameson: Software Developer & Systems Engineer

Chris Flynn: Research Astronomer (Molonglo Telescope)

Willem van Straten: Research Astronomer (Software & Instrumentation)

Ewan Barr: Pulsar Postdoc (HTRU reprocessing & Molonglo)

Arna Karick: eResearch (Research Data Management & Policy) & Astronomer (optical: galaxy clusters & ETGs)

Page 6: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Swinburne Pulsar Portal - The What? online tool facilitating remote access to and processing of CSIRO Parkes pulsar data

Survey snapshot

Page 7: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

High Time Resolution Universe - HTRU (P630)• Paper I - Keith et al. (2010) + discovery papers• Collaborative Research (Swinburne, Manchester, ATNF, Cagliari)• Low-lat survey: thin strip Galactic Plane (deep) -> faint pulsars• Med-lat survey: bright MSP for timing array projevys • High-lat survey: snapshot of transient sky (Sth +10) • Rotating radio transients, short duration radio bursts • Running for ~5 years, over 100 new pulsars, including 26 ms pulsars • Survey has produced over 600 Tb of raw data (Total ~875 Tb)• Data archived to tape & streamed to Swinburne via 1 Gb/s link - cont. observing

Pulsar Timing Array projects (P140)• Detection of gravitational waves

High Time Resolution Universe North (HTRU - North)• Effelsburg Radio Telescope

Molonglo Observatory Synthesis Telescope (MOST): ??

possibly... in consultation with

research groups

Page 8: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Swinburne Pulsar Portal - The How?

• User friendly web interface

• Sophisicated analysis tools backed by significant processing power

• MySQL data base with a PHP frontend.

• Accesses a Pb scale database

• XML headers for instrumental and astrophysical metadata - format independent & editable - facilitates easy indexing

• Uses the supercomputer (gSTAR) batch queue system - email alerts - currently has ‘timeout’ in place

• Modular - datasets and analysis tools can be added over time

• Attempt to write ‘non-expert’ analysis tools

Page 9: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Swinburne Pulsar Portal - The Why?

• Sharing of collaborative datasets - secure / proprietory periods

• Target: project collaborators, registered astronomers.. public? • Enables users to query AND analyse data and download data products

(metadata available via CSIROs Data Access Facility)

• Alleviates Tb-Pb storage issues, and the guesswork associated with setup & maintenance of software & hardware infrastructure

• Access pulsar observations, search object catalogues, process time-series data with sophisticated analysis software

• Test & validate analysis techniques (for Parkes data, Molonglo & SKA) Improved multi-processing (eg. orbital solutions for high-eccenticity binaries)

• Science-ready results & greater discovery potential

Page 10: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Swinburne Pulsar Portal – Data Processing

Page 11: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Swinburne Pulsar Portal – Data Tools

standard routines + novel techniques

candidate sorting

folding & optimisation

pulse periods

plotting software

editabe? user software? e.g Geophysics VO on NeCTar

Page 12: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Swinburne Pulsar Portal – Data Products

failures ~2%

e.g. beams with interference/

timeouts

Page 13: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Swinburne Pulsar Portal – Data Products

Page 14: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Swinburne Pulsar Portal – Data Products

Page 15: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Swinburne Pulsar Portal – Data Products

Page 16: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Coming Soon...

Mid-2014

Page 17: The Swinburne Pulsar Portal: Real-time Supercomputing Processing of Big Data

Other Projects

• MyTardis@Swinburne data solutions - Brain Imaging: MEG & EEG - Microscopy/Eng: raman spectrometer & confocal microscope

• Research Data Management & Policy - Instutional research storage, Cloudstor+ (file sharing & storage) - Research Conduct policy, analytics & strategy

• Swinburne (ANDS) Metadata Store Project - Research data collections for Research Data Australia - Copyright, software licencing, DOIs

• Astronomy Research - HST/ACS Coma Cluster Treasury Survey - HST imaging of the Atlas3D galaxy sample