desperately trying to cope with the data explosion in astronomical sciences ray norris csiro...
TRANSCRIPT
Desperately Trying to Cope with the Data Explosion in Astronomical
Sciences
Ray NorrisCSIRO Australia Telescope
National Facility
Overview
• Background: astronomical data
• Good news
• Bad news
• Data Manifesto
Astronomical Data
Q: How did the first galaxies in the Universe form?
Need many wavelengths:
Source “c” at 3 cm
wavelength
The mysterious “source c”
WFPC2 image
2 arcsec
The hard questions:• Give me the WFPC image to normalise my
spectral line cube– Obviously best to do computation locally
• Give me every source in NED with J-k>4– Obviously best to do computation at host
• Give the me the radio spectral indices (using ATCA data) of all the objects in SLOAN which have J-K>4 in available ESO/STScI databases”– Some computations local, some on hosts– VO needs to make sensible decisions– VO needs grid computing standards
Terabyte database in Baltimore
Local megabyte dataset NASA
Extragalactic Database in Pasadena
Terabyte database in Sydney
Terabyte database in New Mexico
Multi-terabyte databases in Europe & US
Good News
• The Virtual Observatory
• Astronomical Data Centres
• Public-domain data
The Virtual Observatory (VO)• The FITS standard (~1980) paved the way in
interoperability
• International Virtual Observatory Alliance involves all major astronomical observatories worldwide – IVOA established 2002
• VO is a collection of interoperating data archives and software tools which are linked to form a research environment in which astronomical research programs can be conducted.
• It includes terabyte distributed databases, data dictionaries, standards, protocols, tools, algorithms, web services, etc.
Examples of VO operationsGive me a list of all the objects which satisfy:
– Criterion A in the CDS database (in Strasbourg, France),
– Criterion B in the Parkes HIPASS survey (in Australia)
– Criterion C in the Hubble archive (in Baltimore, USA)
P.S.
– Each of these databases has a different format, coordinate system, and ontology, and each is several Tbyte in size.
– Metadata is of variable quality
– The object names will be different in each database.
VO Status• VO is not a project-managed project – it is a collaboration of
different groups, with different drivers, but united by a common goal.
• Several groups worldwide are now defining standards, tools, protocols, etc.
• Some prototype tools and web services already available (e.g. http://www.aus-vo.org/services.html)
• More will become available over the next 1-2 years• See http://www.ivoa.net/
Good News
• The Virtual Observatory
• Astronomical Data Centres
• Public-domain data
Astronomical Data Centres
• Centre de Données astronomiques de Strasbourg, France (CDS)– attempts to hold electronic copies of all published
astronomical data, surveys, etc
• NASA Astronomical Data Centre (ADC) Baltimore, USA
• NASA Extragalactic Database (NED)– Interprets and combines extragalactic data
• Astronomical Data System (ADS)– All published astronomical literature
• Others
Good News
• The Virtual Observatory• Astronomical Data Centres• Public-domain data
Security, confidentiality, and IP protection are not major issues in astronomy – most data are in the public domain – hence VO is interesting to Microsoft etc.
Bad News• Intellectual Property controls.
• Journal data
• Bad planning of new instruments
• Digital Divide
• Legacy data
• Lack of awareness
• "Why should I share my data with my competitors?"
Bad News• Intellectual Property controls.
• Journal data
• Bad planning of new instruments
• Digital Divide
• Legacy data
• Lack of awareness
• "Why should I share my data with my competitors?"
Intellectual Property Protection• Patents
– protect inventions
• Copyright– protects written work and creative work
• Proposed database protection– protects information (about anything)– No “fair use” provisions– You cannot cite someone else’s data
without obtaining their permission– Each paper will need a paper-trail showing
rights to cite data
ICSU International Council of Science
United Nations
IAU IUGG etc...
CODATA
WIPO
United Nations
National Representatives
Committee on Data for Science and Technology
World Intellectual Property Organisation
Bad News• Intellectual Property controls.
• Journal data
• Bad planning of new instruments
• Digital Divide
• Legacy data
• Lack of awareness
• "Why should I share my data with my competitors?"
Journal Data
• Most data published in journals never make it to the data centres
• When they do appear in data centres, they rarely carry the metadata or ontology that enable machine-understanding
• Journals need to impose standards (e.g. VOTable) on authors
Bad News
• Intellectual Property controls. • Journal data• Bad planning of new instruments• Digital Divide• Legacy data• Lack of awareness• "Why should I share my data with my competitors?"
Many new instruments are plannedwithout sufficient planning or fundingfor data management(decreasing scientific productivity)
Bad News
• Intellectual Property controls. • Journal data• Bad planning of new instruments• Digital Divide• Legacy data• Lack of awareness• "Why should I share my data with my
competitors?")
We take for granted instant access to literature and databases. Our colleagues in developing countries still dream of it(thus disadvantaging them even further)
Bad News
• Intellectual Property controls. • Journal data• Bad planning of new instruments• Digital Divide• Legacy data• Lack of awareness• "Why should I share my data with my
competitors?"
Digitising old data competesfor funding with newinstruments
Bad News• Intellectual Property controls.
• Journal data
• Bad planning of new instruments
• Digital Divide
• Legacy data
• Lack of awareness
• "Why should I share my data with my competitors?"
BORING!
Bad News• Intellectual Property controls.
• Journal data
• Bad planning of new instruments
• Digital Divide
• Legacy data
• Lack of awareness
• "Why should I share my data with my competitors?"
The Data Manifestohttp://www.ivoa.net/twiki/bin/view/Astrodata/
AstronomersManifesto
We, the global community of astronomy, aspire to the following guidelines for managing astronomical data, believing that this would maximise the rate and cost-effectiveness of scientific discovery…
1. All major tables, images, and spectra published in journals
should appear in the astronomical data centres.
• Journals should, in collaboration with data centres, define formats, table descriptions, and metadata that are easy for authors to adhere to, and can automatically be translated into a format (e.g. VOTable, FITS, etc) that can be entered by the data centre into their database.
2. All data obtained with publicly-funded observatories should, after appropriate proprietary periods, be
placed in the public domain.
• Consistent with ICSU and OECD recommendations
• …to which Australia is a signatory
3. In any new major astronomical construction project, the data
processing, storage, migration, and management requirements should be
built in at an early stage of the project plan, and costed along with
other parts of the project
• Isn’t this obvious?– apparently not!
4. Astronomers in all countries should have the same access to
astronomical data and information.
5. Legacy astronomical data can be valuable, and high-priority legacy
data should be preserved and stored in digital form in the data centres.
How do you prioritise?
6. The IAU should work with other international organisations to
achieve our common goals and learn from our colleagues in other fields.
• Use bodies such as CODATA to cross-fertilise
But the major challenge to coping with the data explosion remains…
Why can’t someoneelse do it?