kitchen sinks, plumbing and virtual observatories peter fox [email protected] june 4, 2010 – csiro...

44
Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox [email protected] June 4, 2010 – CSIRO Aspendale

Post on 21-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Kitchen Sinks, Plumbing and Virtual Observatories

Peter [email protected]

June 4, 2010 – CSIRO Aspendale

Page 2: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 2

Introduction

• Systems compared to frameworks?• The need, and shifting the burden• Virtual Observatories• Architectures of VOs and semantics• In the lower layers of VOs

– Data access and transport– Formats, formats, formats– Sensor streams

• How do you/ would you participate?

Page 3: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 3

Frameworks vs. Systems

• Rough definitions– Systems have very well-define entry and exit

points. A user tends to know when they are using one. Options for extensions are limited and usually require engineering

– Frameworks have many entry and use points. A user often does not know when they are using one. Extension points are part of the design

• Treat this as a working definition

Page 4: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Diversity, Integration, Size, …

• Not just large (well organized, long-lived, well-funded) projects/ programs want to make their data available

• Data policies are emerging but are still highly variable (or non-existent)– How does a user deal with this?

• Need to manage data to solve challenging scientific or societal problems without the continued need for a scientist to know every detail of complex data management systems

• Large-scale, scientific data repositories:– Most data still created in a manner to simplify generation, not access or use– Very diverse organization of data; files, directories, metadata, emails, etc.– Source/origin management is driven by meta-mechanisms for integration,

interoperability (but still need performance)• Virtual Observatories• Data Grids

• Increasing realization: need management for all forms of ‘data’, I.e. virtual data products are becoming the norm

Size matters; personal data management is as big,

or bigger problem as source data management

Page 5: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Shifting the Burden from the Userto the Provider (with the help of VOs)

Page 6: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

6

Terminology

• Workshop: A Virtual Observatory (VO) is a suite of software applications on a set of computers that allows users to uniformly find, access, and use resources (data, software, document, and image products and services using these) from a collection of distributed product repositories and service providers. A VO is a service that unites services and/or multiple repositories.

• VxOs - x is one discipline, domain, community, country

• NB: VO also refers to Virtual Organization

Page 7: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

7

What should a VO do?

• Make “standard” scientific research much more efficient.– Even the principal investigator (PI) teams should want to use them.– Must improve on existing services (mission and PI sites, etc.). VOs will

not replace these, but will use them in new ways.

• Enable new, global problems to be solved. – Rapidly gain integrated views from the solar origin to the terrestrial

effects of an event.– Find data related to any particular observation.– (Ultimately) answer “higher-order” queries such as “Show me the

data from cases where a large coronal mass ejection observed by the Solar-Orbiting Heliospheric Observatory was also observed in situ.” (science-speak) or “What happens when the Sun disrupts the Earth’s environment” (general public)

Page 8: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

8

Virtual Observatories

• Conceptual examples: • In-situ: Virtual measurements

– Related measurements

• Remote sensing: Virtual, integrative measurements– Data integration

• Both usage patterns lead to additional data management challenges at the source and for users; now managing virtual ‘datasets’

Page 9: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

9

Virtual Observatories

Make data and tools quickly and easily accessible to a wide audience.

Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated

Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage

Page 10: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

10

Early days of VxOs

… … … …

VO1

VO2 VO3

DB2 DB3DBn

DB1

?

Page 11: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

11

Federation

… … … …

VO1

VO2VO3

DB2 DB3DBn

DB1

VO4

Page 12: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

12

The Astronomy approach; data-types as a service

… … … …

VO App1VO App2

VO App3

DB2 DB3DBn

DB1

VOTable Simple

Image Access Protocol Simple Spectrum

Access Protocol

Simple Time Access

ProtocolVO layer

Limited interoperability

Lightweight semantics

Limited meaning, hard coded

Limited extensibility

Under review

OGC: {WFS, WCS, WMS} and

SWE {SOS, SPS, SAS}

use the same approach

Page 13: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Similarities to Astronomy

• Some disciplines have chosen a data format (some even use FITS)• Common applications, community standards appearing• Images, spectra (incl. multi-band), …• More and more data is on-line, some (near) real-time• Data flood - synoptic measurements, spatial/ spectral resolution,

number of instruments, cadence - all increasing (peta-byte to exa-byte is real), data mining and knowledge extraction are now real needs

• Don’t move (or replicate?) the data when possible• Means for interoperation is being demanded - service-oriented

architectures• Some VOs even implementing IVoA standards (primarily

heliophysics and space physics)

Page 14: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Differences with astronomy

• Data types (+station/point, irregular, multi-resolution, ragged arrays, swath, …)

• Data formats - many• Lots of VOs• Metadata conventions range from strict to non-existent• Provenance, derivation and semantics being applied in (more)

formal ways• Geo-spatial dominates (cf helio-spatial), some standards but

little/no enforcement - efforts at conventions/ standards are at data model level

• New to the theme of integration and inter-disciplinary• Number and complexity of projects, systems, frameworks -

need to interoperate at many levels• Social, political and mission forces are immense

Page 15: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

15 Fox - APAC 2007, Driving e-research:

Grids and Semantics

… … … …

VO Portal

Web Serv.

VO API

DB2 DB3DBn

DB1

Semantic mediation layer - VSTO - low level

Semantic mediation layer - mid-upper-level

Education, clearinghouses, other services, disciplines, etc.

Metadata, schema, data

Query, access and use of data

Semantic query, hypothesis and inference

Semantic interoperability

Added value

Added value

Added value

Added value

Mediation Layer• Ontology - capturing concepts of Parameters,

Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes

• Maps queries to underlying data• Generates access requests for metadata, data• Allows queries, reasoning, analysis, new hypothesis

generation, testing, explanation, etc.

Page 16: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

16

Semantic Web Benefits• Unified/ abstracted query workflow: Parameters, Instruments, Date-Time• Decreased input requirements for query: in one case reducing the number of

selections from eight to three• Generates only syntactically correct queries: which was not always insurable in

previous implementations without semantics• Semantic query support: by using background ontologies and a reasoner, our

application has the opportunity to only expose coherent query (portal and services)

• Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, and exposed as smart web services– understanding of coordinate systems, relationships, data synthesis, transformations.– returns independent variables and related parameters

• A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)

Page 17: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 17

Virtual Carbon Observatory

Page 18: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Environmental Assessment

Page 19: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Understand Communities Of Stakeholders

Page 20: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 20

Page 21: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Multi-domain Knowledge Base

21

Page 22: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 22

Vocabularies and Ontologies

• An underlying aspect of all VOs is the need to develop/ agree on a common presentation of the (virtual) holdings, aka a catalog

• As disciplines boundaries are crossed… (ecology)• Vocabularies are increasingly important in this

provision• And, interestingly, there is a real push toward more

explicit representations of semantics in the form of ontologies

• … and provision of vocabulary services*

Page 23: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 23

Let’s turn to plumbing

• Data formats are of resurgent interest but not so much for exchange– For structural representation and efficiency– For transparency and preservation– However, a lot of end-users still care about

formats immensely• Data access and transport• Implications of computing closer to the data

Page 24: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 24

netCDF and similar

• Version 3 (classic) vs. version 4 (aka CDM)• V4 - slow adoption to date (no specific reason)• Conventions (e.g. units, CF-1) make it work• Traditional focus on grids is now evolving as

in-situ data and model comparisons are becoming common, i.e. unstructured data

Page 25: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 25

Discipline neutral access

• One such approach, since 1993, is the DAP – Data Access Protocol (NASA, NOAA standard)

• opendap.org (U.S. not-for-profit)• OPeNDAP is the software

– Core, server (version 4 – Hyrax), client, services

Page 26: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

26

OPeNDAP Hyrax Architecture

OLFS BES

OPeNDAP Lightweight Front end Server (OLFS) Receives requests and asks the BES to fill them Uses Java Servlets Does not directly ‘touch’ data Multi-protocol

Data

Back End Server (BES) Reads data files, Databases, et c., returns info May return DAP2 objects or other data Does not require web server

Client

Page 27: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

27

GridFTPDAP2

HTTPDAP2

ASCII output

HTML form

Info output

OPeNDAP Lightweight Front end Server

THREDDS

Request Formulation**

Requ

est f

rom

clie

ntRe

spon

se to

clie

ntBES

SOAP-DAP (HTTP)

DAP2 (GridFTP, HTTP)

RDF, OWL, JSON (HTTP)

PML output

Page 28: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

28

Hyrax/ Back-end Server

Network Protocol andProcess start/stopactivities

Data Store Interfaces

BES Framework

PPT* Initialization/Termination

DAP2Access

NetCDF3 HDF4 RDF/ SPARQL…

Provenance

Commands**BES Commands/ XML Documents

*PPT is built in (other protocols)**Some commands are built inData DataData

DataCatalogs

Page 29: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Status of the Community OPeNDAP Server Software

• Hyrax 1.6 provides support for NcML-based aggregation

• Faster THREDDS implementation (but not full featured)

• Full security audit and static code analysis certification to comply with NOAA and NASA requirements

• DAP4 (which includes netCDF 4 support) is not available yet

• AND other things

Page 30: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Earth System Grid Center for Enabling Technologies: (ESG-CET)

Earth System Grid Center for Enabling Technologies

• Large data sets, numbers and sizes– High performance– Flexible architecture, both client and several types and numbers of

servers– Aggregation– Server side operations– Multiple transport protocol options

• Full ESG security support as well as loose federation• Full function client access via API (netCDF/CDM) To satisfy the new goals, the OPeNDAP services for ESG have been re-

architected. We now use parts of the standard OPeNDAP framework Hyrax, focusing

on high performance for the client side and extended flexibility.

Page 31: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Earth System Grid Center for Enabling Technologies: (ESG-CET)

Requirements leading to OPeNDAP-g

• Separation of the core Data Access Protocol (DAP) from the transport protocol (HTTP).

• High Performance Computing. The previous CGI based servers did not have the capacity required by ESG. Error and memory handling added.

• Security. Once the OPeNDAP was independent of the transport protocol, adding security was possible by relying on the Globus gsiFTP system.

• Aggregation. OPeNDAP 3.0 did not operate on aggregated datasets. OPeNDAP-g does.

• Transport protocol independence and HPC were incorporated back into OPeNDAP leading to the current version. Security and aggregation initially were ESG only features.

Page 32: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Earth System Grid Center for Enabling Technologies: (ESG-CET)

The Remote NetCDF Invocation (RNI)

The client is the netCDF library. It has exactly the same API as the standard C library netCDF, but it can deal with local files or files reachable via HTTP, PPT or gridFTP. The third tier, the BES server can be reached only via PPT. NetCDF services for all NetCDF calls are implemented a a BES module. The middle tier, acts like a proxy between the RNI client and server and deals with security.

Page 33: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Earth System Grid Center for Enabling Technologies: (ESG-CET)

RNI Architecture

CLIENTDATA

GridFTPOPeNDAP

BES

NetCDFLibrary

RNI Module

connection acts like

RNI Library

Page 34: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Earth System Grid Center for Enabling Technologies: (ESG-CET)

Characteristics of the RNI as part of a data access system

• Full Support of standard OPeNDAP URLs. RNI is being developed with the integrated Unidata/OPeNDAP netCDF library (and CDM)

• Transparent access to either standard netCDF files and aggregated datasets via the NetCDF Markup Language (NCML).

• For remote containers, all write operations are disable for security. That is, for HTTP/HTTPS, PPT and gridFTP/gsiFTP the RNI system is a read only API.

• RNI utilizes Just in Time access. Caching is only for metadata. No pre-fetching of data.

• RNI transparently accesses secure (gsiFTP, HTTPS) or insecure (gridFTP, HTTP) remote data.

Page 35: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Other DAP client/ API library status

• OPeNDAP-Unidata project to fold ‘libnc-dap’ into the standard netCDF distribution, i.e. you get ‘DAP’ for free

• New C-API for DAP – ‘oc’ replaces ocapi and will be the basis for rewrites of the IDL and Matlab (and other) client interfaces

Earth System Grid Center for Enabling Technologies: (ESG-CET)

Page 36: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 36

NOAA/IOOS

• DAP adopted by DMAC• Gateway project for OPeNDAP

– Support for WCS/WFS as source and response type in Hyrax

– Implementation of AIS (Ancillary Information Service) for RDF return prototype

– Initial DAP ontology data model

Page 37: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 37

Cloud

• Microsoft ported OPeNDAP Hyrax to their Azure cloud– http://opendap.cloudapp.net/dap – Web-client/form is at

http://opendap.cloudapp.net/dap/data/nc/contents.html

• Work on Azure Drive (Xdrive) underway• No decisions on future or other cloud

environments

Page 38: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 38

Security (authn/z)

• Developed with Bryan Lawrence (BADC/STFC) for federation of OPeNDAP security

• Specd. In May 2009, implementations presented at EGU in 2010

• Will appear in ESG and community OPeNDAP releases

• AAF compatible?

Page 39: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 39

Sensors

• Due to the increasing demand to process off the sensor:– Sky surveys – volume– Monitoring – for rapid response and decision

support– As part of a network, or on the internet, a web

• There is a corresponding increase in need to ingest/ publish data much earlier than has previously been needed

• Trend toward treating them as RT/NRT sensors

Page 40: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 40

Directions for sensor and spatial standards (my view)

• Has grown out of a limited set of semantic constructs– Geography, features, coverages, maps, streams

• Integration needs are driving different (good) developments, e.g. WCS 2 v WFS 2.

• Transparency requirements are going to drive very different approaches, e.g. encapsulation can be a barrier

• Refactoring of standards: much as is happening in astronomy will be required

Page 41: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 41

Who is developing?Your participation?

• VOs– U.S. – NASA, NSF, NOAA are developing/ funding– EU – many, e.g. HELIO, SOTERIA

• DAP/OPeNDAP– World-wide community, strong Australian contributions/

use• Sensors

– W3 recent – incubator for semantic sensor web – very, very important work

• Vocabulary servers (more than the vocabularies)– Interest in community-based (or W3) effort

Page 42: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

• Scaling to large numbers of data providers• Security, policy enforcement• Data quality• Branding and attribution (where did this data come from

and who gets the credit, is it the correct version, is this an authoritative source?)

• Provenance/derivation (propagating key information as it passes through a variety of services, copies of processing algorithms, …)

• Sustainability

Issues for Virtual Observatories - Geo

Page 43: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Summary/ Discussion

• The VO paradigm in is wide-spread use in Earth and Space Sciences– Successful implementations in production and use (some even

have evaluations)– New science is being enabled and performed– There are active programs at the agency level– Active communities; meeting, publishing, developing,

implementing• Data access and transport is an active field• New attention to spatio-temporal standards and

vocabularies in the context of services• Substantial re-visiting of architectures due to the need to

accommodate explicit semantics (esp. in regard to sensors)

Page 44: Kitchen Sinks, Plumbing and Virtual Observatories Peter Fox pfox@cs.rpi.edu June 4, 2010 – CSIRO Aspendale

Tetherless World Constellation 44

Further Information

• http://tw.rpi.edu/• http://www.opendap.org and

http://docs.opendap.org • Lots of others (ask me)• Contact:

[email protected]