resource discovery for extreme scale collaboration

1
Resource Discovery for Extreme Scale Collaboration Jesse Weaver 1 (Jesse.Weaver @ pnnl.gov ) , Alan Chappell 1 ( [email protected] ) , Sumit Purohit 1([email protected] ) , William Smith 1([email protected] ) , Patrick West 2 ([email protected] ) , Benno Lee 2([email protected] ) , Karen Schuchardt 1([email protected] ) , Peter Fox 2([email protected] ) ( 1 Pacific Northwest National Laboratory, 2 Rensselaer Polytechnic Institute) The amount of data produced in the practice of science is growing rapidly. Despite the accumulation and demand for scientific data, relatively little is actually made available for the broader scientific community. We surmise that the root of the problem is the perceived difficulty to electronically publish scientific data and associated metadata in a way that makes it discoverable. We propose to exploit Semantic Web technologies and practices to make (meta)data discoverable and easy to publish. We share our experiences in curating metadata to illustrate both the flexibility of our approach and the pain of discovering data in the current research environment. We also make recommendations by concrete example of how data publishers can provide their (meta)data by adding some limited, additional markup to HTML pages on the Web. With little additional effort from data publishers, the difficulty of data discovery/access/sharing can be greatly reduced and the impact of research data greatly enhanced. RDESC Architecture TWC/RPI S2S Faceted Browser Facets on the left allow users to constrain their search based on data resources, GCMD Keywords, Special Measured Parameters, and lat/lon coordinates. The facets changed over time based on the metadata extracted from ingesting the various data resources. RDESC RDF Graphs An example description of a GCMD dataset as a RDF graph, using the initial ontology. The current ontology. Ovals represent classes/concepts, and arrows indicate subClassOf relationships. Classes are colored so that darker classes were established in the ontology prior to lighter classes. An example of a RDF description for an ARM data stream and how the ARM measured property hierarchy is used to link data streams to measured properties of interest An example of a RDF description for an ARM data stream and how the ARM measured property hierarchy is used to link data streams to measured properties of interest Conclusion we have emphasized the importance that data publish- ers provide their (meta)data in a way that makes structural and semantic integration a natural process. This is accomplished by following a shared vocabulary of terms embodied as an ontology, and by expressing metadata as RDF triples that utilize the ontology. Although this can sound daunting, we showed that doing so is actually quite easy in practice (section 5). We demonstrated the flexibility of this approach by curating existing metadata into the recommended format. Publishing (meta)data in this (or a similar) way will ameliorate (at least in part) the poor data sharing practices that currently pervade the practice No matter what dataset we have ingested we will be able to present the metadata in search and browse interface, like S2S above, and provide splash pages for each dataset with the information retrieved from the external system. Acknowledgments: Eric Rozell, Masters Student at Rensselaer Polytechnic Institute now with Microsoft Sponsors: US Department of Energy Glossary: OWL – Web Ontology Language PNNL – Pacific Northwest National Laboratory RDESC – Resource Discovery for Extreme Scale Collaboration RDFS – Resource Description Language Schema RPI – Rensselaer Polytechnic Institute SPARQL – a RDF query language S2S – a faceted web browser TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute Resources: http://rdesc.org - site developed fro RDESC project http://rdesc.org/2014/ - The RDESC ontology

Upload: farren

Post on 07-Jan-2016

35 views

Category:

Documents


1 download

DESCRIPTION

Sponsors: US Department of Energy. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Resource Discovery for Extreme Scale Collaboration

Resource Discovery

for Extreme Scale Collaboration

Jesse Weaver1 ([email protected] ), Alan Chappell1 ([email protected] ), Sumit Purohit1([email protected]), William Smith1([email protected]),

Patrick West2 ([email protected]), Benno Lee2([email protected]), Karen Schuchardt1([email protected]), Peter Fox2([email protected]) (1Pacific

Northwest National Laboratory, 2Rensselaer Polytechnic Institute)

The amount of data produced in the practice of science is growing rapidly. Despite the accumulation and demand for scientific data, relatively little is actually made available for the broader scientific community. We surmise that the root of the problem is the perceived difficulty to electronically publish scientific data and associated metadata in a way that makes it discoverable. We propose to exploit Semantic Web technologies and practices to make (meta)data discoverable and easy to publish. We share our experiences in curating metadata to illustrate both the flexibility of our approach and the pain of discovering data in the current research environment. We also make recommendations by concrete example of how data publishers can provide their (meta)data by adding some limited, additional markup to HTML pages on the Web. With little additional effort from data publishers, the difficulty of data discovery/access/sharing can be greatly reduced and the impact of research data greatly enhanced.RDESC Architecture

TWC/RPI S2S Faceted Browser

Facets on the left allow users to constrain their search based on data resources, GCMD Keywords, Special Measured Parameters, and lat/lon coordinates. The facets changed over time based on the metadata extracted from ingesting the various data resources.

RDESC RDF Graphs

An example description of a GCMD dataset as a RDF graph, using the initial ontology.

The current ontology. Ovals represent classes/concepts, and arrows indicate subClassOf relationships. Classes are colored so that darker classes were established in the ontology prior to lighter classes.

An example of a RDF description for an ARM data stream and how the ARM measured property hierarchy is used to link data streams to measured properties of interest

An example of a RDF description for an ARM data stream and how the ARM measured property hierarchy is used to link data streams to measured properties of interest

Conclusion

we have emphasized the importance that data publish- ers provide their (meta)data in a way that makes structural and semantic integration a natural process. This is accomplished by following a shared vocabulary of terms embodied as an ontology, and by expressing metadata as RDF triples that utilize the ontology. Although this can sound daunting, we showed that doing so is actually quite easy in practice (section 5). We demonstrated the flexibility of this approach by curating existing metadata into the recommended format. Publishing (meta)data in this (or a similar) way will ameliorate (at least in part) the poor data sharing practices that currently pervade the practice of science

No matter what dataset we have ingested we will be able to present the metadata in search and browse interface, like S2S above, and provide splash pages for each dataset with the information retrieved from the external system.

Acknowledgments:Eric Rozell, Masters Student at Rensselaer Polytechnic Institute now with Microsoft

Sponsors:

US Department of Energy

Glossary:OWL – Web Ontology LanguagePNNL – Pacific Northwest National LaboratoryRDESC – Resource Discovery for Extreme Scale CollaborationRDFS – Resource Description Language SchemaRPI – Rensselaer Polytechnic InstituteSPARQL – a RDF query languageS2S – a faceted web browserTWC – Tetherless World Constellation at Rensselaer Polytechnic Institute

Resources:http://rdesc.org - site developed fro RDESC projecthttp://rdesc.org/2014/ - The RDESC ontology