semantics-enhanced geoscience interoperability, analytics, and applications
TRANSCRIPT
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications Krishnaprasad Thirunarayan and Amit Sheth
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435
1
Outline
• Semantics-empowered Cyberinfrastructure for Geoscience Applications – Approaches, Benefits, and Challenges
(reflecting cost/convenience/pay-off trade-offs)
• Expressive search and integration using Geospatial information – SPARQL enhancements – Practical applications using semantic technologies,
sensor data streams, and spatial information
2
Semantics-empowered Cyberinfrastructure for Geoscience
applications
3
Domain Goals and Challenges
Data-driven understanding of the evolution of oceans, atmosphere, and solid earth over time through physical, chemical and biological processes.
• Cultural challenges – Proper protection, control, and credit for sharing data
• Technological challenges – Computational tools and repositories conducive to easy
exchange, curation, and attribution of data
Data sharing can promote re-analysis/re-interpretation of extant data, reducing “redundant” data collections.
4
Category of Geoscience
Data
Characteristics Strategy for Reuse CI Strategy
Shor t ta i l s c i e n c e data created b y l a r g e organizations a n d projects
Few, large (TB+), structured, spatially rich (e.g., remote sensing), largely h o m o g e n e o u s , h i g h l y v i s i b l e , curated
P l a n n e d i n t e g r a t i o n strategies, could use formal ontologies / domain models a n d v o c a b u l a r i e s , visualization tools and APIs
Data centers / grids g e n e r a l l y u s i n g relational databases and files, maintained b y p e o p l e w i t h significant IT skills
L o n g t a i l s c i e n c e data created by individual s c i e n t i s t s a n d s m a l l groups
Many, small (GB+), h e t e r o g e n e o u s , invisible (except via p u b l i c a t i o n s ) , poorly curated
Multi-domain and broad vocabularies ( including community establ ished ones), create semantic metadata (annotations) and optionally publish, search and download legacy data, o r use an open da ta initiative
Web-based easy to learn and use semantic tools for annotation, publication, search and download that can be used by individual s c i e n t i s t s w i t h o u t significant IT skills
5
Our Thesis
Associating machine-processable semantics with the long tail of science data and documents can help overcome challenges associated with data discovery, integration and interoperabi l i ty caused by data heterogeneity.
6
What?: Nature of Data
• Structured Data (e.g., relational)
• Semi-structured, Heterogeneous Documents (e.g., Geoscience publications and technical specs, which usually include text, numerics, maps and images)
• Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries)
7
What?: Granularity of Semantics and Associated Applications
• Lightweight semantics: File-level annotation to enable discovery and sharing of long tail of science data
• Richer semantics: Document-level annotation and
extraction for semantic search and summarization • Fine-grained semantics: Data integration,
interoperability and reasoning in Linked Open Data
8
Why?: Benefits of Lightweight Semantics
• Ease of use by domain experts – Faster and wider adoption, promoting evolution
• Low upfront cost to support
• Shallow semantics has wider applicability to a range of documents/data and appeal to a broader community of geoscientists
• Bottom-line: “Learn to Walk before we Run”
9
How?: Ingredients for Semantics-based Cyber Infrastructure
• Use of community-ratified controlled vocabularies and l ightweight ontologies (upper- level , hierarchies)
• Ease self-publishing and discovery
• Data citation index to credit for data sharing
• Semi-automat ic annotat ion of data and documents : Manual + Automatic
10
Title of data Selected from five tier vocabulary provided Keywords
Type of data maps, excel files, images, text
Data format structured or unstructured
Description of data brief unstructured description of content
Contact information of provider(s) name of provider(s), email for verification, lineage
Spatial extent of data and reference system
location
Temporal extent of data date range in time or age range if not recent
Date and type of Related Publication(s)
Journal, Thesis, Agency report, not published
Host site for publication Journal, Library, Personal computer
Access restrictions copyright regulations
Example: Lightweight Semantic Registration of Data
11
System Architecture and Components
12
Problems and A Practical Approach (“When rubber meets the road”)
Deeper Issues: Semantic Formalization of Tabular Data
13
skip
Nature of tables
• Compact structures for sharing information – Minimize duplication
• Types of Tables – Regular : Dense Grid with explicit schema
information in terms of column and row headings => Tractable
– Irregular: Sparse Grid with implicit schema and ad hoc placement of heading => Hard
14
Challenges Associated with Typical Spreadsheet/Table
• Meant for human consumption • Irregular :
– Not simple rectangular grid • Heterogeneous
– All rows not interpreted similarly • Complex
– Meaning of each row and each column context dependent • Footnotes modify meaning of entries (esp. in materials
and process specifications)
15
Practical Semi-Automatic Content Extraction
• DESIGN: Develop regular data structures that can be used to formalize tabular information. – Provide a natural expression of data – Provide semantics to data, thereby removing potential
ambiguities – Enable automatic translation
• USE: Manual population of regular tables and automatic translation into LOD
16
Expressive search and integration using Geospatial information
17
Outline
• Query Language Support for Spatio-Temporal Context: SPARQL-ST (=> GeoSPARQL)
• Practical Applications that use Spatio-Temporal information for joining Sensor Data to enable Machine Perception
18
Overview : SPARQL-ST
• SPARQL – W3C recommended query language for RDF data (as of
Jan. 15, 2008) – Graph pattern-based queries (subgraph match)
• SPARQL-ST – Spatial variables – Temporal variables – Spatial filter expressions – Temporal filter expressions
19
skipToEg
SELECT ?n WHERE { ?p foaf:name ?n . ?p usgov:hasRole ?r . ?r usgov:forOffice ?o . ?o usgov:represents ?q . ?q stt:located_at %g . ?a foaf:name “Nancy Pelosi” . ?a usgov:hasRole ?b . ?b usgov:forOffice ?c . ?c usgov:represents ?d . ?d stt:located_at %h .
SPATIAL FILTER (distance(%g, %h) <= 100 miles) }
Find all politicians that represent areas within 100 miles of the district represented by Nancy Pelosi.
20
SELECT ?p WHERE { ?p usgov:hasRole ?r #t1 . ?r usgov:forOffice ?o #t2 . ?o usgov:represents ?c #t3 . ?c stt:located_at %g #t4 . SPATIAL FILTER (inside(%g, GEOM(POLYGON (( -75.14 40.88, -70.77 40.88, -70.77 42.35, -75.14 42.35, -75.14 40.88))) )) TEMPORAL FILTER ( anyinteract(intersect (#t1, #t2, #t3, #t4), interval(10:01:2013, 10:31:2013, MM:DD:YYYY))) }
Find all politicians representing congressional districts within a given geographical area at any time in October 2013
21
Summary of SPARQL-ST
• Relationship-centric nature of the RDF data model extended for querying STT data
• Querying – Supports spatial and temporal relationships in graph
pattern queries – Integrates well with current standards
• Implementation – Good scalability on large synthetic/real-world data – Only system for spatial and temporal RDF
22
4th Annual Spatial Ontology Community of Practice Workshop (SOCoP) USGS, 12201 Sunrise Valley Drive , Reston VA
December 2, 2011
OGC GeoSPARQL Slides by Matt Perry of Oracle
(also: Kno.e.sis Alumnus)
23
OGC ®
What Does GeoSPARQL Give Us?
• Vocabulary for Query Patterns – Classes
• Spatial Object, Feature, Geometry – Properties
• Topological relations • Links between features and geometries
– Datatypes for geometry literals • ogc:WKTLiteral, ogc:GMLLiteral
• Query Functions – Topological relations, distance, buffer, intersection, …
• Entailment Components – RIF rules to expand feature-feature query into geometry query – Gives a common interface for qualitative and quantitative systems
24
SkipToEg
OGC ®
PREFIX : <http://my.com/appSchema#> PREFIX ogc: <http://www.opengis.net/geosparql#> PREFIX ogcf: <http://www.opengis.net/geosparql/functions#> PREFIX epsg: <http://www.opengis.net/def/crs/EPSG/0/> SELECT ?restaurant WHERE { ?restaurant rdf:type :Restaurant . ?restaurant :cuisine :Mexican . ?restaurant :pointGeometry ?rGeo . ?rGeo ogc:asWKT ?rWKT } ORDER BY ASC(ogcf:distance(“POINT(…)”^^ogc:WKTLiteral, ?rWKT, ogc:KM)) LIMIT 3
Find the three closest Mexican restaurants
Example Query
25
Practical Applications that use Spatio-Temporal information
for joining Sensor Data to enable Machine Perception
26
Applications using spatial and/or temporal information
• Location-aware applications – Four Squares – Open Street Maps
• Spatio-temporal-thematic (STT) context-enhanced data integration, querying, and inferencing (machine perception) – Semantic Sensor Web (+ SemSOS)
• Abstract weather sensor data streams to weather features
27
• Applications supporting expressive queries – Human comprehensible vs machine processable
• Geonames (LOD) ↔ Lat-long, GPS data – What is the current temperature or traffic delay at Dayton
International Airport?
– Knowledge-based query expansion/reasoning • Bridging vocabulary mismatches in the queries and the data,
e.g., using semantic relationships between regions and landmark locations – Find schools in OH – Find schools near Wright State University
(cont’d)
28
Semantic Sensor Observation Service Architecture : Making the Data Smart
29
SSW demo with Mesowest data (Machine Perception)
http://archive.knoesis.org/projects/sensorweb/demos/semsos_mesowest/ssos_demo.htm
30
Implementation of Perception Cycle
31
Trusted Perception Cycle Demo
http://www.youtube.com/watch?v=lTxzghCjGgU
32
Sensor Discovery on Linked Data Demo
http://archive.knoesis.org/projects/sensorweb/demos/sensor_discovery_on_lod/sample.htm
33
34
thank you, and please visit us at http://knoesis.org/
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA
Kno.e.sis