semantic data integration proof of concept
TRANSCRIPT
Semantic Data Integration
I6 Core GroupNic BertrandHerbert Schentz
LTER-Europe Conference, Mallorca, Dec. 2008
Overview
■Testing goals■Test Architecture■Results■Outlook: Applicability for LTER-Europe
Architecture
Goal:Enable seamless access to distributed data Allow local data analysis for all members with their own tools
DistributedSocio-Ecological Data
See all data as if it came from ONE Data Source
Distributed Data miningwith local tools
Portal
Distributed Applications
Longer term visionExtend seamless access to distributed services (SOA)Allow local data analysis for all members with their own tools and common services
See all data as if data came from ONE Data Source processed within ONE application
Distributed Socio-ecological Data
Distributed Data MiningWith local tools
Distributed Data Miningwith local tools
Role of Ontology
Distributed Socio-Ecological Data
SERONTO
SERONTO: basis to discover, retrieve and integrate distributed heterogenous data
common conceptsand structures
Portal
Testing... Why?
■To validate the use of SERONTO for data integration of ALTER-Net and LTER Europe ■Test the feasibility of mapping REAL ecological data to SERONTO ■Test the querying of the connected database(s) from the semantic concepts in SERONTO
Proof of concept:Acceptance Criteria
• The databases must have different structures and must have been developed independently of SERONTO;
• The databases must feature reference lists (e.g. species lists);
• The database structures must not be altered as a result of the integration work;
• New concepts may be imported into SERONTO as and when required;
• The databases must contain data relevant to Long Term Ecological Research (e.g. vegetation surveys, records of species occurrences, measurement of biotic and abiotic components).
Testing: Connecting 5 databases
JOKLcultural
landscapes
JODIvegetation
2835floodplain
ECN Summary Database
More about the databases:Independently developed, Not developed for the purpose of data integrationDifferent data models Different languagesSimilar data types collected in ALTER-Net, Some obvious integration points (e.g. Vegetation)
Pythiavegetation
SERONTO
Data Integration using SERONTO
ImportOntology
Connect Databases
QuerySERONTOResults
Getting value sets back
SERONTO
parameter_method
parameter method
Value_sets Unit
Scale
Data Integration Results➢ Import SERONTO and Units Ontologies into Ontostudio
SERONTO
12
Data Integration Resultsimport diverse ecological databases
JOKLcultural
landscapes
JODIvegetation
2835floodplain
ECN Summary Database
Pythiavegetation
13
Data Integration Results
Extend SERONTO ClassesUsing the content of the databases
(SERONTO Core does not contain domain specific concepts)
Map databases to SERONTO (Simple and complex mappings)
Query individual databases directly
Query multiple databases from the SERONTO (Simple and Complex queries)
Map once, reuse data many times, querying does not require knowledge of the structures of the databases
Semantic data integration is possible
Open Questions
SERONTO Core
domain ontologies ?
<?xml version="1.0" encoding="UTF-8"?><flg:flogic xmlns:flg="http://www.wsmo.org/2004/d16/d16.2/v0.1/"> <!-- Test data to test the WSML F-Logic XML syntax --> <!-- The following <rule></rule> encodes this fact (taken from the F-Logic JACM paper, page 7):bob[name -> "Bob"; age -> 40; affiliation -> cs1[dname -> "CS"; mngr -> bob; assistents -> {john, sally}]
this encoding writes only elementary molecules--> <rule> <head> <molecule> <object> <constant name="bob"/> </object> <superclass isaType=":"> <class> <constant name="empl"/> </class> </superclass> <methodSpec arrow="->"> <name> <constant name="name"/> </name> <result> <oid> <constant name=""Bob""/> </oid> </result>
Portal
Query
Databases
Performance
Possible uses for LTER Europe
Distributed Data Miningwith local tools
Distributed Socio-Ecological Data
SERONTO & Domain Ontologiescommon conceptsand domain knowledge
Portal
Seamless access... Ready for use now