pal gov.tutorial2.session13 2.gav and lav integration
TRANSCRIPT
1PalGov © 2011
أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy
www.egovacademy.ps
Tutorial II: Data Integration and Open Information Systems
Session 13.2
GAV and LAV Integration
Dr. Mustafa Jarrar
University of Birzeit
www.jarrar.info
2PalGov © 2011
About
This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the
Commission of the European Communities, grant agreement 511159-TEMPUS-1-
2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps
University of Trento, Italy
University of Namur, Belgium
Vrije Universiteit Brussel, Belgium
TrueTrust, UK
Birzeit University, Palestine
(Coordinator )
Palestine Polytechnic University, Palestine
Palestine Technical University, PalestineUniversité de Savoie, France
Ministry of Local Government, Palestine
Ministry of Telecom and IT, Palestine
Ministry of Interior, Palestine
Project Consortium:
Coordinator:
Dr. Mustafa Jarrar
Birzeit University, P.O.Box 14- Birzeit, Palestine
Telfax:+972 2 2982935 [email protected]
3PalGov © 2011
© Copyright Notes
Everyone is encouraged to use this material, or part of it, but should
properly cite the project (logo and website), and the author of that part.
No part of this tutorial may be reproduced or modified in any form or by
any means, without prior written permission from the project, who have
the full copyrights on the material.
Attribution-NonCommercial-ShareAlike
CC-BY-NC-SA
This license lets others remix, tweak, and build upon your work non-
commercially, as long as they credit you and license their new creations
under the identical terms.
PalGov © 2011 4
Tutorial Map
Topic h
Session 1: XML Basics and Namespaces 3
Session 2: XML DTD’s 3
Session 3: XML Schemas 3
Session 4: Lab-XML Schemas 3
Session 5: RDF and RDFs 3
Session 6: Lab-RDF and RDFs 3
Session 7: OWL (Ontology Web Language) 3
Session 8: Lab-OWL 3
Session 9: Lab-RDF Stores -Challenges and Solutions 3
Session 10: Lab-SPARQL 3
Session 11: Lab-Oracle Semantic Technology 3
Session 12_1: The problem of Data Integration 1.5
Session 12_2: Architectural Solutions for the Integration Issues 1.5
Session 13_1: Data Schema Integration 1
Session 13_2: GAV and LAV Integration 1
Session 13_3: Data Integration and Fusion using RDF 1
Session 14: Lab-Data Integration and Fusion using RDF 3
Session 15_1: Data Web and Linked Data 1.5
Session 15_2: RDFa 1.5
Session 16: Lab-RDFa 3
Intended Learning Objectives
A: Knowledge and Understanding
2a1: Describe tree and graph data models.
2a2: Understand the notation of XML, RDF, RDFS, and OWL.
2a3: Demonstrate knowledge about querying techniques for data
models as SPARQL and XPath.
2a4: Explain the concepts of identity management and Linked data.
2a5: Demonstrate knowledge about Integration &fusion of
heterogeneous data.
B: Intellectual Skills
2b1: Represent data using tree and graph data models (XML &
RDF).
2b2: Describe data semantics using RDFS and OWL.
2b3: Manage and query data represented in RDF, XML, OWL.
2b4: Integrate and fuse heterogeneous data.
C: Professional and Practical Skills
2c1: Using Oracle Semantic Technology and/or Virtuoso to store
and query RDF stores.
D: General and Transferable Skills2d1: Working with team.
2d2: Presenting and defending ideas.
2d3: Use of creativity and innovation in problem solving.
2d4: Develop communication skills and logical reasoning abilities.
5PalGov © 2011
Module ILOs
After completing this module students will be able to:
- Understand and apply GAV and LAV integration.
6PalGov © 2011
More about GAV and LAV Integration
Mapping in GAV:
• A GAV mapping is a set of queries on local sources S1, S2,
.., Sn (that contain real data!!), one for each element g of
the global schema.
• Such queries can be expressed in SQL or else in a formal
logic. We will follow the first approach
• g = SQL command (S1, S2, …,Sn)
• This means that the mapping tells us exactly how the
element g is computed from the local sources
7PalGov © 2011
More about GAV and LAV Integration
Mapping in LAV:
• A LAV mapping is a set of queries on the global schema
(that contains virtual data), one for each local source (that
contains real data!!).
• Si = SQL command (GS).
• In LAV, views express how sources contribute to the
global schema (and the related virtual db instance).
8PalGov © 2011
EXAMPLE
Global Schema: GProf (Name, age)
S1 Name Age
Khaled 24
Munir 51
S2 Name Age
Layla 56
Khaled 24
Expected extension
GProf Name Age
Khaled 24
Munir 51
Layla 56
Source S1 contains a first set of
professors
Source S2 contains a second set of
professors
Schema: S1(Name, Age)
Schema: S1(Name, Age)
9PalGov © 2011
CREATE VIEW GProf ASSELECT S1.Name as Name, S1.Age as Age FROM S1UNIONSELECT S2.Name AS Name, S2.Age AS Age FROM S2
The extension of this view is
Let’s define the global schemas as views on data sources
GProf Name Age
Khaled 24
Munir 51
Layla 56
EXAMPLE – GAV Mapping
S1 Name Age
Khaled 24
Munir 51
S2 Name Age
Layla 56
Khaled 24
Expected extension
GProf Name Age
Khaled 24
Munir 51
Layla 56
This view is called
‘EXACT’ because it is
exactly equal to the
expected extension
10PalGov © 2011
CREATE VIEW GProf ASSELECT S1.Name as Name, S1.Age as Age FROM S1UNIONSELECT S2.Name AS Name, S2.Age AS Age FROM S2
S1 Name Age
Khaled 24
Munir 51
S2 Name Age
Layla 56
Khaled 24
Expected extension
GProf Name Age
Khaled 24
Munir 51
Layla 56
We want to query the global schema to extract names of profs that are older than 50 years.
LET’S QUERY!
Select GProf.Name
From GProf
Where Age > 50
EXAMPLE – GAV Mapping
11PalGov © 2011
CREATE VIEW GProf ASSELECT S1.Name as Name, S1.Age as Age FROM S1UNIONSELECT S2.Name AS Name, S2.Age AS Age FROM S2
EXAMPLE – GAV Mapping
S1 Name Age
Khaled 24
Munir 51
S2 Name Age
Layla 56
Khaled 24
Expected extension
GProf Name Age
Khaled 24
Munir 51
Layla 56
TRY TO EXECUTE THE QUERY:
Select GProf.Name
From GProf
Where Age > 50
You should have performed the following process:
Substitution of Gprof with the definition of the viewSelect GProf.Name
From Select S1.Name, S1.Age from S1 Union …
Where Age > 50
12PalGov © 2011
CREATE VIEW GProf ASSELECT S1.Name as Name, S1.Age as Age FROM S1UNIONSELECT S2.Name AS Name, S2.Age AS Age FROM S2
EXAMPLE – GAV Mapping
S1 Name Age
Khaled 24
Munir 51
S2 Name Age
Layla 56
Khaled 24
Expected extension
GProf Name Age
Khaled 24
Munir 51
Layla 56
TRY TO EXECUTE THE QUERY:
Select GProf.Name
From GProf
Where Age > 50
GProf Name Age
Munir 51
Layla 56
Results
13PalGov © 2011
CREATE VIEW GProf ASSELECT S1.Name as Name, S1.Age as Age FROM S1UNIONSELECT S2.Name AS Name, S2.Age AS Age FROM S2
EXAMPLE – GAV Mapping
S1 Name Age
Khaled 24
Munir 51
S2 Name Age
Layla 56
Khaled 24
Expected extension
GProf Name Age
Khaled 24
Munir 51
Layla 56
The query is expressed and executed by the
mediator naturally, since in GAV, to execute
the query we only have to substitute the
references to Gprof in the query with the
mapping of Gprof in terms of local schemas
(this operation is called unfolding).
How is the query executed:
14PalGov © 2011
EXAMPLE – LAV Mapping
S1 Name Age
Khaled 24
Munir 51
S2 Name Age
Layla 56
Khaled 24
Expected extension
GProf Name Age
Khaled 24
Munir 51
Layla 56
Here the mapping describes the
contribution of the local sources to the
expected extension of the global schema
Create View S1 (Name, Age) asSelect GProf.Name as S1.Name, GProf.Age as S1.AgeFrom GProf
S1 (Name, Age)
15PalGov © 2011
EXAMPLE – LAV Mapping
S1 Name Age
Khaled 24
Munir 51
S2 Name Age
Layla 56
Khaled 24
Expected extension
GProf Name Age
Khaled 24
Munir 51
Layla 56
Here the mapping describes the
contribution of the local sources to the
expected extension of the global schema
Create View S1 (Name, Age) asSelect GProf.Name as S1.Name, GProf.Age as S1.AgeFrom GProf
S1 (Name, Age)
Create View S2 (Name,Age) asSelect GProf.Name as S2.Name, GProf.Age as S2.Age
From GProf
S2 (Name, Age)
16PalGov © 2011
EXAMPLE – LAV Mapping
S1 Name Age
Khaled 24
Munir 51
S2 Name Age
Layla 56
Khaled 24
Expected extension
GProf Name Age
Khaled 24
Munir 51
Layla 56
Let’s see the mapping as a query on the
global schema. In this case the mediator in
query execution can’t perform the unfolding
operation since the mapping is in the opposite
direction!!!
So, the mediator has to perfrom a reasoning.
The mediator may adopt a strategy in which,
starting from the definitions of the mappings,
looks for names of professors in both views
and subsequently fuses the results
Query Execution:
17PalGov © 2011
References
• Carlo Batini: Course on Data Integration. BZU IT Summer School
2011.
• Stefano Spaccapietra: Information Integration. Presentation at the IFIP
Academy. Porto Alegre. 2005.
• Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI
International, Artificial Intelligence Center. Menlo Park, USA. 2009.