data integration with cloveretl
TRANSCRIPT
Fariz Darari (FU Bolzano) [email protected]
2
Outline
1. Information Integration2. CloverETL3. Demo– Global Schema– Data Sources– Queries
Fariz Darari (FU Bolzano) [email protected]
4
Information Integration
II has the aim to provide uniform access to data that are stored in a number of autonomous and heterogeneous sources.
Fariz Darari (FU Bolzano) [email protected]
5
Challenges
• Different data models (structured, semi-structured, text)
• Different schemata• Differences in the representation of – values (km vs. miles, USD vs. EUR)– entities (addresses, dates, etc.)
• Inconsistencies among the data
Fariz Darari (FU Bolzano) [email protected]
6
Components
• Consists of:1. Global Schema
The unifying schema among local schemata.2. Wrappers
Wrappers make sources accessible. 3. Mediators
Translate queries, combine answers of wrappers and other mediators.
Fariz Darari (FU Bolzano) [email protected]
7
Information Integration - GAV
• An approach of mapping source schemata and global schema
• GAV = relations in the global schema are views of the sources
• Views are virtual relations, the global schema describes a virtual DB
Fariz Darari (FU Bolzano) [email protected]
12
CloverETL
• An Open Source based platform for information integration.
• Data can be:– extracted from any number of sources– validated and modified along the way– written to one or more destinations.
Fariz Darari (FU Bolzano) [email protected]
16
CloverETL - Designer
• Transformation graphs are created in CloverETL Designer.
• Tranformation graphs are divided into:– Extract (Green)– Transformation (Yellow)– Load (Blue)
• The edges correspond to the data flows from data sources to data targets.
Fariz Darari (FU Bolzano) [email protected]
19
Global Schema - Example
• Student(sid, sname, age, nationality)• Country(cid, cname, currency)
Fariz Darari (FU Bolzano) [email protected]
Data Sources
• Unibz (Bolzano), from Relational DB– StudentBZ(id, name, sex, age, nationality, address)
• Unitr (Trento), from XML– StudentTR(id, full_name, age, nationality)
• Unimi (Milan), from CSV– StudentMI(student_id, name, gender, age, citizenship)
• UN (United Nations), from Excel– CountryUN(id, country_name, population, capital, currency)
20
Fariz Darari (FU Bolzano) [email protected]
21
Data Sources - Mapping
• Student(sid, sname, age, nationality) :- StudentBZ(sid, sname, _, age, nationality, _)
• Student(sid, sname, age, nationality) :- StudentTR(sid, sname, age, nationality)
• Student(sid, sname, age, nationality) :- StudentMI(sid, sname, _, age, nationality)
• Country (cid, cname, currency) :-CountryUN(cid, cname, _, _, currency)
Fariz Darari (FU Bolzano) [email protected]
22
Queries1. All students with their information.
q(sid, sname, age, nationality) :- Student(sid, sname, age, nationality).
2. All students whose age is more than 22.q(sid, sname) :-
Student(sid, sname, age, nationality), age > 22.3. All students with their nationality’s currency.
q(sid, sname, age, nationality, currency) :- Student(sid, sname, age, nationality), Country(cid, nationality, currency).
4. The number of students per country.SELECT nationality, count(sid) FROM Student
GROUP BY nationality
Fariz Darari (FU Bolzano) [email protected]
23
Demo
• Query:q(sid, sname) :-
Student(sid, sname, age, nationality), age > 22.• Logical Plans:q(sid, sname) :-
StudentBZ(sid, sname, _, age, nationality, _), age > 22.q(sid, sname) :-
StudentTR(sid, sname, age, nationality), age > 22.q(sid, sname) :-
StudentMI(sid, sname, _, age, nationality), age > 22.
Fariz Darari (FU Bolzano) [email protected]
25
References
• http://www.cloveretl.com/• http://www.inf.unibz.it/~nutt/InfInt1112/