data integration with cloveretl

25
The Practical Side of Information Integration with 1 Fariz Darari (FU Bolzano) [email protected]

Upload: fariz-darari

Post on 11-May-2015

2.025 views

Category:

Technology


7 download

TRANSCRIPT

Page 1: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

1

The Practical Side ofInformation Integration with

Page 2: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

2

Outline

1. Information Integration2. CloverETL3. Demo– Global Schema– Data Sources– Queries

Page 3: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

3

INFORMATION INTEGRATION

Page 4: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

4

Information Integration

II has the aim to provide uniform access to data that are stored in a number of autonomous and heterogeneous sources.

Page 5: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

5

Challenges

• Different data models (structured, semi-structured, text)

• Different schemata• Differences in the representation of – values (km vs. miles, USD vs. EUR)– entities (addresses, dates, etc.)

• Inconsistencies among the data

Page 6: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

6

Components

• Consists of:1. Global Schema

The unifying schema among local schemata.2. Wrappers

Wrappers make sources accessible. 3. Mediators

Translate queries, combine answers of wrappers and other mediators.

Page 7: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

7

Information Integration - GAV

• An approach of mapping source schemata and global schema

• GAV = relations in the global schema are views of the sources

• Views are virtual relations, the global schema describes a virtual DB

Page 8: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

8

Information Integration - GAV

Page 9: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

9

Information Integration - ETL

Page 10: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

10

Information Integration - ETL Products

Page 11: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

11

CLOVER ETL

Page 12: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

12

CloverETL

• An Open Source based platform for information integration.

• Data can be:– extracted from any number of sources– validated and modified along the way– written to one or more destinations.

Page 13: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

13

CloverETL - Company

Page 14: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

14

CloverETL - Architecture

Page 15: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

15

CloverETL - Designer

Page 16: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

16

CloverETL - Designer

• Transformation graphs are created in CloverETL Designer.

• Tranformation graphs are divided into:– Extract (Green)– Transformation (Yellow)– Load (Blue)

• The edges correspond to the data flows from data sources to data targets.

Page 17: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

17

DEMO

Page 18: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

18

Global Schema

Page 19: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

19

Global Schema - Example

• Student(sid, sname, age, nationality)• Country(cid, cname, currency)

Page 20: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

Data Sources

• Unibz (Bolzano), from Relational DB– StudentBZ(id, name, sex, age, nationality, address)

• Unitr (Trento), from XML– StudentTR(id, full_name, age, nationality)

• Unimi (Milan), from CSV– StudentMI(student_id, name, gender, age, citizenship)

• UN (United Nations), from Excel– CountryUN(id, country_name, population, capital, currency)

20

Page 21: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

21

Data Sources - Mapping

• Student(sid, sname, age, nationality) :- StudentBZ(sid, sname, _, age, nationality, _)

• Student(sid, sname, age, nationality) :- StudentTR(sid, sname, age, nationality)

• Student(sid, sname, age, nationality) :- StudentMI(sid, sname, _, age, nationality)

• Country (cid, cname, currency) :-CountryUN(cid, cname, _, _, currency)

Page 22: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

22

Queries1. All students with their information.

q(sid, sname, age, nationality) :- Student(sid, sname, age, nationality).

2. All students whose age is more than 22.q(sid, sname) :-

Student(sid, sname, age, nationality), age > 22.3. All students with their nationality’s currency.

q(sid, sname, age, nationality, currency) :- Student(sid, sname, age, nationality), Country(cid, nationality, currency).

4. The number of students per country.SELECT nationality, count(sid) FROM Student

GROUP BY nationality

Page 23: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

23

Demo

• Query:q(sid, sname) :-

Student(sid, sname, age, nationality), age > 22.• Logical Plans:q(sid, sname) :-

StudentBZ(sid, sname, _, age, nationality, _), age > 22.q(sid, sname) :-

StudentTR(sid, sname, age, nationality), age > 22.q(sid, sname) :-

StudentMI(sid, sname, _, age, nationality), age > 22.

Page 24: Data Integration with CloverETL

Fariz Darari (FU Bolzano) [email protected]

24

Demo - Execution Plan