rule-based management of schema changes at etl sources
DESCRIPTION
Rule-based Management of Schema Changes at ETL sources. G. Papastefanatos 1 , P. Vassiliadis 2 , A. Simitsis 3 , T. Sellis 1,4 , Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) { gpapas , yv }@ dblab . ece . ntua . gr - PowerPoint PPT PresentationTRANSCRIPT
G. Papastefanatos1, P. Vassiliadis2, A. Simitsis3, T. Sellis1,4, Y. Vassiliou1
(1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas, yv}@dblab.ece.ntua.gr
(2) University of Ioannina, Ioannina, Hellas (Greece) [email protected]
(3) HP Labs, Palo Alto, California, USA [email protected]
(4) Institute for the Management of Information Systems (Greece) [email protected]
Rule-based Management of Schema Changes at ETL sources
MEDWa ‘09, Riga, September 2009 2
Outline
• Motivation
• Graph-based representation of ETL processes
• Regulating ETL Evolution
• Hecataeus Internals
• Conclusions
MEDWa ‘09, Riga, September 2009 3
Outline
• Motivation
• Graph-based representation of ETL processes
• Regulating ETL Evolution
• Hecataeus Internals
• Conclusions
Data Warehouse Schema Evolution
MEDWa ‘09, Riga, September 2009 5
Data warehouses are evolving environments, e.g.:
A dimension is removed or renamed
The structure of a dimension table is updated
A fact table is completely decoupled from a
dimension
The measures of a fact table change
An ETL source is modified, etc
Evolving ETL sources…
• Schema Changes on the sources of ETL processes. Design constructs are– Added, Removed, Modified
• ETL processes affected:– SyntacticallySyntactically – i.e., become invalid– SemanticallySemantically – i.e., must conform to the new source
database semantics
• Adaptation of ETL flows– time-consuming task, – treated in most of the cases manually by the
administrators/developers
MEDWa ‘09, Riga, September 2009 6
We would like to know...
• What part of the process is affected and how if e.g., an attribute is deleted?
• Can we predict and handle the impact of changes?
• To what extent can readjustment be automated?
MEDWa ‘09, Riga, September 2009 7
Hecataeus Framework
MEDWa ‘09, Riga, September 2009 8
Mechanism for performing what-if analysis for potential changes of ETL sources
Graph based representation of ETL workflows
Annotation of graph with rules for adapting ETL processes to source schema evolution
Evolution events are mapped to changes on the graph constructs
MEDWa ‘09, Riga, September 2009 9
Outline
• Motivation
• Graph-based representation of ETL processes
• Regulating ETL Evolution
• Hecataeus Internals
• Conclusions
Query representation
MEDWa ‘09, Riga, September 2009 11
Q: SELECT EMP.Emp#, Sum(WORKS.Hours) as T_Hours
FROM EMP, WORKS
WHERE EMP.Emp# = WORKS.Emp#
GROUP BY EMP.Emp#
JoinJoin, GB, GB
MEDWa ‘09, Riga, September 2009 12
Outline
• Motivation
• Graph-based representation of ETL processes
• Regulating ETL Evolution
• Hecataeus Internals
• Conclusions
Graph Annotation with rules
According to prevailing policy, the proper action is taken graph evolution
MEDWa ‘09, Riga, September 2009 13
We annotate
For reacting toW
ith rule
Set of graph elements· Query Node: Q1· Attribute Node: EMP.E_TITLE· View Node: Emps_Prjs, etc.
Set of rules· Propagate· Block· Prompt
Set of evolution events· Add Attribute· Delete Attribute· Rename View, etc.
1
3
2
Example
MEDWa ‘09, Riga, September 2009 14
Emp#
Name
Emp# Name
EMP
QS
S
map-select
map-select
S S
from PolicyOn attribute addition To EMP
then propagate
Emp#
Name
Emp# PhoneName
EMP
QS
S
map-select
map-select
S S S
from PolicyOn attribute addition To EMP
then propagate
Phone
S
map-select
...
...Status: Add_Child
Status: Add_Child
Q: SELECT EMP.Emp#, EMP.Name
FROM EMP
Q: SELECT EMP.Emp#, EMP.Name, Phone
FROM EMPEvent
Add attribute Phone to relation EMP
MEDWa ‘09, Riga, September 2009 15
Outline
• Motivation
• Graph-based representation of ETL processes
• Regulating ETL Evolution
• Hecataeus Internals
• Conclusions
System architecture
MEDWa ‘09, Riga, September 2009 16
DDL filesSQL scripts
DB Catalog
Parser
Create DB
Schema
Evolution Manager
Workload representation
Evolution Semantics
ValidateWorkload
Graph Viewer
DB Schema representation
XML, jpegImport/
Export ScenariosGraph Visualization
MetricManager
MEDWa ‘09, Riga, September 2009 18
Outline
• Motivation
• Graph-based representation of ETL processes
• Regulating ETL Evolution
• Hecataeus Internals
• Conclusions
Research in DB Evolution
• DB Schema Evolution– OODB evolution– Schema versioning
• DW Schema Evolution– Taxonomy of evolution events– Versioning– Materialized Views Evolution– View adaptation & synchronization
• Evolution wrt Model Mappings
MEDWa ‘09, Riga, September 2009 19
Summarizing
• The problem of adaptation of ETL workflows to evolvable data sources
• Graph –based representation of ETL activities• Graph enrichment with semantics for evolution
events• Graph annotation with rules for handling a priori
evolution events• Hecataeus: Framework for performing and
evaluating evolution scenarios in DW environments
MEDWa ‘09, Riga, September 2009 20