ddam lecture 6 integration 1
TRANSCRIPT
-
8/6/2019 DDAM Lecture 6 Integration 1
1/42
1
Distributed Data Architecture &
Management
Dr Simon Scola
-
8/6/2019 DDAM Lecture 6 Integration 1
2/42
2
Acknowledgement
These LN series slides are based on slides adapted from authorsof the text book :
1) Distributed DBMS, M. Tamer zsu & Patrick Valduriez
Prentice Hall, 1999
2) OU course notes M877
With grateful acknowledgement
-
8/6/2019 DDAM Lecture 6 Integration 1
3/42
3
L ectures / ground rules
-
8/6/2019 DDAM Lecture 6 Integration 1
4/42
4
C oncepts & Topics for this LN
Interoperability IntegrationSchema global conceptual schems, schematransalation, schema integration, ...
H omogenization
-
8/6/2019 DDAM Lecture 6 Integration 1
5/42
5
Motivation
Why interoperate, integrate? K inds of interoperability
The general interoperability problem- Business push mergers; globalisation- Technology pull
N ew distributed applications; web, etcMany sources of data; huge volumesH eterogenous data & systemsL egacy systems still used to handle lots of data
-
8/6/2019 DDAM Lecture 6 Integration 1
6/42
6
Motivation... thus
R enewed much interest in broader question- I nformation system interoperability
But what does that mean? H ow do we achieve it?
Many issues here, but our focusin this lecture- Data/base Interoperability,
- Heterogeneity & Integration Issues
-
8/6/2019 DDAM Lecture 6 Integration 1
7/42
7
Distributed Data (seen this before! )
Paris projectsParis employeesParis assignmentsBoston employees
Montreal projectsParis projectsNew York projects
with budget > 200000Montreal employeesMontreal assignments
Boston
CommunicationNetwork
Montreal
Paris
New York
Boston projectsBoston employeesBoston assignments
Boston projectsNew York employeesNew York projectsNew York assignments
Tokyo
-
8/6/2019 DDAM Lecture 6 Integration 1
8/42
8
I ntegration problem (about hetrogenity )
Boston
CommunicationNetwork
Montreal
Paris
New York
Tokyo
-
8/6/2019 DDAM Lecture 6 Integration 1
9/42
9
Transparent system : User View
Distributed Database
-
8/6/2019 DDAM Lecture 6 Integration 1
10/42
10
R ecap. Multi-Distributed Database systems
-
8/6/2019 DDAM Lecture 6 Integration 1
11/42
11
R ecap. Multi-Distributed Database systems
Features- A collection of databases in which a global logical schema
exists to enable distributed data access and management,but in which each database can be accessed independentlyof the global system for local use
- Local sites can operate independently and apply a full set of D B MS operations locally
- Local external schemas available to local users- Global schema represents shared information over MDD B
-
8/6/2019 DDAM Lecture 6 Integration 1
12/42
12
R ecap. Multi-Distributed Database systems
Features- Global schema requires translation of heterogeneous local
schemas- MDD B requires both local and global management and
processing
-
8/6/2019 DDAM Lecture 6 Integration 1
13/42
13
Data/base integration
process of conceptually integrating many datasources (db or otherwise ) to form a single,cohesive databaseWhat does this mean in the context of designingdistributed data?
It means it is a process of designing the global conceptual
schema, Bottom-up
-
8/6/2019 DDAM Lecture 6 Integration 1
14/42
14
G lobal Logical schema
1.Fragmentationf1, f2, ,f5
2. Replicationf3 & f4 replicated
3.Partitioning2 partitions, p1 & p2
4.Allocation2sites
f1f5
f2f3
f4
p1p2
Bott
om- UP
-
8/6/2019 DDAM Lecture 6 Integration 1
15/42
15
Data/base integration /2
It means it is a process of designing theglobal conceptual schema, Bottom-up
And thus N OT applicable to all 4 kinds of architectures that we looked at
A pplicable in cases where a global conceptualschema is part of the architecture
Bottom up : means individual data sourcesalready exist
-
8/6/2019 DDAM Lecture 6 Integration 1
16/42
16
Thus ...
Designing the global concpetual schemainvolves integrating the components localconceptual schemas- E .g. CW integrating of schemas
H ow do we achieve such integration?What are the problems/issues?
H ow do we solve them?
-
8/6/2019 DDAM Lecture 6 Integration 1
17/42
17
Data integration
K inds of problems to resolve include :- Schema integration issues vs data/instance
integrations issues- Semnatic vs syntactic issues
-
8/6/2019 DDAM Lecture 6 Integration 1
18/42
18
G eneral Db Integration Process
Two step process, in general,- Translation- I ntegration
Data source 1
Translator 1
InS 1
GCS
Integrator
Data source 2
Translator 2
InS 2
Data sourceN
TranslatorN
InSN
....
....
....
-
8/6/2019 DDAM Lecture 6 Integration 1
19/42
19
G eneral Db I ntegration Process/2
Translation step- I nto a canonical model - Necessary if data sources are heterogenous
So what does heterogenous mean here?
- Aim to reduce translation to a min- C anonical model - sufficiently expressive to subsume /
include diverse concepts from many sources / databases
- More expressive?C olour printer or black and white printer
English alphabet (26 ) or Portuguese alphabet (23+ 13 accents )
-
8/6/2019 DDAM Lecture 6 Integration 1
20/42
20
E xpressiveness & heterogeneity
TheCh inese language itself is remarkably concrete.There isno word for "size," for example. I f you want to fit someone for shoes, you ask them for the "big -
small" of their feet.There is no suffix equivalent to "ness" inCh inese . Sothere is no "whiteness" -- only the white of the swanand the white of the snow.
TheCh inese are disinclined to use precisely defined terms or categories in any arena, but instead useexpressive, metaphoric language.
-
8/6/2019 DDAM Lecture 6 Integration 1
21/42
21
G eneral Db I ntegration Process/3
Translation step- Q1:
is this step needed if all data are held in a relational databases?- Q2 :
Are Oracle & MS SQ L server homogenous?
-
8/6/2019 DDAM Lecture 6 Integration 1
22/42
22
G eneral Db I ntegration Process/4
Integration step
Each I nS x is then integrated into a G C S
We assumed conceptual schemas from local toglobal;
-
8/6/2019 DDAM Lecture 6 Integration 1
23/42
23
G eneral Db Integration Process
Two step process, in general,- Translation- I ntegration
Data source 1
Translator 1
InS 1
GCS
Integrator
Data source 2
Translator 2
InS 2
Data sourceN
TranslatorN
InSN
....
....
....
-
8/6/2019 DDAM Lecture 6 Integration 1
24/42
24
I llustrative Example
Assume Two databases to be integrated :
1. R elational,- our running example of E MP - PROJ -ASG-S A L
- slightly modified, tables or relations
2. an E R model;- Similar SCH E M A & data,
- BU T NOT I D E NT I C A L; using E R concepts
Which is more Expressive?
-
8/6/2019 DDAM Lecture 6 Integration 1
25/42
25
I llustrative Example/2
db1 R elati o nal , modified as follows
E MP ( E NO, E N A M E , T I TL E)
PROJ (PNO, PN A M E , BU DG E T, LOC , CN A M E ) ASG ( E NO, PNO, R E SP, DU R )
P AY (T I TL E , S A L )
This is one version taken from the text book which we modify above.
-
8/6/2019 DDAM Lecture 6 Integration 1
26/42
26
I llustrative Example/3
Db2 - ER m o del /conceptsSimilar, including one significant difference, keeps Dataabout clients who contracted the projects
EN G INEE R
C LIENT
PROJ E C T WORK S -IN
C ONT RA C TED -BY
n
n
1
1
Different grahpical notations! Note: no attributes shown Relationship as diamond No pK shown here!!
-
8/6/2019 DDAM Lecture 6 Integration 1
27/42
27
I llustrative Example/4
Db2 - E R model/concepts
C LIENT
EN G INEE R PROJ E C T WORK S -IN
C ONT RA C TED BY
n
n
1
1
EN G INEE R ( EN O, EN G -N AME , T IT LE , S AL)PROJ E C T (PROJ N O, PROJ N AME , LO C , BU D G ET )
C LIENT (C N AME , ADD RE SS , ...)
&the relationship attributes are:
WORK S-IN ( RE SP O NSIBILTY, DURA TIO N)
CO NTRA CTE D-BY(C-D ATE )
-
8/6/2019 DDAM Lecture 6 Integration 1
28/42
28
Translation R M->E R
And.. what degree is the relationship?- 1:many or many:many?- An E MP is assigned to many projects and a project has
many E MP assigned to it m:n- P AY relation is difficult to handle, why?Is it an entity? Is it an attribute?
- I f an entity how is it related to other entities?
- How do we know what it is?- Can we create a 1:m relationship from P AY to E MP
-
8/6/2019 DDAM Lecture 6 Integration 1
29/42
29
Translation
So the R M is translated to an E R model thus
E MP PROJ E C T ASGn m
How do we know the degree of the relationships?
A nd , where is the pay E ntity?
-
8/6/2019 DDAM Lecture 6 Integration 1
30/42
30
Translation issues : Model P AY asattributes
E MP PROJ E C T ASGn m
SalaryTitle
Eng. N o. Eng N ame
-
8/6/2019 DDAM Lecture 6 Integration 1
31/42
31
Translation issues : Model P AY as Entity
E MP PROJ E C T ASGn m
SalaryTitle
Eng. N o. Eng N ame
PAY
PAYM ENT
n
1
-
8/6/2019 DDAM Lecture 6 Integration 1
32/42
32
So, which solution & why?
N eed to understand the difference (s ) between the 2solutions
- Sol -1: P AY as an entity- Sol -2: P AY as attribute
- Differences:1) which is the neater/simpler of the two?
2) which is more expressive? Why?3) Which would make the better canonical model E R /R M? Why?4) Which is better? Why?
-
8/6/2019 DDAM Lecture 6 Integration 1
33/42
33
Schema I ntegration (S I )
Canonical model being m o re expressive ; hencemetadataER is more expressive than R M;
Why?- I t means that E R can capture more concepts, more semantics /
more meanings, etc from the real world & thus results is a muchmore faithful model
- What about OO vs E R? (from the expressiveness perspective? )
-
8/6/2019 DDAM Lecture 6 Integration 1
34/42
34
I llustrative Example/5 Schema Translation (ST )
ST = mapping from one schema to another (see next slide ) R equires us to specify what the target global schema data
model is... OO, E R , other kind N ot 100% essential if achievable during the integrati o n
step, Integrator, then, has all infomation about the entire
global data sets at one & the same time
Which target model to use is thus chosen by theintegrator Can decide which model to use (OO, E R , etc )
-
8/6/2019 DDAM Lecture 6 Integration 1
35/42
35
G eneral Db Integration Process
Two step process, in general,- Translation- I ntegration
Data source 1
Translator 1
InS 1
GCS
Integrator
Data source 2
Translator 2
InS 2
Data sourceN
TranslatorN
InSN
....
....
....
-
8/6/2019 DDAM Lecture 6 Integration 1
36/42
36
I llustrative Example/6 Schema Translation (ST )
Integrator can decide which target model to use (OO,ER , etc )- Can make trade-offs between local schemas,- to choose appropriate representation,
- in case of conflicts between the local models as we will illustrate in thisexample
- Thus, integrator must have knowoledge of all possible trade-offs betweenthe many different schemas (which may be heterogenous ) being integrated
-
8/6/2019 DDAM Lecture 6 Integration 1
37/42
37
Schema I ntegration (S I )
Follows translation/mapping step,By integrating the intermediate schemasSI -- is a process,- I dentify components (in the two or more intermediate
models ) related to each other - Selecting best representation for the GCS
- And then, finally, integrating therelated components
-
8/6/2019 DDAM Lecture 6 Integration 1
38/42
38
N ow, onto schema I ntegration (S I )/2
What does related mean?Two components can be related :
- 1. as equivalent components- 2. one C contains the other C - 3. disjoint
- 4. any other?
-
8/6/2019 DDAM Lecture 6 Integration 1
39/42
39
Schema I ntegration Methods (S I )/3
Taxonomy / classification :
Integration process
Binary N -ary
Ladder balanced One-shot Iterative
-
8/6/2019 DDAM Lecture 6 Integration 1
40/42
40
Binary I ntegration Methods /4
a) Ladder step-wise b) Pure binary
-
8/6/2019 DDAM Lecture 6 Integration 1
41/42
41
N ary I ntegration Methods /5
a) O ne pass iteration>2 schema integrated at each step
O ne pass operation = all schemasare integrated producing GCS inone iteration stepPros:
A ll info about local schemas available;Trade-offs between all schemas , not just between a few
Cons:Increased complexityDifficult to automate
b) Iterative nary integrationPros:
More flexibilityMore general
-
8/6/2019 DDAM Lecture 6 Integration 1
42/42
42
Summary
Introduction to I ntegrationSchema integration
- Canonical model - Mapping
Schema integration approaches