1 distributed database concepts 8:30-10:00am thursday, july 21 st 2005 csig05 chaitan baru
TRANSCRIPT
![Page 1: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/1.jpg)
1
Distributed Database Concepts8:30-10:00AM
Thursday, July 21st 2005CSIG05
Chaitan Baru
![Page 2: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/2.jpg)
2
What is the issue?• Ability to access data stored in multiple, different
databases using a single request, e.g.– Get geologic information from multiple geologic
databases– Get employee information from all branches
• Ability to update data stored in multiple databases, e.g.– Transfer salary amount from University to my bank
account – Transfer funds from Visa account to vendor’s account
![Page 3: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/3.jpg)
3
Distributed data accessClient
Database 1 Database 2 Database 3
Homogeneous: mySQL mySQL mySQLHeterogeneous: mySQL Oracle DB2
How about creating a “cached” local copy?
mySQL Excel ASCII flat file
![Page 4: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/4.jpg)
4
Data WarehousingClient
Data Source 1 Data Source 2 Data Source 3
Data Warehouse(common schema)
ETL
– Extract– Transform– Load ETL ETL
1. Load data from sources to warehouse
2. Query processing interaction only between client and warehouse
But, warehouse data could be “stale”, i.e. out of synch with source data…
![Page 5: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/5.jpg)
5
Data integration via middlewareClient
Database 1 Database 2 Database 3
Data integration Middleware
(aka Mediator)
1. Each client request goes to sources, via middleware 2. Result collected by
middleware and returned to client
![Page 6: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/6.jpg)
6
Warehousing vs Mediation• Warehousing: User ETL to “massage” local data
to fit into a common global, warehouse schema • Mediation: Modify user query to match schemas
exported by each source– But, which schema does the user query?– The Integrated View Schema– Sources “export” a view (the export schema)
• Federated databases– Local sources belong to different “administrative
domains”, i.e. different owners.– Local autonomy
![Page 7: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/7.jpg)
7
The Canonical Mediator / Wrapper Architecture
Client Application
Wrapper Wrapper Wrapper Wrapper
Mediator(Integrated view in mediator data model, e.g. relational, XML)
Local viewin local data model
Export viewin mediator data model
Q1
Q11 Q12 Q13 Q14
Cacheddata
Wrapper processes could execute at sources, at mediator, or elsewhere
q14Data source 1
Local schema
Data source 2
Local schema
Data source 3
Local schema
Data source 4
Local schema
![Page 8: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/8.jpg)
8
Example: A Relational Mediator
Client Application
Mediator(Relational data model)
Wrapper Wrapper
Relational DBMSe.g. PostGIS
Shape file
![Page 9: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/9.jpg)
9
Example: A Shape-file Based Mediator
Client Application
Mediator(Shape file-based data model)
Wrapper Wrapper
Relational DBMSe.g. PostGIS
Shape file
![Page 10: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/10.jpg)
10
Example: An XML Mediator
User / Applications
Mediator(XML-based data model, e.g. GML)
Wrapper Wrapper
Relational DBMSe.g. PostGIS
Shape file
Wrapper
XML filee.g. ArcXML
![Page 11: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/11.jpg)
11
User Authentication and Access Control
Client Application
Mediator
Wrapper Wrapper
Data source 1
Data source 2
2. User connects to mediator (passes credentials to mediator)
1. User authenticates to system
3. Mediator connects to sourcesa) Using original user credentialsb) Or, mapped credentials (role-based access)
4. Need to define users or roles in sources
![Page 12: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/12.jpg)
12
Different types of heterogeneity in data integration
• Platform heterogeneity: different OS platforms
• DBMS heterogeneity: different database systems, e.g. SQLServer, mySQL, DB2
• Data type heterogeneity• Schema heterogeneity• Heterogeneity in units, accuracy, resolution• Semantic heterogeneity
![Page 13: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/13.jpg)
13
• A long standing Computer Science problem• Simple case
– Mediator View: (SampleID varchar, Rock_Type varchar, Age int) – In Source2 Table, map Age to int
Wrapper: convert between int and varchar for Age
WrapperSample ID: Rock type: Age: … varchar varchar int
Schema Integration
Sample ID: Rock type: Age: … varchar varchar varchar
Source 1Table
Source 2Table
![Page 14: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/14.jpg)
14
Another integration scenario
– Mediator View:(SampleID varchar, Rock_Type varchar, Age varchar, Era varchar, Period varchar)
– In Source 2 Table, parse Age to obtain sub-components of the field
Sample ID: Rock type: Eon: Era: Period:varchar varchar varchar varchar varchar
Phanerozoic Mesozoic Jurassic
“Phanerozoic/mesozoic;jur”
Source 1Table
Sample ID: Rock type: Age:varchar varchar varchar
Source 2Table
![Page 15: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/15.jpg)
15
A more advanced integration scenario
• Mediator View: (SampleID varchar, Rock_Type varchar, Eon varchar, Era varchar, Period varchar)– Same as Source1 table schema
• Query: Get rock types for all rocks from the Jurassic period
Sample ID: Rock type: Eon: Era: Period:varchar varchar varchar varchar varchar
Phanerozoic Mesozoic Jurassic
150
Source 1Table
Sample ID: Rock type: Age:varchar varchar int
Source 2Table
![Page 16: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/16.jpg)
16
Doing the integration• Query sent to mediator:
SELECT DISTINCT(Rock_Type) FROM Mediator_View WHERE Period=‘Jurrasic’
• Query to Source 1:
SELECT DISTINCT(Rock_Type) FROM Source1_Table WHERE Period=‘Jurrasic’
• For Source2, need to map Period=“Jurassic” to Age values
Sample ID: Rock type: Age:varchar varchar int
Source 2 TableEon: Era: Period: Min Maxvarchar varchar varchar int int
Geologic_Time Table
![Page 17: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/17.jpg)
17
Query “fragment” sent to Source 2
• SELECT DISTINCT (S2.Rock_Type)
FROM
Source2_Table S2,
Geologic_Time_Table GT
WHERE
GT.Period = ‘Jurrasic’ AND
(S2.Age >= GT.Min) AND
(S2.Age <= GT.Max)
Where is the Geologic_Timetable stored ?
![Page 18: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/18.jpg)
18
Another complex query
• Query: Get rock types for all rocks from the mesozoic era– Easy to do for Source 1: Era = “Mesozoic”– For Source 2:
• Need to find numeric age range for Mesozoic– Find age range across all subclasses of Mesozoic
(Cretaceous, Jurassic, Triassic)
• Select all Source 2 Table records whose age range falls within the Mesozoic age range
![Page 19: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/19.jpg)
19
Data Integration Carts©
• Integrating data sets without explicitly creating views• An example request:
Plot all gravity data points that fall within the spatial extent of rocks of a given type, in the Rocky Mountain testbed region– Use GEONsearch to find all gravity and geologic data using
bounding box for “Rocky Mountain testbed region”• Need gazeteer / spatial ontology to determine Rocky Mountain region• Need to know classification of datasets (as gravity and geology)• Intersect extent of gravity and geologic datasets (from metadata) with
extent of Rocky Mountain region– Plot gravity point data that fall within polygons of rocks of given
type
![Page 20: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/20.jpg)
20
Ad hoc integration
GEONsearch Plot mapMap
Data Integration Cart© Query
Search MetadataCatalog
“Geologic and gravitydata in Rocky Mountains”
![Page 21: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/21.jpg)
21
Data Registration
Igneous
Granite Quartzmonzonite
Rock Classification Ontology
Gravitydataset
(X, Y)Metadata
Geologicdataset
Lat, Long, RockType Metadata
Item DetailRegistration
Item Registration(Schema registration)
Location
Latitude Longitude
Spatial Ontology
Point Polygon
![Page 22: 1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru](https://reader030.vdocuments.mx/reader030/viewer/2022032707/56649e555503460f94b4cf1c/html5/thumbnails/22.jpg)
22
Data Registration is Important!