itec 3220a using and designing database systems
DESCRIPTION
ITEC 3220A Using and Designing Database Systems. Instructor: Gordon Turpin Course Website: www.cse.yorku.ca/~gordon/itec3220S07 Office: CSEB3020. Chapter 13. The Data Warehouse. Transaction Processing Versus Decision Support. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/1.jpg)
ITEC 3220AUsing and Designing Database Systems
Instructor: Gordon TurpinCourse Website: www.cse.yorku.ca/~gordon/itec3220S07 Office: CSEB3020
![Page 2: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/2.jpg)
Chapter 13
The Data Warehouse
![Page 3: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/3.jpg)
3
• Transaction processing allows organizations to conduct daily business in an efficient manner– Operational database
• Decision support helps management provide medium-term and long-term direction for an organization
Transaction Processing Versus Decision Support
![Page 4: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/4.jpg)
4
Decision Support System (DSS) Components
![Page 5: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/5.jpg)
5
• Operational data – Relational, normalized database – Optimized to support transactions – Real time updates
• DSS – Snapshot of operational data– Summarized – Large amounts of data
• Data analyst viewpoint– Timespan– Granularity– Dimensionality
Operational vs. Decision Support Data
![Page 6: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/6.jpg)
6
• Database schema– Support complex (non-normalized) data– Extract multidimensional time slices
• Data extraction and filtering• End-user analytical interface• Database size
– Very large databases (VLDBs)– Contains redundant and duplicated data
The DSS Database Requirements
![Page 7: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/7.jpg)
7
• Integrated– Centralized– Holds data retrieved from entire organization
• Subject-Oriented – Optimized to give answers to diverse
questions– Used by all functional areas
• Time Variant – Flow of data through time– Projected data
• Non-Volatile – Data never removed– Always growing
Data Warehouse
![Page 8: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/8.jpg)
8
• Single-subject data warehouse subset
• Decision support to small group
• Can be tested for exploring potential benefits of Data warehouses
• Address local or departmental problems
Data Marts
![Page 9: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/9.jpg)
9
Data Warehouse Versus Data Mart
![Page 10: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/10.jpg)
10
• Data-modeling technique • Maps multidimensional decision support
into relational database• Yield model for multidimensional data
analysis while preserving relational structure of operational DB
• Four Components:– Facts– Dimensions– Attributes– Attribute hierarchies
Star Schema
![Page 11: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/11.jpg)
11
Simple Star Schema
![Page 12: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/12.jpg)
12
Slice and Dice View of Sales
![Page 13: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/13.jpg)
13
• Facts and dimensions represented by physical tables in data warehouse DB
• Fact table related to each dimension table (M:1)
• Fact and dimension tables related by foreign keys
• Subject to the primary/foreign key constraints
Star Schema Representation
![Page 14: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/14.jpg)
14
Star Schema for Sales
![Page 15: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/15.jpg)
15
ExampleCanadian financial organization is interested in building a data warehouse to analyze customers’ credit payments over time, location where the payments were made, customers, and types of credit cards. A customer may use the credit card to make a payment in different locations across the country and abroad. If a payment is made abroad it can be based on domestic currency and then converted into Canadian dollars based on currency rate. •Time is described by Time_ID, day, month, quarter and year.•Location is presented by Location_ID, name of the organization billing the customer, city and country where the organization is located, domestic currency.•A credit card is described by credit card number, type of the credit account, and customer’s credit rate. The customer’s rate depends on the type of the credit account.•A customer is described by ID, name, address, and phone.
![Page 16: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/16.jpg)
16
• Normalization of dimensional tables
• Multiple fact tables representing different aggregation levels
• Denormalization of the fact tables
• Table partitioning and replication
Performance-Improving Techniques for Star
Schema
![Page 17: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/17.jpg)
17
Normalization Example
• Normalize the star schema that you developed for Canadian financial organization on page 16 into 3NF.
![Page 18: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/18.jpg)
18
More Example
A supermarket chain is interested in building a data warehouse to analyze the sales of different products in different supermarkets at different times using different payment method.
– Each supermarket is presented by location_ID, city, country, and domestic currency.
– Time can be measured in time_ID, day, month, quarter, and year.
– Each product is described by product_ID, product_name, and vendor.
– Payment method is described by payment_ID, payment_ type.
Design a star schema for this problem and then normalize the star schema that you developed into 3NF.
![Page 19: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/19.jpg)
19
Data Warehouse Implementation Road Map
![Page 20: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/20.jpg)
Chapter 12
Distributed Database Management Systems
![Page 21: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/21.jpg)
21
The Evolution of Distributed Database Management
Systems
• Distributed database management system (DDBMS)
– Governs storage and processing of logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites
![Page 22: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/22.jpg)
22
Distributed Database Environment
![Page 23: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/23.jpg)
23
Database Systems: Levels of Data and Process
Distribution
![Page 24: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/24.jpg)
24
Single-Site Processing, Single-Site Data (SPSD)
• All processing is done on single CPU or host computer (mainframe, midrange, or PC)
• All data are stored on host computer’s local disk
• Processing cannot be done on end user’s side of the system
• Typical of most mainframe and midrange computer DBMSs
• DBMS is located on the host computer, which is accessed by dumb terminals connected to it
• Also typical of the first generation of single-user microcomputer databases
![Page 25: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/25.jpg)
25
Single-Site Processing, Single-Site Data
(Centralized)
![Page 26: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/26.jpg)
26
Multiple-Site Processing, Single-Site Data (MPSD)
• Multiple processes run on different computers sharing a single data repository
• MPSD scenario requires a network file server running conventional applications that are accessed through a LAN
• Many multi-user accounting applications, running under a personal computer network, fit such a description
![Page 27: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/27.jpg)
27
Multiple-Site Processing, Single-Site Data
![Page 28: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/28.jpg)
28
Multiple-Site Processing, Multiple-Site Data
(MPMD)• Fully distributed database
management system with support for multiple data processors and transaction processors at multiple sites
• Classified as either homogeneous or heterogeneous
• Homogeneous DDBMSs
– Integrate only one type of centralized DBMS over a network
![Page 29: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/29.jpg)
29
Multiple-Site Processing, Multiple-Site Data (MPMD)
(Cont’d)• Heterogeneous DDBMSs
– Integrate different types of centralized DBMSs over a network
• Fully heterogeneous DDBMS– Support different DBMSs that may
even support different data models (relational, hierarchical, or network) running under different computer systems, such as mainframes and microcomputers
![Page 30: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/30.jpg)
30
Distributed Database Design
• Data fragmentation:
– How to partition the database into fragments
• Data replication:
– Which fragments to replicate
• Data allocation:
– Where to locate those fragments and replicas
![Page 31: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/31.jpg)
31
Data Fragmentation
• Breaks single object into two or more segments or fragments
• Each fragment can be stored at any site over a computer network
• Information about data fragmentation is stored in the distributed data catalog (DDC), from which it is accessed by the TP to process user requests
![Page 32: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/32.jpg)
32
Data Fragmentation Strategies
• Horizontal fragmentation: – Division of a relation into subsets
(fragments) of tuples (rows)
• Vertical fragmentation: – Division of a relation into attribute
(column) subsets
• Mixed fragmentation: – Combination of horizontal and
vertical strategies
![Page 33: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/33.jpg)
33
Data Replication
• Storage of data copies at multiple sites served by a computer network
• Fragment copies can be stored at several sites to serve specific information requirements
– Can enhance data availability and response time
– Can help to reduce communication and total query costs
![Page 34: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/34.jpg)
34
Replication Scenarios
• Fully replicated database: – Stores multiple copies of each database
fragment at multiple sites– Can be impractical due to amount of
overhead
• Partially replicated database:– Stores multiple copies of some database
fragments at multiple sites– Most DDBMSs are able to handle the
partially replicated database well• Unreplicated database:
– Stores each database fragment at a single site
– No duplicate database fragments
![Page 35: ITEC 3220A Using and Designing Database Systems](https://reader035.vdocuments.mx/reader035/viewer/2022070410/56814652550346895db36899/html5/thumbnails/35.jpg)
35
Data Allocation
• Deciding where to locate data• Allocation strategies:
– Centralized data allocation• Entire database is stored at one site
– Partitioned data allocation• Database is divided into several disjointed
parts (fragments) and stored at several sites– Replicated data allocation
• Copies of one or more database fragments are stored at several sites
• Data distribution over a computer network is achieved through data partition, data replication, or a combination of both