mis 451 building business intelligence systems
DESCRIPTION
MIS 451 Building Business Intelligence Systems. Logical Design (1). Project Planning. Requirements Analysis. Logical Design. Physical Design. Data Staging. Data Analysis (OLAP). Introduction to Dimensional Modeling. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/1.jpg)
MIS 451
Building Business Intelligence Systems
Logical Design (1)
![Page 2: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/2.jpg)
2
Project Planning
Requirements Analysis
Physical Design
Logical Design
Data Staging
Data Analysis (OLAP)
![Page 3: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/3.jpg)
3
Introduction to Dimensional Modeling
Dimensional Modeling is a DW logical design technique that seeks to present data in a standard framework that is intuitive for data access and allows for high performance data access.
Intuitive: easy to write SQL High performance: high performance SQL
![Page 4: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/4.jpg)
4
Customer
Places
1
Order
M
Contain
1
OrderLine
MOrder
M
Product
1
Belong to
M
ProductCategory
1
SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES
CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY
PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME
TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER
reference
referenced by
reference
referenced by
reference
referenced by
ER Model
Dimensional Model (Star Schema)
For detailed information, please refer handout 1.
![Page 5: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/5.jpg)
5
Introduction to Dimensional Modeling
Analytical Report: 2-dimension January sales report by customer state and product category
Query: list sales in Jan. by customer state and product category?
![Page 6: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/6.jpg)
6
Introduction to Dimensional Modeling
Query based on ER Model:
Select State, PCName, SUM(Price*Quantity)
From OrderLine OL, Customer C, Product_Category PC, Product P, Order O
Where OL.OID = O.OID and OL.PID = P.PID and O.CID = C.CID and to_char(O.OrderDate,’MON’) = ’JAN’ and P.PCID = PC.PCID
Group by State, PCName
Join: 5 tables
Query based on Dimensional Model:
Select State, PCName, SUM(Sales)
From Sales S, Customer C, Product P, Time T
Where S.Time_ Key = T.Time_Key and S.Product_ Key = P.Product_Key and S.Customer_Key = C.Customer_Key and T.Month= ’JAN’
Group by State, PCName
Join: 4 tables
![Page 7: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/7.jpg)
7
Fact and Dimension
SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES
CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY
PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME
TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER
reference
referenced by
reference
referenced by
reference
referenced by
Fact table
Dimension table
![Page 8: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/8.jpg)
8
Fact and Dimension
There are two types of tables in dimensional modeling: Fact table: attributes in fact tables are
measurements for analysis or contents in reports.
Dimension table: attributes in dimension tables are constraints for the measurements or headers in reports.
Dimensions Facts
![Page 9: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/9.jpg)
9
Facts and Dimensions
Criteria Fact Attributes Dimension Attributes
Purpose Measurements for analysis Constraints for the measurements
Reporting use Report content Row or column report headers
Data type Most facts are numeric and additive. There are semi-additive or no-additive facts.
Textual, descriptive
Size Larger number of records Smaller number of records
![Page 10: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/10.jpg)
10
Facts and Dimensions
How to identify facts and dimensions? Requirements Analysis:
Analytical requirements: Marketing managers want to know sales performance for different product category in different states?
Information requirements: quantity of product sold, sales amount, product category, and customer states
ER Model
![Page 11: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/11.jpg)
11
SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES
CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY
PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME
TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER
reference
referenced by
reference
referenced by
reference
referenced by
![Page 12: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/12.jpg)
12
SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES
CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY
PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME
TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER
reference
referenced by
reference
referenced by
reference
referenced by
F1: Calculation
F: refers to special considerations for fact table or special type of fact table
![Page 13: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/13.jpg)
13
F1: Calculation
Normalization in RDB 1NF 2NF 3NF
Non-volatile property of data warehouse enables DW design to resist normalization and improve query performance.
![Page 14: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/14.jpg)
14
SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES
CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY
PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME
TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER
reference
referenced by
reference
referenced by
reference
referenced by
D1: Slowly changing dimension
D: refers to special considerations for dimension table or special type of dimension table
![Page 15: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/15.jpg)
15
D1: Slowly changing dimension
Values of attributes in dimension tables may evolve over time. For example, customers moved from one city to another city.
CID CName State City
101 Jon Arizona Tucson
102 Tom Arizona Tucson
103 Mark Arizona Phoenix
Tom moved from Tucson to Phoenix
Phoenix
![Page 16: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/16.jpg)
16
D1: Slowly changing dimension There are three ways to handle slowly changing dimension. Method 1: Overwrite old values with new values
CID CName State City
101 Jon Arizona Tucson
102 Tom Arizona Tucson
103 Mark Arizona Phoenix
CID CName State City
101 Jon Arizona Tucson
102 Tom Arizona Phoenix
103 Mark Arizona Phoenix
![Page 17: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/17.jpg)
17
D1: Slowly changing dimension
Drawbacks of method 1:
Historical information is totally lost.
We will never know that customer 102 lived in Tucson before.
Moreover, when listing sales by city, all the sales of customer 102 will be counted as part of Phoenix sales, although 102 was in Tucson before.
![Page 18: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/18.jpg)
18
D1: Slowly changing dimension Method 2: Add a new attribute to record current value of the changing attribute.
CID CName State City
101 Jon Arizona Tucson
102 Tom Arizona Tucson
103 Mark Arizona Phoenix
CID CName State Original City Current City
101 Jon Arizona Tucson Tucson
102 Tom Arizona Tucson Phoenix
103 Mark Arizona Phoenix Phoenix
![Page 19: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/19.jpg)
19
D1: Slowly changing dimension
Drawbacks of method 2:
Only partial Historical information (original & current) is kept.
Considering that customer 102 moved from Tucson to Flagstaff then to Phoenix, the customer information of customer 102 only includes Tucson and Phoenix.
![Page 20: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/20.jpg)
20
D1: Slowly changing dimension Method 3: Add a record whenever a dimension attribute changes.
CID CName State City
101 Jon Arizona Tucson
102 Tom Arizona Tucson
103 Mark Arizona Phoenix
![Page 21: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/21.jpg)
21
D1: Slowly changing dimension
Method 3 keep all the information. However,
Is there any problem?
![Page 22: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/22.jpg)
22
D1: Slowly changing dimension Method 4: warehouse key + method 3 Warehouse key is a sequence of non-negative integers served as primary keys of tables in data warehouse.
CID CName State City
101 Jon Arizona Tucson
102 Tom Arizona Tucson
103 Mark Arizona PhoenixWarehouse key
![Page 23: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/23.jpg)
23
D1: Slowly changing dimension
Why warehouse key is needed in data warehouse?
Solve slowly changing dimension problem
Compared with natural keys (i.e., primary keys of tables in RDB, such as CID of customer table), warehouse keys have high join performance.
![Page 24: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/24.jpg)
24
D1: Slowly changing dimension
Warehouse key
Primary keys in dimensional tables are warehouse keys.
Primary key in fact table is a collection of warehouse keys of all/part of its associated dimensions.
![Page 25: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/25.jpg)
25
SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES
CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY
PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME
TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER
reference
referenced by
reference
referenced by
reference
referenced by
D1: Slowly changing dimension
Notation: Primary key
![Page 26: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/26.jpg)
26
SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES
CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY
PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME
TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER
reference
referenced by
reference
referenced by
reference
referenced by
D2: Time Dimension
D: refers to special considerations for dimension table or special type of dimension table
![Page 27: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/27.jpg)
27
D2: Time Dimension Data warehouse needs an explicit time
dimension table instead of just a time attribute (e.g, ORDERDATE).
Besides the time attribute, time dimension table includes the following additional attributes:
Day_of_week (1-7); Day_number_in_month (1-31); Day_number_in_year (1-365) Week_number (1-52); month (1-12), Quarter (1-4) Holiday_flag (y/n) Fiscal_quarter, Fiscal_year
![Page 28: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/28.jpg)
28
D2: Time Dimension
Time dimension can:
Save computation effort and improve query performance
Complex queries regarding calendar calculation are hidden from end users of data warehouse.
![Page 29: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/29.jpg)
29
SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES
CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY
PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME
TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER
reference
referenced by
reference
referenced by
reference
referenced by
D3: Snowflake
D: refers to special considerations for dimension table or special type of dimension table
![Page 30: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/30.jpg)
30
D3: Snowflake
PRODUCT_CATEGORY# PRODUCT_CATEGORY_KEY* PCID* PCNAME
CUSTOMERTIME
SALES
PRODUCT# PRODUCT_KEY* PID* PNAME* PRODUCT_CATEGORY_KEY
REFERECEREFERENCED BY
REFERENCE
REFERENCED BY
REFERENCE
REFERENCED BY
REFERENCE
REFERENCED BY
Snowflake structure
![Page 31: MIS 451 Building Business Intelligence Systems](https://reader035.vdocuments.mx/reader035/viewer/2022062803/568146a0550346895db3b9ad/html5/thumbnails/31.jpg)
31
D3: Snowflake
Snowflake structure should be avoided in data warehouse design
Tradeoff of avoiding snowflake
Advantage: improve query performance
Disadvantage: require more storage space