l16 l17 datawarehouse

47
Rushdi Shams, Dept of CSE, KUET Database Systems Database Systems Data Warehousing Data Warehousing Version 1.0 Version 1.0 1

Upload: rushdi-shams

Post on 19-Jan-2015

132 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Database SystemsDatabase Systems

Data WarehousingData WarehousingVersion 1.0Version 1.0

1

Page 2: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Advent of Data WarehousingThe Advent of Data Warehousing

The existing database models were not The existing database models were not suitable to meet the requirements.suitable to meet the requirements.

The requirements can be categorized into The requirements can be categorized into two-two-

1.1. Operational Use Operational Use 2.2. Decision Support UseDecision Support Use

2

Page 3: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Operational UseOperational Use Requires precise, accurate, and Requires precise, accurate, and instantinstant picture of picture of

databasedatabase Day to day basis business-Day to day basis business-1.1. Customer comesCustomer comes2.2. Orders partsOrders parts

1.1. Search the partsSearch the parts2.2. Book/purchase the partsBook/purchase the parts3.3. Add datesAdd dates

1.1. Bank transactions on the purchase/ bookingBank transactions on the purchase/ booking2.2. InvoiceInvoice

3

Page 4: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Operational UseOperational Use Customer-company direct interactionCustomer-company direct interaction All the information are processed All the information are processed

instantaneously (or almost instantaneously)instantaneously (or almost instantaneously)

4

Page 5: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Decision Support UseDecision Support Use

Operational use magnifies the scope-Operational use magnifies the scope-Which customer, where he lives, what is his Which customer, where he lives, what is his phone number, which part he bought, how phone number, which part he bought, how much he paid, what was the date, bla bla bla…much he paid, what was the date, bla bla bla…

Decision support use narrows the scope-Decision support use narrows the scope-I need only the business related issues- which I need only the business related issues- which customer, which part he bought, how much customer, which part he bought, how much he paid and what was the datehe paid and what was the date

5

Page 6: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Decision Support UseDecision Support Use

… … & the benefits are-& the benefits are- In december, the company may need to stock In december, the company may need to stock

DDR RAM more than HDDDDR RAM more than HDD SATA HDDs are more sold than PATA HDDsSATA HDDs are more sold than PATA HDDs Mr. X is our honourable customer who bought Mr. X is our honourable customer who bought

most of the RAMs and Mr. Y is our honourable most of the RAMs and Mr. Y is our honourable customer who bought most of the SATA HDDscustomer who bought most of the SATA HDDs

6

Page 7: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

And The War Begins…And The War Begins…

So, the conflict between lightspeed So, the conflict between lightspeed applications (OLTP) and slog future predictions applications (OLTP) and slog future predictions led an advent of data warehousing.led an advent of data warehousing.

7

Page 8: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Relational DatabasesRelational Databases

Too granular, too many little piecesToo granular, too many little pieces Processing takes longer time for larger Processing takes longer time for larger

transactions by joining those little piecestransactions by joining those little pieces Very effective for Front End applications that Very effective for Front End applications that

are accessed by too many people too are accessed by too many people too frequentlyfrequently

Requires less hardware specificationRequires less hardware specification

8

Page 9: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Data warehousingData warehousing

Processes large amount of informationProcesses large amount of information Too less users (basically the owners)Too less users (basically the owners) Mainly for reporting and analysisMainly for reporting and analysis Hardware requirements are hugeHardware requirements are huge

9

Page 10: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The relation between themThe relation between them

Data Warehousing is simplest form of Data Warehousing is simplest form of relational databaserelational database

Try to only add data and remove data… Try to only add data and remove data… because most often changing requires huge because most often changing requires huge data processingdata processing

And you often do mistake in Keys for just two And you often do mistake in Keys for just two records, in this case you are dealing with records, in this case you are dealing with millions of records- so, think about data millions of records- so, think about data modificationsmodifications

10

Page 11: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The relation between themThe relation between them

The one-many / many-many / many-one The one-many / many-many / many-one relations and key constraints of relational relations and key constraints of relational model is still present in data warehousingmodel is still present in data warehousing

11

Page 12: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Dimensional Data ModelThe Dimensional Data Model

So, if data warehouse needs a different data So, if data warehouse needs a different data model rather than relational model, what that model rather than relational model, what that would be?would be?

The answer is dimensional data modelThe answer is dimensional data model

12

Page 13: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Dimensional Data ModelThe Dimensional Data Model

Contains-Contains-1.1. FactsFacts2.2. DimensionsDimensions Fact table contains transactions. For Fact table contains transactions. For

example, invoices of all the customers for example, invoices of all the customers for the last 5 years.the last 5 years.

The dimension tables describe the fact table.The dimension tables describe the fact table.

13

Page 14: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Dimensional Data ModelThe Dimensional Data ModelStatic Data

Dynamic Data

14

Page 15: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Star SchemaThe Star Schema

The most effective approach to model data The most effective approach to model data using dimensional data model is the using dimensional data model is the Star Star SchemaSchema

15

Page 16: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Star SchemaThe Star Schema

16

Page 17: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Star Schema: Equivalent DiagramThe Star Schema: Equivalent Diagram

17

Page 18: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Star Schema: PropertiesThe Star Schema: Properties

So, a star schema contains a fact table- which So, a star schema contains a fact table- which is robust as the time goes by, very dynamic, is robust as the time goes by, very dynamic, changes all the timechanges all the time

A star schema contains dimension tables- A star schema contains dimension tables- which are static, changes very little as the which are static, changes very little as the time goes bytime goes by

Star schema aids queries to join a bulky fact Star schema aids queries to join a bulky fact table with dimension tables to be simple and table with dimension tables to be simple and not time complexnot time complex

18

Page 19: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Snowflake SchemaThe Snowflake Schema

Normalized star schemaNormalized star schema Only the dimensions are normalizedOnly the dimensions are normalized The result is a fact table connected directly The result is a fact table connected directly

with some dimension tables and some with some dimension tables and some dimension tables connected to other dimension tables connected to other dimension tablesdimension tables

19

Page 20: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Snowflake SchemaThe Snowflake Schema

Fact Table

NormalizedDimension

Dimension

20

Page 21: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Snowflake Schema: Equivalent The Snowflake Schema: Equivalent ViewView

21

Page 22: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The ProblemThe Problem Not too many tables but too many layersNot too many tables but too many layers The most used Relational algebra in The most used Relational algebra in

dimensional database is dimensional database is JoinJoin Too many tables in joins, too many overheads.Too many tables in joins, too many overheads. There are not many tables here There are not many tables here But too many layers, joining one table But too many layers, joining one table

requires joining other related tables requires joining other related tables And if one of those tables (Fact) have trillions And if one of those tables (Fact) have trillions

of data, you are dead!of data, you are dead!

22

Page 23: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The ProblemThe Problem

If the SALE fact table has 1 million records, and all dimensions contain 10 records each, a Cartesian product would return 106 multiplied by 109 records. That makes for 1015 records

23

Page 24: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The SolutionThe Solution

Convert the snowflake schema into star Convert the snowflake schema into star schema.schema.

24

Page 25: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The SolutionThe Solution

25

Page 26: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The SolutionThe Solution

a join occurs between one fact table and six dimensional tables. That is a Cartesian product of 106 multiple by 106, resulting in 1012 records returned.

26

Page 27: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The DifferenceThe Difference

The difference between 1012 and 1015 is three decimals.

Three decimals is not just three zeroes and thus 1,000 records. The difference is actually 1,000,000,000,000,000 – 1,000,000,000,000 = 999,000,000,000,000.

27

Page 28: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Types of Dimension TablesTypes of Dimension Tables

Dimension tables showed so far are inadequateDimension tables showed so far are inadequate Typically, there are some conventions for Typically, there are some conventions for

dimension tables.dimension tables. Such as dates and locations are two common Such as dates and locations are two common

dimension tables in data warehouses.dimension tables in data warehouses. Why?? Most businesses have two common Why?? Most businesses have two common

issues- date of a transaction, place of shipment/ issues- date of a transaction, place of shipment/ deliverydelivery

28

Page 29: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Types of Dimension Tables: DatesTypes of Dimension Tables: Dates

29

Page 30: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Types of Dimension Tables: DatesTypes of Dimension Tables: Dates

30

Page 31: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Types of Dimension Tables: LocationsTypes of Dimension Tables: Locations

Locations, states, country, continent, etcLocations, states, country, continent, etc

31

Page 32: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Let’s Create a Data Let’s Create a Data Warehouse ModelWarehouse Model

32

Page 33: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

The Relational ModelThe Relational Model

33

Page 34: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Step 1Step 1

Identify the Fact tableIdentify the Fact table The Fact table contains (mostly) transactions The Fact table contains (mostly) transactions

that occur day-to-day basis/ that are related that occur day-to-day basis/ that are related with money/ anything that is the main with money/ anything that is the main purpose of a businesspurpose of a business

34

Page 35: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Step 1: Finding the Fact TableStep 1: Finding the Fact Table

35

Page 36: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Step 1Step 1

So, our fact table would be (in this case) So, our fact table would be (in this case) RoyaltyRoyalty

36

Page 37: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Step 2: Find Dimension TablesStep 2: Find Dimension Tables

Find the tables that are static, not dynamic… Find the tables that are static, not dynamic… dynamic one is the Fact table.dynamic one is the Fact table.

We will take a look at both the static We will take a look at both the static (dimension) tables and dynamic (fact) tables (dimension) tables and dynamic (fact) tables when we will finish step 3when we will finish step 3

37

Page 38: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Step 3Step 3

Develop a snowflake schema with the fact and Develop a snowflake schema with the fact and dimension tablesdimension tables

38

Page 39: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Step 3: Snowflake SchemaStep 3: Snowflake Schema

39

Page 40: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Step 3: Snowflake SchemaStep 3: Snowflake Schema

40

Page 41: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Step 4Step 4

Develop a star schema by denormalizing the Develop a star schema by denormalizing the snowflake schemasnowflake schema

41

Page 42: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Step 4: Star SchemaStep 4: Star Schema

42

Page 43: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Step 4: Star SchemaStep 4: Star Schema

43

Page 44: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Surrogate Key: Important Key in Data Surrogate Key: Important Key in Data WarehouseWarehouse

A customer is recognized in table 1 by customer nameA customer is recognized in table 1 by customer name The same person in table 2 is recognized by telephone The same person in table 2 is recognized by telephone

numbernumber The same person in table 3 is recognized by SSN numberThe same person in table 3 is recognized by SSN number If you have to make table 1, 2, 3 as dimension tables, If you have to make table 1, 2, 3 as dimension tables,

then the fact table will not be able to recognize the then the fact table will not be able to recognize the same person having 3 foreign keys from those tablessame person having 3 foreign keys from those tables

44

Page 45: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Surrogate Key: Important Key in Data Surrogate Key: Important Key in Data WarehouseWarehouse

45

Page 46: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

Understanding the Fact TableUnderstanding the Fact Table

Facts are numeric valuesFacts are numeric values Facts are not the foreign key fieldsFacts are not the foreign key fields The foreign keys are used to provide more The foreign keys are used to provide more

detail with a fact- which are the main focus of detail with a fact- which are the main focus of the businessthe business

46

Page 47: L16 l17  datawarehouse

Rushdi Shams, Dept of CSE, KUET

ReferenceReference

Beginning Database Design by Gavin Beginning Database Design by Gavin Powell, Wrox Publications, 2005Powell, Wrox Publications, 2005

47