11-dmdw11-dw curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/dmdw11.pdf · facts and dimensions...

53
Dimensional Modeling Dimensional Modeling Prof.dr.ing. Florin Radulescu Universitatea Politehnica din Bucureşti

Upload: vodat

Post on 31-Jan-2018

239 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Dimensional ModelingDimensional Modeling

Prof.dr.ing. Florin Radulescu

Universitatea Politehnica din Bucureşti

Page 2: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Facts and dimensions

�Steps in dimensional modeling

�Modeling example

Summary

Road Map

2

Florin Radulescu, Note de curs

DMDW-11

�Summary

Page 3: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

In the previous lesson we saw that:

�A large enterprise data warehouse will consist of 20 or more very similar-looking data marts, with similar dimensional

Facts and dimensions

3

Florin Radulescu, Note de curs

DMDW-11

data marts, with similar dimensional models.

�Each data mart may contain several fact tables, each with 5 to 15 dimension tables.

�Many of these dimension tables will be shared between several fact tables.

Page 4: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�A fact table represents a business process and contains the values for the main measurements describing that process.

�For a sale business process, this table

Facts

4

Florin Radulescu, Note de curs

DMDW-11

�For a sale business process, this table contains for example quantity sold, total price, total cost, total gross profit.

�The attributes of a fact table may be additive, semi-additive or non-additive

Page 5: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

• Additive measures can be aggregated across all dimensions.

• All examples in the previous slide are additive.

Additive measures

5

Florin Radulescu, Note de curs

DMDW-11

additive.

• For example, SUM(total price) is meaningful on all dimensions: time, location, store, customer, etc.

Page 6: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Semi-Additive measures can be aggregated across some dimensions but not all.

�Here are for example periodic measurements:�account balance for a bank account or

�inventory level for a retail chain.

Semi-Additive

6

Florin Radulescu, Note de curs

DMDW-11

�inventory level for a retail chain.

� In the first case an average may be computed for knowing the average daily balance but the sum of daily balances is not meaningful.

� In the second case, inventory level is additive on product and warehouse but not across time: the sum of yesterday and today inventory level for a given product is not a meaningful value.

Page 7: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

• Non-additive measures cannot be aggregated across all/any dimension.

• A classical example is the unit price.

• Considering a retail company, the sum of unit prices along any dimension (product, customer,

Non-additive measures

7

Florin Radulescu, Note de curs

DMDW-11

prices along any dimension (product, customer, location, etc.) is not meaningful.

• For that reason, if these values can be computed based on additive measures, the non-additive measures are not stored in the fact tables.

• For our example, the unit price can always be computed dividing total cost by the quantity sold.

Page 8: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�The level of detail of o record in a fact table is called the “grain” of the table.

�Besides business process measurements, the fact table contains also foreign keys for

Grain

8

Florin Radulescu, Note de curs

DMDW-11

the fact table contains also foreign keys for all the dimension tables and possibly some pseudo-foreign keys for some degenerate dimensions.

�More details on this topic in the next paragraphs of this lesson.

Page 9: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

In [CS680, 2004], dimension tables are characterized as follows:

�Represent the who, what, where, when and how of a measurement/artifact

�Represent real-world entities not business

Dimensions

9

Florin Radulescu, Note de curs

DMDW-11

�Represent real-world entities not business processes

�Give the context of a measurement (subject)

Example: in a retail company DW, the Sales fact table can be linked with dimensions like Location (Where), Time (When), Product (What), Customer (Who), Sales Channel (How).

Page 10: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

The Dimension Attributes are the columns of the dimension table. [Wikipedia] lists some features for these attributes:

�Verbose - labels consisting of full words,

�Descriptive,

Dimensions

10

Florin Radulescu, Note de curs

DMDW-11

�Descriptive,

�Complete - no missing values,

�Discretely valued - only one value per row in dimensional table,

�Quality assured - no misspelling, no impossible values.

Page 11: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Star scheme

11

Florin Radulescu, Note de curs

DMDW-11

Page 12: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Each fact table is surrounded by several linked dimension tables, as in Figure 1.

�Because of its appearance, such a construction is called a ‘star scheme’.

�A star scheme has several advantages:

Advantages

12

Florin Radulescu, Note de curs

DMDW-11

�A star scheme has several advantages:�Is easy to understand. Graphic representations have

almost always this advantage

�Provide better performance: data is de-normalized in fact and dimension tables, so for obtaining a query result needs only the joins between the fact table and the implied dimensions

�Is extensible. Attributes and dimensions may be added easily

Page 13: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

SQL Query

SELECT P.Name, SUM(S.Sales), . . .

FROM Sales S, Date D, Product P, Location L, Promotion R

WHERE S.Date_Id = D.Date_Id // join condition

AND S.Poduct_Id = P.Product_Id

AND S.Location_Id = L.Location_Id

13

Florin Radulescu, Note de curs

DMDW-11

AND S.Location_Id = L.Location_Id

AND S.Promotion_Id = R.Promotion_Id

// additional condition

AND D.Month='JUN' AND D.Year='2012' AND L.Country_Name='ROU'

GROUP BY P.Product_Id, P.Name // P.Name is needed because

// is in the SELECT clause

Page 14: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Sometimes the dimension tables are normalized: each dimension is stored as a set of tables.

�In this case, the scheme is called ‘Snow-

Snow-flake schemes

14

Florin Radulescu, Note de curs

DMDW-11

�In this case, the scheme is called ‘Snow-flake scheme’

Page 15: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Example

15

Florin Radulescu, Note de curs

DMDW-11

Page 16: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Facts and dimensions

�Steps in dimensional modeling

�Modeling example

Summary

Road Map

16

Florin Radulescu, Note de curs

DMDW-11

�Summary

Page 17: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�There are four steps in dimensional modeling design.

�These steps must be performed in particular order and every review of a step

The four step approach

17

Florin Radulescu, Note de curs

DMDW-11

particular order and every review of a step triggers the review of all subsequent steps.

Page 18: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�These four steps are:

1. Select the business processes

2. Declare the grain

3. Choose the dimensions

The four step approach

18

Florin Radulescu, Note de curs

DMDW-11

3. Choose the dimensions

4. Identify the facts

Page 19: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

� An organization has several departments and carries out several business processes.

�Selecting the business process does not

Select the business processes

19

Florin Radulescu, Note de curs

DMDW-11

�Selecting the business process does not refer to one structural department of the organization but to a process carried out by one or more departments together.

Page 20: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�In a general company for example some main business processes are:

�Supply chain management

�Orders,

Select the business processes

20

Florin Radulescu, Note de curs

DMDW-11

�Orders,

�Shipments,

�Invoicing,

�Stocking and inventory

�General ledger

Page 21: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

� This approach also ensures that data contains no duplicate data.

� If a department approach in structuring the data warehouse is used, same data may be used by several departments and must be presented redundantly in the DW.

� For example, inventory data are used for supply chain management but also for production management in a car

No duplicate data

21

Florin Radulescu, Note de curs

DMDW-11

management but also for production management in a car factory.

� A data warehouse organized based on departmental structure will duplicate inventory data but organizing it on business processes will avoid redundancy and both departments –supply management and production – will use the same data.

Page 22: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�No duplicate data means also that:

�Consistency is better preserved; redundancy

is known for inducing consistency problems

(as the update anomaly)

No duplicate data

22

Florin Radulescu, Note de curs

DMDW-11

(as the update anomaly)

�Data are published once and this single view

is used as published for decision support in all

departments and activities.

Page 23: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

� Each line in a fact table is a grain in our data warehouse. In step 2 of the dimensional design process the level of detail for these lines / grains must be defined.

� Thinking at a retail company with registered users (like Metro or Selgros), for the POS sales business

Step 2: Declare the grain

23

Florin Radulescu, Note de curs

DMDW-11

Metro or Selgros), for the POS sales business process, a grain may be:

1. An individual line item on a customer’s retail sales ticket or invoice, as measured by a scanner device (in that case the same item may be on several lines in the same ticket/invoice because the quantity was greater than one and each product was scanned individually).

Page 24: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

2. The same significance as above but lines containing the same part number are summarized in a single line.

3. A daily reunion of the sales tickets of a customer containing items and prices.

4. A sales ticket / invoice for a customer. The same

Step 2: Declare the grain

24

Florin Radulescu, Note de curs

DMDW-11

4. A sales ticket / invoice for a customer. The same customer may have several sales tickets each day.

5. A daily summary on the sales tickets of a customer, containing only total sales amount.

6. A weekly summary on the sales tickets of a customer containing only total sales amount.

Page 25: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

• In the first three cases data on what products

bought each customer is preserved in the

DW, so there is a low detail lever allowing

much more queries and much more ‘slicing

Discussion

25

Florin Radulescu, Note de curs

DMDW-11

much more queries and much more ‘slicing

and dicing’ actions: reports, analysis, etc.

• For the last three cases these data are

removed, the level of detail is higher and only

a part of the above actions may be

performed.

Page 26: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

A key idea in choosing the granularity level is emphasized in [Kimball, Ross, 2002]:

� “Preferably you should develop dimensional models for the most atomic information captured by a business process. Atomic data is the most detailed information collected; such data cannot be subdivided

Discussion

26

Florin Radulescu, Note de curs

DMDW-11

information collected; such data cannot be subdivided further.”

and

� “A data warehouse almost always demands data expressed at the lowest possible grain of each dimension not because queries want to see individual low-level rows, but because queries need to cut through the details in very precise ways.”

Page 27: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Some features of atomic data listed in Kimball & Ross book are:

�Is highly dimensional,

�Being highly dimensional, detailed and atomic

Atomic data features

27

Florin Radulescu, Note de curs

DMDW-11

�Being highly dimensional, detailed and atomic

fact measurement allow more dimensions and

so data may be drilled in more ways,

�Dimensional approach is favored by atomic

data, each extra dimension being easily

added to the star schemes,

Page 28: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Features – cont.:�Provides maximum analytic flexibility,�Detailed data allow more ad hoc queries,�Low level grain does not prohibit adding also

summary high level grain in the DW for speeding

Atomic data features

28

Florin Radulescu, Note de curs

DMDW-11

summary high level grain in the DW for speeding up frequent queries and reports.

�Note that declaring the grain is a critical step. If later the granularity choice is proved to be wrong, the process must go back to step 2 for re-declaring the grain correctly, and after that steps 3 and 4 must be run again.

Page 29: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Knowing the grain, dimensions can be determined easily: each dimension represents a possible fact table line description.

�Example of common dimensions used in a sales data warehouse:�Product,

Step 3: Choose the dimensions

29

Florin Radulescu, Note de curs

DMDW-11

�Product,

�Customer,

�Date,

�Ticket number

�Status,

�Store,

�Salesperson

�Promotion

Page 30: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

• Dimensions can be found by asking ourselves haw can we describe a single line in the fact table.

• In the above example, a line in the fact

Step 3: Choose the dimensions

30

Florin Radulescu, Note de curs

DMDW-11

• In the above example, a line in the fact table represents a single line on a sales ticket.

• This line is about selling a product to a customer at a given date and in a given store possibly under a promotion.

Page 31: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

• The line is on a ticket having a number, a status (for example paid by credit card) and has been made by a particular salesperson.

Step 3: Choose the dimensions

31

Florin Radulescu, Note de curs

DMDW-11

• The number of attributes in a dimension table is not so small.

• This lesson presents an example showing that each dimension may have several tens of attributes.

Page 32: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Every line in a fact table must contain some attribute values.

� These attributes represents the measures assigned to that business process that must be determined at this step.

� In the case of a star scheme containing data on POS

Step 4: Identify the facts

32

Florin Radulescu, Note de curs

DMDW-11

� In the case of a star scheme containing data on POS retail sales in a store chain, possible attributes of the fact table are:�Quantity sold – additive value

�Total line value amount – additive value

�Line cost amount – additive value

�Line profit amount – additive value

�Unit price – not an additive value

Page 33: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

• The product sold, store, date, time, customer, promotion, sales ticket and other data are also identified by the linked records in the corresponding dimension

Discussion

33

Florin Radulescu, Note de curs

DMDW-11

records in the corresponding dimension tables.

• In the fact table are stored only information specific to an association of an instance for each dimension.

Page 34: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

• Additive measures are preferred. So unit price, which is not an additive value, will be removed because it can be computed by division from the total line value amount and quantity sold.

• Redundant data can be stored in a fact table if

Discussion

34

Florin Radulescu, Note de curs

DMDW-11

• Redundant data can be stored in a fact table if they are additive or semi-additive.

• For example, Line profit amount may be computed by subtracting the cost amount from the value amount.

• The presence of these redundant values is allowed for speeding up processing.

Page 35: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Facts and dimensions

�Steps in dimensional modeling

�Modeling example

Summary

Road Map

35

Florin Radulescu, Note de curs

DMDW-11

�Summary

Page 36: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

� A retail sales modeling example is presented in [Kimball, Ross, 2002] for a store chain.

�Each store has several departments and sales several tens of thousands items (called stock keeping units – SKU).

Modeling example

36

Florin Radulescu, Note de curs

DMDW-11

keeping units – SKU).

�Each SKU has either a universal product code imprinted by the manufacturers or a local code for bulk goods (for example agricultural products -vegetables and fruits, meat, bakery, etc.).

�Package variation of a product is another SKU and by consequence has a different code.

Page 37: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Some products are sold under some promotions. There are four types of promotions in the modeling example cited:

�Temporary price reductions,

Modeling example

37

Florin Radulescu, Note de curs

DMDW-11

�Temporary price reductions,

�Ads in newspapers and newspaper inserts,

�Displays in the store (end-aisle displays

included),

�Coupons.

Page 38: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Step 1: Select the business process

�The most important business process in a retail company is customer purchases as captured by the POS system, so the

Step 1

38

Florin Radulescu, Note de curs

DMDW-11

captured by the POS system, so the business process modeled is “POS retail sales”.

Page 39: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Step 2: Declare the grain

�As seen earlier, in dimensional modeling is preferably to store atomic information, data collected at the POS location and not

Step 2

39

Florin Radulescu, Note de curs

DMDW-11

data collected at the POS location and not summarized or aggregated data based on POS transactions.

�So the grain in this modeling example will be an individual line on the sales ticket generated by the POS.

Page 40: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Step 3: Choose the dimensions

�As presented earlier, a line on a sales ticket is about selling a product to a customer at a given date and in a given

Step 3

40

Florin Radulescu, Note de curs

DMDW-11

customer at a given date and in a given store possibly under a promotion.

�The line is on a ticket having a number and is made by a particular salesperson.

Page 41: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Step 3 – star scheme

Product Product_Key (PK) Product attributes

Store Store_Key (PK) Store attributes

POS_Sales Product_Key (FK) Date_Key (FK) Store_Key (FK) SP_Key (FK)

41

Florin Radulescu, Note de curs

DMDW-11

Date Date_Key (PK) Date attributes

Promotion Promotion_Key (PK) Promotion attributes

Salesperson SP_Key (PK) SP attributes

SP_Key (FK) Promotion_Key (FK) Ticket_number (FK) Fact table attributes

Page 42: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�From this definition of the grain, the dimensions are:

�Product

�Date

Step 3 - details

42

Florin Radulescu, Note de curs

DMDW-11

�Date

�Store

�Promotion

�Salesperson

�Sales ticket

Page 43: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�The sales ticket is a so-called “degenerate dimension”.

�Such degenerate dimensions come from operational control numbers: ticket

Degenerate dimensions

43

Florin Radulescu, Note de curs

DMDW-11

operational control numbers: ticket number, order number, invoice number, and so on.

�These dimensions are empty - without other attributes.

Page 44: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�No associated dimension table is present in the

star scheme, but they are necessary in some

queries, for example for finding products sold

together in the same sales basket.

Degenerate dimensions

44

Florin Radulescu, Note de curs

DMDW-11

�For these dimensions only the pseudo-foreign

key associated with the dimension is present as

attribute in the fact table.

�Every dimension has a surrogate primary key

and this key is also contained as foreign key in

the fact table.

Page 45: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�The candidate fact attributes:

�Quantity_sold – additive value

�Line_amount – additive value

�Cost_amount – additive value

Step 4: Identify the facts

45

Florin Radulescu, Note de curs

DMDW-11

�Cost_amount – additive value

�Profit amount – additive value

�Unit_price – not an additive value

Page 46: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Star scheme again

Product Product_Key (PK) Product attributes

Date

Store Store_Key (PK) Store attributes

POS_Sales Product_Key (FK) Date_Key (FK) Store_Key (FK) SP_Key (FK)

Promotion_Key (FK) Ticket_number (FK) Quantity_sold Line_amount

46

Florin Radulescu, Note de curs

DMDW-11

Date Date_Key (PK) Date attributes

Promotion Promotion_Key (PK) Promotion attributes

Salesperson SP_Key (PK) SP attributes

Line_amount Cost_amount Profit amount

Page 47: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�Because Unit_price is not additive, this attribute will not be stored in the fact table but can always be computed by dividing Line_amount to Quantity_sold.

Discussion

47

Florin Radulescu, Note de curs

DMDW-11

�This applies to percentages and ratios, (almost all are non-additive values) and and in these cases the numerator and denominator must be stored in the fact table.

Page 48: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Dimension attributes example

Date Key (PK) Date Full Date Description Day of Week Day Number in Epoch Week Number in Epoch Month Number in Epoch

Calendar Quarter Calendar Year-Quarter Calendar Half Year Calendar Year Fiscal Week Fiscal Week Number in Year Fiscal Month

48

Florin Radulescu, Note de curs

DMDW-11

Month Number in Epoch Day Number in Calendar Month Day Number in Calendar Year Day Number in Fiscal Month Day Number in Fiscal Year Last Day in Week Indicator Last Day in Month Indicator Calendar Week Ending Date Calendar Week Number in Year Calendar Month Name Calendar Month Number in Year Calendar Year-Month (YYYY-MM)

Fiscal Month Fiscal Month Number in Year Fiscal Year-Month Fiscal Quarter Fiscal Year-Quarter Fiscal Half Year Fiscal Year Holiday Indicator Weekday Indicator Selling Season Major Event SQL Date Stamp

Page 49: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Product attributes example

Product Key (PK) SKU Number (Natural Key) Category Description Package Type Description Fat Content Weight

Product Description Brand Description Department Description Package Size Diet Type Weight Units of Measure

49

Florin Radulescu, Note de curs

DMDW-11

Weight Storage Type Shelf Width Shelf Depth

Weight Units of Measure Shelf Life Type Shelf Height

Page 50: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Store attributes example

Store Name Store Number (Natural Key) Store Street Address Store City Store County Store State

Store Region Floor Plan Type Photo Processing Type Financial Service Type Selling Square Footage Total Square Footage

50

Florin Radulescu, Note de curs

DMDW-11

Store State Store Zip Code Store Manager Store District

Total Square Footage First Open Date Last Remodel Date

Page 51: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

Promotion attributes example

Promotion Key (PK) Promotion Name Price Reduction Type Promotion Media Type Ad Type

Coupon Type Ad Media Name Display Provider Promotion Cost Promotion Begin Date

51

Florin Radulescu, Note de curs

DMDW-11

Ad Type Display Type

Promotion Begin Date Promotion End Date

Page 52: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

�This course presented the dimensional model of data warehouses:�Definitions for facts and dimensions, definitions

for star scheme and snow-flake scheme.

�The four steps in dimensional modeling: identify

Summary

52

Florin Radulescu, Note de curs

DMDW-11

�The four steps in dimensional modeling: identify the business process, declare the grain, choose dimensions and identify the facts

�A modeling example for a sales chain with illustration of attributes in fact and dimension tables

�Next week: Data warehouse case study

Page 53: 11-DMDW11-DW curs 2cursuri.cs.pub.ro/~radulescu/dmdw/dmdw-nou/DMDW11.pdf · Facts and dimensions Steps in dimensional modeling Modeling example Summary Road Map 2 Florin Radulescu,

[CS680, 2004] Introduction to Data Warehouses, Drexel Univ. CS

680 Course notes, 2004 (page

https://www.cs.drexel.edu/~dvista/cs680/2.DW.Overview.ppt

visited 2010)

[Kimball, Ross, 2002] Ralph Kimball, Margy Ross - The Data

Warehouse Toolkit, Second Edition, Wiley & Sons, 2002

References

53

Florin Radulescu, Note de curs

DMDW-11

Warehouse Toolkit, Second Edition, Wiley & Sons, 2002

[Wikipedia] Wikipedia, the free encyclopedia, en.wikipedia.org