slowly changing dimension
TRANSCRIPT
![Page 1: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/1.jpg)
Slowly Changing Dimension: Categories
By: Prof. Sunita Sahu Assistant Prof, VESIT,Mumbai
![Page 2: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/2.jpg)
Slowly Changing Dimension: Categories Dimensions that change slowly over time,
rather than changing on regular schedule, time-base.
In Data Warehouse there is a need to track changes in dimension attributes in order to report historical data.
The usual changes to dimension tables are classified into three types Type 1 Type 2 Type 3
2
![Page 3: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/3.jpg)
Example3
Order factProduct Key
Time KeyCustomer KeySalesperson KeyOrder DollarsCost Dollars
Margin DollarsSale Units
CustomerCustomer Key
Customer NameCustomer CodeMartial Status
AddressStateZip
SalespersonSalesperson KeySalesperson Name
Territory NameRegion Name
ProductProduct Key
Product NameProduct CodeProduct Line
Brand
TimeTime Key
DateMonth
QuarterYear
![Page 4: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/4.jpg)
Type 1 Changes: Error Correction
Usually relate to corrections of errors in the source system.
For example, the customer dimension: change in name because of spelling mistake
4
![Page 5: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/5.jpg)
Type 1 Changes, cont.
General Principles for Type 1 changes:
Usually, the changes relate to correction of errors in the source system
Sometimes the change in the source system has no significance
The old value in the source system needs to be discarded
The change in the source system need not be preserved in the DWH
5
![Page 6: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/6.jpg)
Applying Type 1 changes
Overwrite the attribute value in the dimension table row with the new value
The old value of the attribute is not preserved No other changes are made in the dimension
table row. The key of this dimension table or any other
key values are not affected. Easiest to implement.
6
![Page 7: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/7.jpg)
Before the change: Customer_ID Customer_Name Customer_Type 1 Cust_1
Corporate
After the change: Customer_ID Customer_Name Customer_Type 1 Cust_1
Retail
![Page 8: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/8.jpg)
Type 2 Changes:
Let’s look at the martial status of customer. One the DWH’s requirements is to track orders
by martial status All changes before 11/10/2004 will be under
Martial Status = Single, and all changes after that date will be under Martial Status = Married
We need to aggregate the orders before and after the marriage separately
8
![Page 9: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/9.jpg)
Type 2 Changes, cont.
General Principles for Type 2 changes: They usually relate to true changes in source
systems. There is a need to preserve history in the DWH. This type of change partitions the history in the
DWH. Every change for the same attributes must be
preserved.
9
![Page 10: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/10.jpg)
Type 2 Implementation
The steps: Add a new dimension table row with the new
value of the changed attribute An effective date will be included in the
dimension table There are no changes to the original row in the
dimension table The key of the original row is not affected The new row is inserted with a new surrogate
key
10
![Page 11: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/11.jpg)
Before the change: Customer_ID
Customer_Name
Customer_Type
Start_Date End_Date
1 Cust_1 Corporate 22-07-2010 31-12-9999
Customer_ID
Customer_Name
Customer_Type
Start_Date End_Date
1 Cust_1 Corporate 22-07-2010 31-12-9999
2 Cust_1 Retail 22-07-2010 31-12-9999
Type 2 Example
![Page 12: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/12.jpg)
Type 3 Changes
Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value.
There will also be a column that indicates when the current value becomes active.
Not common at all Time-consuming We want to track history without lifting heavy
burden. There are many soft changes and we don’t care for
the “far” history
12
![Page 13: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/13.jpg)
Type 3 Changes
General Principles: They usually relate to “soft” or tentative
changes in the source systems There is a need to keep track of history with old
and new values of the changes attribute They are used to compare performances across
the transition They provide the ability to track forward and
backward
13
![Page 14: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/14.jpg)
Type 3
No new dimension row is needed The existing queries will seamlessly switch to
the current value. Any queries that need to use the old value
must be revised accordingly. The technique works best for one soft change
at a time. If there is a succession of changes, more
sophisticated techniques must be advised
14
![Page 15: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/15.jpg)
Customer Key Name State 1001 Williams New York
After Williams moved from New York to Los Angeles, the original information gets updated, and we have the following table (assuming the effective date of change is February 20, 2010):
Customer Key Name Original State Current State Effective Date
1001 Williams New York Los Angeles 20-FEB-2010
Type 3
![Page 16: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/16.jpg)
Advantages
This does not increase the size of the table, since new information is updated.
This allows us to keep some part of history.
Disadvantages
Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Williams later moves to Texas on December 15, 2003, the Los Angeles information will be lost.
Type 3
![Page 17: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/17.jpg)
Large Dimension Table
Dimension table is large based on two factors. very deep: that is, the dimension has a very
large number of rows. Very wide: that is, the dimension may have a
large number of attributes or columns. In a data warehouse, typically the customer and
product dimensions are likely to be large. Such customer dimension tables may have as
many as 100 million rows. The product dimension of large retailers is also quite huge.
![Page 18: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/18.jpg)
Junk Dimension
The junk dimension is simply a structure that provides a convenient place to store the junk attributes. It is just a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension.
In OLTP tables that are full of flag fields and yes/no attributes, many of which are used for operational support and have no documentation except for the column names and the memory banks of the person who created them. Not only do those types of attributes not integrate easily into conventional dimensions such as Customer, Vendor, Time, Location, and Product, but you also don’t want to carry bad design into the data warehouse.However, some of the miscellaneous attributes will contain data that has significant business value, so you have to do something with them.
![Page 19: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/19.jpg)
Advantage of junk dimension: It provides a recognizable location for related codes,
indicators and their descriptors in a dimensional framework. This avoids the creation of multiple dimension tables. Provide a smaller, quicker point of entry for queries
compared to performance when these attributes are directly in the fact table.
An interesting use for a junk dimension is to capture the context of a specific transaction. While our common, conformed dimensions contain the key dimensional attributes of interest, there are likely attributes about the transaction that are not known until the transaction is processed.
![Page 20: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/20.jpg)
![Page 21: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/21.jpg)
Rapidly Changing Dimensions
If one or more of its attributes changes frequently.
when you deal with a type 2 change, you create an additional dimension table row with the new value of the changed attribute. By doing so, you are able to preserve the history.
consider customer dimension. Here the number of rows tends to be large, sometimes in the range of even a million or more rows. But significant attributes in a customer dimension may change many timesin a year. Rapidly changing large dimensions can be too problematic for the type 2 approach.
![Page 22: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/22.jpg)
Rapidly Changing Dimensions
One effective approach is to break the large dimension table into one or more simpler dimension tables. How can you accomplish this?
Obviously, you need to break off the rapidly changing attributes into another dimension table, leaving the slowly changing attributes behind in the original table.
![Page 23: Slowly changing dimension](https://reader036.vdocuments.mx/reader036/viewer/2022070522/58ee692f1a28ab6c358b463f/html5/thumbnails/23.jpg)
Solution to rapidly changing dimension
Large dimensions call for special considerations.
Because of the sheer size, many data warehouse functions involving large dimensions may be slow and inefficient.
You need to address the following issues by using effective design methods, by choosing proper
indexes, and by applying other optimizing techniques: