data warehousing concepts by sathish yellanki
DESCRIPTION
Data Warehousing ConceptsTRANSCRIPT
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 1
Data Warehousing Concepts
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 2
Dimensional Data Model • Dimensional Data Model is Commonly Used in Data
Warehousing Systems. • The Two Common Schema Types
• Star Schema • Snowflake Schema
Slowly Changing Dimension • Slowly Changing Dimensions Are Common Issues Facing Data
Warehousing Development Process.
Conceptual Data Model • A Data Warehouse Specialist Should Be Much Familiar With The
Concept of Conceptual Data Model.
Logical Data Model • A Data Warehouse Specialist Should Be Very Clear With The
Concepts And Process of A Logical Data Model.
Physical Data Model • A Data Warehouse Specialist Should Be Very Clear of The
Concept And Process of Developing The Physical Data Model.
Compare Conceptual, Logical, And Physical Data Model • A Data Warehouse Specialist Should BE Familiar With Different
Levels of Abstraction For A Data Model.
Data Integrity • A Data Warehouse Specialist Should Be Clear With “What is
Data Integrity” And How it is Enforced in Data Warehousing.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 3
What is OLAP? • All The Data Warehousing Experts Should Be Familiar With The
Definition of OLAP.
MOLAP, ROLAP, AND HOLAP • A Data Warehousing Specialist Should Have Crystal Clarity With
The Different Types of OLAP Technology.
Bill Inmon Vs. Ralph Kimball Process • A Data Warehousing Specialist Should Know The Difference of
Opinion Between The Role Between DWH And Data Mart. • The Direction of Development Should Be DWH To Datamart OR
Vice Versa
Factless Fact Table • A Data Warehousing Specialist Should Take in Confidence What
is The Use of A Fact Table Without Any Fact.
Junk Dimension • A Data Warehousing Specialist Should Definitely Keep Himself
Clear With The Concept of A Junk Dimension, When To Use The Junk Dimension And Why And Where it is Useful.
Conformed Dimension • A Data Warehouse Specialist Should Keep Him Self Clear With
The Concept of A Conformed Dimension. • The Specialists Should Have Detailed Clarity in What is
Conformed Dimension And Why it is Important.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 4
Dimensional Data Model
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 5
• Dimensional Data Model is Most Often Dominated in Building The Data Warehousing Systems.
• Dimensional Modeling is Different From The 3rd Normal Form, Standards Commonly Used For Transactional (OLTP) Systems.
Jargon For Dimensional Modeling
Dimension • Dimension Always Represents A Specific Category of
Information As A Single Collection. • The Dimension is Planned As Per The Subject of Analysis it is
Chosen.
Attribute • An Attribute is A Unique Level Within The Dimension. • An Attribute in Dimensional Model Definitely Has An Hierarchy
OR Level.
Hierarchy • Hierarchy is The Specification of Levels That Represents
Relationship Between Different Attributes Within A Dimension.
Fact Table • A Fact Table is A Table That Contains The Measures of Interest
Specific To The Subject of Analysis. • Fact Table is Generally A Collection of Aggregates At Different
Levels of Granularity. • A Fact Table is A Collection of Measures of The Subject of
Analysis, Integrated To The Associated Dimensions.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 6
Lookup Table • The Lookup Table Provides The Detailed Information About The
Attributes.
• The Lookup Table Keeps Such Information That is Essential To Describe The Attribute in A More Better Way As Per The Requirement of Analysis.
• The Lookup Table For Any Attribute Would Include A List of All of The Descriptive Details Available in The Data Warehouse.
Common Points To Consider • A Dimensional Model Includes A Collection of Fact Tables And
Lookup Tables.
• Fact Tables Connect To One OR More Lookup Tables, But Fact Tables Do Not Have Direct Relationships To One Another.
• Dimensions And Hierarchies Are Represented By Lookup Tables.
• Attributes Are The Non-Key Columns in The Lookup Tables.
Data Models For Data Warehouses / Data Marts • The Most Commonly Used Schema Types in Data Warehousing
Are • Star Schema
• Snowflake Schema
• Using A Star OR A Snowflake Schema Largely Depends on Personal Preference And Business Needs.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 7
• Snowflakes Are Always A Better Choice When There is A Business Case To Analyze The Information At That Particular Level.
What is Meant By Granularity? • Granularity Refers To The Level of Detail of The Data Stored in
The Fact Tables in A Data Warehouse. • High Granularity Refers To Data That is At OR Near The
Transaction Level. • Data That is At The Transaction Level is Usually Called As
Atomic Level Data. • Low Granularity Refers To Data That is Summarized OR
Aggregated, Usually From The Atomic Level Data. • Summarized Data Can Be Lightly Summarized As in Daily OR
Weekly Summaries OR Highly Summarized Data Such As Yearly Averages And Totals.
What is Meant By Fact Table Granularity? • The First Step in Designing A Fact Table is To Determine The
Granularity of The Fact Table. • Fact Table Granularity Decides The Lowest Level of Information
That Will Be Stored in The Fact Table. • The Fact Table Granularity Depends on The Construct of The
Type And The Number of Dimensions That Are Included in The Schema.
• Fact Table Granularity Decides Factor of Density on Measures
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 8
What Constitutes The Fact Table Granularity? • Fact Table Granularity Constitutes Two Steps
• Determine Which Dimensions Will Be Included. • Determine Where Along The Hierarchy of Each Dimension The
Information Will Be Kept.
• The Determining Factors of Fact Table Granularity Usually Goes Back To The Requirements Phase.
Which Dimensions We Should Include? • Determining Which Dimensions To Include in The Data
Warehouse is Usually A Straightforward Process, As Business Processes Will Often Dictate Clearly What Are The Relevant Dimensions.
What Level Should Be Included Within Each Dimension? • Determining Which Part of Hierarchy The Information is Stored
Along Each Dimension is Not An Exactly Scientific. • Level of The Dimension is Dictated Purely on User Requirement
Only. • Sometimes The Users Will Not Specify Certain Requirements,
But Based on The Industry Knowledge, The Data Warehousing Team Must Foresee Certain Requirements And Include Them.
• It is Prudent For The Data Warehousing Team To Design The Fact Table Such That Lower-Level Information is Included, To Avoid Re-Design of The Fact Table in The Future.
• Level of Dimension is More of An Art Than Science.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 9
What is A Fact Table? • A Fact Table Consists of The Measurements, Metrics OR Facts of
A Business Process, Located At The Center of A Star Schema OR A Snowflake Schema Surrounded By Dimension Tables.
• A Fact Table Stores Quantitative Information For Analysis And is Often De-Normalized.
• A Fact Table Typically Has Two Types of Columns • Columns Containing Facts • Foreign Keys To Dimension Tables
• The Primary Key of A Fact Table is Usually A Composite Key That is Made Up of All of its Foreign Keys.
• Fact Store Different Types of Measures • Additive Measures. • Semi Additive Measures. • Non Additive Measures.
Types of Facts Additive Facts • Additive Facts Are Facts That Can Be Summed Up Through All of The
Dimensions in The Fact Table.
Semi-Additive Facts • Semi-Additive Facts Are Facts That Can Be Summed Up For Some of The
Dimensions in The Fact Table, But Not The Others.
Non-Additive
• Non-Additive Facts Are Facts That Cannot Be Summed Up For Any of The Dimensions Present in The Fact Table.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 10
Types of Fact Tables Cumulative Fact Table • Cumulative Fact Table Describes What Has Happened Over A Period of
Time.
• The Facts For This Type of Fact Tables Are Mostly Additive Facts. Snapshot Fact Table • Snapshot Fact Table Describes The State of Things in A Particular
Instance of Time, And Usually Includes More Semi-Additive And Non-Additive Facts.
What is Star Schema? • In The Star Schema Design, A Single Object OR Also Called The
Fact Table Sits in The Middle And is Radically Connected To Other Surrounding Objects Which Are Dimension Lookup Tables Like A Star.
• Each Dimension is Represented As A Single Table. • The Primary Key in Each Dimension Table is Related To A
Foreign Key in The Fact Table. • All Measures in The Fact Table Are Related To All The
Dimensions With Which The Fact Table is Related. • All The Measures Will Have The Same Level of Granularity. • A Star Schema Can Be Simple OR Complex, A Simple Star
Consists of One Fact Table And A Complex Star Can Have More Than One Fact Table With Measures Integrated on The Dimensions.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 11
What is A Snowflake Schema? • The Snowflake Schema is An Extension of The Star Schema,
Where Each Point of The Star Explodes into More Points. • In A Snowflake Schema, The Dimensional Table is Normalized
into Multiple Lookup Tables, Each Representing A Level in The Dimensional Hierarchy.
Advantage • Improvement in Query Performance Due To Minimized Disk
Storage Requirements And Joining Smaller Lookup Tables.
Disadvantage
• Additional Maintenance Efforts Needed Due To The Increase Number of Lookup Tables.
What is A Slowly Changing Dimension? • Slowly Changing Dimensions Are Dimensions That Change
Slowly Over Time, Rather Than Changing on Regular Schedule That is Time-Based.
• In Data Warehouse There is A Need To Track Changes in Dimensional Attributes in Order To Report Historical Data.
• Slowly Changing Dimensions Are Implemented in Multiple Ways, Implementing One of The SCD Types Should Enable Users Assigning Proper Dimensional Attribute Value For Given Data.
• All Dimensions Cannot Be Suitable For SCD Standards.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 12
• Slowly Changing Dimension Applies To Cases Where The Attribute For A Record Varies Over Time.
Types of Slowly Changing Dimensions
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 13
SCD Type 0 • The SCD Type 0 Method is A Passive Method. • It Just Manages Dimensional Changes And No Action is
Performed. • The Values in The Dimension Remain As They Were At The Time
The Dimension Record Was First Inserted. • In Certain Circumstances History is Preserved With A Type 0,
And Type 0 Provides The Least OR No Control on The History.
SCD Type 1 • SCD Type 1 Methodology Overwrites Old Data With New Data,
And Therefore Does Not Track Historical Data. • SCD Type 1 Methodology is Used When There is No Need To
Store Historical Data in The Dimension Table. • SCD Type 1 is Used To Correct Data Errors in The Dimension. • Usage is 50% in The Development of Data Warehouse.
Advantage • SCD Type 1 is Easy To Maintain.
Disadvantage • There is No History in The Data Warehouse.
Illustrative Example Original Dimension
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123 ABC Acme Supply Co CA
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 14
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123 ABC Acme Supply Co IL
SCD Type 2 • SCD Type 2 Method Tracks Historical Data By Creating Multiple
Records For A Given Natural Key in The Dimensional Tables, With Separate Surrogate Keys AND/OR Different Version Numbers.
• Using SCD Type 2 We Can Manage Unlimited History Preserved For Each Insert.
• Usage is 50% in The Development of Data Warehouse.
Methods of Implementing SCD Type 2 Method 1 • Add One Extra Column For Managing Version Numbers. • The Version Column Will Be Incremented Sequentially For The
Number of Changes That Are Taking Place on The Dimensional Value.
Illustrative Example Original Dimension
Illustrative Example Changed Dimension SCD Type 1
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123 ABC Acme Supply Co CA
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 15
Method 2 Changed Dimension SCD Type 2 • Add One Extra Column Which Manages The „Effective Date‟ • The „Effective Date‟ Column Will Register The Current Latest
Date Exactly When The New Change is Being Registered.
Illustrative Example Changed Dimension SCD Type 2
Supplier_Key Supplier_Code Supplier_Name Supplier_State Version
123 ABC Acme Supply Co CA 0
124 ABC Acme Supply Co IL 1
Supplier_Key
Supplier_Code
Supplier_Name
Supplier_State
Start_Date End_Date
123 ABC Acme
Supply Co CA 01-Jan-2000 21-Dec-2004
124 ABC Acme
Supply Co IL 22-Dec-2004
Advantage • SCD Type 2 Keeps Accurately All The Historical Information.
Disadvantage • SCD Type 2 Will Cause The Size of The Table To Grow Fast. • For The Table With Many Rows Storage And Performance Can
Become A Concern. • SCD Type 2 Necessarily Complicates The ETL Process.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 16
SCD Type 3 • SCD Type 3 Method Tracks Changes Using Separate Columns
And Preserves Limited History. • SCD Type 3 Preserves Limited History As it is Limited To The
Number of Columns Designated For Storing Historical Data. • The Original Table Structure in Type 1 And Type 2 is The Same
But Type III Adds Additional Columns. • We Can Have One Additional Column That Specifies When The
Change Has Taken Place Effectively. • SCD Type 3 is Rarely Used in Actual Practice.
Illustrative Example Original Dimension
Supplier_Key
Supplier_Code
Supplier_Name Original_Supplier_
State Effective_Date
Current_Supplier_
State
123 ABC Acme Supply Co CA 22-Dec-2004 IL
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123 ABC Acme Supply Co CA
Illustrative Example Changed Dimension SCD Type 3
Advantage • SCD Type 3 Does Not Increase The Size of The Table, Since New
Information Is Updated. • SCD Type 3 Allows Us to Keep Some Part of History.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 17
Disadvantage • SCD Type 3 Will Not Be Able To Keep All History Where An
Attribute is Changed More Than Once.
SCD Type 4 • SCD Type 4 Method Uses “History Tables”, Where One Table
Keeps The Current Data, And An Additional Table is Used To Keep A Record of Some OR All Changes.
• In SCD Type 4 Both The Surrogate Keys Are Referenced in The Fact Table To Enhance Query Performance.
• SCD Type 4 Method Resembles How Database Audit Tables And Change Data Capture Techniques Function.
Illustrative Example Original Dimension
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123 ABC Acme Supply Co CA
Supplier_key Supplier_Code Supplier_Name Supplier_State
123 ABC Acme Supply Co IL
Illustrative Example Supplier Current Table
Supplier_key Supplier_Code Supplier_Name Supplier_State
Create_Date
123 ABC Acme Supply Co CA 22-Dec-2004
Illustrative Example Supplier History Table
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 18
Data Modeling (Conceptual, Logical, And Physical Data Models)
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 19
• There Are Three Levels of Data Modeling Standards • Conceptual Data Model • Logical Data Model • Physical Data Model
Conceptual Data Model • A Conceptual Data Model Identifies The Highest-Level
Relationships Between The Different Entities.
Features of Conceptual Data Model • Enterprise-Wide Coverage of The Business Concepts
• Customer • Product • Store • Location • Asset
• Designed And Developed Primarily For A Business Audience • Contains Around 20-50 Entities OR Concepts With No OR
Extremely Limited Number of Attributes Described. • Contains Relationships Between Entities, But May OR May Not
Include Cardinality And Nullability. • Entities Will Have Definitions. • Designed And Developed To Be Independent of DBMS, Data
Storage Locations OR Technologies. • Model Addresses Digital And Non-Digital Concepts. Includes
The Important Entities And The Relationships Among Them. • No Attribute is Specified, No Primary Key is Specified.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 20
Conceptual Data Model • A Logical Data Model is A Fully-Attributed Data Model That is
Independent of • DBMS • Technology • Data Storage
• Organizational Constraints
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 21
• Logical Data Model Typically Describes Data Requirements From The Business Point of View.
• Has No Requirement That Resulting Data Implementations Must Be Created Using Relational Technologies.
Features of A Logical Data Model • Typically Describes Data Requirements For A Single Project OR
Major Subject Area. • May Be Integrated With Other Logical Data Models Via A
Repository of Shared Entities • Typically Contains 100-1000 Entities, Although These Numbers
Are Highly Variable Depending on The Scope of The Data Model. • Contains Relationships Between Entities That Address
Cardinality And Nullability of The Relationships. • Designed And Developed To Be Independent of DBMS, Data
Storage Locations OR Technologies. • Data Attributes Will Typically Have Datatypes With Precisions
And Lengths Assigned, Nullability (Optionality) Assigned. • Entities And Attributes Will Have Definitions. • All Kinds of Other Meta Data May Be Included Like Retention
Rules, Privacy Indicators, Volumetrics, Data Lineage. • A Logical Data Model May Show Only A Tiny Percentage of The
Meta Data Contained Within The Model. • A Logical Data Model Will Normally Be Derived From And OR
Linked Back To Objects in A Conceptual Data Model.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 22
Steps For Designing The Logical Data Model • Specify Primary Keys For All Entities. • Find The Relationships Between Different Entities. • Find All Attributes For Each Entity. • Resolve Many-To-Many Relationships. • Normalization.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 23
Differences Between Conceptual And Logical Data Models • In A Logical Data Model, Primary Keys Are Present, Whereas in
A Conceptual Data Model, Primary Keys Are Not Present. • In A Logical Data Model, All Attributes Are Specified Within An
Entity. Conceptual Data Model Does Not Specifies Attributes. • Relationships Between Entities Are Specified Using Primary
Keys And Foreign Keys in A Logical Data Model. In A Conceptual Data Model, The Relationships Are Simply Stated, Not Specified, So We Simply Know That Two Entities Are Related, But We Do Not Specify What Attributes Are Used For This Relationship.
Physical Data Model • A Physical Data Model is A Fully-Attributed Data Model That is
Dependent Upon A Specific Version of A Data Persistence Technology.
• The Target Implementation Technology May Be • A Relational DBMS • An XML Document • A NOSQL Data Storage Component • A Spreadsheet • Other Data Implementation Option
Features of A Physical Data Model • Physical Data Model Typically Describes Data Requirements For
A Single Project OR Application, OR Sometimes Even A Portion of An Application.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 24
• May Be Integrated With Other Physical Data Models Via A Repository of Shared Entities
• Typically Contains 10-1000 Tables, Although These Numbers Are Highly Variable Depending on The Scope of The Data Model.
• Contains Relationships Between Tables That Address Cardinality And Nullability of The Relationships.
• Designed And Developed To Be Dependent on A Specific Version of A DBMS, Data Storage Location OR Technology.
• Columns Will Have Datatypes With Precisions And Lengths Assigned.
• Columns Will Have Nullability Assigned. • Tables And Columns Will Have Definitions. • Denormalization May Occur Based on User Requirements. • Physical Data Model Includes Other Physical Objects Such As
• Views • Primary Key Constraints • Foreign Key Constraints • Indexes • Security Roles • Store Procedures • XML Extensions • File Stores
• The Diagram of A Physical Data Model May Show Only A Tiny Percentage of The Meta Data Contained Within The Model.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 25
The Steps For Physical Data Model • Convert Entities into Tables. • Convert Relationships into Foreign Keys. • Convert Attributes into Columns. • Modify The Physical Data Model Based on Physical Constraints
OR Requirements.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 26
Differences Between Conceptual And Logical Data Models • Entity Names Are Now Table Names. • Attributes Are Now Column Names. • Data Type For Each Column is Specified. • Data Types Can Be Different Depending on The Actual Database
Being Used.
Cross Comparison of All Models
Feature Conceptual Logical Physical
Entity Names ✓ ✓
Entity Relationships ✓ ✓
Attributes ✓
Primary Keys ✓ ✓
Foreign Keys ✓ ✓
Table Names ✓
Column Names ✓
Column Data Types ✓
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 27
A View on Data Integrity
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 28
• Data Integrity Refers To The Validity of Data, Which Concentrates on Data Consistency And Correctness.
• In A Data Warehouse OR A Data Mart, There Are Three Areas of Data Integrity Needs To Be Enforced: • Database Level • ETL Process Level • Access Level
Database Level Integrity Referential Integrity • The Relationship Between The Primary Key of One Table And
The Foreign Key of Another Table Must Always Be Maintained. Primary Key / Unique Constraint • Primary Keys And The Unique Constraints Are Used To Make
Sure Every Row in A Table Can Be Uniquely Identified. Not Null Versus Nullable • For Columns Identified As Not Null, They Cannot Have A Null
Value. Valid Values
• Only Allowed Values Are Permitted in The Database. ETL Process Level Integrity • For Each Step of The ETL Process, Data Integrity Checks Should
Be Put in Place To Ensure That Source Data is The Same As The Data in The Destination.
• Most Common Checks Include Record Counts OR Record Sums.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 29
Access Level • We Need To Ensure That Data is Not Altered By Any
Unauthorized Means Either During The ETL Process OR in The Data Warehouse.
• Design Safeguards Against Unauthorized Access To Data Including Physical Access To The Servers, As Well As Logging of All Data Access History.
• Data Integrity Can Only Be Ensured if There is No Unauthorized Access To The Data.
What is Meant By OLAP? • OLAP is An Abbreviated Form For On-Line Analytical
Processing. • The First Attempt To Provide A Definition To OLAP Was By Dr.
Codd, Who Proposed 12 Rules For OLAP. • The Key Feature of The OLAP Environment is
"Multidimensional“ Environment OR The Architecture. • Depending on The Underlying Technology Used, OLAP Can Be
Broadly Divided into Three Different Flavors • MOLAP(Multi Dimensional On-Line Analytical Processing) • ROLAP(Relational On-Line Analytical Processing) • HOLAP(Hybrid Online Analytical Processing)
• OLAP is A Field of Analysis of Data Considering The Samples Collected on A Time Based Variance, Related To The Business Process.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 30
MOLAP • MOLAP is The More Traditional Way of OLAP Analysis, in Which,
Data is Stored in A Multidimensional Cube. • The Storage is Not Necessary To Be in The Relational Database,
But Can Be in Proprietary Formats. • MOLAP Processes Data That is Already Stored in A
Multidimensonal Array in Which All Possible Combinations of Data Are Reflected, Each in A Cell That Can Be Accessed Directly
Advantages • Excellent Performance Due To Optimized Storage,
Multidimensional Indexing And Caching, Optimal For Slicing And Dicing Operations.
• MOLAP Can Perform Complex Calculations, All Calculations Are Pre-Generated When The Cube is Created. Hence, Complex Calculations Are Possible, And Are Returned Quickly.
Disadvantages • MOLAP is Limited To The Amount of Data it Can Handle, Because
All Calculations Are Performed When The Cube is Built, it is Not Possible To Include A Large Amount of Data in The Cube Itself.
• Only Summary-Level Information Will Be Included in Cube. • Requires Additional Investment, Cube Technology Are Often
Proprietary And Do Not Already Exist in The Organization. Therefore, To Adopt MOLAP Technology, Chances Are Additional Investments in Human And Capital Resources Are Needed.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 31
ROLAP • ROLAP Methodology Relies on Manipulating The Data Stored in
The Relational Database To Give The Appearance of Traditional OLAP's Slicing And Dicing Functionality.
• Each Action of Slicing And Dicing is Equivalent To Adding A "WHERE" Clause in The SQL Statement.
• ROLAP Differs Significantly in That it Does Not Require The Pre-Computation And Storage of Information.
Advantages • ROLAP Can Handle Large Amounts of Data, The Data Size
Limitation of ROLAP Technology is The Limitation on Data Size of The Underlying Relational Database.
• Can Leverage Functionalities Inherent in The Relational Database.
• ROLAP is Considered To Be More Scalable in Handling Large Data Volumes, Especially Models With Dimensions With Very High Cardinality.
Disadvantages • Performance Can Be Slow, Because Each ROLAP Report is
Essentially An SQL Query OR Multiple SQL Queries Where The Query Time Can Be Long if The Underlying Data Size is Large.
• Limited By SQL Functionalities, As ROLAP Technology Mainly Relies on Generating SQL Statements To Query The Relational Database, And SQL Statements Do Not Fit All Needs.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 32
HOLAP • HOLAP Technologies Attempt To Combine The Advantages of
MOLAP And ROLAP. • For Summary-Type Information, HOLAP Leverages Cube
Technology For Faster Performance, When Detail Information is Needed, HOLAP Can "Drill Through" From The Cube into The Underlying Relational Data.
• HOLAP Stores Data in A Both A Relational Database (RDB) And A Multidimensional Database (MDDB) And Uses Whichever One is Best Suited To The Type of Processing Desired.
Factless Fact Table • A Factless Fact Table is A Fact Table That Does Not Have Any
Measures. • Factless Fact Table is Essentially An Intersection of Dimensions. • Factless Fact Tables Offer The Most Flexibility in Data
Warehouse Design, In Certain Situations if Factless Fact Table is Not Desined We May Land With Multiple Fact Tables.
• Factless Fact Table Captures Events That Happen Only At Information Level But Not Included in The Calculations Level.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 33
Junk Dimension • In Data Warehouse Design, Frequently We Run into A Situation
Where There Are Yes/No Indicator Fields in The Source System.
• As Per The Business Analysis, We Have To Keep Boolean Information in The Fact Table.
• Keeping All The Boolean Indicator Fields in The Fact Table, Needs Many Small Dimension Tables, And The Amount of Information Stored in The Fact Table Also Increases Tremendously, Leading To Possible Performance And Management Issues.
• Junk Dimension is A Dimension in Which We Combine The Boolean Indicator Fields into A Single Dimension, Leading To A Single Dimension Table.
• The Content in The Junk Dimension Table is The Combination of All Possible Values of The Individual Indicator Fields.
Advantage of Junk Dimension • It Provides A Recognizable Location For Related Codes,
Indicators And Their Descriptors in A Dimensional Framework. • Avoids The Creation of Multiple Dimension Tables. • Provides Smaller, Quicker Point of Entry Queries Compared To
Performance When Attributes Are Directly in The Fact Table. • An Interesting Use For A Junk Dimension Is To Capture The
Context of A Specific Transaction.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 34
Conformed Dimension • A Conformed Dimension is A Dimension That Has Exactly The
Same Meaning And Content When Being Referred From Different Fact Tables.
• A Conformed Dimension Can Refer To Multiple Tables in Multiple Data Marts Within The Same Organization.
• For Two Dimension Tables To Be Considered As Conformed, They Must Either Be Identical OR One Must Be A Subset of Another.
• Two Dimension Tables That Are Exactly The Same Except For The Primary Key Are Not Considered Conformed Dimensions.
Rapidly Changing Dimensions • A Dimension Attribute That Changes Frequently is A Rapidly
Changing Attribute.
• If We Move The Rapidly Changing Attribute To its Own Dimension, With A Separate Foreign Key in The Fact Table, Then That New Dimension is Called A Rapidly Changing Dimension.
Degenerated Dimension • A Degenerate Dimension is A Dimension Which is Derived From
The Fact Table And Doesn't Have its Own Dimension Table.
• These Are Essentially Dimension Keys For Which There Are No Other Attributes
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 35
Inferred Dimension • While Loading Fact Records, A Dimension Record May Not Yet
Be Ready. • One Solution is To Generate An Surrogate Key With Null For All
The Other Attributes. • The Generated Surrogate Key is Called An Inferred Member, But
is Often Called As An Inferred Dimension.
Role Playing Dimension • A Role-Playing Dimension is One Where The Same Dimension
Key Along With its Associated Attributes Can Be Joined To More Than One Foreign Key in The Fact Table.
Shrunken Dimension • A Shrunken Dimension is A Subset of Another Dimension.
Static Dimension • Static Dimensions Are Not Extracted From The Original Data
Source, But Are Created Within The Context of The DWH. • A Static Dimension Can Be Loaded Manually.
Data Warehouse VS Data Mart • DWH Holds Multiple Subject Areas, DM Holds Single Subject • DWH Holds Very Detailed Information, DM Holds Summaries • DWH Works To Integrate All Data Sources, DM Integrates A Given
Subject Only. • DWH May Operate on Dimensional Model, DM Works only on
Dimensional Model