bi dimension modelling basics
DESCRIPTION
TRANSCRIPT
Introduce Microsoft BI & Basics of Dimension Modelling
Parikshit Savjani
Parikshit Savjani is a Premier Field Engineer with Microsoft with specialization on SQL Server and Business Intelligence (SSAS,SSIS and SSRS).His role involves consulting,performance tuning,delivering workshops,chalk talks to Premier Customers of Microsoft environment. He has 4.5 years of experience with Microsoft & SQL Server.He contributes to the community by Blogging his learnings on this site, www.sqlserverfaq.net & MSDN Blogs
Know the Presenter
Agenda
Introduce Microsoft BI & Basics of Dimension Modelling
What is Business Intelligence?• BI is process which allows Business Analysts to take
informed decisions better and faster.
• Data Warehouse is the process of consolidating the data from disparate data sources to facilitate BI.
• Dimension Modeling is the data modeling principle to architect the Data Warehouse to support BI.
Enterprise BI• Comprehensive view of Corporate Data
• Dedicated IT Staff• Large Volumes of Data• Complex Business Logic• Complex Security
Team BI• Created and Managed by Team of Information Workers
• Multi-User, but not corporate level
• Variable Security Requirements• Consistency of Data and Terms• Reduced Data Volumes• Fewer Users• Monitored by IT Staff
Personal BI• Built and Managed by Information Workers/Analysts
• Use Familiar Tools (Excel)• Models Evolve Dynamically• Data Owned by Information Workers
• Variable Data Sources• Small Data Volumes• Single User
BI Overview
Enterprise BI Solution
Microsoft BI Stack
Data Modeling ConceptsNormalization Principles• 1st Normal Form
Every row should be uniquely identified by PK No Repeating group of columns
• 2nd Normal Form In the Composite Primary Key there should be no
partial dependency
• 3rd Normal Form Non-key attribute should be dependent only on Key
attribute and no other non-key attribute
Data Modeling Demo
ORDER NUMBERCUSTOMER ID
CUSTOMER NAMECUSTOMER CITY
CUSTOMER STATECOUNTRY
EMPLOYEEIDEMPLOYEE NAMEEMPLOYEE EMAILEMPLOYEE PHONE
PRODUCTIDPRODUCTNAME
PRODUCTCATEGORYMODELIDMODEL
VENDORID VENDOR
UNITPRICEQUANTITYDISCOUNT
SALESAMOUNT
ORDER NUMBERCUSTOMER ID
CUSTOMER NAMECUSTOMER CITY
CUSTOMER STATECOUNTRY
EMPLOYEEIDEMPLOYEE NAMEEMPLOYEE EMAILEMPLOYEE PHONE
PRODUCTIDORDER NUMBER PRODUCTNAME
PRODUCTCATEGORYMODELIDMODEL
VENDORID VENDOR
UNITPRICEQUANTITYDISCOUNT
SALESAMOUNT
PRODUCTIDPRODUCTNAME
PRODUCTCATEGORYUNIT PRICEMODELIDMODEL
VENDORID VENDOR
PRODUCTIDORDER NUMBER
QUANTITYDISCOUNT
SALESAMOUNT
ORDER NUMBERCUSTOMER IDEMPLOYEEID
PRODUCTIDPRODUCTNAME
PRODUCTCATEGORYUNIT PRICEMODELID
VENDORID
PRODUCTIDORDER NUMBER
QUANTITYDISCOUNT
SALESAMOUNT
MODELIDMODENAME
VENDORID VENDOR
EMPLOYEEIDEMPLOYEE NAMEEMPLOYEE EMAILEMPLOYEE PHONE
CUSTOMER IDCUSTOMER NAMECUSTOMER CITY
CUSTOMER STATECOUNTRY
VENDOR MASTER
PRODUCT MODEL MASTER
PRODUCT MASTER
CUSTOMER MASTER
EMPLOYEE MASTER
ORDER MASTER
ORDER TRANSACTIONS
3rd Normal OLTP Design
Dimension Modeling Demo
PRODUCTIDPRODUCTBUSINESSKEY
PRODUCTNAMEPRODUCTCATEGORY
SIZE COLOR
UNIT PRICEMODEL
VENDOR
PRODUCTIDCUSTOMERIDEMPLOYEEID
QUANTITYDISCOUNT
SALESAMOUNTORDER NUMBER
EMPLOYEEID EMPLOYEEBUSINESSKEY
EMPLOYEE NAMEEMPLOYEE EMAILEMPLOYEE PHONE
CUSTOMER IDCUSTOMERBUSINESSKE
YCUSTOMER NAMECUSTOMER CITY
CUSTOMER STATECOUNTRY
PRODUCT DIMENSION
CUSTOMER DIMENSION
EMPLOYEE DIMENSION
FACT SALES
DIMENSION MODEL
Dimension Modeling ConceptsDimension Tables
Provides context to slice the dataMaps to the Master Table of the OLTP system
mapsShould be Denormalized & should be 1st NFAre wide in nature. Comparatively shallow as compared to Fact
Tables. Include as many columns as you can think ofAre related to only Fact table and otherwise
should be unrelated
Fact Tables
Measures of interest.Maps to Transactional table of OLTP system.Are in 3 NFNarrow in NatureVery Deep contains rows for every transactionAggregated in the context of the DimensionsConsists of Key Columns and Measure
Columns
Star Schema A Star Schema contains a
fact table and one or more dimension tables. 1. A Fact Table: The central
fact table store the numeric fact (measures) such as Sales dollars, Costs, Unit Sales etc.
2. Dimension Tables: They surround the central fact table, and they store descriptive information about the measures
The shape looks like a Star
SnowFlake Schema
If there are m dimensions and if each dimension has n rows, the theoretical size of the Cube is m*n.Addition of one redundant Dimension can increase the size of the Cube by large amount.
Dimension Modelling - Caveat
• If there are m dimensions and if each dimension has n rows, the theoretical size of the Cube is m*n.
• Addition of one redundant Dimension can increase the size of the Cube by large amount.
Dimension Modeling Designs• Conformed Dimensions• Reference Dimensions• Role Playing Dimensions• Parent Child Dimension• Many to Many Dimensions• Slowly Changing Dimensions• Degenerate Dimensions/Fact
Dimensions• Factless Fact
Conformed Dimensions• A conformed dimension is a dimension that has exactly the same
meaning and content when being referred from different fact tables in multiple datamarts.
• For two dimension tables to be considered as conformed, they must either be identical or one must be a subset of another
• There cannot be any other type of difference between the two tables. For example, two dimension tables that are exactly the same except for the primary key are not considered conformed dimensions.
• The time dimension is a common conformed dimension in an organization
Reference Dimensions• Snowflake schema• A Reference dimension using columns from multiple
tables, or the dimension table links a dimension that is directly linked to the fact table
Role Playing
Dimensions • It is used in a cube more than one time, each time for a different purpose.
• Each role-playing dimension is joined to a fact table on a different foreign key.
Parent Child Dimensions
• A Parent Child Dimension is a standard dimension which contains parent-child hierarchy.
• A parent-child hierarchy is a hierarchy in a standard dimension that contains a parent attribute.
• A parent attribute describes a self-referencing relationship, or self-join, within a dimension main table.
Many to Many Dimension
• DIMENSION MODEL (BANK)
BRANCHIDTIMEKEY
CUSTOMERIDTRANSACTIONAMOUN
TTRANSACTIONTYPE
BRANCHID
TIMEKEY
CUSTOMERIDACCOUNTID
BRANCHIDTIMEKEY
ACCOUNTIDTRANSACTIONAMOUN
TTRANSACTIONTYPE
ACCOUNTIDCUSTOMERID
CUSTOMERID
INTERMEDIATE FACT TABLE
Slowly Changing Dimension
• Ideally Dimensions Attributes are never expected to change over time. For e.g Month, City, State, Cost.
• Some of Attributes of Dimension might change over a time For e.g. ProductUnitPrice, CustomerCity referred to as SCD.
• Type 1 SCD• No History is maintained.
• Type 2 SCD• History maintained in the form of rows.
• Type 3 SCD• History maintained in the form of columns.
SurrogateKeyStartDateEndDateStatus
Degenerate Dimensions
• Known as Degenerate dimension• Fact dimension is a standard dimension that is
constructed from the columns directly in the fact table
Factless Fact
• Fact Tables with no Measures.• Used to measure the occurrence of
an event
StudentIDTeacherIDTimeKeyClassID
DIMENSION MODEL (SCHOOL)
StudentID
TeacherID
TimeKey
ClassID
ReferencesData warehousing Toolkit 2.0 – Ralph Kimball
Q&A
• Parikshit Savjani Email: [email protected] Blog: http://www.sqlserverfaq.net