last updated : 26th may 2003 center of excellence data warehousing introductionto data modeling
DESCRIPTION
Data Modeling for Data Warehouse How to structure the data in your data warehouse ? Process that produces abstract data models for one or more database components of the data warehouse Modeling for Warehouse is different from that for Operational database Dimensional Modeling, Star Schema Modeling or Fact/Dimension ModelingTRANSCRIPT
Last Updated : 26th may 2003
Center of ExcellenceData Warehousing
Introduction Introduction to to
Data ModelingData Modeling
ObjectivesAt the end of this lesson, you will know :
Data Modeling for Data WarehouseWhat are dimensions and factsStar Schema and Snowflake SchemasCoverage TablesFactless TablesWhat to look for in Modeling toolsSome modeling tools
Data Modeling for Data WarehouseHow to structure the data in your data
warehouse ?Process that produces abstract data
models for one or more database components of the data warehouse
Modeling for Warehouse is different from that for Operational databaseDimensional Modeling, Star Schema Modeling
or Fact/Dimension Modeling
Modeling TechniquesEntity-Relationship Modeling
Traditional modeling techniqueTechnique of choice for OLTPSuited for corporate data warehouse
Dimensional ModelingAnalyzing business measures in the specific
business contextHelps visualize very abstract business
questionsEnd users can easily understand and navigate
the data structure
Entity-Relationship Modeling - Basic ConceptsThe ER modeling technique is a discipline
used to illuminate the microscopic relationships among data elements.
The highest art form of ER modeling is to remove all redundancy in the data.
Created databases that cannot be queried !!!!!
An Order Processing ER Model
Order Header
Order Details
Customer TableFK
Item TableFK
Salesrep tableCity
Sales District
Sales Region
Sales Country Product Brand
Product Category
FK
Entity-Relationship Modeling - Basic ConceptsEntity
Object that can be observed and classified by its properties and characteristics
Business definition with a clear boundaryCharacterized by a nounExample
Product Employee
Entity-Relationship Modeling - Basic ConceptsRelationship
Relationship between entities - structural interaction and association
described by a verb Cardinality
1-1 1-M M-M
Example : Books belong to Printed Media
Entity-Relationship Modeling - Basic ConceptsAttributes
Characteristics and properties of entitiesExample :
Book Id, Description, book category are attributes of entity “Book”
Attribute name should be unique and self-explanatory
Primary Key, Foreign Key, Constraints are defined on Attributes
Entity-Relationship Modeling – Why Not ?End users cannot understand or
remember an ER model. No graphical user interface (GUI) that
takes a general ER model and makes it usable by end users.
Softwares cannot usefully query a general ER model.
Use of the ER modeling technique defeats the basic allure of data warehousing, namely intuitive and high-performance retrieval of data.
Dimensional Modeling - Basic ConceptsRepresents the data in a standard, intuitive
framework that allows for high-performance access;
Schema designed to process large, complex, adhoc and data intensive queries.
No concern for concurrency, locking and insert/update/delete performance
Every dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables.
This characteristic "star-like" structure is often called a star join.
CITY
PRODUCT
PERIOD
CUSTOMER
SALES AMOUNT
UNITS
Measures
Dimensions
REGION
STATE
DISTRICT
CITY PRODUCT
BRAND
COLOR
CATEGORY
SIZE
DAY
MONTH
YEAR
QUARTER
CUSTOMER
CATEGORY
CONTACT
ADDRESS
Star Schema
Dimensional Modeling - Basic ConceptsFact Tables
The most useful facts in a fact table are numeric and additive
Typically represents a business transaction, or event that can be used in analyzing business process
By nature fact tables are sparseUsually very large - billions of records
Dimensional Modeling - Basic ConceptsDimension Tables
Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multipart key in the fact table.
Dimension tables, most often contain descriptive textual information
Determine contextual background for factsExamples :
Time Location/Region Customers
Dimensional Modeling - Basic ConceptsMeasures
A numeric attribute of a factRepresents performance or behavior of the
business relative to the dimensionsThe actual numbers are called variablesOccupy very little space compared to Fact
TablesExamples :
Quantity supplied Transaction amount Sales volume
Fact Table & Dimension TablesFact TablesNumerical
Measurements of business are stored in Fact Tables.
Dimensional TablesDimensions are
attributes about facts.
Conformed DimensionsDimension that means the same thing
with every possible fact table that it can be joined with
Conformed dimensions most essential For the Bus Architecture Integrated function of the Data Warehouse
Some common dimensions are :CustomerProductLocationTime
Surrogate KeysAll tables (facts and dimensions) should
not use production keys but Data Warehouse generated surrogate keysProductions keys get reused sometimes In case of mergers/acquisitions, protects you
from different key formatsProduction systems may change their systems
to generalize key definitionsUsing surrogate key will be fasterCan handle Slowly Changing dimensions well
Slowly Changing Dimensions
Certain kinds of dimension attribute changes need Certain kinds of dimension attribute changes need to be handled differently in Data Warehouseto be handled differently in Data WarehouseType I - Overwrite
e.g. Name Correction, Description changesType II - Partition History
Packing change, Customer movement Create a new dimension record with new surrogate key
Type III - Organizational changes Sales Force Reorganization Show by sales broken by new and old organizations Need to create an old and a new field
Factless Fact TablesFor Event Tracking e.g. attendance
Date_KeyStudent_KeyCourse_KeyTeacher_KeyFacility_Key
DateDimension
CourseDimension
FacilityDimension
StudentDimension
TeacherDimension
Problem : To find out which Products on promotion did not sell?
Date_KeyProduct_KeyStore_Key
Promotion_KeyDollars Sold
DateDimension
StoreDimension
ProductDimension
PromotionDimension
Units Sold
Fact Table
Coverage Tables
Date_Key
Product_Key
Store_Key
Promotion_Key
DateDimension
StoreDimension
ProductDimension
PromotionDimension
Sales Promotion Coverage Table
Coverage TablesSolution - Coverage Tables
Snowflake SchemaDimension tables are normalized by
decomposing at the attribute levelEach dimension has one key for each level
of the dimension’s hierarchyGood performance when queries involve
aggregationComplicated maintenance and metadata,
explosion in number of table.Makes user representation more complex
and intricate
Snowflake schema - Example
FactTable
DimTable
DimTable
DimTable
DimTable
AggregatesPre-stored summaries in the databaseSignificant Performance advantagePreferably should not be stored in fact
tables.May take significant time to build
aggregatesMany tools can automatically navigate to
most aggregated table that can service a query
Aggregate NavigatorsAutomatically redirect queries to the most
summarized tableSome tools like Business Objects,
Discoverer, Microstrategy, Metacube etc support this
Native database support already available
Aggregate Navigator
DBMS
LAN
Examples of Data Modeling ToolsERWIN
Supports Data Warehouse design as a modeling technique
Powersoft WarehouseArchitectModule of Power Designer specifically for DW
ModelingOracle Designer
Can be extended for Warehouse modelingOthers like Infomodeler, Silverrun are also
used
Questions