last updated : 26th may 2003 center of excellence data warehousing introductionto data modeling

28
Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introduction Introduction to to Data Modeling Data Modeling

Upload: merry-washington

Post on 20-Jan-2018

213 views

Category:

Documents


0 download

DESCRIPTION

Data Modeling for Data Warehouse How to structure the data in your data warehouse ? Process that produces abstract data models for one or more database components of the data warehouse Modeling for Warehouse is different from that for Operational database  Dimensional Modeling, Star Schema Modeling or Fact/Dimension Modeling

TRANSCRIPT

Page 1: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Last Updated : 26th may 2003

Center of ExcellenceData Warehousing

Introduction Introduction to to

Data ModelingData Modeling

Page 2: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

ObjectivesAt the end of this lesson, you will know :

Data Modeling for Data WarehouseWhat are dimensions and factsStar Schema and Snowflake SchemasCoverage TablesFactless TablesWhat to look for in Modeling toolsSome modeling tools

Page 3: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Data Modeling for Data WarehouseHow to structure the data in your data

warehouse ?Process that produces abstract data

models for one or more database components of the data warehouse

Modeling for Warehouse is different from that for Operational databaseDimensional Modeling, Star Schema Modeling

or Fact/Dimension Modeling

Page 4: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Modeling TechniquesEntity-Relationship Modeling

Traditional modeling techniqueTechnique of choice for OLTPSuited for corporate data warehouse

Dimensional ModelingAnalyzing business measures in the specific

business contextHelps visualize very abstract business

questionsEnd users can easily understand and navigate

the data structure

Page 5: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Entity-Relationship Modeling - Basic ConceptsThe ER modeling technique is a discipline

used to illuminate the microscopic relationships among data elements.

The highest art form of ER modeling is to remove all redundancy in the data.

Created databases that cannot be queried !!!!!

Page 6: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

An Order Processing ER Model

Order Header

Order Details

Customer TableFK

Item TableFK

Salesrep tableCity

Sales District

Sales Region

Sales Country Product Brand

Product Category

FK

Page 7: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Entity-Relationship Modeling - Basic ConceptsEntity

Object that can be observed and classified by its properties and characteristics

Business definition with a clear boundaryCharacterized by a nounExample

Product Employee

Page 8: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Entity-Relationship Modeling - Basic ConceptsRelationship

Relationship between entities - structural interaction and association

described by a verb Cardinality

1-1 1-M M-M

Example : Books belong to Printed Media

Page 9: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Entity-Relationship Modeling - Basic ConceptsAttributes

Characteristics and properties of entitiesExample :

Book Id, Description, book category are attributes of entity “Book”

Attribute name should be unique and self-explanatory

Primary Key, Foreign Key, Constraints are defined on Attributes

Page 10: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Entity-Relationship Modeling – Why Not ?End users cannot understand or

remember an ER model. No graphical user interface (GUI) that

takes a general ER model and makes it usable by end users.

Softwares cannot usefully query a general ER model.

Use of the ER modeling technique defeats the basic allure of data warehousing, namely intuitive and high-performance retrieval of data.

Page 11: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Dimensional Modeling - Basic ConceptsRepresents the data in a standard, intuitive

framework that allows for high-performance access;

Schema designed to process large, complex, adhoc and data intensive queries.

No concern for concurrency, locking and insert/update/delete performance

Every dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables.

This characteristic "star-like" structure is often called a star join.

Page 12: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

CITY

PRODUCT

PERIOD

CUSTOMER

SALES AMOUNT

UNITS

Measures

Dimensions

REGION

STATE

DISTRICT

CITY PRODUCT

BRAND

COLOR

CATEGORY

SIZE

DAY

MONTH

YEAR

QUARTER

CUSTOMER

CATEGORY

CONTACT

ADDRESS

Star Schema

Page 13: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Dimensional Modeling - Basic ConceptsFact Tables

The most useful facts in a fact table are numeric and additive

Typically represents a business transaction, or event that can be used in analyzing business process

By nature fact tables are sparseUsually very large - billions of records

Page 14: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Dimensional Modeling - Basic ConceptsDimension Tables

Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multipart key in the fact table.

Dimension tables, most often contain descriptive textual information

Determine contextual background for factsExamples :

Time Location/Region Customers

Page 15: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Dimensional Modeling - Basic ConceptsMeasures

A numeric attribute of a factRepresents performance or behavior of the

business relative to the dimensionsThe actual numbers are called variablesOccupy very little space compared to Fact

TablesExamples :

Quantity supplied Transaction amount Sales volume

Page 16: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Fact Table & Dimension TablesFact TablesNumerical

Measurements of business are stored in Fact Tables.

Dimensional TablesDimensions are

attributes about facts.

Page 17: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Conformed DimensionsDimension that means the same thing

with every possible fact table that it can be joined with

Conformed dimensions most essential For the Bus Architecture Integrated function of the Data Warehouse

Some common dimensions are :CustomerProductLocationTime

Page 18: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Surrogate KeysAll tables (facts and dimensions) should

not use production keys but Data Warehouse generated surrogate keysProductions keys get reused sometimes In case of mergers/acquisitions, protects you

from different key formatsProduction systems may change their systems

to generalize key definitionsUsing surrogate key will be fasterCan handle Slowly Changing dimensions well

Page 19: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Slowly Changing Dimensions

Certain kinds of dimension attribute changes need Certain kinds of dimension attribute changes need to be handled differently in Data Warehouseto be handled differently in Data WarehouseType I - Overwrite

e.g. Name Correction, Description changesType II - Partition History

Packing change, Customer movement Create a new dimension record with new surrogate key

Type III - Organizational changes Sales Force Reorganization Show by sales broken by new and old organizations Need to create an old and a new field

Page 20: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Factless Fact TablesFor Event Tracking e.g. attendance

Date_KeyStudent_KeyCourse_KeyTeacher_KeyFacility_Key

DateDimension

CourseDimension

FacilityDimension

StudentDimension

TeacherDimension

Page 21: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Problem : To find out which Products on promotion did not sell?

Date_KeyProduct_KeyStore_Key

Promotion_KeyDollars Sold

DateDimension

StoreDimension

ProductDimension

PromotionDimension

Units Sold

Fact Table

Coverage Tables

Page 22: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Date_Key

Product_Key

Store_Key

Promotion_Key

DateDimension

StoreDimension

ProductDimension

PromotionDimension

Sales Promotion Coverage Table

Coverage TablesSolution - Coverage Tables

Page 23: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Snowflake SchemaDimension tables are normalized by

decomposing at the attribute levelEach dimension has one key for each level

of the dimension’s hierarchyGood performance when queries involve

aggregationComplicated maintenance and metadata,

explosion in number of table.Makes user representation more complex

and intricate

Page 24: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Snowflake schema - Example

FactTable

DimTable

DimTable

DimTable

DimTable

Page 25: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

AggregatesPre-stored summaries in the databaseSignificant Performance advantagePreferably should not be stored in fact

tables.May take significant time to build

aggregatesMany tools can automatically navigate to

most aggregated table that can service a query

Page 26: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Aggregate NavigatorsAutomatically redirect queries to the most

summarized tableSome tools like Business Objects,

Discoverer, Microstrategy, Metacube etc support this

Native database support already available

Aggregate Navigator

DBMS

LAN

Page 27: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Examples of Data Modeling ToolsERWIN

Supports Data Warehouse design as a modeling technique

Powersoft WarehouseArchitectModule of Power Designer specifically for DW

ModelingOracle Designer

Can be extended for Warehouse modelingOthers like Infomodeler, Silverrun are also

used

Page 28: Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling

Questions