data warehousing basic concepts

33
P e o p l e M a k i n g T e c h n o l o g y W o r k ™ DATA WAREHOUSING  Basics Concepts

Upload: sriram-krishnamoorthy

Post on 07-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 1/33

P e o p l e M a k i n g T e c h n o l o g y W o r k ™ 

DATA WAREHOUSING

 Basics Concepts

Page 2: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 2/33

  Agenda

► Evolution of DWH► Why should we consider Data Warehousing solutions ?

► Definition of Data Warehouse

► Characteristics of DWH

► Difference between DW’s and OLTP 

► DWH Life Cycle► DWH Architecture

► Dimensional Data Modeling

► Star Schema Design

► Fact Table

► Fact Granularity

► Dimension Tables

► Snowflake Schema Design

► Important aspects of Star Schema & Snow Flake Schema

► Data Acquisition (ETL)

► ETL Concepts

Page 3: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 3/33

Evolution of DWH

Traditional approaches to computer system design during 1980’s 

►Not optimized for analysis and reporting

►Company wide reporting couldn’t be supported from asingle system

►For developing reports often required writing specificcomputer programs which was slow and expensive

Page 4: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 4/33

Why should we consider Data Warehousing solutions ?

When users are requesting access to a large amount of

historical information for reporting purposes, you should

strongly consider a warehouse or mart. The user will benefit

when the information is organized in an efficient manner for

this type of access.

Page 5: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 5/33

Def . Data Warehousing

DWH is type of relational data base system specially

designed for query analysis processing rather than

transactional processing.

The DWH systems are also called as Historical Db’s, 

Read only Db’s, Integrated Db’s, Decision Supporting

System, Executive info System, Business Info System.

Page 6: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 6/33

 

►Subject Oriented

►Non Volatile

►Integrated

►Time Variant 

Characteristics of DWH

Page 7: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 7/33

Differences……………….. 

DWH database (OLAP)  OLTP database 

Designed for analysis of businessmeasures by category andattributes.

Designed for real time businessoperations.

Optimized for bulk loads and large,complex, unpredictable queriesthat access many rows pertable.

Optimized for a common set oftransactions, usually adding orretrieving a single row at a timeper table.

Loaded with consistent, valid data;

requires no real time validation.

Optimized for validation ofincoming data during

transactions; uses validationdata tables.

Supports few concurrent usersrelative to OLTP.

Supports thousands of concurrentusers.

Page 8: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 8/33

 

OLAP Database (OLAP)  OLTP Database 

Multidimensional DatabaseStructures

Normalized DataStructures

Index - Many Index - Few

Joins - Few Joins - Many

Aggregated Data - More Aggregate Data - Few

No. of users - Few No. of users - More

Periodic update of data Data ModificationMore

Huge volumes of data Small volumes of data

Page 9: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 9/33

DWH Life Cycle

Business Analyst

Data Modular

ETL Developer

Report Developer

Testing

Page 10: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 10/33

DWH Architecture

Three common architectures are:

►DWH Architecture (Basic)

►DWH Architecture (With a staging area)

►DWH Architecture (With a staging area and data marts) 

Page 11: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 11/33

DWH Architecture (Basic)

Page 12: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 12/33

DWH Architecture (with a staging area)

Page 13: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 13/33

DWH Architecture

(with a staging area and data marts)

Page 14: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 14/33

Dimensional Data Modeling

To develop a Star Schema design a Data Modeler followsdimensional modeling design aspect.

Dimensional modeling is a 3 stage process

►Conceptual modeling

►Logical Modeling

►Physical Modeling

Page 15: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 15/33

Before start implementing the schema design aData modeler should understand the followingprocess

►Understand the clients Business requirements►Understand the grain of fact

►Designing of the Dimension tables

►Designing of the Fact tables

Page 16: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 16/33

Example of Dimensional Data Model (Star Schema Design)

Page 17: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 17/33

Fact Table

► Contain numeric measures of the business

► Contains facts and connected to dimensions

► two types of columns

facts or measures

foreign keys to dimension tables► May contain date-stamped data

► A fact table might contain either detail level facts or facts

that have been aggregated

Page 18: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 18/33

Steps in designing Fact Table

► Identify a business process for analysis(like sales).

► Identify measures or facts (sales dollar).

► Identify dimensions for facts(product dimension, location dimension,

time dimension, organization dimension).

► List the columns that describe each dimension.(region name, branch

name, region name).► Determine the lowest level of summary in a fact table(sales dollar).

Page 19: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 19/33

Types of Facts (Measures)

► Additive - Measures that can be added across all dimensions.

► Semi Additive - Measures that can be added across few

dimensions and not with others.

► Non Additive - Measures that cannot be added across all

dimensions.

Page 20: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 20/33

In the example, sales fact table is connected to dimensions location, product, time

and organization. Measure "Sales Dollar" in sales fact table can be added

across all dimensions independently or in a combined manner which is

explained below.

► Sales Dollar value for a particular product

► Sales Dollar value for a product in a location

► Sales Dollar value for a product in a year within a location

► Sales Dollar value for a product in a year within a location sold or serviced by

an employee

Page 21: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 21/33

Fact Granularity

► A fact table maintains a numerical info

► It is defined as the level at which fact info/- is stored.

► The level is determined by dimensional table.

Year?

Quarter?

Month?

Week?

Day?

Page 22: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 22/33

Dimension Tables

► Contain textual information that represents attributes of the business

► Contain relatively static data► Are joined to fact table through a foreign key reference

► Are usually smaller than fact tables

Example of Location Dimension

Page 23: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 23/33

Location Dimension 

Location DimensionId 

CountryName 

StateName 

CountyName 

City Name  Date Time Stamp 

1  USA  New York  Shelby  Manhattan 1/1/2005 11:23:31

AM 

2  USA  Florida  Jefferson PanamaCity 

1/1/2005 11:23:31AM 

3  USA  California  Montgomery  San Hose 1/1/2005 11:23:31AM 

4  USA  New Jersey  Hudson  Jersey City 1/1/2005 11:23:31AM 

Location Dimension

Page 24: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 24/33

Star Schema Design benefits

► Easy for users to understand

► Fast response to queries

► Support multi dimensional analysis

► Supported by many front end tools

Page 25: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 25/33

Snowflake Schema Design

►Dimension table hierarchies are broken intosimpler tables 

► In few organizations, they try to normalize the

dimension tables to save space 

►Both Fact and Dimensional tables are Normalized

► Increases the number of joins and poor

performance in retrieval of data 

►May become large and unmanageable

►Degrades query performance 

Page 26: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 26/33

Example of Snowflake Schema

Page 27: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 27/33

Important aspects of Star Schema & Snow Flake Schema

► In a star schema every dimension will have a primarykey.

► In a star schema, a dimension table will not have anyparent table.

►Whereas in a snow flake schema, a dimension tablewill have one or more parent tables.

►Hierarchies for the dimensions are stored in thedimensional table itself in star schema.

►Whereas hierarchies are broken into separate tablesin snow flake schema. These hierarchies helps to drilldown the data from topmost hierarchies to thelowermost hierarchies.

Page 28: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 28/33

Data Acquisition

► It is the process of extracting the relevantbusiness info/- from the different sourcesystems transforming the data from oneformat into an another format, integratingthe data in to homogeneous format and

loading the data in to a warehousedatabase.

►Data Extraction (E)

►Data Transformation (T)

►Data Loading (L)

Page 29: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 29/33

Sample ETL Process Flow

Page 30: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 30/33

ETL Process

The ETL Process having the following basic steps

► Is mapping the data between source systems and target database

► Is cleansing of source data in staging area

► Is transforming cleansed source data and then loading into the target

system

Page 31: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 31/33

►Source System 

A database, application, file, or other storage facility fromwhich the data in a data warehouse is derived.

►Mapping The definition of the relationship and data flow between

source and target objects.►Staging Area 

A place where data is processed before entering thewarehouse.

►Cleansing The process of resolving inconsistencies and fixing theanomalies in source data, typically as part of the ETLprocess.

Page 32: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 32/33

► Transformation 

The process of manipulating data. Any manipulation beyondcopying is a transformation. Examples include cleansing,aggregating, and integrating data from multiple sources.

► Transportation The process of moving copied or transformed data from a

source to a data warehouse.► Target System 

A database, application, file, or other storage facility to which the"transformed source data" is loaded in a data warehouse. 

Page 33: Data Warehousing Basic Concepts

8/3/2019 Data Warehousing Basic Concepts

http://slidepdf.com/reader/full/data-warehousing-basic-concepts 33/33

 

Thank You !!!