63516396 data warehousing basic concepts
DESCRIPTION
dwconceptTRANSCRIPT
![Page 1: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/1.jpg)
P e o p l e M a k i n g T e c h n o l o g y W o r k ™
DATA WAREHOUSING
Basics Concepts
![Page 2: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/2.jpg)
Agenda
► Evolution of DWH► Why should we consider Data Warehousing solutions ?► Definition of Data Warehouse► Characteristics of DWH► Difference between DW’s and OLTP► DWH Life Cycle► DWH Architecture► Dimensional Data Modeling► Star Schema Design► Fact Table ► Fact Granularity► Dimension Tables► Snowflake Schema Design► Important aspects of Star Schema & Snow Flake Schema ► Data Acquisition (ETL)► ETL Concepts
![Page 3: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/3.jpg)
Evolution of DWH
Traditional approaches to computer system design during 1980’s
►Not optimized for analysis and reporting►Company wide reporting couldn’t be supported from a
single system►For developing reports often required writing specific
computer programs which was slow and expensive
![Page 4: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/4.jpg)
Why should we consider Data Warehousing solutions ?
When users are requesting access to a large amount of
historical information for reporting purposes, you should
strongly consider a warehouse or mart. The user will benefit
when the information is organized in an efficient manner for
this type of access.
![Page 5: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/5.jpg)
Def . Data Warehousing
DWH is type of relational data base system specially
designed for query analysis processing rather than
transactional processing.
The DWH systems are also called as Historical Db’s,
Read only Db’s, Integrated Db’s, Decision Supporting
System, Executive info System, Business Info System.
![Page 6: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/6.jpg)
►Subject Oriented
►Non Volatile
►Integrated
►Time Variant
Characteristics of DWH
![Page 7: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/7.jpg)
Differences………………..
DWH database (OLAP) OLTP databaseDesigned for analysis of business
measures by category and attributes.
Designed for real time business operations.
Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table.
Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table.
Loaded with consistent, valid data; requires no real time validation.
Optimized for validation of incoming data during transactions; uses validation data tables.
Supports few concurrent users relative to OLTP.
Supports thousands of concurrent users.
![Page 8: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/8.jpg)
OLAP Database (OLAP) OLTP DatabaseMultidimensional Database
StructuresNormalized Data
StructuresIndex - Many Index - Few
Joins - Few Joins - Many
Aggregated Data - More Aggregate Data - Few
No. of users - Few No. of users - More
Periodic update of data Data Modification More
Huge volumes of data Small volumes of data
![Page 9: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/9.jpg)
DWH Life Cycle
Business Analyst
Data Modular
ETL Developer
Report Developer
Testing
![Page 10: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/10.jpg)
DWH Architecture
Three common architectures are:
►DWH Architecture (Basic)
►DWH Architecture (With a staging area)
►DWH Architecture (With a staging area and data marts)
![Page 11: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/11.jpg)
DWH Architecture (Basic)
![Page 12: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/12.jpg)
DWH Architecture (with a staging area)
![Page 13: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/13.jpg)
DWH Architecture (with a staging area and data marts)
![Page 14: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/14.jpg)
Dimensional Data Modeling
To develop a Star Schema design a Data Modeler follows dimensional modeling design aspect.
Dimensional modeling is a 3 stage process
►Conceptual modeling►Logical Modeling►Physical Modeling
![Page 15: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/15.jpg)
Before start implementing the schema design a Data modeler should understand the following process
►Understand the clients Business requirements►Understand the grain of fact►Designing of the Dimension tables►Designing of the Fact tables
![Page 16: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/16.jpg)
Example of Dimensional Data Model (Star Schema Design)
![Page 17: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/17.jpg)
Fact Table
► Contain numeric measures of the business► Contains facts and connected to dimensions► two types of columns facts or measures foreign keys to dimension tables ► May contain date-stamped data► A fact table might contain either detail level facts or facts
that have been aggregated
![Page 18: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/18.jpg)
Steps in designing Fact Table
► Identify a business process for analysis(like sales). ► Identify measures or facts (sales dollar). ► Identify dimensions for facts(product dimension, location dimension,
time dimension, organization dimension). ► List the columns that describe each dimension.(region name, branch
name, region name). ► Determine the lowest level of summary in a fact table(sales dollar).
![Page 19: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/19.jpg)
Types of Facts (Measures)
► Additive - Measures that can be added across all dimensions.
► Semi Additive - Measures that can be added across few dimensions and not with others.
► Non Additive - Measures that cannot be added across all dimensions.
![Page 20: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/20.jpg)
In the example, sales fact table is connected to dimensions location, product, time and organization. Measure "Sales Dollar" in sales fact table can be added across all dimensions independently or in a combined manner which is explained below.
► Sales Dollar value for a particular product ► Sales Dollar value for a product in a location ► Sales Dollar value for a product in a year within a location ► Sales Dollar value for a product in a year within a location sold or serviced by
an employee
![Page 21: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/21.jpg)
Fact Granularity
► A fact table maintains a numerical info► It is defined as the level at which fact info/- is stored.► The level is determined by dimensional table.
Year?Quarter?Month?Week?Day?
![Page 22: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/22.jpg)
Dimension Tables► Contain textual information that represents attributes of the business► Contain relatively static data► Are joined to fact table through a foreign key reference► Are usually smaller than fact tables
Example of Location Dimension
![Page 23: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/23.jpg)
Location Dimension
Location Dimension Id
Country Name
State Name
County Name City Name Date Time Stamp
1 USA New York Shelby Manhattan 1/1/2005 11:23:31 AM
2 USA Florida Jefferson Panama City
1/1/2005 11:23:31 AM
3 USA California Montgomery San Hose 1/1/2005 11:23:31 AM
4 USA New Jersey Hudson Jersey City 1/1/2005 11:23:31 AM
Location Dimension
![Page 24: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/24.jpg)
Star Schema Design benefits
► Easy for users to understand
► Fast response to queries
► Support multi dimensional analysis
► Supported by many front end tools
![Page 25: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/25.jpg)
Snowflake Schema Design
►Dimension table hierarchies are broken into simpler tables
► In few organizations, they try to normalize the dimension tables to save space
►Both Fact and Dimensional tables are Normalized► Increases the number of joins and poor
performance in retrieval of data
►May become large and unmanageable►Degrades query performance
![Page 26: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/26.jpg)
Example of Snowflake Schema
![Page 27: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/27.jpg)
Important aspects of Star Schema & Snow Flake Schema
►In a star schema every dimension will have a primary key. ►In a star schema, a dimension table will not have any
parent table. ►Whereas in a snow flake schema, a dimension table will
have one or more parent tables. ►Hierarchies for the dimensions are stored in the
dimensional table itself in star schema. ►Whereas hierarchies are broken into separate tables in
snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.
![Page 28: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/28.jpg)
Data Acquisition
► It is the process of extracting the relevant business info/- from the different source systems transforming the data from one format into an another format, integrating the data in to homogeneous format and loading the data in to a warehouse database.
►Data Extraction (E)►Data Transformation (T)►Data Loading (L)
![Page 29: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/29.jpg)
Sample ETL Process Flow
![Page 30: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/30.jpg)
ETL Process
The ETL Process having the following basic steps
► Is mapping the data between source systems and target database
► Is cleansing of source data in staging area
► Is transforming cleansed source data and then loading into the target system
![Page 31: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/31.jpg)
►Source SystemA database, application, file, or other storage facility from which the data in a data warehouse is derived.
►MappingThe definition of the relationship and data flow between source and target objects.
►Staging AreaA place where data is processed before entering the warehouse.
►CleansingThe process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.
![Page 32: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/32.jpg)
► TransformationThe process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.
► TransportationThe process of moving copied or transformed data from a source to a data warehouse.
► Target SystemA database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.
![Page 33: 63516396 Data Warehousing Basic Concepts](https://reader036.vdocuments.mx/reader036/viewer/2022062522/577c77861a28abe0548c6f99/html5/thumbnails/33.jpg)
Thank You !!!