dataware house introduction by quontrasolutions
DESCRIPTION
Quontra Solutions main motto is to Provide Industry Oriented best Online Training on all IT Courses. All our courses are taught by experienced trainers who have extensive field knowledge with the topics they teach. We are offering Job Oriented online Training Program on Informatica. Learn Informatica Course from Real Time Experienced Trainers. Quontra Solutions provide Training to wide range of customers like for the working professional, job seeking candidates, corporate & to the students. Coming to learning part to work in Informatica minimum Intermediate level of SQL knowledge and atleast basic level of UNIX knowledge (as Informatica installed in UNIX environment most cases) adding to it some analytical skills in writing expressions.TRANSCRIPT
![Page 1: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/1.jpg)
INTRODUCTION TO DATA
WAREHOUSINGBY
QUONTRA SOLUTIONS
PHONE : (404)-900-9988
EMAIL :
WEBSITE : WWW.QUONTRASOLUTIONS.COM
![Page 2: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/2.jpg)
DATA WAREHOUSE Maintain historic data Analysis to get better understanding of business Better Decision making Definition: A data warehouse is a
subject-oriented
integrated
time-varying
non-volatile
collection of data that is used primarily in organizational decision making.
-- Bill Inmon, Building the Data Warehouse 1996
![Page 3: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/3.jpg)
SUBJECT ORIENTED• Data warehouse is organized around subjects such as
sales, product, customer.• It focuses on modeling and analysis of data for
decision makers.• Excludes data not useful in decision support process.
![Page 4: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/4.jpg)
INTEGRATED• Data Warehouse is constructed by integrating
multiple heterogeneous sources.• Data Preprocessing are applied to ensure consistency.
RDBMS
LegacySystem
DataWarehouse
Flat File
Data ProcessingData Transformation
Data ProcessingData Transformation
![Page 5: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/5.jpg)
NON-VOLATILE• Mostly, data once recorded will not be updated.• Data warehouse requires two operations in data
accessing- Incremental loading of data- Access of data
load access
![Page 6: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/6.jpg)
TIME VARIANT• Provides information from historical perspective e.g.
past 5-10 years• Every key structure contains either implicitly or
explicitly an element of time
![Page 7: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/7.jpg)
WHY DATA WAREHOUSE? Problem Statement:• ABC Pvt Ltd is a company with branches at USA,
UK,CANADA,INDIA• The Sales Manager wants quarterly sales report
across the branches. • Each branch has a separate operational system
where sales transactions are recorded.
![Page 8: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/8.jpg)
WHY DATA WAREHOUSE?
USA
UK
CANADA
INDIA
SalesManager
Get quarterly sales figure for each branch
and manually calculate sales figure across branches.
What if he need daily sales report across the branches?
![Page 9: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/9.jpg)
WHY DATA WAREHOUSE? Solution:• Extract sales information from each database.• Store the information in a common repository at a
single site.
![Page 10: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/10.jpg)
WHY DATA WAREHOUSE?
USA
UK
CANADA
INDIA
DataWarehouse
SalesManager
Query &Analysis tools
![Page 11: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/11.jpg)
CHARACTERISTICS OF DATA WAREHOUSE Relational / Multidimensional database
Query and Analysis rather than transaction Historical data from transactions Consolidates Multiple data sources Separates query load from transactions Mostly non volatile Large amount of data in order of TBs
![Page 12: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/12.jpg)
WHEN WE SAY LARGE - WE MEAN IT!• Terabytes -- 10^12
bytes:
• Petabytes -- 10^15 bytes:
• Exabytes -- 10^18 bytes:
• Zettabytes -- 10^21 bytes:
• Zottabytes -- 10^24 bytes:
Yahoo! – 300 Terabytes and growing
Geographic Information Systems
National Medical Records
Weather images
Intelligence Agency Videos
![Page 13: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/13.jpg)
OLTP VS DATA WAREHOUSE (OLAP)OLTP Data Warehouse (OLAP)
Indexes Few Many
Data Normalized Generally De-normalized
Joins Many Some
Derived data and aggregates Rare Common
![Page 14: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/14.jpg)
DATA WAREHOUSE ARCHITECTURE
FlatFiles
ETL(Extract
Transformand Load)
Data Warehouse
InventoryData Mart
Data Mining
Analysis
Reporting
GenericData Mart
SalesData Mart
Operational System
Operational System
FlatFiles
![Page 15: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/15.jpg)
ETL ETL stands for Extract, Transform and Load Data is distributed across different sources
– Flat files, Streaming Data, DB Systems, XML, JSON
Data can be in different format– CSV, Key Value Pairs
Different units and representation– Country: IN or India– Date: 20 Nov 2010 or 20101020
![Page 16: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/16.jpg)
ETL FUNCTIONS Extract
– Collect data from different sources– Parse data– Remove unwanted data
Transform– Project– Generate Surrogate keys– Encode data– Join data from different sources– Aggregate
Load
![Page 17: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/17.jpg)
ETL STEPS• The first step in ETL process is mapping the data
between source systems and target database. • The second step is cleansing of source data in staging
area. • The third step is transforming cleansed source data. • Fourth step is loading into the target system.
Data before ETL Processing:
Data after ETL Processing:
![Page 18: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/18.jpg)
ETL GLOSSARYMapping:
Defining relationship between source and target objects.
Cleansing:
The process of resolving inconsistencies in source data.
Transformation:
The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include aggregating, and integrating data from multiple sources.
Staging Area:
A place where data is processed before entering the warehouse.
![Page 19: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/19.jpg)
DIMENSION Categorizes the data. For example - time, location,
etc. A dimension can have one or more attributes. For
example - day, week and month are attributes of time dimension.
Role of dimensions in data warehousing.- Slice and dice- Filter by dimensions
![Page 20: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/20.jpg)
TYPES OF DIMENSIONS• Conformed Dimension - A dimension that is shared across fact
tables. • Junk Dimension - A junk dimension is a convenient grouping
of flags and indicators. For example, payment method, shipping method.
• De-generated Dimension - A dimension key, that has no attributes and hence does not have its own dimension table. For example, transaction number, invoice number. Value of these dimension is mostly unique within a fact table.
• Role Playing Dimensions - Role Playing dimension refers to a dimension that play different roles in fact tables depending on the context. For example, the Date dimension can be used for the ordered date, shipment date, and invoice date.
• Slowly Changing Dimensions - Dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule.
![Page 21: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/21.jpg)
TYPES OF SLOWLY CHANGING DIMENSION
• Type1 - The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all.
• Type 2 - The Type 2 method tracks historical data by creating multiple records for a given value in dimension table with separate surrogate keys.
• Type 3 - The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it's limited to the number of columns we designate for storing historical data.
• Type 4 - The Type 4 method is usually referred to as using "history tables", where one table keeps the current data, and an additional table is used to keep a record of all changes.
Type 1, 2 and 3 are commonly used.
Some books talks about Type 0 and 6 also.
http://en.wikipedia.org/wiki/Slowly_changing_dimension
![Page 22: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/22.jpg)
FACTS Facts are values that can be examined and analyzed. For Example - Page Views, Unique Users, Pieces Sold,
Profit. Fact and measure are synonymous. Types of facts:
– Additive - Measures that can be added across all dimensions.
– Non Additive - Measures that cannot be added across all dimensions.
– Semi Additive - Measures that can be added across few dimensions and not with others.
![Page 23: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/23.jpg)
HOW TO STORE DATA? Facts and Dimensions:
1. Select the business process to model
2. Declare the grain of the business process
3. Choose the dimensions that apply to each fact table row
4. Identify the numeric facts that will populate each fact table row
![Page 24: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/24.jpg)
DIMENSION TABLE Contains attributes of dimensions e.g. month is an
attribute of Time dimension. Can also have foreign keys to another dimension
table Usually identified by a unique integer primary key
called surrogate key
![Page 25: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/25.jpg)
FACT TABLE Contains Facts Foreign keys to dimension tables Primary Key: usually composite key of all FKs
![Page 26: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/26.jpg)
TYPES OF SCHEMA USED IN DATA WAREHOUSE
Star Schema Snowflake Schema Fact Constellation Schema
![Page 27: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/27.jpg)
STAR SCHEMA Multi-dimensional Data Dimension and Fact Tables A fact table with pointers to Dimension tables
![Page 28: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/28.jpg)
STAR SCHEMA
![Page 29: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/29.jpg)
SNOWFLAKE SCHEMA An extension of star schema in which the dimension
tables are partly or fully normalized. Dimension table hierarchies broken down into
simpler tables.
![Page 30: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/30.jpg)
SNOWFLAKE SCHEMA
![Page 31: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/31.jpg)
FACT CONSTELLATION SCHEMA• A fact constellation schema allows dimension tables
to be shared between fact tables. • This Schema is used mainly for the aggregate fact
tables, OR where we want to split a fact table for better comprehension.
For example, a separate fact table for daily, weekly and monthly reporting requirement.
![Page 32: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/32.jpg)
FACT CONSTELLATION SCHEMA
In this example, the dimensions tables for time, item, and location are shared between both the sales and shipping fact tables.
![Page 33: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/33.jpg)
OPERATIONS ON DATA WAREHOUSE Drill Down
Roll up Slice & Dice Pivoting
![Page 34: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/34.jpg)
DRILL DOWN
Time
Reg
ion
Product
Category e.g Home Appliances
Sub Category e.g Kitchen Appliances
Product e.g Toaster
![Page 35: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/35.jpg)
ROLL UP
Year
Quarter
Month
Fiscal Year
Fiscal Quarter
Fiscal Month
Fiscal Week
Day
![Page 36: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/36.jpg)
SLICE & DICE
Time
Reg
ion
ProductProduct = Toaster
Time
Reg
ion
![Page 37: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/37.jpg)
PIVOTING
• Also called rotation• Rotate on an axis• Interchange Rows and Columns
Time
Reg
ion
Product
Region
Tim
e
Product
![Page 38: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/38.jpg)
ADVANTAGES OF DATA WAREHOUSE• One consistent data store for reporting, forecasting,
and analysis• Easier and timely access to data• Scalability• Trend analysis and detection• Drill down analysis
![Page 39: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/39.jpg)
DISADVANTAGES OF DATA WAREHOUSE• Preparation may be time consuming.
• High associated cost
![Page 40: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/40.jpg)
CASE STUDY: WHY DATA WAREHOUSE• G2G Courier Pvt. Ltd. is an established brand in
courier industry which has its own network in main cities and also have sub contracted in rural areas across the country to various partners.
• The President of the company wants to look deep into the financial health of the company and different performance aspects.
![Page 41: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/41.jpg)
CHALLENGES• Apart from G2G’s own transaction system, each
partner has their own system which make the data very heterogeneous.
• Granularity of data in various systems is also different. For eg: minute accuracy and day accuracy.
• To do analysis on metrics like Revenue and Timely delivery across various geographical locations and partner, we need to have a unified system.
![Page 42: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/42.jpg)
DATA WAREHOUSE MODEL
Sales Fact
Region
Product ProductCategory
Time
![Page 43: Dataware House Introduction by QuontraSolutions](https://reader036.vdocuments.mx/reader036/viewer/2022062321/55cf929e550346f57b98064b/html5/thumbnails/43.jpg)
THANK YOU