introduction to data warehousingcsci253/presentations s12/data warehousin… · a data warehouse is...

32
Introduction to Data Warehousing

Upload: others

Post on 24-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Introduction to Data Warehousing

Page 2: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

What is Data Warehouse ?

Page 3: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

What is Data Warehouse ?

Data warehouse (DW) is a database mainly used to facilitate

reporting and analysis in businesses.

Page 4: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

What is Data Warehouse ? The definition of the data warehouse focuses on data storage. The main

source of the data extracted is cleaned, transformed, cataloged and

made available for use by managers and other business professionals

for data mining, online analytical processing, market research and

decision support.

Page 5: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

What is Data Warehouse ?

In other words many large organizations are data rich but information

poor. So Data Warehousing reverses the situation by turning the

organization to be more information rich.

Page 6: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

What is Data Warehouse ?

It combines data from multiple and varied sources into one

comprehensive and easily manipulated database.

Page 7: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

What is Data Warehouse ? The data stored in the warehouse are uploaded from several operational DB

systems (such as marketplace, sales etc) and integrated into one large DB.

Then it can be subdivided into small group units of data called Data Marts

allowing the user to choose the source and type of data depending on current

needs.

Page 8: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Why Data Warehouse ?

Page 9: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Scenario: ABC Pvt Ltd.

ABC Pvt Ltd is a company with several branches in

different cities. The Sales Manager wants a quarterly sales

report for the whole company. Each branch has it’s own

separate operational system.

Page 10: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Scenario: ABC Pvt Ltd.

City A

City B

City C

City D

Sales

Manager Report on sales per item

type for first quarter.

Page 11: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Solution: Date Warehousing

Extract sales information from each database.

Store the information in a common database at a

single warehouse.

Refresh and update warehouse at regular intervals so that it contains up to date information for analysis.

Retrieve and analyze information from warehouse as it contains all data with a historical perspective.

Page 12: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

City A

City B

City C

City D

Data

Warehouse

Sales

Manager

Query &

Analysis tools

Report

Page 13: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Why is Data Warehousing special ?

Page 14: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Operational vs. Warehouse System

A data warehouse is a

- Subject-oriented,

- Integrated,

- Time-variant,

- Nonvolatile

collection of data in support of the management’s decision making

process.

Page 15: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Subject-oriented

Data warehouse can organized around subjects such as sales, products

and customers.

For example, to learn more about your company's sales data, you can

build a warehouse that concentrates on sales. Using this warehouse,

you can answer questions like “Who was our best customer for this

item last year? ” or “Which employee made the most sales last

month?”

It excludes data not useful in decision support process.

Page 16: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Integration

Data Warehouse is constructed by integrating multiple heterogeneous

sources with different data types, such as relational databases, OLTP

files, Flat files, and refined into one common coding scheme.

RDBMS

OLTP

Data

Warehouse

Flat File Data Processing

Data Transformation

Page 17: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Time-variant

Provides information from historical perspective

Every data key structure contains either implicitly or explicitly an

element of time.

A data warehouse generally stores data that is 5-10 years old, to be

used for comparisons, trends, and forecasting.

Page 18: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Nonvolatile

A Data Warehouse is always a physically separate store of data.

Due to this separation, data warehouses do not require transaction processing, recovery, or backup. The data are not updated or changed in any way once they enter the data warehouse, but are only loaded, refreshed and accessed for queries.

Page 19: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Operational vs. Warehouse System

Features Operational Warehouse

Characteristics Operational processing Informational processing

Orientation Transaction Analysis

User Clerk,DBA,database

professional

Knowledge workers

Function Day to day operation Decision support

Data Current Historical

View Detailed,flat relational Summarized,

multidimensional

DB design Application oriented Subject oriented

Unit of work Short ,simple transaction Complex query

Access Read/write Mostly read

Page 20: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Operational vs. Warehouse System

Features Operational Warehouse

Focus Data in Information out

Number of records

accessed

tens millions

Number of users thousands hundreds

DB size 100MB to GB 100 GB to TB

Priority High performance,high

availability

High flexibility, end-

user autonomy

Page 21: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

The component of the data

warehouse

Page 22: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Two-Tier Architecture

For the dimensional analysis the client software may

require the data to be well structured as a star schema that

simplifies both the software operations and users view for

the data

Page 23: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Three-Tier Architecture

Adding application server between the client and the

warehouse to improve performance and reduce networking

traffic

It will manage the interaction with the warehouse , perform

the calculations and send the results to the client

Page 24: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Three-Tier (MDD) Architecture

The Multi Dimensional Database can store the data in

special structure designed to facilitate the dimensional

analysis

Page 25: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

The application for Data

Warehousing

Page 26: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Data warehousing tools / Extract ,

Transform , Load

Page 27: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Summary of the processes the applications do

Page 28: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Data

Extraction and Transformation

Normalized Changed into

Relational Data &

Page 29: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Data

Cleaning

AT&T

ATT

AT and T

are all different

spelling of the

same name

ATT

Page 30: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Stored in

Data warehouse

SQL

End User

Middle

ware

Page 32: Introduction to Data Warehousingcsci253/Presentations S12/data warehousin… · A Data Warehouse is always a physically separate store of data. Due to this separation, data warehouses

Thank You