data warehousing

36
Data Warehousing & Data Mining By Mandar Kulkarni PRN 10030141129 MBA-IT SICSR

Upload: mandar-kulkarni

Post on 20-Jun-2015

552 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Data warehousing

Data Warehousing&

Data Mining

By Mandar KulkarniPRN 10030141129

MBA-ITSICSR

Page 2: Data warehousing

Contents

• Data warehousing• Understanding data warehousing• Data warehouse architecture• Data Mining• Data mining techniques

Page 3: Data warehousing

Warehouse?

Real time example?

Page 4: Data warehousing

Data Warehousing

Page 5: Data warehousing

Samsung

Mumbai

Delhi

Chennai

Banglore

SalesManager

Sales per item type per branchfor first quarter.

Page 6: Data warehousing

• Now, the sales manager wants to know the sales of first quarter.?

• Solution– Extract information from each database store it at

a single place, and process using operational systems.!

Page 7: Data warehousing

Mumbai

Delhi

Chennai

Banglore

DataWarehouse

SalesManager

Query &Analysis tools

Report

Solution

Page 8: Data warehousing

Operational Systems

• Running the business real time• Routine tasks• Decision Support Systems(DSS)– Help in taking actions!

• Used by people who deal with customers, products

• They are increasingly used by customers

Page 9: Data warehousing

Data Warehouse

• A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.

• A process of transforming data into information and making it available to users in a timely enough manner to make a difference

Page 10: Data warehousing

Definition

• Integrated, Subject-Oriented, Time-Variant, Nonvolatile database that provides support for

decision making

Page 11: Data warehousing

Data warehouse architecture

Page 12: Data warehousing

External

Production

Internal

Source Data

Archived Data MartsData Staging

Metadata

Data Warehouse DBMS

MDDB

Information DeliveryManagement & Control

OLAP

Report /Query

Data Mining

Page 13: Data warehousing

Components

• Source Data • Data Staging (Data Extraction, cleaning And Loading )– Talend is the first open source ETL tool

• Data Storage • Information Delivery (EIS)• Management and control

Page 14: Data warehousing

OLAP

• Online Analytical Processing Tools• DSS tools that use multidimensional data

analysis techniques– Support for a DSS data store– Data extraction and integration filter– Specialized presentation interface

• Oracle OLAP 11G

Page 15: Data warehousing

Multidimensional analysis

Page 16: Data warehousing

OLAP architecture

Page 17: Data warehousing

12 Rules of Data Warehouse

1. Data Warehouse and Operational Environments are Separated

2. Data is integrated3. Contains historical data over a long period of

time4. Data is a snapshot data captured at a given

point in time5. Data is subject-oriented

Page 18: Data warehousing

6.Mainly read-only with periodic batch updates

7.Development Life Cycle has a data driven approach versus the traditional process-driven approach

8.Data contains several levels of detail-Current, Old, Lightly Summarized, Highly Summarized

Page 19: Data warehousing

9.Environment is characterized by Read-only transactions to very large data sets

10.System that traces data sources, transformations, and storage

11.Metadata is a critical component– Source, transformation, integration, storage, relationships,

history, etc

12.Contains a chargeback mechanism for resource usage that enforces optimal use of data by end users

Page 20: Data warehousing

OLTP v/s Data warehousing

OLTP• Application Oriented • Used to Run Business• Detailed data • Current up-to date • Isolated data• Repetitive Access• Performance Sensitive• Few records accessed• Read/Update Access

Data Warehousing • Subject Oriented• Used to analyze business• Summarized and refined• Snapshot Data • Integrated Data• Ad-Hoc Access• Performance relaxed• Large volume accessed at a

time• Mostly Read

Page 21: Data warehousing

Data Warehouse summary

• Integrated platform for OLAP and DSS

• Helps optimize business operations

• Easy access to multidimensional data

Page 22: Data warehousing

Data Mining

Page 23: Data warehousing

Why Data Mining?

Strategic decision making

Wealth generation

Analyzing trends

Security

Page 24: Data warehousing

Data Mining

• Look for hidden patterns and trends in data that is not immediately apparent from summarizing the data

• No Query…

• …But an “Interestingness criteria”

Page 25: Data warehousing

Data Mining

+ =Data

Interestingnesscriteria

Hiddenpatterns

Page 26: Data warehousing

Data Mining

+ =Data

Interestingnesscriteria

Hiddenpatterns

Type of Patterns

Page 27: Data warehousing

Data Mining

+ =Data

Interestingnesscriteria

Hiddenpatterns

Type of data Type of Interestingness criteria

Page 28: Data warehousing

Type of Data• Tabular (Ex: Transaction data)

– Relational– Multi-dimensional

• Tree (Ex: XML data)

• Graphs

• Sequence (Ex: DNA, activity logs)

• Text, Multimedia …

Page 29: Data warehousing

Type of Interestingness

• Frequency• Rarity• Correlation • Length of occurrence (for sequence and temporal data)

• Consistency • Repeating / periodicity • “Abnormal” behavior • Other patterns of interestingness…

Page 30: Data warehousing

Data Mining vs Statistical Inference

Statistics:

ConceptualModel

(Hypothesis)

StatisticalReasoning

“Proof”(Validation of Hypothesis)

Page 31: Data warehousing

Data Mining vs Statistical Inference

Data mining:

MiningAlgorithmBased on InterestingnessData

Pattern (model, rule, hypothesis)discovery

Page 32: Data warehousing

Used for..

• Data mining is used for– Frequent Item-sets– Associations– Classifications– Clustering

Page 33: Data warehousing

Techniques • Algorithms– Apriori algorithm

– Decision tree• SLIQ– Supervised Learning in QUEST– IBM

• “GROUP BY”mysql> select sum(sal),deptno from emp group by deptno;

Page 34: Data warehousing

Data Mining Summary

• Helps in pattern analysis and thus taking actions –real time and future based.

• Analyzing trends and clusters in business operations.

Page 35: Data warehousing

References

• http://www.datawarehousing.com/ • http://www.dw-institute.com/ • http://www.almaden.ibm.com/cs/quest/index.html

Page 36: Data warehousing

Thank you

Any Questions?