paul k chen

52
1 Paul K Chen Introduction to Data Warehouse Chapter 1 Data Warehouse Fundamentals

Upload: dwayne

Post on 31-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Data Warehouse Fundamentals. Chapter 1. Introduction to Data Warehouse. Paul K Chen. 1. Introduction to Data Warehouse. Portions of the Materials at this website subject- Data Warehouse Fundamentals -are drawn from the Textbooks below: Data Warehouse Fundamentals - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Paul K Chen

1

Paul K Chen

Introduction to Data Warehouse

Chapter 1

Data Warehouse Fundamentals

Page 2: Paul K Chen

Introduction to Data Warehouse

Portions of the Materials at this website subject-Data Warehouse Fundamentals -are drawn from the Textbooks below:  Data Warehouse FundamentalsAuthor: Paulraj PonniahPublisher: John Wiley & Sons, Inc. 2001 

Database SystemsAuthors: Thomas Connolly and Carolyn BeggPublisher: Wesley Longman, Inc. Second Edition

Page 3: Paul K Chen

Road Map for Learning By Subject DW Overview

DW Architecture/Components/Building Blocks

Relational & Dimensional Modeling-DW DB Design

Analyzing DW Business Requirements

Trends

Chapters 3

Chapters 6,7

Chapters 1

DW Information Delivery/Data Retrieval by OLAP and Data Mining via Web

DW Project Planning and

ManagementChapter 4

Physical Design Process andData Quality

Chapter 11Chapters 8, 9, 10

Chapter 2

Chapter 5

Page 4: Paul K Chen

Chapter 1 - Objectives

Understand the differences between data and information and the information crisis

Recognize the information crisis at every enterprise Understand the various ways of organizing and

managing information for decision making use Review the history of decision support systems Learn briefly what is data warehouse and see why data

warehousing is the viable solution

Page 5: Paul K Chen

Data and Information

We’re told we live in the “information age”. People often talk about data and information as if

there were the same. They are, in many regards, opposite.

A datum is just a fact—your name is a fact, your phone number is a fact.

Information is data that is presented in a meaningful, understandable and. beneficial format. Information is data that has been organized , sequenced, correlated and summarized, such as a phone book.

Page 6: Paul K Chen

Data and Information

A phone book is information. It not only contains names and phone numbers, but it correctly associates each person’s phone number with their names. It presents this list of correlated names and phone numbers in alphabetical sequence, so that we find the phone number from the name. In addition, it divides the phone numbers into two types; personal and business.

It is the function of the computer to convert data to information.

Page 7: Paul K Chen

Definitions

Database: The database is a place where you put your data; data that you wish to convert to information at some future time.

Database Management System: A DBMS is the software that converts the data in your database to information. It is the DBMS that provides you the capability for cross-referencing, correlating, sorting, summarizing, etc.

Page 8: Paul K Chen

Information as A Competitive Weapon

Information technology and quality information are not

the goals, but merely to support organizations to reach

goals of

Superior products and services

Greater productivity

Eventually success

Page 9: Paul K Chen

Data, Information, and Decision

Data

Information (Data + Process)

Knowledge

Decision (Information +

Knowledge)

Data/Information/Decision

Data Resource Management (DRM)

MIS (OLTP) & OOAD

KM (Knowledge Mgt), KWS (Knowledge Work Systems)

DSS; ESS, EIS (Executive Information Systems)

Data Warehousing/Data Mart/Data Mining/OLAP (Executive, Collaborative and individual levels)

Page 10: Paul K Chen

Data, Information, and Decision bySubject

Data Data processing

+ Processing System Analysis/Design Information MIS, Database Systems Object (Data+Processing) Object-Oriented SD/DA

Knowledge Artificial Intelligence

+ Information Expert system Decision (executive level) DSS, EIS Decision (all levels, sophisticated) Data warehousing

Data Mining

Page 11: Paul K Chen

The Information Crisis

Integrated: Must have a single, enterprise-wide view. Data Integrity: Information must be accurate and

must conform to business rules. Accessible: Easily accessible with intuitive access

paths, and responsive for analysis. Credible: Every business factor must have one and one

value. Timely: Information must be available within the

stipulated time frame.

Page 12: Paul K Chen

The Era of Information-Based Management—Five Themes

A Single Information Source (E-Business)

Distributed Information Availability (XML)

Information In A Business Context (Decision Support Systems)

Automated Information Delivery (for ex., Trigger)

Information Quality and Ownership (for ex., DRM)

Page 13: Paul K Chen

Complete E-Business Suite

One Database

MarketingSales

Order Mgt

Procurement

Supply Chain (SCM)Manufacturing

FinancialServices

Human Resources

Projects

CustomerRelationship(CRM)

ERP EAI

Page 14: Paul K Chen

What is EAI? What is EAI? EAI refers to Enterprise Application Integration.

EAI is the merging of applications and data from various new and legacy systems within a business. Various means are employed to accomplish EAI, including middleware, in order to unify IT resources, maximize new ERP investments, diminish errors and get everyone on the same page. EAI enables companies to link their existing software applications with each other and with portals. EAI provides the ability to get their applications to exchange critical data. EAI is usually close to the top of any CIO's list of concerns. There are different approaches to EAI. Some rely on linking specific applications with tailored code, but most rely on generic solutions, typically called middleware. XML, combined with SOAP and UDDI, is a kind of middleware.

Page 15: Paul K Chen

Data Warehouse & ERP

– ERP = Enterprise Resource Planning

– A software solution that addresses enterprise needs taking the process view of an organization to meet the

organization goals.

-- It integrates all the departments and functions across

a company into a single computer system that can

serve all those different departments’ particular

needs.

Page 16: Paul K Chen

Information System Categories

Page 17: Paul K Chen

Information System Categories

Page 18: Paul K Chen

DATA RESOURCE MANAGEMENT (DRM)

DEFINITION

DATA RESOURCE MANAGEMENT (DRM) IS THE

BUSINESS DISCIPLINE WHICH FOCUSES ON HOW

DATA CAN BE MANAGED TO MOST EFFICIENTLY

SUPPORT THE BUSINESS ENTERPRISE. DRM

ADDRESSES THE MANAGEMENT OF ALL

ENTERPRISE DATA. WHEN COMBINED WITH OTHER

ENTERPRISE PROCESSES, DRM PROVIDES

INFORMATION WHEN NEEDED, WHERE NEEDED, IN

THE FORM NEEDED, WITH DESIRED ACCURACY

AND AT MINIMUM COST FOR BUSINESS

ENTERPRISE.

Page 19: Paul K Chen

DATA RESOURCE MANAGEMENT (DRM)

DATA RESOURCE MANAGEMENT BECOMES INCREASINGLY CRITICAL TO THE SUCCESS OF THE CORPORATION IN THE MARKETPLACE DUE TO THESE NEW REALITIES:

THE COMPETITIVE, GLOBAL ENVIRONMENT THAT BUSINESS IS FACING

EXPLOSIVE GROWTH OF THE WEB OVER THE INTERNET

INCREASING USE OF DATA WAREHOUSE SYSTEMS TO MAKE BETTER DECISIONS

Page 20: Paul K Chen

DATA RESOURCE MANAGEMENT (DRM)

WHAT IT IS:

PROVIDING A UNIFIED AND INTEGRATED APPROACH FOR PLANNING, CONTROL AND INTEGRATION OF OUR DATA ASSETS IN SUPPORT OF ENTERPRISE’S BUSINESS

ENCOURAGING THE REDUCTION OF UNNECESSARY DATA DUPLICATION

ENCOURAGING THE REUSE AND SHARING OF HIGH QUALITY DATA

DONE RIGHT, THE INVESTMENT CAN BE PAID BACK MANY TIMES OVER.

Page 21: Paul K Chen

DRM PRINCIPLES

THE FOLLOWING PRINCIPLES SERVE AS

GUIDELINES FOR MANAGING DATA AS AN

ENTERPRISE DATA:

STRATEGICALLY AND TECHNICALLY DRIVEN:

THE EXISTENCE OF EACH DATA ITEM MUST BE JUSTIFIED BY A BUSINESS PROCESS REQUIRED OF EITHER SHORT-TERM OR LONG-TERM GOALS.

Page 22: Paul K Chen

DRM PRINCIPLES (Continued)

DATA LIFE CYCLE ASSESSMENT

DATA LIFE CYCLE FROM ACQUISITION OR CREATION TO PRODUCTION OR DELETION MUST BE PERIODICALLY ASSESSED BASED ON BUSINESS NEEDS AND CLIMATES.

Page 23: Paul K Chen

DRM PRINCIPLES (Continued)

DATA DEFINED

DATA MUST BE UNIQUELY DEFINED AND ASSIGNED PRECISE MEANING PER ORGANIZATION VOCABULARY.

Page 24: Paul K Chen

DRM PRINCIPLES (Continued)

INTEGRITY

DATA INTEGRITY RULES MUST BE MAINTAINED TO ASSURE CONSISTENCY AND TO CONTROL REDUNDANCY.

Page 25: Paul K Chen

DRM PRINCIPLES (Continued)

SECURITY/CONFIDENTIALITY

DATA MUST BE PROTECTED FROM UNAUTHORIZED AND INADVERTENT ACCESS, MODIFICATION, DESTRUCTION AND DISCLOSURE.

Page 26: Paul K Chen

DRM PRINCIPLES (Continued)

ACCESSIBILITY

DATA MUST BE MADE AVAILABLE WHEN AND WHERE NEEDED FOR SHARING AND REUSE.

Page 27: Paul K Chen

DRM PRINCIPLES (Continued)

DATA STEWARDSHIP

DATA SUBJECT AREAS WILL BE MANAGED BY A TEAM OF PEOPLE KNOWN AS DATA OWNERS AND CUSTODIANS. THE GROUP IS RESPONSIBLE FOR ASSURING THAT DATA STRUCTURE REFLECTS BUSINESS POLICIES AND RULES.

Page 28: Paul K Chen

DRM PRINCIPLES (Continued)

COST/BENEFIT OPTIMIZATION

DATA MUST BE UTILIZED TO MAXIMIZE BUSINESS

BENEFITS AT A MINIMUM COST.

Page 29: Paul K Chen

Knowledge Management (KM) – Side Benefits of DRM

It is a systematic process for capturing, integrating, organizing, and communicating knowledge accumulated by employees.

It is a vehicle to share corporate knowledge so that employees may be more more effective and be productive in their work.

A knowledge management system must store all such knowledge in a knowledge repository.

Page 30: Paul K Chen

What is AI?

What is intelligence?

– The ways humans think..

– The ways humans behave ..

– The ways rational/intelligent things think..

– -The ways rational/intelligent things behave… AI is the science of understanding intelligence and the

art of making intelligent things

Page 31: Paul K Chen

What does AI do?

Automation of problem solving

– Learning

– Memory (Knowledge Representation)

– Reasoning

– Acting Study of mental faculty through computational models Making computers do what people do better now (or

did better at some point!)

Page 32: Paul K Chen

History of Decision-Support Systems

Ad Hoc Reports Special Extract Programs Small Applications Information Centers Decision-Support Systems Executive Information Systems

Page 33: Paul K Chen

Four Levels of Analytical Processing

In modern organization, at least four levels of analytical processing should be supported by information systems

– First level: Consists of simple queries and reports against current and historical data

– Second level: Goes deeper and requires the ability to do “what if” processing across data store dimensions

Page 34: Paul K Chen

Four Levels of Analytical Processing

– Third level: Needs to step back and analyze what has previously occurred to bring about the current stat of the data

– Fourth level: Analyzes what has happened in the past and what needs to be done in the future in order to bring some specific change

Page 35: Paul K Chen

The Evolution of Data Warehousing

Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost-effective services to the customer.

This resulted in accumulation of growing amounts of data in operational databases.

Page 36: Paul K Chen

The Evolution of Data Warehousing

Organizations now focus on ways to use operational data to support decision-making, as a means of gaining competitive advantage.

However, operational systems were never designed to support such business activities.

Businesses typically have numerous operational systems with overlapping and sometimes contradictory definitions.

Page 37: Paul K Chen

The Evolution of Data Warehousing

Organizations need to turn their archives of data into a source of knowledge, so that a single integrated / consolidated view of the organization’s data is presented to the user.

A data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources.

Page 38: Paul K Chen

Objectives of Today’s Businesses

Access and combine data from a variety of data stores Perform complex data analysis across these date stores Create multidimensional views of data and its

metadata Easily summarize and roll up the information across

subject areas and business dimensions

Page 39: Paul K Chen

These objectives cannot be met easily

Data is scattered in many types of incompatible structures.

Lack of documentation has prevented from integration older legacy systems with newer systems

Internet software like searching engine needs to be improved

Accurate and accessible metadata across multiple organizations is hard to get

Page 40: Paul K Chen

A New Type of System Environment

Data is designed for analytical tasks Data from multiple applications Easy to use and conductive to long interactive sessions by users Read-intensive data usage Direct interaction with the system by the users without IT

assistance Content updated periodically and stable Content to include current and historical data Ability for users to run queries and get results online Ability for users to initiate reports

Page 41: Paul K Chen

What is a Data Warehouse?

Characteristics:

1. A central database that is loaded from multiple operational databases for the purpose of end-user access and decision

support.

Data Warehousing is a decision support system. It has theFollowing characteristics:

Page 42: Paul K Chen

What is a Data Warehouse? - Continued

2. A data warehouse differs from an

operational system in that the data it contains is normally static and

updated in a scheduled manner through

massive

loading procedures.

Page 43: Paul K Chen

What is a Data Warehouse? - Continued

3. A data warehouse is developed to accommodate random, ad hoc queries and to allow users to ‘drill down’ to

minute levels of detail.

Page 44: Paul K Chen

Definition

Bill Inmon defines a central data warehouse as a database that is: 1. Subject Oriented Data naturally congregates around major

categories within any corporation. These categories are called subject areas. For example, subject areas are bill of material, customer, product, and criminal profile. The subject area will be designed to contain only the data appropriate for decision support analysis.

Page 45: Paul K Chen

Definition (Continued)

2. Integrated Data integration is displayed by

consistence in the measurement of variables, naming conventions, physical data definitions across the data. There will be only one definition, identifier, etc., for each

subject area.

Page 46: Paul K Chen

Definition (Continued)

3. Time Variant

Data in the DW is historical and accurate as of some point in time. Since DW data is extracted from operational systems, it must have an element of time as part of its key structure.

Page 47: Paul K Chen

Definition (Continued)

4. Static Since the data in DW is a snap shot

extracted from operational system, it must be

static or non-updateable.

Page 48: Paul K Chen

Definition (Continued)

Data in the warehouse is summarized at different levels.

Granularity levels are based on the data types and the expected system performance for queries.

5. Data Granularity

Page 49: Paul K Chen

The Benefits of Data Warehouse

Enable workers to make better and wiser decisions

A data warehouse is specifically developed to allow users the ability to explore data in an unlimited number of ways, accommodating essentially any query a manager could dream up and providing access to the data sources that are behind the results. For example, information gleaned from a data warehouse can change pricing information.

Page 50: Paul K Chen

The Benefits of Data Warehouse

Identify hidden business opportunities

A data warehouse performs a second, and very valuable function by searching data for trends and abnormalities which users may not know to look for.

For example: Assisting companies in

spotting sales trends, and detecting erroneous or fraudulent billings.

Page 51: Paul K Chen

The Benefits of Data Warehouse

Bending with the customer

A data warehouse can help companies by really understanding who their customers are and what services they are using.

For example, by collecting and analyzing

internet portal click stream data, companies are able to build extensive user profiles to boost profits through sales channel.

Page 52: Paul K Chen

The Benefits of Data Warehouse

Precision Marketing

A data warehouse can aid in detecting segments of the marketplace (geographically and demographically) which remain untapped, and help show the best way to reach out to these potential customers (rapid response to market and technology trends).