1 chapter 5 analyzing and defining business requirements for a data warehouse paul k chen data...

57
1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Upload: jordan-may

Post on 18-Jan-2018

222 views

Category:

Documents


1 download

DESCRIPTION

Definition of The Business Requirements u The definition of requirements is the user’s statement of how he or she wants to do business, and the information required to support his or her new methods of operations.

TRANSCRIPT

Page 1: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

1

Chapter 5

Analyzing and Defining Business Requirements for

a Data Warehouse

Paul K Chen

Data Warehouse Fundamentals

Page 2: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Chapter 5- Objectives Learn what is the definition of the business

requirements Understand the role of business dimensions related to

DW business requirements Learn specifically the steps in defining and recording

DW business requirements Review methods for gathering requirements (JAD,

Interviews and Sampling) Discuss briefly architecture concepts impacted by

business requirements

Page 3: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Definition of The Business Requirements

The definition of requirements is the user’s statement of how he or she wants to do business, and the information required to support his or her new methods of operations.

Page 4: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Definition of The Business Requirements

The requirements can be broadly divided into two areas:

1 Functional requirements—written in user terminology since it is user operations that are being described.

2 Non-functional requirements –these are the limitations and demands imposed upon the computing solutions; such as architectural plan, data storage specifications and information system performance expectations.

Page 5: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Requirements As the Driving Force for Data Warehousing

Understand why business requirements are the driving force

Discuss how requirements drive every development phase

Specifically Learn how requirements influence data design

Review the impact of requirements on architecture Note the special considerations for ETL and metadata Examine how requirements shape information delivery

Page 6: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Business Requirements As the Driving Force

Business Requirements

Planning &Management

DesignArchitecture Infrastructure

Data AcquisitionData Storage

Information Delivery

ConstructionArchitecture Infrastructure

Data AcquisitionData Storage

Information Delivery

Maintenance

Deployment

Page 7: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Dimensional Nature of Business Data

Product

TimeGeography

The business data of sales units (fact) is measured and analyzed in three dimensional.

Page 8: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Examples of Business Facts and Dimensions

Supermarket ChainSupermarket Chain Manufacturing Company

Airline CompanyInsurance Business

Time Promotion

ProductStoreShipment

Time Cust-ship-toShip from

Ship ModeProductDeal

Time Customer

Flight

Fare Class

AirportStatus

Sale Unit

FrequentFlyer Flights

Time Agent

ClaimClaims Insured

PartyStatus Policy

Page 9: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Defining and Recording Information Requirements for a Data Warehouse

Nine-Step Methodology includes the following steps:Step 1: Choosing the process Step 2: Choosing the grain Step 3: Identifying and conforming the dimensions Step 4: Choosing the facts Step 5: Storing pre-calculations in the fact table Step 6: Rounding out the dimension tables Step 7: Choosing the duration of the database Step 8: Tracking slowly changing dimensions Step 9: Deciding the query priorities and the query

modes.

Page 10: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Step 1: Choosing The Process (Subject Area)

The process (function) refers to the subject matter of a particular data mart.

First data mart built should be the one that is most likely to be delivered on time, within budget, and to answer the most commercially important business questions.

Page 11: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Subject Area

Selecting the first subject area or areas to be populated Use the enterprise level data model in selecting

appropriate subject area(s) Three Options: -- Implement a single subject area (best option) -- Implement a subset of a subject area -- Implement a subset of several subject areas (most

common)• Determine how much data should be loaded and its

variety

Page 12: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Step 2: Choosing The Grain

Decide what a record of the fact table is to represent.

Identify dimensions of the fact table. The grain decision for the fact table also determines the grain of each dimension table.

Also include time as a core dimension, which is always present in star schemas. Due to disk space constraint, data selected must be time relevant in terms of trend, predictability, and profitability for the enterprise.

Page 13: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Step 3: Identifying And Conforming The Dimensions

Dimensions set the context for asking questions about the facts in the fact table.

If any dimension occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other.

A dimension used in more than one data mart is referred to as being conformed (shared).

Page 14: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Step 4: Choosing The Facts

The grain of the fact table determines which facts can be used in the data mart.

Facts should be numeric and additive.

Unusable facts include:– non-numeric facts– non-additive facts– fact at different granularity from other facts in

table.

Page 15: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Fact Criteria

Weight the Fact attributes based upon the following criteria: They exhibit measurable results to the Users and

Management.

• They are visible within the business and through management.

• They are manageable.

Page 16: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Subject Area

Subject areas are collections of like data that support analysis of the major subjects in a business. Election criteria:

They consist of two or more attributes.

• They are essential to the successful operation of the target system or business area to meet client objectives.

• They can be defined by governing business rules.

Page 17: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Step 5: Storing Pre-Calculations In The Fact Table

Once the facts have been selected each should be re-examined to determine whether there are opportunities to use pre-calculations.

Page 18: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

ADD DERIVED DATA

• Benefits Less space used Enhanced performance Breaking_lease Percentage_of_breaking_lease (< 3_months) Percentage_of_breaking_lease (>3 but < 6 months) Percentage_of_breaking_lease (>6 but <9 months) Percentage_of_breaking_lease (>9 but <12 months) Percentage_of_breaking_lease (> 12 months)

Page 19: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Add Summarization Schemes

Simple summation

Summation by group

Aggregation

Vertical summarization

Page 20: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Simple Summation --Add Summarization Schema

Individual Daily salesDate Product Qty Sales $Jan 1 nuts 100 300Jan 1 nuts 200 600

Jan 2 nuts 300 900Jan 2 nuts 100 300

Jan 3 Nuts 50 150Jan 3 Nuts 40 120

Daily Sales Summary

Date Product Qty Sales $Jan 1 Nuts 300 900Jan 2 Nuts 400 1,200Jan 3 Nuts 90 20

Page 21: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Summation By Group

Group data attributes based on usage and stability.

• Group stable and slowly changing data all in one table

• Group unstable and frequently changing data all in another table

Page 22: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Aggregation

Aggregation is used to create data marts.

For instance, a group of users frequently perform analysis comparing sales across geographic regions, broken by product line. If a data mart were created that stores the sales data already aggregated to the desired level, the users’ queries would be simpler.

Page 23: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Aggregation

Add up amounts by dayin sql: SELECT date, sum (amt) FROM SALE GROUP BY date

sale p Storedateamt

p1p2p1p2

c1c2c3c1

1121

1243

ans

date sum

12

64

Roll UpDrill Down

Page 24: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Vertical Summarization

Summarization building upon a single dimensional theme: Monthly renters• Total # of all renters • Total # of new renters• Total rental income Monthly sales• Staff name• Total sales $• Total houses sold

Page 25: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Step 6: Rounding Out The Dimension Tables

Text descriptions are added to the dimension tables.

Text descriptions should be as intuitive and understandable to the users as possible.

Usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables.

Page 26: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Step 7: Choosing The Duration Of The Database

Duration measures how far back in time the fact table goes. For ex. Insurance &Tax Considerations.

Very large fact tables raise at least two very significant data warehouse design issues. – Often difficult to source increasing old data. – It is mandatory that the old versions of the

important dimensions be used, not the most current versions, known as the ‘Slowly Changing Dimension’ problem.

Page 27: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Step 8: Tracking Slowly Changing Dimensions

Slowly changing dimension problem means that the proper description of the old dimension data must be used with old fact data.

Often, a generalized key must be assigned to important dimensions in order to distinguish multiple snapshots of dimensions over a period of time.

Page 28: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Step 9: Deciding The Query Priorities And The Query Modes

Most critical physical design issues affecting the end-user’s perception includes:– physical sort order of the fact table on disk– presence of pre-stored summaries or aggregations.

Additional physical design issues include administration, backup, indexing performance, and security.

Page 29: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Criteria For Assessing The Dimensionality Of A Data Warehouse

Criteria proposed by Ralph Kimball to measure the extent to which a system supports the dimensional view of data warehousing.

Twenty criteria divided into three broad groups: architecture, administration, and expression.

Page 30: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Architectural CriteriaArchitectural criteria describes way the entire system is organized.

– Explicit declaration – Conformed dimensions and facts – Dimensional integrity– Open aggregate navigation– Dimensional symmetry– Dimensional scalability– Sparsity tolerance

Page 31: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Administration CriteriaAdministration criteria are considered to be essential to the ‘smooth running’ of a dimensionally-oriented data warehouse.

– Graceful modification – Dimensional replication – Changed dimension notification– Surrogate key administration– International consistency

Page 32: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Expression CriteriaExpression criteria are mostly analytic capabilities that are needed in real-life situations.

– Multiple-dimension hierarchies – Ragged-dimension hierarchies– Multiple valued dimensions– Slowly changing dimensions – Roles of a dimension– Hot-swappable dimensions– On-the-fly fact range dimension– On-the-fly behaviour dimension

Page 33: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Business Requirements (Use Automaker Sales as an example)

In order to get an idea of the data to be used by the sales andInventory department, a facilitation session was held with15 key end users and the IT data warehouse team. The following business questions were generated from that meeting:

What is the sales trend in quantity and dollar amounts sold each Make, Model, Series and Color for a specific dealer, for each

Page 34: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Matching User Requirements to DW Data Requirements (Develop Fact Table)

Primary Key dealer_id month_year sales_area_id make model series

Page 35: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Determine Dimensions & Attributes

Dimensions sales_area_dim sales_time_dim dealer_dimAttributes dealer_mms_sales_qty dealer_mms_sales_dollar_amt dealer_ytd_mms_sales_qty dealer_ytd_mms_sales_amt dealer_inventory_qty

Page 36: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Determine Dimensions & Attributes

Page 37: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Collecting The Business Requirements via JAD Sessions

JAD (Joint Application Development) vs. Traditional Way of

Gathering Requirements JAD sessions (also called facilitated session) are used to gather

information and feedback and confirm the results of requirements gathering.

JAD sessions replace the traditional way of conducting a series of interviews on a one-to-one basis with the users.

Advantages: Achieving consensus during the session when multiple sources of information exist, raising and addressing issues or assigning them for resolution, and immediately confirming information.

Page 38: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

JAD Session

JAD sessions are used to scope the project. Each session should last two to three day. They are very focused and fast-paced.

JAD sessions can be very formal and follow strict guidelines or be informal group sessions.

Page 39: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

JAD - Roles

Whether they are formal or informal, there are four necessary roles to be filled:

Facilitator The Facilitator is the session leader. It is the facilitator’s responsibility to ensure that the objectives of the sessions are met.

Scribes(s)Scribes are responsible for recording the minutes of the session and optionally constructing deliverables using an automated tool as the session progresses.

Page 40: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

JAD - Roles

User The users provide knowledge specific to the scope of the project.

DevelopersDevelopers are the team members who will be building the system.

Page 41: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

JAD Session

The session is divided into three segments:

Introduction: Welcoming remarks; description of the facilities such as rest room locations, messages, reviewing the agenda and setting expectations.

Conducting the session: To confirm deliverables set out in the session objectives.

Wrapping up the session: By summarizing progress towards the objectives; reviewing the agenda for the next one and obtaining feedback from the participants.

Page 42: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

JAD Session

Potential drawbacks

The commitment of a large block of time for all participants

Requirements collected could be less than satisfactory due to unpredictability of the JAD session or organizational culture not sufficiently developed to enable the concerted efforts required to be productive in a JAD setting.

Page 43: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Five Steps in Interview Preparation Reading background material Establishing interview objectives

Deciding when to interview

Preparing the interviewee

Deciding on question type and structure

Page 44: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Two Types of Questions – Open-End Questions vs. Closed Questions

Open-ended interview questions Open describes the interviewee’s options for responding.They are open.

Advantages: Putting the interviewee at ease Allowing more spontaneity

Disadvantages: Possibly losing control of the interview May not get the types of answers you want

Page 45: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Two Types of Questions – Open-End Questions vs. Closed Questions

Closed interview questions Such as “ How many subordinates do you have?

Benefits: Getting to relevant data Keeping control over the interview

Drawbacks: Failing to obtain rich detail Intimidating the interviewee

Page 46: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Three Basic Ways of Structuring Interviews

Pyramid Structure: Starting from closed questions, then gradually expand into open territory.

Funnel Structure: The reverse of pyramid structure approach.

Diamond-Shaped: A combination of the two above structures.

Page 47: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

The Needs for Sampling Containing costs

Speeding up the data gathering

Improving effectiveness

Reducing bias

Page 48: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Sampling Design

Four steps: Determine the data to be collected or described

Determine the population to be sampled

Choose the type of sample

Decide on the sample size

Page 49: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Kinds of Information Sought in Investigation

Type of hard data (other than interviewing and observation) - Quantitative Data

Reports for decision making Performance reports Records Data capture forms

Page 50: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Kinds of Information Sought in Investigation

Qualitative Data

Memos Signs in bulletin boards and in work areas Corporate websites Manuals Policy handbooks

Page 51: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

The Architectural Plan (Non-Functional Requirements)

• Client / Server Architecture• Data Warehouse Parallel Database Technology• RAID Technology

Relevant Architecture Concepts Impacted by Requirements

Page 52: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Client / Server Architecture

Desktop PC runs "X" windows

HP Unix Server runs the main application logic

Sequent Server handles the data storage and data retrieval requests

Application Data

SQL

Machine Configuration Example

Page 53: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Client / Server architecture provides flexibility to support different combinations of host machine configurations.

User Interface Client

ApplicationLogicServer

Data Server

machine 2machine 1 machine 3

query request

Network Message Count

1 - Request1 - Results

Network Message Count

1000 - SQL statements1000 - SQL responses__________________2000 - Total

SQL statements

SQL responsesquery results

Client / Server ArchitectureExample 1

3 Tier hosting configuration for Typical Queries (or any process that makes a high volume of DB calls) suffers in performance due to Network Messaging overhead.

Page 54: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

2 Tier hosting configuration supports the performance requirements of Typical Queries by eliminating 99.8% of Network Messaging overhead.

User Interface Client

ApplicationLogicServer

Data Server

machine 2machine 1

query request

Network Message Count

1 - Request1 - Results

SQL statements

SQL responsesquery results

Client / Server ArchitectureExample 2

Page 55: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Data Warehouse Parallel Database Technologies

Shared memory architecture (SMP)– All the servers share all the data

Shared nothing architecture (MPP)– Each server has its own partition of data

Page 56: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

RAID Technology RAID 0: Sector interleave, no error checking (no

redundancy) RAID 1: Mirroring (duplicate copy) RAID 2: Bit interleave with error correction codes on

multiple drives RAID 3: Bit interleave with error correction on single

drive RAID 4: Sector interleave with dedicated parity drive RAID 5: Sector interleave, parity stored on all drives

Page 57: 1 Chapter 5 Analyzing and Defining Business Requirements for a Data Warehouse Paul K Chen Data Warehouse Fundamentals

Tugas

Jelaskan apa yang dimaksud dengan bussines requirement?

Apa yang Anda lakukan dalam bussiness requirement?