1 chapter 5 analyzing and defining business requirements for a data warehouse paul k chen data...
DESCRIPTION
Definition of The Business Requirements u The definition of requirements is the user’s statement of how he or she wants to do business, and the information required to support his or her new methods of operations.TRANSCRIPT
1
Chapter 5
Analyzing and Defining Business Requirements for
a Data Warehouse
Paul K Chen
Data Warehouse Fundamentals
Chapter 5- Objectives Learn what is the definition of the business
requirements Understand the role of business dimensions related to
DW business requirements Learn specifically the steps in defining and recording
DW business requirements Review methods for gathering requirements (JAD,
Interviews and Sampling) Discuss briefly architecture concepts impacted by
business requirements
Definition of The Business Requirements
The definition of requirements is the user’s statement of how he or she wants to do business, and the information required to support his or her new methods of operations.
Definition of The Business Requirements
The requirements can be broadly divided into two areas:
1 Functional requirements—written in user terminology since it is user operations that are being described.
2 Non-functional requirements –these are the limitations and demands imposed upon the computing solutions; such as architectural plan, data storage specifications and information system performance expectations.
Requirements As the Driving Force for Data Warehousing
Understand why business requirements are the driving force
Discuss how requirements drive every development phase
Specifically Learn how requirements influence data design
Review the impact of requirements on architecture Note the special considerations for ETL and metadata Examine how requirements shape information delivery
Business Requirements As the Driving Force
Business Requirements
Planning &Management
DesignArchitecture Infrastructure
Data AcquisitionData Storage
Information Delivery
ConstructionArchitecture Infrastructure
Data AcquisitionData Storage
Information Delivery
Maintenance
Deployment
Dimensional Nature of Business Data
Product
TimeGeography
The business data of sales units (fact) is measured and analyzed in three dimensional.
Examples of Business Facts and Dimensions
Supermarket ChainSupermarket Chain Manufacturing Company
Airline CompanyInsurance Business
Time Promotion
ProductStoreShipment
Time Cust-ship-toShip from
Ship ModeProductDeal
Time Customer
Flight
Fare Class
AirportStatus
Sale Unit
FrequentFlyer Flights
Time Agent
ClaimClaims Insured
PartyStatus Policy
Defining and Recording Information Requirements for a Data Warehouse
Nine-Step Methodology includes the following steps:Step 1: Choosing the process Step 2: Choosing the grain Step 3: Identifying and conforming the dimensions Step 4: Choosing the facts Step 5: Storing pre-calculations in the fact table Step 6: Rounding out the dimension tables Step 7: Choosing the duration of the database Step 8: Tracking slowly changing dimensions Step 9: Deciding the query priorities and the query
modes.
Step 1: Choosing The Process (Subject Area)
The process (function) refers to the subject matter of a particular data mart.
First data mart built should be the one that is most likely to be delivered on time, within budget, and to answer the most commercially important business questions.
Subject Area
Selecting the first subject area or areas to be populated Use the enterprise level data model in selecting
appropriate subject area(s) Three Options: -- Implement a single subject area (best option) -- Implement a subset of a subject area -- Implement a subset of several subject areas (most
common)• Determine how much data should be loaded and its
variety
Step 2: Choosing The Grain
Decide what a record of the fact table is to represent.
Identify dimensions of the fact table. The grain decision for the fact table also determines the grain of each dimension table.
Also include time as a core dimension, which is always present in star schemas. Due to disk space constraint, data selected must be time relevant in terms of trend, predictability, and profitability for the enterprise.
Step 3: Identifying And Conforming The Dimensions
Dimensions set the context for asking questions about the facts in the fact table.
If any dimension occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other.
A dimension used in more than one data mart is referred to as being conformed (shared).
Step 4: Choosing The Facts
The grain of the fact table determines which facts can be used in the data mart.
Facts should be numeric and additive.
Unusable facts include:– non-numeric facts– non-additive facts– fact at different granularity from other facts in
table.
Fact Criteria
Weight the Fact attributes based upon the following criteria: They exhibit measurable results to the Users and
Management.
• They are visible within the business and through management.
• They are manageable.
Subject Area
Subject areas are collections of like data that support analysis of the major subjects in a business. Election criteria:
They consist of two or more attributes.
• They are essential to the successful operation of the target system or business area to meet client objectives.
• They can be defined by governing business rules.
Step 5: Storing Pre-Calculations In The Fact Table
Once the facts have been selected each should be re-examined to determine whether there are opportunities to use pre-calculations.
ADD DERIVED DATA
• Benefits Less space used Enhanced performance Breaking_lease Percentage_of_breaking_lease (< 3_months) Percentage_of_breaking_lease (>3 but < 6 months) Percentage_of_breaking_lease (>6 but <9 months) Percentage_of_breaking_lease (>9 but <12 months) Percentage_of_breaking_lease (> 12 months)
Add Summarization Schemes
Simple summation
Summation by group
Aggregation
Vertical summarization
Simple Summation --Add Summarization Schema
Individual Daily salesDate Product Qty Sales $Jan 1 nuts 100 300Jan 1 nuts 200 600
Jan 2 nuts 300 900Jan 2 nuts 100 300
Jan 3 Nuts 50 150Jan 3 Nuts 40 120
Daily Sales Summary
Date Product Qty Sales $Jan 1 Nuts 300 900Jan 2 Nuts 400 1,200Jan 3 Nuts 90 20
Summation By Group
Group data attributes based on usage and stability.
• Group stable and slowly changing data all in one table
• Group unstable and frequently changing data all in another table
Aggregation
Aggregation is used to create data marts.
For instance, a group of users frequently perform analysis comparing sales across geographic regions, broken by product line. If a data mart were created that stores the sales data already aggregated to the desired level, the users’ queries would be simpler.
Aggregation
Add up amounts by dayin sql: SELECT date, sum (amt) FROM SALE GROUP BY date
sale p Storedateamt
p1p2p1p2
c1c2c3c1
1121
1243
ans
date sum
12
64
Roll UpDrill Down
Vertical Summarization
Summarization building upon a single dimensional theme: Monthly renters• Total # of all renters • Total # of new renters• Total rental income Monthly sales• Staff name• Total sales $• Total houses sold
Step 6: Rounding Out The Dimension Tables
Text descriptions are added to the dimension tables.
Text descriptions should be as intuitive and understandable to the users as possible.
Usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables.
Step 7: Choosing The Duration Of The Database
Duration measures how far back in time the fact table goes. For ex. Insurance &Tax Considerations.
Very large fact tables raise at least two very significant data warehouse design issues. – Often difficult to source increasing old data. – It is mandatory that the old versions of the
important dimensions be used, not the most current versions, known as the ‘Slowly Changing Dimension’ problem.
Step 8: Tracking Slowly Changing Dimensions
Slowly changing dimension problem means that the proper description of the old dimension data must be used with old fact data.
Often, a generalized key must be assigned to important dimensions in order to distinguish multiple snapshots of dimensions over a period of time.
Step 9: Deciding The Query Priorities And The Query Modes
Most critical physical design issues affecting the end-user’s perception includes:– physical sort order of the fact table on disk– presence of pre-stored summaries or aggregations.
Additional physical design issues include administration, backup, indexing performance, and security.
Criteria For Assessing The Dimensionality Of A Data Warehouse
Criteria proposed by Ralph Kimball to measure the extent to which a system supports the dimensional view of data warehousing.
Twenty criteria divided into three broad groups: architecture, administration, and expression.
Architectural CriteriaArchitectural criteria describes way the entire system is organized.
– Explicit declaration – Conformed dimensions and facts – Dimensional integrity– Open aggregate navigation– Dimensional symmetry– Dimensional scalability– Sparsity tolerance
Administration CriteriaAdministration criteria are considered to be essential to the ‘smooth running’ of a dimensionally-oriented data warehouse.
– Graceful modification – Dimensional replication – Changed dimension notification– Surrogate key administration– International consistency
Expression CriteriaExpression criteria are mostly analytic capabilities that are needed in real-life situations.
– Multiple-dimension hierarchies – Ragged-dimension hierarchies– Multiple valued dimensions– Slowly changing dimensions – Roles of a dimension– Hot-swappable dimensions– On-the-fly fact range dimension– On-the-fly behaviour dimension
Business Requirements (Use Automaker Sales as an example)
In order to get an idea of the data to be used by the sales andInventory department, a facilitation session was held with15 key end users and the IT data warehouse team. The following business questions were generated from that meeting:
What is the sales trend in quantity and dollar amounts sold each Make, Model, Series and Color for a specific dealer, for each
Matching User Requirements to DW Data Requirements (Develop Fact Table)
Primary Key dealer_id month_year sales_area_id make model series
Determine Dimensions & Attributes
Dimensions sales_area_dim sales_time_dim dealer_dimAttributes dealer_mms_sales_qty dealer_mms_sales_dollar_amt dealer_ytd_mms_sales_qty dealer_ytd_mms_sales_amt dealer_inventory_qty
Determine Dimensions & Attributes
Collecting The Business Requirements via JAD Sessions
JAD (Joint Application Development) vs. Traditional Way of
Gathering Requirements JAD sessions (also called facilitated session) are used to gather
information and feedback and confirm the results of requirements gathering.
JAD sessions replace the traditional way of conducting a series of interviews on a one-to-one basis with the users.
Advantages: Achieving consensus during the session when multiple sources of information exist, raising and addressing issues or assigning them for resolution, and immediately confirming information.
JAD Session
JAD sessions are used to scope the project. Each session should last two to three day. They are very focused and fast-paced.
JAD sessions can be very formal and follow strict guidelines or be informal group sessions.
JAD - Roles
Whether they are formal or informal, there are four necessary roles to be filled:
Facilitator The Facilitator is the session leader. It is the facilitator’s responsibility to ensure that the objectives of the sessions are met.
Scribes(s)Scribes are responsible for recording the minutes of the session and optionally constructing deliverables using an automated tool as the session progresses.
JAD - Roles
User The users provide knowledge specific to the scope of the project.
DevelopersDevelopers are the team members who will be building the system.
JAD Session
The session is divided into three segments:
Introduction: Welcoming remarks; description of the facilities such as rest room locations, messages, reviewing the agenda and setting expectations.
Conducting the session: To confirm deliverables set out in the session objectives.
Wrapping up the session: By summarizing progress towards the objectives; reviewing the agenda for the next one and obtaining feedback from the participants.
JAD Session
Potential drawbacks
The commitment of a large block of time for all participants
Requirements collected could be less than satisfactory due to unpredictability of the JAD session or organizational culture not sufficiently developed to enable the concerted efforts required to be productive in a JAD setting.
Five Steps in Interview Preparation Reading background material Establishing interview objectives
Deciding when to interview
Preparing the interviewee
Deciding on question type and structure
Two Types of Questions – Open-End Questions vs. Closed Questions
Open-ended interview questions Open describes the interviewee’s options for responding.They are open.
Advantages: Putting the interviewee at ease Allowing more spontaneity
Disadvantages: Possibly losing control of the interview May not get the types of answers you want
Two Types of Questions – Open-End Questions vs. Closed Questions
Closed interview questions Such as “ How many subordinates do you have?
Benefits: Getting to relevant data Keeping control over the interview
Drawbacks: Failing to obtain rich detail Intimidating the interviewee
Three Basic Ways of Structuring Interviews
Pyramid Structure: Starting from closed questions, then gradually expand into open territory.
Funnel Structure: The reverse of pyramid structure approach.
Diamond-Shaped: A combination of the two above structures.
The Needs for Sampling Containing costs
Speeding up the data gathering
Improving effectiveness
Reducing bias
Sampling Design
Four steps: Determine the data to be collected or described
Determine the population to be sampled
Choose the type of sample
Decide on the sample size
Kinds of Information Sought in Investigation
Type of hard data (other than interviewing and observation) - Quantitative Data
Reports for decision making Performance reports Records Data capture forms
Kinds of Information Sought in Investigation
Qualitative Data
Memos Signs in bulletin boards and in work areas Corporate websites Manuals Policy handbooks
The Architectural Plan (Non-Functional Requirements)
• Client / Server Architecture• Data Warehouse Parallel Database Technology• RAID Technology
Relevant Architecture Concepts Impacted by Requirements
Client / Server Architecture
Desktop PC runs "X" windows
HP Unix Server runs the main application logic
Sequent Server handles the data storage and data retrieval requests
Application Data
SQL
Machine Configuration Example
Client / Server architecture provides flexibility to support different combinations of host machine configurations.
User Interface Client
ApplicationLogicServer
Data Server
machine 2machine 1 machine 3
query request
Network Message Count
1 - Request1 - Results
Network Message Count
1000 - SQL statements1000 - SQL responses__________________2000 - Total
SQL statements
SQL responsesquery results
Client / Server ArchitectureExample 1
3 Tier hosting configuration for Typical Queries (or any process that makes a high volume of DB calls) suffers in performance due to Network Messaging overhead.
2 Tier hosting configuration supports the performance requirements of Typical Queries by eliminating 99.8% of Network Messaging overhead.
User Interface Client
ApplicationLogicServer
Data Server
machine 2machine 1
query request
Network Message Count
1 - Request1 - Results
SQL statements
SQL responsesquery results
Client / Server ArchitectureExample 2
Data Warehouse Parallel Database Technologies
Shared memory architecture (SMP)– All the servers share all the data
Shared nothing architecture (MPP)– Each server has its own partition of data
RAID Technology RAID 0: Sector interleave, no error checking (no
redundancy) RAID 1: Mirroring (duplicate copy) RAID 2: Bit interleave with error correction codes on
multiple drives RAID 3: Bit interleave with error correction on single
drive RAID 4: Sector interleave with dedicated parity drive RAID 5: Sector interleave, parity stored on all drives
Tugas
Jelaskan apa yang dimaksud dengan bussines requirement?
Apa yang Anda lakukan dalam bussiness requirement?