bi capacity planning

10
BI Infrastructure Capacity Planning Approach July 2009 Michael Thompson [email protected]

Upload: mstmike

Post on 19-Jun-2015

1.854 views

Category:

Documents


1 download

DESCRIPTION

Traditional operational views of capacity planning is not the same as BI Capacity planning. I created this presentation to help establish a BI Infrastructure Capacity planning process.

TRANSCRIPT

Page 1: Bi Capacity Planning

BI Infrastructure Capacity Planning Approach

July 2009

Michael Thompson

[email protected]

Page 2: Bi Capacity Planning

Background – BI Capacity Planning

In the past, Business Intelligence workloads were viewed as non-essential, discretionary work and consequently have often been given a low priority when planning computing resource requirements. Today, however, we have come to view BI systems as equal in importance to operational systems. This presents challenges.

Traditional operational views of capacity planning is not the same as BI Capacity planning. Traditional workload in operational systems tend to grow predictably, in a linear fashion, whereas workload in BI capacity planning is less predictable. In summary, the differences are:

 TRADITIONAL WORKLOAD BI WORKLOAD

Small units of work with consistent elapsed times (which are usually very short)

Units of work are heterogeneous in nature, varying in elapsed times from sub-second to many hours

Predictable access paths to the data, using direct index lookup with virtually no scanning of large numbers of records

There are unpredictable access paths, sometimes using indices but frequently scanning very large volumes (gigabytes and terabytes) of data

Very small answer sets (often a single row), requiring little I/O Very large answer sets (millions of rows) are common, requiring a lot of concurrent I/O that affects elapsed times and resource consumption

Simple SQL, easily optimized and rarely requiring parallelism Frequently there is complex SQL that is difficult to optimize and heavily dependent on parallelism, capable of consuming all available resources for extended periods of time

Users have little or no control over the SQL that is executed Users generate their own SQL, with unpredictable selections

Page 3: Bi Capacity Planning

Capacity Planning Methodology

Page 4: Bi Capacity Planning

The Goals of the BI Capacity Planning Exercise :

Most systems will respond to increased load with some degree of decreasing performance. A system's ability to accept higher load is called scalability, and modifying a system to handle a higher load is synonymous to performance tuning. The following are the goals of the BI Capacity Planning Exercise

– Determine current system scalability needs– Proactively identify problematic user activity and query performance issues potentials– Optimize the data warehouse, systems, and environment based on user and application

activity – Develop future state view of expected growth activities (user, query, data)

Conducting a BI Capacity Planning Exercise is a 5 step process that consists of:

Step 1: Assemble BI Capacity TeamStep 2: ProfilingStep 3: Develop Growth ModelStep 4: Conduct Sizing Model / Estimation ExerciseStep 5: Recommendation

Approach: BI Capacity Planning

Page 5: Bi Capacity Planning

Step1: Assemble the BI Capacity Planning Team

Different professionals are involved in the Capacity Planning Exercise.

– Hardware Capacity Planner – responsible for developing the hardware capacity plan– Storage Specialist – understand storage requirements of the warehouse and data marts– Database Administrator – uses formulas to calculate size of tables and optimize data

architecture– Developer – conducts the proof of concept work– LOB Representative – develop current profile of user base and expected growth goals

Approach: BI Capacity Planning

Page 6: Bi Capacity Planning

Step 2: Profiling

The first step to capacity planning is to create a system profile. Generally, to produce a credible capacity forecast, you should plan for the peaks rather than the “average” times of resource consumption. The following areas will be reviewed during peak times:

– System characterization - create a high-level profile of the capacity requirements of a BI environment. This identifies the analysis period to be used as a basis for capacity planning of a workload.

– Workload characterization - determine the peak processing current workload– Data Profile - quantify the relationship between the data and disk installed, determine the amount of

raw data in the database subsystem at peak processing times– User Profile – develop current profile of user base:

a) operational user b) analytic userc) data miner

– Query Profile – profile high frequency queries and categorize query types by: trivial, small, medium, large

– Data Usage Patterns – profile the most actively used data elements and characterize data elements by: trivial, small, medium, high usage patterns

Approach: BI Capacity Planning

Page 7: Bi Capacity Planning

Approach: BI Capacity Planning

Page 8: Bi Capacity Planning

Approach: BI Capacity Planning

Step 3: Develop the growth model

When considering the growth of a BI growth workload, the planning model will consider:

– Data Growth - In general, the impact of growing the data will have significantly different impacts on the CPU requirements for the various types of queries in the workload. Trivial and small queries are hardly impacted with data growth since they often use highly efficient access paths to locate the data in the tables. Large and x-large queries will most likely scan more data to locate the needed records and are significantly impacted when the data is grown.

– User Growth – The type of user growth is as important as the number. Using simple numbers, like 20% growth of users, without understanding where they fit in the query profile spectrum - a) operational users, b) analytic users, c) data mining users – needs to be understood. Also, the growth profile must take into consideration that an operational user may migrate to an analytics user or data mining user.

– Query Growth - The number of queries executing is related to the users that have access to a system, and it is that relationship that determines the increase in the workload and processing requirements. It is important to discuss this with the end-user department representative, to establish the user-query relationship.

Page 9: Bi Capacity Planning

Approach: BI Capacity Planning

Step 4: Sizing Modeling / Estimation

Using the modeling inputs from the previous two steps, a sizing estimation of the environment will be conducted. Two methods will be used

Estimator Tools (DB2 Estimator, Appfluent)

Two predictive modeling tools will use the inputs of the previous steps will estimate that estimates the CPU cost and elapsed time associated with the execution of specific SQL statements under various scenarios.

Proof of Concept

Using the ideas gathered during the profiling review, the Proof of Concept tests the ideas using actual data and stress test scenarios. The proof of concept is the best method of testing expected outcome.

Page 10: Bi Capacity Planning

Execution Timeline

Planning Profiling

Develop Growth Formulas

Estimator Tools Results

Recomm-endation

Future State

Definition

Proof of Concept Work