bi capacity planning
Post on 19-Jun-2015
Embed Size (px)
DESCRIPTIONTraditional operational views of capacity planning is not the same as BI Capacity planning. I created this presentation to help establish a BI Infrastructure Capacity planning process.
- 1. BI Infrastructure Capacity Planning Approach July 2009 Michael Thompson [email_address]
2. Background BI Capacity Planning
- In the past, Business Intelligence workloads were viewed as non-essential, discretionary work and consequently have often been given a low priority when planning computing resource requirements. Today, however, we have come to view BI systems as equal in importance to operational systems.This presents challenges.
- Traditional operational views of capacity planning is not the same as BI Capacity planning.Traditional workload in operational systems tend to grow predictably, in a linear fashion, whereas workload in BI capacity planning is less predictable. In summary, the differences are:
TRADITIONAL WORKLOAD BI WORKLOAD Small units of work with consistent elapsed times (which are usually very short)Units of work are heterogeneous in nature, varying in elapsed times from sub-second to many hours Predictable access paths to the data, using direct index lookup with virtually no scanning of large numbers of records There are unpredictable access paths, sometimes using indices but frequently scanning very large volumes (gigabytes and terabytes) of data Very small answer sets (often a single row), requiring little I/O Very large answer sets (millions of rows) are common, requiring a lot of concurrent I/O that affects elapsed times and resource consumption Simple SQL, easily optimized and rarely requiring parallelism Frequently there is complex SQL that is difficult to optimize and heavily dependent on parallelism, capable of consuming all available resources for extended periods of time Users have little or no control over the SQL that is executed Users generate their own SQL, with unpredictable selections 3. Capacity Planning Methodology 4.
- The Goals of the BI Capacity Planning Exercise :
- Most systems will respond to increased load with some degree of decreasing performance. A system's ability to accept higher load is called scalability, and modifying a system to handle a higher load is synonymous to performance tuning.The following are the goals of the BI Capacity Planning Exercise
- Determine current system scalability needs
- Proactively identify problematic user activity and query performance issues potentials
- Optimize the data warehouse, systems, and environment based on user and application activity
- Develop future state view of expected growth activities (user, query, data)
- Conducting a BI Capacity Planning Exercise is a 5 step process that consists of:
- Step 1: Assemble BI Capacity Team
- Step 2: Profiling
- Step 3: Develop Growth Model
- Step 4: Conduct Sizing Model / Estimation Exercise
- Step 5: Recommendation
Approach: BI Capacity Planning 5.
- Step1: Assemble the BI Capacity Planning Team
- Different professionals are involved in the Capacity Planning Exercise.
- Hardware Capacity Planner responsible for developing the hardware capacity plan
- Storage Specialist understand storage requirements of the warehouse and data marts
- Database Administrator uses formulas to calculate size of tables and optimize data architecture
- Developer conducts the proof of concept work
- LOB Representative develop current profile of user base and expected growth goals
Approach: BI Capacity Planning 6.
- Step 2: Profiling
- The first step to capacity planning is to create a system profile. Generally, to produce a credible capacity forecast, you should plan for the peaks rather than the average times of resource consumption.The following areas will be reviewed during peak times:
- System characterization- create a high-level profile of the capacity requirements of a BI environment. This identifies the analysis period to be used as a basis for capacity planning of a workload.
- Workload characterization- determine the peak processing current workload
- Data Profile- quantify the relationship between the data and disk installed, determine the amount of raw data in the database subsystem at peak processing times
- User Profile develop current profile of user base:
- a) operational user
- b) analytic user
- c) data miner
- Query Profile profile high frequency queries and categorize query types by: trivial, small, medium, large
- Data Usage Patterns profile the most actively used data elements and characterize data elements by: trivial, small, medium, high usage patterns
Approach: BI Capacity Planning 7. Approach: BI Capacity Planning 8. Approach: BI Capacity Planning
- Step 3: Develop the growth model
- When considering the growth of a BI growth workload, the planning model will consider:
- Data Growth -In general, the impact of growing the data will have significantly different impacts on the CPU requirements for the various types of queries in the workload. Trivial and small queries are hardly impacted with data growth since they often use highly efficient access paths to locate the data in the tables. Large and x-large queries will most likely scan more data to locate the needed records and are significantly impacted when the data is grown.
- User Growth The type of user growth is as important as the number.Using simple numbers, like 20% growth of users, without understanding where they fit in the query profile spectrum -a) operational users, b) analytic users, c) data mining users needs to be understood.Also, the growth profile must take into consideration that an operational user may migrate to an analytics user or data mining user.
- Query Growth- The number of queries executing is related to the users that have access to a system, and it is that relationship that determines the increase in the workload and processing requirements.It is important to discuss this with the end-user department representative, to establish the user-query relationship.
9. Approach: BI Capacity Planning
- Step 4: Sizing Modeling / Estimation
- Using the modeling inputs from the previous two steps, a sizing estimation of the environment will be conducted.Two methods will be used
- Estimator Tools (DB2 Estimator, Appfluent)
- Two predictive modeling tools will use the inputs of the previous steps will estimate that estimates the CPU cost and elapsed time associated with the execution of specific SQL statements under various scenarios.
- Proof of Concept
- Using the ideas gathered during the profiling review, the Proof of Concept tests the ideas using actual data and stress test scenarios.The proof of concept is the best method of testing expected outcome.
10. Execution Timeline Planning Profiling Develop Growth Formulas Estimator Tools Results Recomm-endation Future State Definition Proof of Concept Work