capacity management for large virtual server estates a rationalized approach copyright 2014, perfcap...

32
Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Upload: darrell-williamson

Post on 18-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Capacity Management for Large Virtual Server Estates

A Rationalized Approach

Copyright 2014, PerfCap Corporation

Page 2: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

The Capacity PlanningDilemmaComputing style shift: complex distributed systems

“Real CP” too expensive/complex & “Cycles are free”

Vast estates of underutilized systems Real capacity planning marginalized

Little or no capacity planning and little cost control

Reactive capacity management

Expensive “Black Swans”

capacity

cost

workload

CM for Large Server Estates 2

Page 3: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Applications and Infrastructure

Trade Finance

Retail Banking

Cash Management

Trust & Securities

SecuritiesOrigination,

Sales, Trading

Corporate Advisory

FOREX Trading

Mutual Funds Investments

Alternative Investments

(RREEF)

Institutional Asset

Management

Insurance Asset

Management

Online Banking

ETFs

Multitude of applications share same IT infrastructure. Each application has its particular capacity management needs.

IT managers struggling to balance costs and performance.

CM for Large Server Estates 3

Page 4: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Monitor Key Performance Indicators Select and define KPI thresholds which

suggest performance problems Alert and trigger investigation when KPI

thresholds are crossed. Attempt to predict future behavior of KPIs

based on past history Determine risk by predicted time to failure Trigger investigation and corrective action

in a timely fashion

PM/CP Process

Reactive analysis

Proactive management

CM for Large Server Estates 4

Page 5: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

The Problem

• Automate monitoring of performance data• Automate risk evaluation• Automate timely triggers for capacity investigation• Selectively perform in-depth capacity planning

How do you do capacity management for a large server estate?

CM for Large Server Estates 5

Page 6: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Visualize performance, capacity and risk status of all distributed application services in a single enterprise-wide view

Go beyond simplistic trending to projections of actual system responsiveness reflecting end-user satisfaction

Do realistic capacity planning with limited business forecasts

A solution that scales from 10s to 10,000s of servers

The Challenges

CM for Large Server Estates 6

Page 7: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Automated Solution Uses New: Methodology - Risk Analysis Metric - Headroom Risk Visualization Format

Status Dashboards Enterprise-wide rollup status (by service,

business, etc.) Transition Reports

CM for Large Server Estates 7

Page 8: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Automated Collection and Analysis

Internet

AnalysisCMDB

hypervisors

PhysicalServers

Storage Arrays

VMs

Array Console

Networks Storage

Events

Trending

Clusters

Real Time

Applications

Performance/Capacity

Reports

Risk Dashboards

Notifications

CM for Large Server Estates 8

Page 9: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Breakthrough

Maximum

Current Risk Status Color

Transa

ctio

n R

esp

onse

Tim

e

Time : Days/Weeks/Months

Lead Time Lead

Time

Automated Risk AnalysisUsing Common KPIs

CM for Large Server Estates 9

Page 10: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Application Performance

The key issue of application performance is responsiveness.

e.g. transaction response time, batch turnaround time, end-to-end

processing time, time to db update, trade execution time, etc.

CM for Large Server Estates 10

Page 11: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Response Time vs KPI

CM for Large Server Estates 11

Page 12: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Application Response Time Changes

As Workload Changes

0

100

200

300

400

500

600

700

800

900

1000

0 1 2 3 4 5 6 7 8 9 10

Transactions/second

Resp

onse

Tim

e (m

s)

.

CM for Large Server Estates 12

Page 13: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Using Trending to Determine Capacity

0

100

200

300

400

500

600

700

800

900

1000

0 2 4 6 8 10 12 14 16 18 20

Transactions/second

Resp

onse

Tim

e (m

s)

.

If acceptable response time should not exceed 600 ms, then application load capacity should not exceed 19 transactions / second.

Estimated application capacity is 19 trans/sec

CM for Large Server Estates 13

Page 14: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Application PerformanceReality vs Linear Trend

0

500

1000

1500

2000

2500

3000

0 2 4 6 8 10 12 14 16 18 20

Transactions/second

Resp

onse

Tim

e (m

s)

.

This is the typical relationship between load and response time. After “knee” of the curve is reached, response time degrades rapidly.

CM for Large Server Estates 14

Page 15: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

True Application Capacity

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 2 4 6 8 10 12

Transactions/second

Resp

onse

Tim

e (m

s)

.

Actual application capacity is 9 trans/sec

True capacity is not maximum sustainable load but maximum load with acceptable performance.

CM for Large Server Estates 15

Page 16: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Capacity Headroom

l

Where do you want to operate?

Current Workload Headroom Saturation Point

Operational Capacity

Workload

Res

pons

e T

ime

Response time is a function of CPU, disk, memory, adapters, etc.

Headroom is the portion of operational capacity remaining.

CM for Large Server Estates 16

Page 17: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Headroom Risk Analysis

CM for Large Server Estates 17

Page 18: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Risk History Dashboard

CM for Large Server Estates 18

Page 19: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Capacity Risk Monitoring

Automated Risk Analysis Computations

Risk Status History Dashboard

Risk Status Dashboards

Automated Color Transition Notification

CM for Large Server Estates 19

Page 20: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

A Tractable Solution Reduces capacity planner’s workload Closer to real user-perceived performance Capacity manage 10,000s of servers

CM for Large Server Estates 20

Page 21: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

VIRTUALIZED INFRASTRUCTURES

Same Issues, New Complexity

CM for Large Server Estates 21

Page 22: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

New Challenges New complexity Hierarchical views / service views What systems virtualized to save cost? Performance/capacity consequences “What-if” provisioning scenarios

CM for Large Server Estates 22

Page 23: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

New Level of Complexity

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Update / rebalancehost hardware

host CaMCycle

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Update / rebalancehost hardware

host CaMCycle

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Update

VM provisioning

VM CaMCycle

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

TimeM

etr

ic

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

TimeM

etr

ic

Update

VM provisioning

VM CaMCycle

. . . for each VM . . .

. . . for each host . . .

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Update / rebalancehost hardware

host CaMCycle

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Update / rebalancehost hardware

host CaMCycle

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Update / rebalancehost hardware

host CaMCycle

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Update / rebalancehost hardware

host CaMCycle

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Update

VM provisioning

VM CaMCycle

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

TimeM

etr

ic

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

TimeM

etr

ic

Update

VM provisioning

VM CaMCycle

. . . for each VM . . .

. . . for each host . . .

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Update

VM provisioning

VM CaMCycle

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

Time

Me

tric

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

Time

Me

tric

Update

VM provisioning

Update

VM provisioning

VM CaMCycle

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

TimeM

etr

ic

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

TimeM

etr

ic

Update

VM provisioning

VM CaMCycle

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

TimeM

etr

ic

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

TimeM

etr

ic

Predict

capacity demand

Physical limit

Breakthrough threshold

Lead time

Lead time

TimeM

etr

ic

Physical limit

Breakthrough threshold

Physical limit

Breakthrough threshold

Lead timeLead timeLead time

Lead timeLead time

TimeM

etr

ic

Update

VM provisioning

Update

VM provisioning

VM CaMCycle

. . . for each VM . . .

. . . for each host . . .

Must do CM on both physical and virtual levels.

CM for Large Server Estates 23

Page 24: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Key Principle

It is essential to provide capacity management from both the perspective of each virtual machine and the perspective of the host systems on which the virtual machines operate.

CM for Large Server Estates 24

Page 25: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Capacity Risk (Two Perspectives)

Enterprise View

Host Views

Data Centre Views

Guest Views

Cluster Views

Serv

ice V

iew

- ER

P

Serv

ice V

iew

- eM

ail

Serv

ice V

iew

– CR

M

Serv

ice V

iew

– HR

CM for Large Server Estates 25

Page 26: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Capacity Risk (Two Perspectives)

CM for Large Server Estates 26

Page 27: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Projected Resource View (Any Level)

London Data Centre, CPU GHz Resource Projections, 31-Dec-2011

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

Total GHz GHz Available Peak GHz Used Average GHz Used

Proj. Total GHz Proj. GHz Available Proj. Peak GHz Used Proj. Average GHz Used

CM for Large Server Estates 27

Page 28: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Underutilized Systems

Maximum 12-month CPU Utilizations

0 10 20 30 40 50 60 70 80 90 100

PRDB-MP05-E00

PRDB-MP01-E00

PRDB-MP02-E00

PRDA-MP05-E00

PRDB-MP04-E00

PRDB-MP07-E00

PRDA-MP04-E00

PRDA-MP02-E00

PRDA-MP07-E00

PRDB-MP06-E00

PRDB-MP09-E00

PRDB-MP12-E00

PRDA-MP06-E00

PRDB-MP03-E00

PRDA-MP09-E00

PRDA-MP12-E00

PRDA-MP03-E00

PRDB-MP11-E00

PRDB-MP08-E00

PRDA-MP11-E00

PRDA-MP08-E00

PRDB-MP10-E00

PRDA-MP10-E00

Extract from CMDB

CM for Large Server Estates 28

Page 29: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Underutilized Risk Color Status

Physical limit

Breakthrough threshold

Lead time

Time

Metr

ic

Underutilized threshold

New risk color

Use a new purple color status to identify virtualization candidates.

CM for Large Server Estates 29

Page 30: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Virtualization Consequences

CM for Large Server Estates 30

Page 31: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Virtualization Consequences

What happens if I move VMs, re-provision VMs, clone VMs, change host hardware, etc.?

CM for Large Server Estates 31

Page 32: Capacity Management for Large Virtual Server Estates A Rationalized Approach Copyright 2014, PerfCap Corporation

Virtual Infrastructure CP Challenges Enterprise-to-host performance and capacity visibility

IT infrastructure servers Distributed application services

Automated performance analysis, advising and modeling

Smooth scaling from 10s to 10,000s of servers “What if” modeling of vSphere clusters and services

CM for Large Server Estates 32