© 2009 ibm corporation aix rightsizing clea zolotow senior technical staff member, ibm corporation...

33
© 2009 IBM Corporation AIX Rightsizing Clea Zolotow Senior Technical Staff Member, IBM Corporation Nicholas Lydakis, Manager, Capacity Planning, WellPoint Corporation June 3, 2011

Upload: pauline-benson

Post on 28-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

© 2009 IBM Corporation

AIX Rightsizing

Clea ZolotowSenior Technical Staff Member, IBM Corporation

Nicholas Lydakis, Manager, Capacity Planning, WellPoint Corporation

June 3, 2011

© 2009 IBM Corporation2

AIX Rightsizing

ABSTRACT

There are many ways to reduce cost in a datacenter. One of the easiest ways to decrease costs is to decrease the number of servers on the floor. Now, along with physical consolidation, we can logically simplify the datacenter by utilizing virtualization.

Some technical barriers to virtualization are Performance concerns – workloads competing for resources; Growth concerns – workloads cannot reserve space for growth; and Architectural constraints – servers run out of IO or memory before they run out of CPU.

This presentation provides an mass analysis methodology to address performance and growth concerns, and architectural constraints as well as methodologies that can be used to coadunate LPARs to achieve higher utilization rates at the hardware level.

This methodology has been quite successful at IBM. Our biggest cost savings was a run rate of $2.4 million yearly in hardware and a $2 million software savings due to decreased engine utilization.

© 2009 IBM Corporation3

AIX Rightsizing

Virtualization = Infrastructure Simplification

Efficient Virtualization provides the best ROI and minimize the RISK

Logical SimplificationMultiple virtual servers (OS’s)

per physical serverSignificant savings – fewer

servers, higher utilization Rapid “provisioning”Automatic workload mgmtPreserve logical “server to

application” relations

Virtualization

Virtual Servers,Storage, Networks

StorageServersNetworking

Physical Consolidation

LinuxServer

Networking

Fewer sitesUse of larger servers / SAN’sMostly environmental savingsDisparate management toolsLabor intense provisioningWorkload mgnt and isolation issues

SAN

Windows Server

Unix Server

Linux Servers

Unix Servers

1 workload per serverManual provisioningNo sharingVertical silo’sDisparate mgmt toolsMultiple sites

ManagementServers

Complex

Networking

Storage

Windows Servers

© 2009 IBM Corporation4

AIX Rightsizing

Virtualization’s popularity today is based on its ability to optimize ITVirtualization’s popularity today is based on its ability to optimize IT

Virtualization has been around for decades

And it is here to stay

Large and small organizations alike are rapidly adopting the technology

Virtualization motivators

Reduce costs 57%

Simplify IT infrastructure and administration 48%

Increase server utilization 48%

Increase scalability of infrastructure 29%

Enhance resiliency and reliability 25%

Improve application performance 15%

Manage a heterogeneous server environment 9%

Source: IBM Systems and Technology Group (1Q06)

Why do organizations adopt virtualization?

For reasons that range from reduced IT costs to simplified IT environments, streamlined management and increased IT flexibility

© 2009 IBM Corporation5

AIX Rightsizing

Each Workload is Evaluated for Suitability Based on Technical Attributes

Priority Workloads for Consolidation:

WebSphere® applications

Domino® Applications

Selected tools: Tivoli®, WebSphere® and internally developed

WebSphere MQ

DB2® Universal Database™

© 2009 IBM Corporation6

AIX Rightsizing

Current Mid-Range Server Location by State – Physical Consolidation Opportunities still exist!

Unix and Intel by Location

California Colorado Connecticut Georgia Illinois

Indiana Kentucky Maine Massachusetts Michigan

Missouri Nevada New Hampshire New York North Carolina

Ohio Texas Virginia West Virgina Wisconsin

© 2009 IBM Corporation7

AIX Rightsizing

Analysis methodology to address performance and growth concerns: Rightsize individual LPARs (CPU and Memory)

Know your current hardware utilization rates and derive potential cost savings to get customer/app owner buy-in.

Rightsize individual LPARs– Initial pass is “perfect world”– Second pass is initial meeting with app

owners.– Third and subsequent passes take into

account most-loved and business critical applications.

Roll out resizing in waves.– Capacity planning has to measure pre- and

post-wave to ensure that there is headroom for processing.

– Find potential resource problems before the app owner

Actual hardware savings is usually 50% or less than perfect world analysis.

Physical Box Busy

8.43

13.66

0 2 4 6 8 10 12 14 16

Non-Production

Production

© 2009 IBM Corporation8

AIX Rightsizing

UNIX Virtualized vs. Non-Virtualized Utilization Large Company – Recent Data

0

10

20

30

40

50

60

70

80

90

0

200

400

600

800

1000

1200

Average CPU Busy 11.88 9.43 12.94 10.85 11.28

Average CPU Max 84.36 71.26 81.81 67.25 76.17

Number of Virtual Machines/LPARs 499 54 406 153 1112

Non-Production LPAR

Non-Production Server

Production LPARProduction

ServerTotal/Averages

© 2009 IBM Corporation9

AIX Rightsizing

Capped and Uncapped Mode

In the configuration of Micro-Partitioning, two types are available, capped and uncapped. The difference is in defining the ability of a partition to use extra capacity available in the system. If a processor donates unused cycles back to the shared pool, or if the system has idle capacity (because there is not enough workload running), the extra cycles may be used by other partitions, depending on their type and configuration.

Capped mode The processing capacity never exceeds the assigned processing capacity.

Uncapped mode The processing capacity may be exceeded when the shared processing pool has available resources.

© 2009 IBM Corporation10

AIX Rightsizing

Capped Mode

A capped partition is defined with a hard maximum limit of processing capacity. That means that it cannot go over its defined maximum capacity in any situation, unless you change the configuration for that partition (either by modifying the partition profile or by executing a dynamic LPAR operation). Even if the system is otherwise idle, the capped partition cannot exceed its entitled capacity.

© 2009 IBM Corporation11

AIX Rightsizing

Uncapped Mode With an uncapped partition, you must specify the uncapped weight of that partition. If multiple uncapped

logical partitions require idle processing units, the managed system distributes idle processing units to the logical partitions in proportion to each logical partition's uncapped weight. The higher the uncapped weight of a logical partition, the more processing units the logical partition gets.

© 2009 IBM Corporation12

AIX Rightsizing

Min, Max and Desired

When assigning processor values you must specify minimum, desired, and maximum values for both processing units and virtual processors.

If any of the three types of resources cannot satisfy the specified minimum and required values, the activation of a partition fails. If the available resources satisfy all the minimum and required values but do not satisfy the desired values, the activated partition will get as many of the resources that are available.

MinProcessing Unit

.1

DesiredProcessing Unit

.5

MaxProcessing Unit

1

MinVirtual CPU

1

DesiredVirtual CPU

1

MaxVirtual CPU

2

The maximum value is used to limit the maximum processor resources when dynamic logical partitioning operations are performed on the partition.

This is the Cap

© 2009 IBM Corporation13

AIX Rightsizing

Physical

Virtual

Engine Type

Minimum Entitlement Maximum

Half of the Physical Entitlement

The average CPU consumed by the LPAR,

or 10% of the Virtual entitlement, whichever is higher. The total of this

number cannot exceed the activated CPUs on the

frame.

Twice the Physical Entitlement

Half of the Virtual Entitlement

Twice the Virtual Entitlement

The maximum of the CPU consumed

by the LPAR * 1.30%.

Rightsizing Methodology: AIX CPU Sizing Parameters (Uncapped)Minimum=the lowest configuration available without rebooting

Physical Entitlement=the starting configuration of the LPAR

Physical Entitlement=the starting configuration of the LPAR

Maximum=the highest configuration available without rebooting

Maximum=the highest configuration available without rebooting

Virtual Entitlement=the maximum the LPAR can receive

Virtual Entitlement=the maximum the LPAR can receive

© 2009 IBM Corporation14

AIX Rightsizing

Rightsizing Methodology: AIX CPU Sizing Parameters (Capped)

Minimum=the lowest configuration available without rebooting

Maximum=the highest configuration available without rebooting

Maximum=the highest configuration available without rebooting

Physical Entitlement=the capacity of the LPAR can receive

Physical Entitlement=the capacity of the LPAR can receive

Physical

Engine Type

Minimum Entitlement Maximum

Half of the Physical Entitlement Twice the Physical

Entitlement

The maximum of the CPU consumed

by the LPAR * 30%. The total of

this number cannot exceed the

activated CPUs on the frame.

© 2009 IBM Corporation15

AIX Rightsizing

Advanced Power Virtualization

AIX 5LV5.2Linux

Hypervisor

Dynamically resizable

2 CPUs

4CPUs

6 CPUs

Lin

ux

Lin

ux

AIX

5L

V5

.3

Virtual I/O paths

AIX

5L

V 5

.3

AIX

5L

V5

.3

AIX

5L

V5

.3

AIX

5L

V5

.3

Micro-Partitioning

ManagerServer

LPAR 2AIX 5L V5.3

LPAR 1AIX 5L V5.2

LPAR 3Linux

PLM partitions Unmanaged partitions

Hypervisor

PLM agent PLM agent

AIX 5LV5.3

6CPUs

Ethernetsharing

Virtual I/O server

partition

Storagesharing

1 CPU

i5/OSV5R3**

1CPU

IVM

Virtual I/O Server– Shared Ethernet – Shared SCSI and

Fibre Channel-attached disk subsystems

– Supports AIX 5L V5.3 and Linux partitions

Micro-Partitioning– Share processors across

multiple partitions– Minimum partition 1/10th

processor

Partition Load Manager– Balances processor and

memory request

Managed via HMC or IVM

© 2009 IBM Corporation16

AIX Rightsizing

Tooling and Data Retrieval: SRM

To the right is the SRM methodology and data streams. This works like many other performance and capacity systems.

Minutely agents are deployed (1) and sent to an interim holding spot (2) where the the data gets processed and crunched to 15 minute intervals or hourly intervals (3) where it’s stored in DB2 (4) and presented on the SRM website(4).

© 2009 IBM Corporation17

AIX Rightsizing

Tooling and Data Retrieval: Brio (ODBC)

After the data is loaded to the SRM data warehouse, it is extracted to the PC utilizing Microsoft’s Open Data Base Connectivity (ODBC).

There, the architectural and utilization information is merged together to produce three reports utilized for rightsizing and server consolidation studies.

Utilization Information

CustomCategorization

Architectural Information

Brio

SRM Data Warehouse

Rightsizing Reporting

Architectural Reporting

Utilization Reporting

© 2009 IBM Corporation18

AIX Rightsizing

Rightsizing Methodology: AIX CPU Sizing Parameters

Part One: Pull the data:

Part Two: Analyze it

Use this later, start with the forest, not the trees.

Use this later, start with the forest, not the trees.=ROUNDUP(IF(

A3="Capped",(G3*I3/100)*1.3,J3),0)

=ROUNDUP(IF(A3="Capped",(G3*I3/100)*1.3,J3),0)

=IF(K3/10>M3,K3/10,M3)

=IF(K3/10>M3,K3/10,M3) =ROUNDUP(IF(

A3="Capped",G3*I3/100,J3),1)

=ROUNDUP(IF(A3="Capped",G3*I3/100,J3),1)

© 2009 IBM Corporation19

AIX Rightsizing

The Big Picture

In the previous example, I chose only the 34 32-way boxes at this corporation (1088 CPUs).

385 physical CPUs on capped LPARs are currently allocated to the workload.

After rightsizing, in a perfect world, we uncapped all the LPARs and could run them on 261 virtual CPUs and 174.8 physical CPUs, or 5.5 32-way boxes, a savings of 25 physical frames after accounting for headroom (2 CPUs per frame) and 4 engines per frame dedicated to VOIS.

Your mileage will vary.

© 2009 IBM Corporation20

AIX Rightsizing

Technical Barriers to Virtualization: Workloads Competing for Resources

Monitoring workloads is essential.

Silo-ed corporations seem to believe that in shared-host systems, someone else is stealing their CPU.

The next chart shows how physical utilization can be calculated at the frame level.

Uncapped LPAR utilization is calculated by utilizing the number of CPUs dispatched to service the workload and therefore includes any LPAR overhead of frame overhead (PURR value, physical processors consumed).

Capped LPAR utilization can be calculated in two ways:– Simple count of engines as they are no longer in the shared pool (i.e., the number of

physical CPUs).– CPU Utilization * the number of physical CPUs assigned.

To prove to management that the boxes are underutilized and run a cost savings project, I usually use CPU Utilization (as seen on the next page).

To prove to application owners that the CPUs isn’t being “stolen” I use the “simple count of engines” for the capped environment and the CPU dispatched for the uncapped.

© 2009 IBM Corporation21

AIX Rightsizing

0

5

10

15

20

25

30

35

IBM

,01025C24A

IBM

,01021062B

IBM

,0102B143F

IBM

,010288F3F

IBM

,0102C586C

IBM

,0102105DB

IBM

,010247D1D

IBM

,01021DD

DB

IBM

,01021BF5B

IBM

,0102B13D

F

IBM

,01023F70B

IBM

,01020EC

9D

IBM

,01024DA

1A

IBM

,01020ED

2D

IBM

,010222C5F

IBM

,010225A9A

IBM

,0102398AB

IBM

,0102270FB

IBM

,01021DD

9B

IBM

,010288F8F

IBM

,010288FDF

IBM

,0102CF0D

F

IBM

,0102BC

17C

IBM

,011095030

IBM

,011059FBD

IBM

,0110BA

DFC

IBM

,011023BA

F

IBM

,0110BA

EA

C

IBM

,0110BA

DC

C

IBM

,0110BA

DA

C

IBM

,01103F92F

IBM

,0110401DF

IBM

,01022E17D

IBM

,011049B0F

Phy

sica

l CP

Us

0

5

10

15

20

25

30

CP

U U

tiliz

atio

n

CPUs on Frame Max HW Used Avg HW Used 90thPCtile HW Used

The top (yellow bar) is the number of physical CPUs, here 32.

The red square is the 90th percentile of the CPU utilization of the frame utilizing hourly data.

The red square is the 90th percentile of the CPU utilization of the frame utilizing hourly data.

The top of the blue line is the maximum CPU utilization of the frame.

The top of the blue line is the maximum CPU utilization of the frame.

The bottom of the blue line is the average utilization of the frame.

The bottom of the blue line is the average utilization of the frame.

Technical Barriers to Virtualization: Workloads Competing for Resources

© 2009 IBM Corporation22

AIX Rightsizing

Growth Concerns – Workloads Cannot Reserve Space for Growth;

In an uncapped environment, workloads can reserve space for growth by utilizing the amount of virtualized CPUs available to the workload.

This was used to “sell” the benefits of uncapped LPARs to the application owners.

In the previous example, a 30% uplift was built into the calculation for the virtual CPUs:– =ROUNDUP(IF(A3="Capped",(G3*I3/100)*1.3,J3),0). – As you work with your individual environment, you can customize that uplift.– Note that uplift not only covers growth, but intra-hour peaks (as I utilized hourly average

data).

© 2009 IBM Corporation23

AIX Rightsizing

Architectural Constraints – Servers Run out of IO or Memory Before They run out of CPU;

These machines require 1,393,664 MB of memory to run their workload. (Memory optimization will have to wait for another day.)

Spread over 7 machines, each machines (evenly) would require 199,095 MB of memory, or 200,704 (4096) or 204,800 (8192).

Unfortunately, these machines came with 131,072.

Further, there are 7 Oracle databases in which the application owner will not let the LPAR run on shared VOIS, adding to the number of frames and the number of engines.

© 2009 IBM Corporation24

AIX Rightsizing

Methodologies to Coadunate LPARs

coadunationthe state or condition of being united by growth.

— coadunate, adj.

© 2009 IBM Corporation25

AIX Rightsizing

Coadunation Example

Mixing workload shares headroom but you pay in response time at low utilization....workload management shifts peaks based on business priorities to use

"white space" but response time of lower priority work is traded off...

© 2009 IBM Corporation26

AIX Rightsizing

Data Preparation

Data is readily available from the SRM database at srmweb.raleigh.ibm.com.

Data is extracted and normalized to the receiving machine using the Ideas International database.

The CSV file is briefly edited then run into SPOT.

This extraction and load process takes about 20 minutes (depending on the response time of the SRM database).

The SPOT tool takes about 10 minutes to run each datacenter (Southbury and Boulder).

Total study time is 60 minutes. Easy!

© 2009 IBM Corporation27

AIX Rightsizing

SPOT Screenshot #1

© 2009 IBM Corporation28

AIX Rightsizing

SPOT Screenshot #2

© 2009 IBM Corporation29

AIX Rightsizing

SPOT Screenshot #3

© 2009 IBM Corporation30

AIX Rightsizing

Results of Co-adunation Study, Boulder

Boulder has 24 physical frames holding 93 LPARs, averaging 3.875 LPARs per frame. Based on CPU utilization, the LPARs could all be deployed to 5 x445s, which would then run an average of 47.4% busy, a savings of 19 physical frames. 2 LPARs would be migrated to stand-alone. (This is an average 18.2 LPARs per frame.)

Current host utilization for Boulder for March, 2007 was 7.33% busy.

© 2009 IBM Corporation31

AIX Rightsizing

Results of Co-adunation Study, Southbury

Southbury has 17 physical frames holding 59 LPARs, averaging 3.47 LPARs per frame. Based on CPU utilization, the LPARs could all be deployed to 4 x445s, which would then run an average of 45.6% busy, a savings of 13 physical frames. 2 LPARs would be migrated to stand-alone. (This is an average of 14.25 LPARs per frame.)

Current utilization for Southbury was 7.62% busy.

© 2009 IBM Corporation32

AIX Rightsizing

Conclusion

There are many ways to reduce cost in a datacenter.

Decrease the number of servers on the floor using physical or virtual consolidation.

Address Concerns:– Performance concerns – workloads competing for resources; – Growth concerns – workloads cannot reserve space for growth; and – Architectural constraints – servers run out of IO or memory before they run out of CPU.

Utilize a statistical or bin-packing mass analysis methodology to coadunate LPARs to achieve higher utilization rates at the hardware level.

Get those cost savings!

© 2009 IBM Corporation33

AIX Rightsizing

Questions?