victor barra, chris hayes, john paul

33
VMWorld 2007 Presentations Redux (IP10 and IP26) Virtualization Center of Competency: “The Road to a World Class Implementation” Victor Barra, Chris Hayes, John Paul Siemens Medical Solutions, Virtualization Center of Competency

Upload: dysis

Post on 27-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

VMWorld 2007 Presentations Redux (IP10 and IP26) Virtualization Center of Competency: “The Road to a World Class Implementation”. Victor Barra, Chris Hayes, John Paul Siemens Medical Solutions, Virtualization Center of Competency. Siemens Medical Solutions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Victor Barra, Chris Hayes, John Paul

VMWorld 2007 Presentations Redux (IP10 and IP26)

Virtualization Center of Competency: “The Road to a World Class Implementation”

Victor Barra, Chris Hayes, John Paul

Siemens Medical Solutions, Virtualization Center of Competency

Page 2: Victor Barra, Chris Hayes, John Paul

Siemens Medical Solutions

Siemens Medical Solutions employs more than 36,000 people in over 130 countries and is a key business unit of Siemens AG, a leading global electronics and engineering company.

Over fifteen years of virtualization experience on different technical platforms

8 Virtual Centers

Over 80 ESX Hosts

Approximately 500 Virtual Machines of varying types… (Infrastructure, Applications, and Database Servers)

Page 3: Victor Barra, Chris Hayes, John Paul

Common Virtualization Myths and Perceptions

“I don’t need education on virtualization, I can just do it.”

“I’m a technician, I don’t need to get involved in the business side.”

“I don’t need to get buy-in.”

“I still need physical access to my server, even if it’s virtual”

“Virtualization gives you more resources then a physical system.”

“The virtual platform can’t handle database servers.”

“I don’t have to worry about the other stuff, I can just do ESX.”

“We can continue to keep throwing hardware\resources at problems.”

“Storage is storage, and networking is networking, it works the same.”

“I don’t see any errors (on the switch, frame, etc.), it must be the host.”

“Everyone knows that virtualization is an intuitively cheaper solution.”

“It’s okay to ‘Set it and Forget it’ once the workload is virtualized.”

Page 4: Victor Barra, Chris Hayes, John Paul

If We had Known Then What We Know Now…

Know Your Load

Master the Cost Model

Set the Ground Rules from the Start

Not Because You Can…Because You Should

Growth will happen faster then you think

Virtualization Changes Legacy Business Practices

The smallest, earliest decisions impact Scalability

Plan for Capacity rather than Reacting to Capacity

LUN Sizing and Naming Standards are Critical!

VM Architects need to know Everything about Everything!

Virtualization density challenges conventional thinking

So much hinges on Mins, Max-es, Resource & Component Influences

Page 5: Victor Barra, Chris Hayes, John Paul

Virtualization Timeline

OperationalizationBuild OutInitial LoadStart-Up Initial Build

Initial Phases of Virtualization

Implemented by Core Team

Asset Management Capacity Planning

NetworkingPerformanceProcesses

ProcurementSecurityStorage

VirtualizationCo

re T

ea

m L

au

nc

he

d

Each Center of Competency Now Trained in Virtualization

InfrastructureVirtualizationSneakerware

Wo

rld

Cla

ss

Im

ple

me

nta

tio

n

VC

oC

La

un

ch

ed

Page 6: Victor Barra, Chris Hayes, John Paul

Definition of Each Timeline PhaseStart-up: This phase includes the initial design, strategies, resource allocation, and consensus gathering prior to the actual build.

Initial Build: This phase forms the foundation for your deployment of the first Virtual infrastructure, The number and size of the ESX Hosts, LUNs, Networking, etc., vary per site but are all key to a successful build out.

Initial Load: This phase places an initial workload on the virtual infrastructure. It includes a burn in process, guest to host ratio establishment, and increasingly heavy, heterogeneous workload.

Build Out: This phase is process of scaling up the deployment of your virtual infrastructure, expertise, and staff to accommodate the business needs.

Operationalization: This is the steady state phase of training, support and turn over of the day to day operations to the support teams (a.k.a. mainstreaming)

OperationalizationBuild OutInitial LoadStart-Up Initial Build

Virtualization Timeline

Page 7: Victor Barra, Chris Hayes, John Paul

Forming the Virtualization Core Team

InfrastructureStorage

Networking

Hardware Configurations

VirtualizationESX Deployment

Virtual Center Operation

Performance

Security

Sneakerware and TranslationA willingness to track down (via sneakerware) any help needed or do any task in order to keep the project on schedule an

Good communications and teaching skills

Ability to “speak” in the languages of the other supporting groups and translate the needs for them

This is the core team involved with initial implementation. They are the keys to your virtualization success. Their multi-disciplined strengths include:

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 8: Victor Barra, Chris Hayes, John Paul

Some Best Practices from Other Platforms to Apply

Capacity Planning and Reporting

Change Management

File Backup and Recovery Standardization

Financial Targets with Quantifiable/Measurable Metrics

Hardware Monitoring and Alert Notification

Performance Monitoring and Reporting

Recovery Point and Recovery Time Options

Repeatable Processes

Security Role Classification and Assignment

Service Level Agreements with Customers and Vendors

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 9: Victor Barra, Chris Hayes, John Paul

Necessary Perspectives for Virtualization - SAN

Take a serialized view of each component of the SAN path

HBA

Storage fabric switches/directors

Frame Adaptor (SAN front-end)

Front end CPUs

RAID configuration

Disk Speeds and Logical Groupings

LUNs are now shared and active across multiple hosts

LUN queuing becomes more of a concern

Keeping track of WWIDs and masking is essential

The Core Team (and subsequent VCoC) needs to embrace some new and old perspectives on established technologies

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

KeyPoints

Page 10: Victor Barra, Chris Hayes, John Paul

Necessary Perspectives for Virtualization (continued)

Networking

High network density for virtual infrastructure

The network now terminates within the virtual infrastructure

Performance

The virtual context is critical

View the performance from the end user’s perspective first Perfmon and Task Manager for the Windows view Virtual Infrastructure Client and ESXTop for the ESX view

The sampling intervals are key to correct performance analysis

Look at both trends and real time monitoring for a complete perspective

Some issues will not be quantifiable since they are one time occurrences (such as when different workloads all peak at the same time)

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

KeyPoints

Page 11: Victor Barra, Chris Hayes, John Paul

Key Things to Remember During Start-up

Get buy-in from the financial officers - No $$, No Go!Build a solid financial plan and get senior level backing

Leverage the existing talent in the various disciplines

The mainframe model has been successful for many years. Learn from it!

Most of their expertise from other platforms apply to the virtualization platform

Don’t let them assume that all things are the same

Get them involved early and facilitate their education

Virtual Center and VMotion are must haves

Deploy your initial virtual machines on network storage, avoid local disk

Consider cold migration times when establishing any maintenance window and choosing your data drive sizes

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 12: Victor Barra, Chris Hayes, John Paul

Best Practices: Essential Processes During Start-upEscalation

The initial users MUST know who to call for any perceived virtualization issue

The Core team must know who to call for any supporting disciplines

Set realistic expectations on how quickly the Core Team can respond

Performance Analysis

Establish a process to do increasingly deep performance analysis of workloads

Leverage performance analysis skills from other technology platforms

Provisioning

Establish a repeatable process to provision net new virtual machines

Establish a timely process to procure new ESX host systems

Sandbox and Staging Servers

Required to identify and resolve problems and to apply fixes

KeyPoints

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 13: Victor Barra, Chris Hayes, John Paul

Key Things to Remember During the Initial Build

Think ENTERPRISE, plan for substantial growth

Build in additional time for education ramp-up

Build and get an approved Capacity Plan in place

Initial decisions must be geared toward scalability

Determine metrics that can be used to measure success

Commit to an initially low virtual machine/host virtualization ratio

Use industry\vendor preferred practices for World Class Implementation

Use tools such as Platespin’s PowerRecon and VMWare’s Capacity Planner to estimate your target configuration but rely on real world testing after the loads are virtualized

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 14: Victor Barra, Chris Hayes, John Paul

Technical Considerations for the ESX Host Hardware

All critical resources must be “Truly” redundant

Always Consult Vendor HCLs\SCLs during farm design

The faster CPU the better to overcome virtualization overhead

Avoid large (8-way or greater) systems except for large SMP implementations

Memory capacity is key (2-4GB per CPU in ESX host)

Local disk capacity is not as important

Deploy multiple hosts of the same processor family for VMotion

Bundle NICs whenever possible to avoid single point of failure

Build a migration strategy for technology refreshes

Create LUN naming conventions that both the VCoC and Storage Teams understand

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 15: Victor Barra, Chris Hayes, John Paul

Getting the Storage Subsystem Design Correct!

ESX places unique demands on storage (all down one “pipe”)Use multiple fiber HBAs for SAN

Manually assign different HBAs to guest machines (manual load balancing)

Spread I/O across multiple adapters and front end CPUs

Design by capacity AND I/Os per Second

Design SAN for performance at a LUN level not just at the frame level

Choose the type of RAID carefully

Consider tiered storage based on capacity and performance needs

Tiers can be implemented inside the same SAN frame Different speed and types of disk (ex., Fiber and S-ATA) Different RAID configurations Different groupings of physical disk

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 16: Victor Barra, Chris Hayes, John Paul

Standards of Practice

Critical VMWare Stated Mins\Max Values

Number of Hosts Sharing one LUN: 16

256 Total LUNs per ESX host

32 Guest Per LUN

Number of hosts per DRS cluster 32

Number of hosts per HA cluster 16

Number of hosts per Virtual Center server: VC1.3x—20; VC 2.x--100

32 Hosts Per Virtual Cluster (VI3)

64GB RAM Per Host

800MB RAM to Service Console

Number of total paths to a LUN 1024

Number of Physical NICs: 26 100, 32 1000, 20 BroadCom

Number of HBAs 16

How do these values effect your deployment?

World Class Virtualization Deployment

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 17: Victor Barra, Chris Hayes, John Paul

Standards of Practice: Impact on deploymentWorld Class Virtualization Deployment

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Vendor\Industry Standards impact Deployment Decisions

Deployment Decisions impact Resource Allocations

Resource Allocations Impact Farm Functionality

Example: LUN Size

Standard\Typical VM Guest Footprint: X

Max Guest Per LUN: 32

Max Hosts Per Virtual Cluster (VI3): 32

Max Number of hosts sharing one LUN: 16

Industry Standards

Deployment Standards

Resource Allocations

Build your equation for LUN size and Total Cluster Size

LUN Naming – <ESX Farm name><Array Serial Number><Device ID> , where array serial number is the last 4 digits of the array serial number and volume number is exactly 4 digits in length.  ex. dvlp41691249

Page 18: Victor Barra, Chris Hayes, John Paul

Standards of Practice:

ESX Host

Build Partitions

World Class Virtualization Deployment

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Service Console Memory (Right Sizing)

Find the right size for the Service Console up to the maximum of 800MB, depending on installed 3rd party software. (remember the “Free” command)

Page 19: Victor Barra, Chris Hayes, John Paul

Infrastructure Redundancy & Fault Tolerance:

Key Components

Hardware (Power, PCI Devices, etc.)

Storage (Highly-available LUNs, Paths, Switches, etc.)

Network (Multi-path, multi-module per VLAN)

HA & DRS: VI3 component aiding VM availability and load balancing

World Class Virtualization Deployment

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

::::::

:H

BA

1

::::::

:H

BA

2

1

VL

AN

1V

LA

N 2

2 1

VL

AN

1V

LA

N 2

2

1 2 3 4

Page 20: Victor Barra, Chris Hayes, John Paul

Base Deployment (Close Up View)

Service Console

Vmotion Network

Networking (VLAN 1)

Networking (VLAN 1)

::::::

:H

BA

1

::::::

:H

BA

21

VL

AN

1V

LA

N 2

2 1

VL

AN

1V

LA

N 2

2

1 2 3 4

Ser

vice

Co

nso

le

VM

oti

on

Fiber Cable

Networking

VMotion

ServiceConsole

Page 21: Victor Barra, Chris Hayes, John Paul

Security:

Host security

Limit root level access

Limit all Host level access

Limit ssh and ftp access

Standard physical server security

VMDK security

Ideally encrypted vmdk files

Virtual Center Security

Least access possible approach

World Class Virtualization Deployment

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

(Very Important)

Page 22: Victor Barra, Chris Hayes, John Paul

Price\Cost Model:

Per Usage (Based on Core Four Resources)

Per Container (Based on Flat Hosting Fee)

Tiered cost model

Chargeback Reporting

Home-grown

3rd Party

World Class Virtualization Deployment

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Hardware

FloorSpace

PowerAir

Conditioning

DisasterRecovery

BackupRecovery

OperationalSupport

Management

Software

Price Cost Model

Page 23: Victor Barra, Chris Hayes, John Paul

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

World Class Virtualization Deployment Standards of Practice: Resource Interdependency

Farm

VM Footprint

Storage Config

Cluster Size

Capacity Plan

Cost Model

Performance

Host Config

Network Config

All components affect the farm each other

Deployment decisions impact components, resources & capacity

The Cost Model & Capacity Plan affect more than each other!

Storage decisions affects:

Host Config

Cluster Sizing

Capacity Planning

Cost Model

VM Footprintaffects:

Performance

Storage Config

Capacity Planning

Cost Model

Page 24: Victor Barra, Chris Hayes, John Paul

Backup and Recovery

Does your backup and recovery process meet your customer & performance requirements?

How will each virtual machine be backed up?

Host Based, VMDK an VMX backup recovery and deployment

Does the backup need to be done at the file level or only VMDK level?

Full Site Replication (SAN Replication) of Virtual Machines

World Class Virtualization Deployment

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 25: Victor Barra, Chris Hayes, John Paul

Disaster Recovery “One Size Does Not Fit All”

Providing a tired DR solution along with classifying what is a disaster will enable you to optimize your recovery process.

Tiered DR Approach

What's the retention on each VMDK?

How often do we do VMDK backups?

Full vs. Incremental Backups?

What is the SLA and importance of each guest?

Defining a Disaster

Catastrophic outage (ESX Cluster, Entire Site Down, etc.)

SAN has gone down

ESX server or Virtual Center Crash

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 26: Victor Barra, Chris Hayes, John Paul

Supportability:

The success of the deployment of VM is contingent on support teams that comprise your IT Infrastructure

Remember that not all levels of support will have the base knowledge. Virtualization changes many things

Provide hands on training before turnover

Develop a support process work-flow

World Class Virtualization Deployment

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 27: Victor Barra, Chris Hayes, John Paul

Capacity Planning:

Capacity Planning 5-step checklist:

Identify Core Component Profiles

Develop Critical Resource Thresholds

Understand Replenishment Timeframes

Evaluate Resource Demands

Establish Capacity Replenishment Triggers

World Class Virtualization Deployment

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Storage LUN Replenishment Trigger Example:Platform Storage Frame Threshold: 80%Total Farm Storage: 4.4 TBReplenishment Timeframe: 4 WeeksStorage Demand: 70GB\Week

(4 weeks X 70GB\week)= 280GB 80% of 4480GB = 3584

Replenishment Trigger 3304GB

Replenishment Trigger Equation: RT%=CT-(RRxD)Replenishment Trigger = Critical Threshold-Replenishment Rate X Demand

Capacity Planning 5-step checklist:

Identify Core Component Profiles

Develop Critical Resource Thresholds

Understand Replenishment Timeframes

Evaluate Resource Demands

Establish Capacity Replenishment Triggers

Page 28: Victor Barra, Chris Hayes, John Paul

82%(UTIL.)

79%(UTIL.)

75%(UTIL.)

SANCPU

RAM

100 %(max run-rate)

VMs

Critical Thresholds

Understand Critical Thresholds (Platform & Organizational)

Determine Standard Profiles (VM, Host & Storage)

Understand Critical Thresholds (Platform and Organizational)

Evaluate Fulfillment Timeframes (Host, Memory, Storage, etc.)

Determine Demand: Current & Forecasted (good luck!)

Set Replenishment Triggers For Each Resource

Establish a Repeatable Capacity Evaluation and Replenishment Process

Identify Replenishment

Timeframes(worst case)

SANRAM

90 days

Lead Times

HOST

60 days

30 days

Timeframes

79%(util.)

75%(util.)

100 %(run rate)

Critical Thresholds

SANRAM

90 days

Lead Times

HOST

60 days

30 days

Timeframes

Set Replenishment Triggers

Account for Demand!

World Class Virtualization Deployment; Capacity Planning

What Are Your Replenishment Triggers?

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 29: Victor Barra, Chris Hayes, John Paul

Building the Virtualization Center of Competency

Asset Management

Capacity Planning

Networking

Performance

Processes

The VCoC transitions the virtual landscape from initial build to the build out phase with world class implementation measures. The VCoC comprises the original Core team augmented by resources from the following disciplines:

Procurement

Security

Storage

Systems Administration

Virtualization

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Remember that you can’t move to a World Class Implementation until ALL of the various disciplines move there! World Class applies to the entire infrastructure and organization, not just to the part that the VCoC handles.

Page 30: Victor Barra, Chris Hayes, John Paul

Transitioning from the Initial Load to the Build Out

Review the Initial Build for Lessons Learned

Validate the Financial Plan

Gather the Requirements for the Build Out

Engage all of the technical, financial, and process groups to determine what should and should not be in the VCoC

Recognize any final bastions of resistance to virtualization and address their concerns. Rely on metrics from the Initial Build phase to help forecast Build Out requirements

Use the capacity plan to perpetuate scalability and resource fulfillment

Consult with critical discipline areas to ensure infrastructure availability

Target hardware consistency

Ongoing supportability is very high on the priority list

Build net new VMs when and where possible

Be careful of over confidence and stretching the Rules of Thumb

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 31: Victor Barra, Chris Hayes, John Paul

Best Practices: Essential Processes During the Build OutDocumentation

As painful as it may be, you need to document your processes This is necessary for 24x7 support “If you can’t repeat it, you can’t support it”

Establish Repeatable and Measurable Financial Metrics

Chargebacks and resource consumption are company requirements

Knowledge Transfer

Establish a mentoring or buddy program to do knowledge transfer

Avoid any single point of failures, especially with individuals

Patching

Create a process for patching and upgrading ESX hosts and virtual machines

Scalability

Every infrastructure component must be scaleable

Achievable timeframes and costs are needed for infrastructure scaling

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 32: Victor Barra, Chris Hayes, John Paul

Best Practices: Putting Together the Deployment PuzzleBig Bang or Phased

Mass build out

Incremental deployment of resources

“Build and Fill” or “Evaluate and Fill”

Load-based deployment

Container-based deployment plan

“Build and Fill” (practical) or “Evaluate and Fill” (Ideal)

Keys to Successful Deployment

Load distribution

Load classification

What is the footprint?

Establish virtualization candidate metrics

Implementation of a Staging Server and Production Zones

Understand isolation requirements; deploy loads and VMWare features accordingly

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e

Page 33: Victor Barra, Chris Hayes, John Paul

Final Thoughts

Think beyond the ESX Server “Think Enterprise” (Power, Storage and Networking becomes more important)

The cost of unused storage space

Leverage existing management tools

Licensing issues (OS, Backup Agent, Apps)

Features misused (redo, cloning and snapshot)

The end user, management, and operational teams lack of virtualization knowledge will be a hurdle to overcome

Do not over commit your core 4 resources, use only what is needed.

Financial buy-in should not be assumed, go out and get it!

Learn from others, don’t make their mistakes (you will make plenty on your own!)

Find and fix the hot spots and recognize atypical behavior

OperationalizationBuild OutInitial LoadStart-Up Initial Build

V i r t u a l i z a t i o n T i m e l i n e