victor barra, chris hayes, john paul
DESCRIPTION
VMWorld 2007 Presentations Redux (IP10 and IP26) Virtualization Center of Competency: “The Road to a World Class Implementation”. Victor Barra, Chris Hayes, John Paul Siemens Medical Solutions, Virtualization Center of Competency. Siemens Medical Solutions. - PowerPoint PPT PresentationTRANSCRIPT
VMWorld 2007 Presentations Redux (IP10 and IP26)
Virtualization Center of Competency: “The Road to a World Class Implementation”
Victor Barra, Chris Hayes, John Paul
Siemens Medical Solutions, Virtualization Center of Competency
Siemens Medical Solutions
Siemens Medical Solutions employs more than 36,000 people in over 130 countries and is a key business unit of Siemens AG, a leading global electronics and engineering company.
Over fifteen years of virtualization experience on different technical platforms
8 Virtual Centers
Over 80 ESX Hosts
Approximately 500 Virtual Machines of varying types… (Infrastructure, Applications, and Database Servers)
Common Virtualization Myths and Perceptions
“I don’t need education on virtualization, I can just do it.”
“I’m a technician, I don’t need to get involved in the business side.”
“I don’t need to get buy-in.”
“I still need physical access to my server, even if it’s virtual”
“Virtualization gives you more resources then a physical system.”
“The virtual platform can’t handle database servers.”
“I don’t have to worry about the other stuff, I can just do ESX.”
“We can continue to keep throwing hardware\resources at problems.”
“Storage is storage, and networking is networking, it works the same.”
“I don’t see any errors (on the switch, frame, etc.), it must be the host.”
“Everyone knows that virtualization is an intuitively cheaper solution.”
“It’s okay to ‘Set it and Forget it’ once the workload is virtualized.”
If We had Known Then What We Know Now…
Know Your Load
Master the Cost Model
Set the Ground Rules from the Start
Not Because You Can…Because You Should
Growth will happen faster then you think
Virtualization Changes Legacy Business Practices
The smallest, earliest decisions impact Scalability
Plan for Capacity rather than Reacting to Capacity
LUN Sizing and Naming Standards are Critical!
VM Architects need to know Everything about Everything!
Virtualization density challenges conventional thinking
So much hinges on Mins, Max-es, Resource & Component Influences
Virtualization Timeline
OperationalizationBuild OutInitial LoadStart-Up Initial Build
Initial Phases of Virtualization
Implemented by Core Team
Asset Management Capacity Planning
NetworkingPerformanceProcesses
ProcurementSecurityStorage
VirtualizationCo
re T
ea
m L
au
nc
he
d
Each Center of Competency Now Trained in Virtualization
InfrastructureVirtualizationSneakerware
Wo
rld
Cla
ss
Im
ple
me
nta
tio
n
VC
oC
La
un
ch
ed
Definition of Each Timeline PhaseStart-up: This phase includes the initial design, strategies, resource allocation, and consensus gathering prior to the actual build.
Initial Build: This phase forms the foundation for your deployment of the first Virtual infrastructure, The number and size of the ESX Hosts, LUNs, Networking, etc., vary per site but are all key to a successful build out.
Initial Load: This phase places an initial workload on the virtual infrastructure. It includes a burn in process, guest to host ratio establishment, and increasingly heavy, heterogeneous workload.
Build Out: This phase is process of scaling up the deployment of your virtual infrastructure, expertise, and staff to accommodate the business needs.
Operationalization: This is the steady state phase of training, support and turn over of the day to day operations to the support teams (a.k.a. mainstreaming)
OperationalizationBuild OutInitial LoadStart-Up Initial Build
Virtualization Timeline
Forming the Virtualization Core Team
InfrastructureStorage
Networking
Hardware Configurations
VirtualizationESX Deployment
Virtual Center Operation
Performance
Security
Sneakerware and TranslationA willingness to track down (via sneakerware) any help needed or do any task in order to keep the project on schedule an
Good communications and teaching skills
Ability to “speak” in the languages of the other supporting groups and translate the needs for them
This is the core team involved with initial implementation. They are the keys to your virtualization success. Their multi-disciplined strengths include:
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Some Best Practices from Other Platforms to Apply
Capacity Planning and Reporting
Change Management
File Backup and Recovery Standardization
Financial Targets with Quantifiable/Measurable Metrics
Hardware Monitoring and Alert Notification
Performance Monitoring and Reporting
Recovery Point and Recovery Time Options
Repeatable Processes
Security Role Classification and Assignment
Service Level Agreements with Customers and Vendors
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Necessary Perspectives for Virtualization - SAN
Take a serialized view of each component of the SAN path
HBA
Storage fabric switches/directors
Frame Adaptor (SAN front-end)
Front end CPUs
RAID configuration
Disk Speeds and Logical Groupings
LUNs are now shared and active across multiple hosts
LUN queuing becomes more of a concern
Keeping track of WWIDs and masking is essential
The Core Team (and subsequent VCoC) needs to embrace some new and old perspectives on established technologies
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
KeyPoints
Necessary Perspectives for Virtualization (continued)
Networking
High network density for virtual infrastructure
The network now terminates within the virtual infrastructure
Performance
The virtual context is critical
View the performance from the end user’s perspective first Perfmon and Task Manager for the Windows view Virtual Infrastructure Client and ESXTop for the ESX view
The sampling intervals are key to correct performance analysis
Look at both trends and real time monitoring for a complete perspective
Some issues will not be quantifiable since they are one time occurrences (such as when different workloads all peak at the same time)
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
KeyPoints
Key Things to Remember During Start-up
Get buy-in from the financial officers - No $$, No Go!Build a solid financial plan and get senior level backing
Leverage the existing talent in the various disciplines
The mainframe model has been successful for many years. Learn from it!
Most of their expertise from other platforms apply to the virtualization platform
Don’t let them assume that all things are the same
Get them involved early and facilitate their education
Virtual Center and VMotion are must haves
Deploy your initial virtual machines on network storage, avoid local disk
Consider cold migration times when establishing any maintenance window and choosing your data drive sizes
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Best Practices: Essential Processes During Start-upEscalation
The initial users MUST know who to call for any perceived virtualization issue
The Core team must know who to call for any supporting disciplines
Set realistic expectations on how quickly the Core Team can respond
Performance Analysis
Establish a process to do increasingly deep performance analysis of workloads
Leverage performance analysis skills from other technology platforms
Provisioning
Establish a repeatable process to provision net new virtual machines
Establish a timely process to procure new ESX host systems
Sandbox and Staging Servers
Required to identify and resolve problems and to apply fixes
KeyPoints
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Key Things to Remember During the Initial Build
Think ENTERPRISE, plan for substantial growth
Build in additional time for education ramp-up
Build and get an approved Capacity Plan in place
Initial decisions must be geared toward scalability
Determine metrics that can be used to measure success
Commit to an initially low virtual machine/host virtualization ratio
Use industry\vendor preferred practices for World Class Implementation
Use tools such as Platespin’s PowerRecon and VMWare’s Capacity Planner to estimate your target configuration but rely on real world testing after the loads are virtualized
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Technical Considerations for the ESX Host Hardware
All critical resources must be “Truly” redundant
Always Consult Vendor HCLs\SCLs during farm design
The faster CPU the better to overcome virtualization overhead
Avoid large (8-way or greater) systems except for large SMP implementations
Memory capacity is key (2-4GB per CPU in ESX host)
Local disk capacity is not as important
Deploy multiple hosts of the same processor family for VMotion
Bundle NICs whenever possible to avoid single point of failure
Build a migration strategy for technology refreshes
Create LUN naming conventions that both the VCoC and Storage Teams understand
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Getting the Storage Subsystem Design Correct!
ESX places unique demands on storage (all down one “pipe”)Use multiple fiber HBAs for SAN
Manually assign different HBAs to guest machines (manual load balancing)
Spread I/O across multiple adapters and front end CPUs
Design by capacity AND I/Os per Second
Design SAN for performance at a LUN level not just at the frame level
Choose the type of RAID carefully
Consider tiered storage based on capacity and performance needs
Tiers can be implemented inside the same SAN frame Different speed and types of disk (ex., Fiber and S-ATA) Different RAID configurations Different groupings of physical disk
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Standards of Practice
Critical VMWare Stated Mins\Max Values
Number of Hosts Sharing one LUN: 16
256 Total LUNs per ESX host
32 Guest Per LUN
Number of hosts per DRS cluster 32
Number of hosts per HA cluster 16
Number of hosts per Virtual Center server: VC1.3x—20; VC 2.x--100
32 Hosts Per Virtual Cluster (VI3)
64GB RAM Per Host
800MB RAM to Service Console
Number of total paths to a LUN 1024
Number of Physical NICs: 26 100, 32 1000, 20 BroadCom
Number of HBAs 16
How do these values effect your deployment?
World Class Virtualization Deployment
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Standards of Practice: Impact on deploymentWorld Class Virtualization Deployment
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Vendor\Industry Standards impact Deployment Decisions
Deployment Decisions impact Resource Allocations
Resource Allocations Impact Farm Functionality
Example: LUN Size
Standard\Typical VM Guest Footprint: X
Max Guest Per LUN: 32
Max Hosts Per Virtual Cluster (VI3): 32
Max Number of hosts sharing one LUN: 16
Industry Standards
Deployment Standards
Resource Allocations
Build your equation for LUN size and Total Cluster Size
LUN Naming – <ESX Farm name><Array Serial Number><Device ID> , where array serial number is the last 4 digits of the array serial number and volume number is exactly 4 digits in length. ex. dvlp41691249
Standards of Practice:
ESX Host
Build Partitions
World Class Virtualization Deployment
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Service Console Memory (Right Sizing)
Find the right size for the Service Console up to the maximum of 800MB, depending on installed 3rd party software. (remember the “Free” command)
Infrastructure Redundancy & Fault Tolerance:
Key Components
Hardware (Power, PCI Devices, etc.)
Storage (Highly-available LUNs, Paths, Switches, etc.)
Network (Multi-path, multi-module per VLAN)
HA & DRS: VI3 component aiding VM availability and load balancing
World Class Virtualization Deployment
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
::::::
:H
BA
1
::::::
:H
BA
2
1
VL
AN
1V
LA
N 2
2 1
VL
AN
1V
LA
N 2
2
1 2 3 4
Base Deployment (Close Up View)
Service Console
Vmotion Network
Networking (VLAN 1)
Networking (VLAN 1)
::::::
:H
BA
1
::::::
:H
BA
21
VL
AN
1V
LA
N 2
2 1
VL
AN
1V
LA
N 2
2
1 2 3 4
Ser
vice
Co
nso
le
VM
oti
on
Fiber Cable
Networking
VMotion
ServiceConsole
Security:
Host security
Limit root level access
Limit all Host level access
Limit ssh and ftp access
Standard physical server security
VMDK security
Ideally encrypted vmdk files
Virtual Center Security
Least access possible approach
World Class Virtualization Deployment
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
(Very Important)
Price\Cost Model:
Per Usage (Based on Core Four Resources)
Per Container (Based on Flat Hosting Fee)
Tiered cost model
Chargeback Reporting
Home-grown
3rd Party
World Class Virtualization Deployment
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Hardware
FloorSpace
PowerAir
Conditioning
DisasterRecovery
BackupRecovery
OperationalSupport
Management
Software
Price Cost Model
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
World Class Virtualization Deployment Standards of Practice: Resource Interdependency
Farm
VM Footprint
Storage Config
Cluster Size
Capacity Plan
Cost Model
Performance
Host Config
Network Config
All components affect the farm each other
Deployment decisions impact components, resources & capacity
The Cost Model & Capacity Plan affect more than each other!
Storage decisions affects:
Host Config
Cluster Sizing
Capacity Planning
Cost Model
VM Footprintaffects:
Performance
Storage Config
Capacity Planning
Cost Model
Backup and Recovery
Does your backup and recovery process meet your customer & performance requirements?
How will each virtual machine be backed up?
Host Based, VMDK an VMX backup recovery and deployment
Does the backup need to be done at the file level or only VMDK level?
Full Site Replication (SAN Replication) of Virtual Machines
World Class Virtualization Deployment
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Disaster Recovery “One Size Does Not Fit All”
Providing a tired DR solution along with classifying what is a disaster will enable you to optimize your recovery process.
Tiered DR Approach
What's the retention on each VMDK?
How often do we do VMDK backups?
Full vs. Incremental Backups?
What is the SLA and importance of each guest?
Defining a Disaster
Catastrophic outage (ESX Cluster, Entire Site Down, etc.)
SAN has gone down
ESX server or Virtual Center Crash
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Supportability:
The success of the deployment of VM is contingent on support teams that comprise your IT Infrastructure
Remember that not all levels of support will have the base knowledge. Virtualization changes many things
Provide hands on training before turnover
Develop a support process work-flow
World Class Virtualization Deployment
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Capacity Planning:
Capacity Planning 5-step checklist:
Identify Core Component Profiles
Develop Critical Resource Thresholds
Understand Replenishment Timeframes
Evaluate Resource Demands
Establish Capacity Replenishment Triggers
World Class Virtualization Deployment
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Storage LUN Replenishment Trigger Example:Platform Storage Frame Threshold: 80%Total Farm Storage: 4.4 TBReplenishment Timeframe: 4 WeeksStorage Demand: 70GB\Week
(4 weeks X 70GB\week)= 280GB 80% of 4480GB = 3584
Replenishment Trigger 3304GB
Replenishment Trigger Equation: RT%=CT-(RRxD)Replenishment Trigger = Critical Threshold-Replenishment Rate X Demand
Capacity Planning 5-step checklist:
Identify Core Component Profiles
Develop Critical Resource Thresholds
Understand Replenishment Timeframes
Evaluate Resource Demands
Establish Capacity Replenishment Triggers
82%(UTIL.)
79%(UTIL.)
75%(UTIL.)
SANCPU
RAM
100 %(max run-rate)
VMs
Critical Thresholds
Understand Critical Thresholds (Platform & Organizational)
Determine Standard Profiles (VM, Host & Storage)
Understand Critical Thresholds (Platform and Organizational)
Evaluate Fulfillment Timeframes (Host, Memory, Storage, etc.)
Determine Demand: Current & Forecasted (good luck!)
Set Replenishment Triggers For Each Resource
Establish a Repeatable Capacity Evaluation and Replenishment Process
Identify Replenishment
Timeframes(worst case)
SANRAM
90 days
Lead Times
HOST
60 days
30 days
Timeframes
79%(util.)
75%(util.)
100 %(run rate)
Critical Thresholds
SANRAM
90 days
Lead Times
HOST
60 days
30 days
Timeframes
Set Replenishment Triggers
Account for Demand!
World Class Virtualization Deployment; Capacity Planning
What Are Your Replenishment Triggers?
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Building the Virtualization Center of Competency
Asset Management
Capacity Planning
Networking
Performance
Processes
The VCoC transitions the virtual landscape from initial build to the build out phase with world class implementation measures. The VCoC comprises the original Core team augmented by resources from the following disciplines:
Procurement
Security
Storage
Systems Administration
Virtualization
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Remember that you can’t move to a World Class Implementation until ALL of the various disciplines move there! World Class applies to the entire infrastructure and organization, not just to the part that the VCoC handles.
Transitioning from the Initial Load to the Build Out
Review the Initial Build for Lessons Learned
Validate the Financial Plan
Gather the Requirements for the Build Out
Engage all of the technical, financial, and process groups to determine what should and should not be in the VCoC
Recognize any final bastions of resistance to virtualization and address their concerns. Rely on metrics from the Initial Build phase to help forecast Build Out requirements
Use the capacity plan to perpetuate scalability and resource fulfillment
Consult with critical discipline areas to ensure infrastructure availability
Target hardware consistency
Ongoing supportability is very high on the priority list
Build net new VMs when and where possible
Be careful of over confidence and stretching the Rules of Thumb
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Best Practices: Essential Processes During the Build OutDocumentation
As painful as it may be, you need to document your processes This is necessary for 24x7 support “If you can’t repeat it, you can’t support it”
Establish Repeatable and Measurable Financial Metrics
Chargebacks and resource consumption are company requirements
Knowledge Transfer
Establish a mentoring or buddy program to do knowledge transfer
Avoid any single point of failures, especially with individuals
Patching
Create a process for patching and upgrading ESX hosts and virtual machines
Scalability
Every infrastructure component must be scaleable
Achievable timeframes and costs are needed for infrastructure scaling
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Best Practices: Putting Together the Deployment PuzzleBig Bang or Phased
Mass build out
Incremental deployment of resources
“Build and Fill” or “Evaluate and Fill”
Load-based deployment
Container-based deployment plan
“Build and Fill” (practical) or “Evaluate and Fill” (Ideal)
Keys to Successful Deployment
Load distribution
Load classification
What is the footprint?
Establish virtualization candidate metrics
Implementation of a Staging Server and Production Zones
Understand isolation requirements; deploy loads and VMWare features accordingly
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e
Final Thoughts
Think beyond the ESX Server “Think Enterprise” (Power, Storage and Networking becomes more important)
The cost of unused storage space
Leverage existing management tools
Licensing issues (OS, Backup Agent, Apps)
Features misused (redo, cloning and snapshot)
The end user, management, and operational teams lack of virtualization knowledge will be a hurdle to overcome
Do not over commit your core 4 resources, use only what is needed.
Financial buy-in should not be assumed, go out and get it!
Learn from others, don’t make their mistakes (you will make plenty on your own!)
Find and fix the hot spots and recognize atypical behavior
OperationalizationBuild OutInitial LoadStart-Up Initial Build
V i r t u a l i z a t i o n T i m e l i n e