virtualizing exchange

APP-BCA1684

#vmworldapps

Virtualizing

Exchange Best

Practices

Scott Salyer

VMware, Inc.

2

Disclaimer

This session may contain product features that are

currently under development.

This session/overview of the new technology represents

no commitment from VMware to deliver these features in

any generally available product.

Features are subject to change, and must not be included in

contracts, purchase orders, or sales agreements of any kind.

Technical feasibility and market demand will affect final delivery.

Pricing and packaging for any new technologies or features

discussed or presented have not been determined.

3

Agenda

Exchange Design on vSphere

Support Considerations

vSphere Best Practices for Exchange

vSphere HA, DRS and vMotion with Exchange

Site Resiliency for Exchange

4

Exchange Design Process

Set aside virtualization for a moment…remember, we’re designing

an Exchange environment first

Complete pre-requisite work before diving into the design:

• Understand the business and technical requirements

• Evaluate the current workload, or estimated workload for a new environment

• Know the health of the current and supporting infrastructure

• Understand support and licensing considerations

Follow a design methodology:

1. Establish the design requirements

2. Gather the compute requirements based on the Exchange design

requirements (Exchange Role Requirements Calculator)

3. Design the virtualization platform based on the Exchange design and

compute requirements

4. Determine virtual machine sizing and distribution among virtualization hosts

5

Design Requirements

High availability requirements

• Database Availability Groups (DAG)

• vSphere High Availability

Site resiliency requirements

• DAG

• Site Recovery Manager

Dedicated or Multi-role servers

Database Sizing

• Few, large databases or many, small databases

• Consider backup and restore times

Number of mailboxes

• Growth over next X years

Mailbox tiers

• Quota, activity, deleted item retention, etc

6

Compute Requirements based on Exchange Design

Requirements

• 1 site, high availability using DAG and vSphere HA

• 10,000 heavy users (2GB quota, 150 messages/mbx/day)

• Dedicated mailbox role, combined client access and hub transport servers

*CPU recommendations based on Intel x5660 processor

Physical Compute Requirements

• (7) mailbox cores to support activated databases

• (7) client access/hub transport cores

7

vSphere Compute Requirements

Exchange compute requirements

• 14 total mailbox cores to support failover

• 14 total client access/hub transport cores to support failover

• 256 GB memory for mailbox role to support failover (assuming two node DAG)

• 48 GB memory for CAS/HT role to support failover (assuming four CAS/HT VMs)

• No. of CAS/HT VMs * (4 GB + (2 GB * No. of vCPUs))

Sample vSphere host configuration

• CPU: (2) six-core Intel x5660

• Memory: 144 GB

vSphere minimum requirements

• Three vSphere hosts provide 36 cores, 432 GB

8

Virtual Machine Sizing and Placement

Option 1:

• (2) DAG nodes (77% utilized during failover)

• 8 vCPU, 128GB

• (4) client access/hub transport nodes

• 4 vCPU, 12GB (2GB/vCPU + 4GB)

• CPU and Memory overcommitted during host failure

Option 2

• (3) DAG nodes (54% utilized during failover)

• 6 vCPU, 64GB

• (4) client access/hub transport nodes

• 4 vCPU, 12GB (2GB/vCPU + 4GB)

• Sufficient resources for peripheral services

9

Support Considerations (What is and what isn’t?)

Support for Exchange has evolved drastically over the last two

years leading to confusion and misconceptions

What is Supported?

• Virtualization of all server roles, including Unified Messaging with Exchange

2010 SP1

• Combining Exchange 2010 SP1 DAG and vSphere HA and vMotion

• Thick virtual disks and raw-device mappings (pass-thru disk)

• Fibre channel, FCoE, iSCSI (native and in-guest)

Not Supported?

• NAS Storage for Exchange files (mailbox database, HT queue, logs)

• Thin virtual disks

• Virtual machine snapshots (what about backups?)

MS TechNet – Understanding Exchange 2010 Virtualization:

(http://technet.microsoft.com/en-us/library/jj126252)

http://technet.microsoft.com/en-us/library/jj126252










10

Virtual CPUs

Best Practices for vCPUs

• vCPUs assigned to all Exchange virtual machines should be equal to or less than the total number of physical cores on the ESX host

• Enable hyper-threading, but understand that logical processor cores are not equal to physical processor cores for sizing

• Enable NUMA -- Exchange is not NUMA-aware, but ESX is

• Determine the CPU requirements based on physical CPU throughput and mailbox profile

• Use the SPECInt2006 rating for proposed processor

• Exchange Processor Query Tool

• Calculate adjusted megacycles required and match up to proposed hardware

• Exchange Mailbox Role Calculator

11

Hyper-threading Confusion

VMware’s Performance Best Practices for vSphere whitepaper

recommends enabling hyper-threading in the BIOS

• http://www.VMware.com/pdf/Perf_Best_Practices_vSphere4.0.pdf (page 15)

• www.VMware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf (page 20)

Microsoft recommends disabling hyper-threading for production

Exchange implementations

• http://technet.microsoft.com/en-us/library/dd346699.aspx#Hyper

• “Hyper-threading causes capacity planning and monitoring challenges, and as

a result, the expected gain in CPU overhead is likely not justified…”

“So, do I or don’t I?”

YES! Enable hyper-threading to take advantage of ESXi’s

intelligent scheduler, but size to the capability of physical cores

http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.0.pdf



http://technet.microsoft.com/en-us/library/dd346699.aspx



12

Sizing Exchange VMs to the NUMA Node

NUMA Node 1 NUMA Node 2

NUMA Interconnect 4 vCPU/32 GB

Within NUMA Node

6 vCPU/48 GB

Beyond NUMA Node

Memory Bank 1 Memory Bank 2

4 Core / 32 GB 4 Core / 32 GB

13

NUMA Considerations

The following recommendations should be followed whenever possible to

ensure the best performance on systems that support NUMA:

• Make sure the NUMA architecture is enabled in the BIOS. On many systems

enabling NUMA is achieved by disabling Node Interleaving (typically default)

• When possible size Exchange virtual machines so that the amount of vCPUs

and memory does not exceed the number of cores and memory in a single

NUMA node

• The size of a NUMA node is not always the number of cores on a chip, for

example; AMD Magny Cours place two six-core chips in a single die/socket

• Avoid using CPU affinity features in vSphere, as this can circumvent the NUMA

architecture and reduce performance.

14

Virtual Memory

Best Practices

• Avoid memory over-commitment

• Only use memory reservations to avoid over-commitment, guarantee available physical memory, or to reclaim VM swap file space.

• “Right-size” the configured memory of a VM.

• More memory means less IOs, but only slightly. Test in your environment and fine-tune.

• Check “Unlimited” in resource allocation or set the limit to higher than configured memory

15

Storage: Best Practices

Follow storage vendor recommendations for path policy;

no restrictions like with MSCS

Format NTFS partitions from within the guest with a 64KB

allocation unit size

Verify partition is aligned

• StartingoffSet StripeUnitSize

Eagerzeroedthick virtual disks eliminate first write penalty

• Do not use if using array thin provisioning

Set power policy to High Performance

• Or disable power management in BIOS

16

Storage: Common Questions

PVSCSI vs LSI Logic

• PVSCSI can provide better throughput with less CPU utilization, test using

JetStress and your proposed workload

More virtual SCSI adapters • Evenly distribute virtual disks and

RDMs across four vSCSI adapters

VMFS

VMFS

VMDK or RDM? • Next Slide…

One virtual disk per VMFS or

consolidated virtual disks? • Performance – VMFS datastore must be sized

to accommodate the combined workload

• Management – Fewer datastores eases

management

VMFS VMFS

17

When should I use raw-device mappings (RDMs)?

Performance

Performance is no longer a deciding factor for using RDMs

VMDK disks perform comparably to RDMs

Capacity

VMDK files are limited to 2TB

Physical mode RDMs support up to 64TB

Storage Interaction

Backup solutions may require RDMs due to storage interaction needed

for hardware based VSS

Considerations

Easier to exhaust 255 LUN limitation in ESXi

VMFS volumes can support multiple virtual disks

vSphere storage features leverage virtual disks

18

What about NFS and In-guest iSCSI?

NFS

• Not supported for Exchange data (databases or logs) or shared-disk MSCS

configs (it works great, but consider support implications)

• Consider using for guest OS and app data

In-guest iSCSI

• Supported for DAG database storage

• Facilitates easy storage zoning and access masking

• Useful for minimizing number of LUNs zoned to an ESXi host

• Offloads storage processing resources away from ESXi hosts

19

Networking

Best Practices

• vSphere Distributed Switch or Standard vSwitch?

• Choice is yours, but distributed switches require less management overhead

• Separate traffic types:

• Management (vmkernel, vMotion, FT)

• Storage (iSCSI, FCoE)

• Virtual Machine (MAPI, replication, DMZ, etc)

• Configure vMotion to use multiple NICs to increase throughput

• Allocate at least 2 NICs per virtual switch to leverage NIC teaming capabilities

• Use the VMXNET3 para-virtualized network interface within the guest

• Following Microsoft best practices, allocate multiple NICs to Exchange VMs participating in a DAG

20

Exchange DAG Networking

DAG VMs *should* have two virtual network adapters for

replication and client access (MAPI)

• If separate networks are not possible use a single virtual NIC

Single vSwitch using VLAN trunking to

separate traffic

Physical Separation using

multiple vSwitches and physical NICs

21

vSphere High Availability

Best Practices

Enable HA to protect all VMs in the case of an unexpected host failure

Configure Admission Control policy to ensure failover capacity is available

Enable VM Monitoring to restart non-responsive VMs

Protect database availability groups with vSphere HA

• Deploy vSphere clusters as N+1, where N

is the number of nodes in a DAG

22

VMware Distributed Resource Scheduling

Best Practices

Enable DRS in fully automated mode for entire cluster

• Set individual automation levels for one-off needs

When possible keep VMs small

• VMs with less configured memory and fewer vCPUs can be placed by DRS

more efficiently

vSphere cluster members should have compatible CPU models for vMotion compatibility

• EVC Mode can guarantee host compatibility

Use host and VM groups and affinity rules

23

DRS VM and Host Groups and Rules

Host DRS Groups – Use to group like hosts; hosts in a rack or

chassis, or for licensing purposes

VM DRS Groups – Use to group like or dependent VMs together;

all CAS/HT VMs, all DAG nodes, etc.

Rules:

• Keep VMs Together

• Separate VMs

• VMs to Hosts

• Must run on hosts in group

• Should run on hosts in group

• Must not run on hosts in group

• Should not run on hosts in group

Use for DAG VMs

Use for non-clustered

VMs

24

Should Run On Versus Must Run On Rules

“Should run on” rules preferred to

keep like VMs separated across

chassis, racks, datacenters, etc.

In the case of a failure, VMs can be

powered on surviving hosts to

maintain performance

“Must run on” rules preferred when a

VM must never run on a particular

host or set of hosts

In the case of a failure, VMs will NOT

be allowed to run on the specified

hosts unless the rule is removed,

even if vCenter Server is offline

25

Avoid Database Failover during vMotion

When using vMotion with DAG nodes…

If supported at the physical networking layer enable jumbo frames on all

vmkernel ports to reduce the frames that must be generated and

processed

If jumbo frames is not supported modify cluster heartbeat setting

samesubnetdelay parameter to a maximum of 2000ms (default = 1000ms)

Always dedicate vMotion interfaces for the best performance

Always use multiple vMotion interfaces for increased throughput

C:\> C:\cluster.exe /cluster:dag-name /prop

samesubnetdelay=2000

PS C:\> $cluster = get-cluster dag-name;

$cluster.SameSubnetDelay = 2000

26

Multi-NIC vMotion Support

vDS management port group configured with both vmnics active

27


vDS vMotion port group 1 configured with vmnic0 active, vmnic1 standby

28


vDS vMotion port group 2 configured with vmnic0 standby, vmnic1 active

29


Configure in teaming properties of port group

30

Backups

Virtual Machine Backups

• Requires Exchange-aware agent to ensure a supported Exchange backup

• Large virtual machines (many virtual disks) take longer to snapshot and

commit – may affect DAG heartbeats

Software VSS

• Wide third party adoption

• Flexible configuration support

Hardware VSS

• Storage array level protection, either full clones or snapshots

• Storage vendor provides VSS integration

• Most solutions require physical mode raw-device mappings (unless using in-

guest attached iSCSI)

31

Site Resiliency – Exchange Native

Deployment may involve two or more sites serving as failover targets for

each other

Passive databases located in primary and secondary sites provide local

high availability and site resilience

“Datacenter switchover” process:

1. Terminate services in primary site

(stop-databaseavailabilitygroup)

2. Validate dependencies in second

datacenter

3. Activate mailbox servers

(restore-databaseavailabilitygroup)

4. Update DNS records for client access

endpoints

32

Site Resiliency – Site Recovery Manager

Deployment may involve two or more sites serving as failover targets for

each other

Passive databases at the primary site provide high availability, storage

replication provides site resiliency

SRM failover process:

1. Initiate SRM recovery plan (also

updates DAG node IP addresses)

2. Configure DAG IP address and

witness server (may be part of

SRM recovery plan)

3. Update DAG and DAG node A

records (may be part of SRM

recovery plan)

4. Update DNS records for client

access endpoints (may be part of

SRM recovery plan)

33

Choosing a Site Resilience Plan

Exchange Native Site Resilience

• Active-active and Active-passive deployments supported

• Manual activation during site failure

• Requires application specific site recovery procedure

• Testing recovery procedure requires failing over production services

Site Recovery Manager

• Active-active and Active-passive deployments supported

• Manual activation during site failure

• Application independent recovery

• Testing of recovery plans can be done without impacting production

34

Take Aways…

100% support from VMware and Microsoft

Virtualization doesn’t change the compute requirements for Exchange

Design the Exchange environment based on requirements, base vSphere design on Exchange design

Visit our Exchange Page on VMware.com

http://www.vmware.com/go/exchange

VMware Communities: Exchange, Domino and RIM

http://www.vmware.com/go/exchange

FILL OUT A SURVEY

AT WWW.VMWORLD.COM/MOBILE

COMPLETE THE SURVEY

WITHIN ONE HOUR AFTER

EACH SESSION AND YOU WILL

BE ENTERED INTO A DRAW

FOR A GIFT FROM THE

VMWARE COMPANY STORE

APP-BCA1684

#vmworldapps

Virtualizing

Exchange Best

Practices

Scott Salyer

VMware, Inc.

virtualizing exchange

Documents