wp vm top10hamisconfigissues

Upload: ascrivner

Post on 05-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 WP VM Top10HAMisconfigIssues

    1/6

    Expert Reerence Series o White Papers

    1-800-COURSES www.globalknowledge.com

    Ten vSphere HA and

    DRS Misconfguration

    Issues

  • 7/31/2019 WP VM Top10HAMisconfigIssues

    2/6

    Copyright 2011 Global Knowledge Training LLC. All rights reserved. 2

    Ten vSphere HA and DRS

    Misconfguration Issues

    John Hales, Global Knowledge VMware instructor, A+, Network+, CTT+,MCSE, MCDBA, MOUS, MCT, VCP, VCAP, VCI, EMCSA

    IntroductionVMware has a popular and powerul virtualization suite o products in the vSphere and vCenter amily oproducts. This white paper will ocus on ten o the biggest mistakes people make when confguring the HighAvailability (HA) and Distributed Resource Scheduler (DRS) eatures. We wont rehash what HA or DR are andhow they work. Ill instead reer you to the webinar and white paper I wrote or Global Knowledge, titled TopFive New Features in vSphere 5, and will ocus on common confguration mistakes and how to avoid them. Wellbegin by looking at fve common HA issues, then well look at our common DRS issues, then conclude with anissue that aects both HA and DRS. A top-ten list is always subjective, but Id suggest that these eatures belongamong the top mistakes commonly seen in vSphere deployments. In this white paper, well rank the issues inorder rom those with the biggest impact to those with the smallest impact in each section.

    HA IssuesHA is included in almost every version o vSphere, including one o the small business bundles (Essentials Plus),as the impact o an ESXi host ailure is much bigger than the loss o a single server in the traditional worldbecause many virtual machines (VMs) are aected. Thus, it is very important to get HA designed and confguredcorrectly.

    1. Purchasing Dierently Confgured Servers

    One o the common mistakes people make is buying dierently sized servers (more CPU and/or memory insome servers than others) and placing them in the same cluster. This is oten done with the idea that some VMsrequire a lot more resources than others, and the big, powerul servers are more expensive than several smallerservers. The problem with this thinking is that HA is pessimistic and assumes that the largest servers will ail.

    Solution: Either buy servers that are confgured the same (or at least similarly) or create a couple o dierentclusters, with each cluster having servers confgured the same. Some people also implement afnity rules tokeep the big VMs on designated servers, but this impacts DRS well cover that issue later.

    2. Insufcient Hosts to Run All VMs Accounting or HA Overhead

    When budgets are tight, many administrators will size their environments to have sufcient resources to run allthe VMs that are needed, but orget to take into account the overhead that HA imposes to guarantee that su-fcient resources will exist to restart the VMs on a ailed host (or multiple hosts, i you are pessimistic). See thenext two confguration issues or guidance on planning or host ailures (and thus the overhead or HA).

    http://www.globalknowledge.com/training/whitepaperdetail.asp?pageid=502&wpid=883&country=United+Stateshttp://www.globalknowledge.com/training/whitepaperdetail.asp?pageid=502&wpid=883&country=United+States
  • 7/31/2019 WP VM Top10HAMisconfigIssues

    3/6

    Copyright 2011 Global Knowledge Training LLC. All rights reserved. 3

    VMwares best practice is to always leave Admission Control enabled or HA to have HA automatically setaside resources to restart VMs ater a host ailure. Wed strongly recommend this as well.

    Solution: Plan or the overhead o HA and purchase sufcient hardware to cover the resources required by theVMs in the environment plus the overhead or HA.

    3. Using the Host Failures Cluster Tolerates Policy

    Recall that there are three admission control policies, namely:

    Host ailures the cluster tolerates: The original (and only) option or HA, this type assumes theloss o a specifed number o hosts (one to our in versions 3 and 4, up to 31 in vSphere 5).

    Percentage o cluster resources reserved as ailover spare capacity: Introduced in vSphere4, this option sets aside a specifed percentage o both CPU and memory resources rom the total in thecluster or ailover use; vSphere 5 improved this option by allowing dierent percentages to be speci-fed or CPU and memory.

    Speciy ailover hosts: This policy specifes a standby host that runs all the time, but is never used

    or running VMs unless a host in the cluster ails. It was introduced in vSphere 4 and upgraded in ver-sion 5 by allowing multiple hosts to be specifed.

    As described previously, HA is pessimistic, and always assumes the largest host will ail, reserving more resourc-es than usually needed i the hosts are sized dierently (though, per issue one, we dont recommend that).

    This policy also uses a concept called slots to reserve the right amount o spare capacity, but it assumes a onesize fts all policy in this regard and uses the VM with the largest CPU and the largest memory reservation asthe slot size or all VMs, thus wasting additional resources unless all VMs are sized the same.

    Solution: Use the VMware recommended policy o Percentage o cluster resources reserved as ailover sparecapacity instead, which will take a percentage o the entire clusters resources and use actual reservations oneach VM instead o using the largest reservation.

    4. Forgetting to Update the Percentage Admission Control Policy as Cluster

    Grows

    I the Percentage o cluster resources reserved as ailover spare capacity policy is used (as suggest-ed), it is important to reserve the correct amount o CPU and memory based on the needs o the VMs and thesize o the cluster. For example, in a two-node cluster, the loss o one o the nodes removes hal o the clusterresources (assuming they are sized the same). Thus, the percentage may be set to 50. However, i additionalnodes are added to the cluster later, that value is probably too high and should be reduced to take into accountthe additional node(s) and the number o simultaneous ailures expected (or example with our nodes, theloss o one node suggests that the percentage be set to 25, while i two ailures are expected, then 50 percentshould be used).

  • 7/31/2019 WP VM Top10HAMisconfigIssues

    4/6

    Copyright 2011 Global Knowledge Training LLC. All rights reserved. 4

    Solution: Go back and recalculate the appropriate value in your cluster whenever hosts are added to or re-moved rom the cluster.

    5. Confguring VM Restart Priorities Inefciently

    One o the settings that can be set in an HA cluster is the deault restart priority o VMs ater a host ailure.

    This deaults to Medium, but can be set to Low, Medium, or High, or Disabled, i most VMs should not berestarted ater a host ailure. This is okay i all o the VMs are the same priority or i there are three priorities oVMs (low, medium, and high) and most are normal. However, i you have normal, high, and very high, this mightnot work as well.

    Solution: Consider setting the cluster deault or restart priority to Low, enabling two higher levels or VMs.For example, maybe inrastructure VMs such as domain controllers or DNS servers are the highest priority (set-ting those VMs to High), ollowed by critical services, such as database or e-mail servers (setting those VMs toMedium), and then the rest o the VMs will be at the deault (Low). Any VMs that dont need to be restartedcan be set to Disabled to save resources ater a host ailure.

    DRS IssuesThe DRS eature is a little more advanced than HA but, or all but the smallest environments, no less important,as perormance across many VMs in a dynamic environment is an ever-present concern and one that could eas-ily consume one or more administrators ull time without it. Leverage DRS or an environment that will run assmoothly as possible.

    6. Not Preparing or New Hardware

    One o the biggest changes many administrators dont plan or is new hardware (new CPUs with more ad-vanced capabilities or switching CPU vendors between Intel and AMD). The problem here is that you end up

    with islands o vMotion compatibility, where VMs, once started, can only be moved to some o the other serv-ers in the cluster. This severely limits what DRS can do to load balance the cluster.

    Solutions: This issue has several solutions:

    BuildseparateclustersforAMDandIntel(ifyouhavebothCPUarchitecturesbetteryet,stickwithasingle CPU vendor) to solve the CPU vendor issue.

    AlwaysenableEnhancedvMotionCompatibility(EVC)oneveryclustersothat,asnewnodesareadded, they will be dumbed down to the level o your existing hosts.

    Asoldhostsareremovedfromacluster,remembertoupgradetheEVCleveltoexposethecapabilitieso the new hosts added to the cluster. The setting must always be set to the lowest CPU type in thecluster.

    7. Im Smarter than DRS Mentality

    This is a common mentality or administrators who are new to using DRS they dont trust DRS or want toknow where VMs are all the time. I once had a student who said that his security department mandated docu-

  • 7/31/2019 WP VM Top10HAMisconfigIssues

    5/6

    Copyright 2011 Global Knowledge Training LLC. All rights reserved. 5

    mentation o which host each VM was located on very silly in a virtual environment which is designed to bevery dynamic. In other cases, administrators think they are smarter than DRS and can better balance the load.

    Solution: Let DRS run in Fully Automated mode. You are not smarter than DRS. There are just too many VMsto watch, and you cant always watch them, but DRS will check on the load balance o the cluster every fve

    minutes and will automatically load balance as conditions change.

    8. Setting the Migration Threshold too Aggressively

    One o the mistakes new administrators oten make with DRS is that they set the Migration Threshold tooaggressively. This value goes on a fve-point scale rom Conservative to Aggressive. Conservative onlyimplements Priority 1 (fve-star) recommendations, namely: the host is going into maintenance mode, reserva-tions on the host exceed the hosts capacity, or i afnity rules violated. Priorities 2 5 (our- to one-star recom-mendations, respectively) take perormance into account by using higher priority recommendations when thecluster is more out o balance and lower priority recommendations when the dierence between nodes is less.Many administrators think that they want to be as aggressive as possible to be as balanced as possible, butremember that there is a trade-o between being perectly balanced and the cost o achieving that balance; in

    other words, the cost o vMotion. Doing too much vMotion may actually cost more than the benefts o beingperectly balanced.

    Solution: Set the threshold to the mid-point Priority 3 unless the load is airly static. Analyze cluster peror-mance and recommendations and adjust as necessary.

    9. Non-optimal Sizing o Clusters.

    A cluster (HA or DRS) can have up to 32 nodes in it, but just because you can, doesnt mean you should.

    Very small clusters give DRS ew options or load-balancing and oten incur higher overhead by HA, reducing theavailable capacity to run VMs.

    On the other hand, very large clusters may be fne rom an HA perspective. I you are running vSphere 5, thereis one master node and the rest are slaves, but any slave can be promoted to be the primary i the primary ails,so large cluster sizes are okay. On the other hand, i you are running version 4 or below, you may wish to use asmaller cluster size as there are a maximum o fve primary nodes, with the other nodes being secondary nodes,but secondary nodes are usually not automatically promoted to a primary node i a primary ails. This is impor-tant because, i all primary nodes are down, HA will not automatically restart anything.

    DRS clusters are another matter, however. The problem is that the larger the cluster and the more VMs in the

    cluster, the more possible scenarios vCenter has to analyze, dramatically increasing the load on that server.

  • 7/31/2019 WP VM Top10HAMisconfigIssues

    6/6

    Copyright 2011 Global Knowledge Training LLC. All rights reserved. 6

    Solution: Many experts recommend putting between 16 and 24 hosts in a cluster as a good balance betweenthe reduced overhead or HA and the increased load on the vCenter or DRS. I you will be using linked clones,such as with View, the maximum cluster size is eight nodes.

    HA and DRSFinally, theres an issue that aects both HA and DRS. Optimizing VMs through the proper use o reservations,limits, and shares will be a more time-consuming and challenging task then many previously listed, but will paydividends day in and day out.

    10. Overuse o Reservations, Limits, and Afnities

    One o the powerul eatures in vSphere is the ability to guarantee a certain level o resources (via reservations)or to cap consumption (via limits) or VMs. While this can be done, it reduces the options that HA and DRS havein load-balancing and restarting VMs. Using afnities, while convenient and may be necessary or HA, peror-mance, or licensing reasons, add even more constraints to HA and DRS.

    Solution: Use shares whenever possible instead o using reservations and limits, and minimize the use o afn-ity (VM-to-VM, as well as VM-to-Host) rules to give HA and DRS the most possible options. I limits and reserva-tions are needed, implement them at the resource pool level whenever possible instead o at the individual VMlevel.

    Learn MoreLearn more about how you can improve productivity, enhance efciency, and sharpen your competitive edgethrough training.

    VMware vSphere: Fast Track [V5.0]

    VMware vSphere: Install, Confgure, Manage [V5.0]

    VMware vSphere: Whats New [V5.0]

    Visit www.globalknowledge.com or call 1-800-COURSES (1-800-268-7737) to speak with a GlobalKnowledge training advisor.

    Related VMware Certifcations

    VMware Certifed Proessional 5 (VCP5)

    About the AuthorJohn Hales, VCP, VCAP, VCI, is a VMware instructor at Global Knowledge, teaching all o the vSphere and Viewclasses that Global Knowledge oers. John is also the author o many books (including a book on vSphere5: Proessional vSphere 5: Implementation and Management), rom involved technical books rom Sybex, toexampreparationbooks,tomanyquickreferenceguidesfromBarCharts,inadditiontocustomcoursewareforindividual customers. John has various certifcations, including the VMware VCP (3, 4, and 5), VCAP, and VCI; theMicrosoftMCSE,MCDBA,MOUS,andMCT;theEMCEMCSA(StorageAdministratorforEMCClariionSANs);and

    http:///www.globalknowledge.comhttp://www.globalknowledge.com/training/certification_listing.asp?pageid=12&certid=686&country=United+Stateshttp://www.globalknowledge.com/training/certification_listing.asp?pageid=12&certid=686&country=United+Stateshttp:///www.globalknowledge.com