sto7534 vsan day 2 operations (vmworld 2016)

34
Virtual SAN - Day 2 Operations Cormac Hogan, VMware, Inc Paudie ORiordan, VMware, Inc STO7534 #STO7534

Upload: cormac-hogan

Post on 07-Jan-2017

2.077 views

Category:

Technology


1 download

TRANSCRIPT

Virtual SAN - Day 2 OperationsCormac Hogan, VMware, IncPaudie ORiordan, VMware, IncSTO7534#STO7534

1

CONFIDENTIAL2

This presentation may contain product features that are currently under development.This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.Technical feasibility and market demand will affect final delivery.Pricing and packaging for any new technologies or features discussed or presented have not been determined.Disclaimer

This SessionVirtual SAN has been available since March 2014, almost 2.5 yearsTo date, we have over 5,000 VSAN customers.VMware recognises that dealing with Virtual SAN Operations on a day to day basis requires more than 2 clicksSince the launch of Virtual SAN, additional tools for managing, monitoring and troubleshooting Virtual SAN have become available.In this session, approaches to common problems that actual Virtual SAN administrators face will be discussed.We will discuss how various tools and approaches to various problems can help you manage your data now the VMware consultant left the building.

3CONFIDENTIAL

3

Agenda41Introduction to Session2Monitor Getting The Basics Right3Alerting What Are My Options?4Virtual SAN Upgrade5 Bring it all together Handling a Failure (Demo)

CONFIDENTIAL

Monitoring Get the Basics RightvSphere LoggingVirtual SAN Trace FilesESXi Core Files

Persistent Logging Challenges with ESXi Boot DevicesvSphere Hosts can be deployed on multiple different types of media with draw backs and advantagesSCSI, SSD, USB, SATADOMIf you are already in production consider how logging gets laid outSCSI /SAS/ SATA / SSD / VMFS automatically added Scratch located on VMFSSATADOM VMFS automatically added Scratch located on VMFSUSB / SD (any capacity) No VMFS No persistent Scratch area512 MB RAMDISK instead

VMFS

/scratch (RAMDISK)

/bootbank system

vmkDiagnostic

/altbootbank/storeVMware strongly recommends setting up syslog in all casesCONFIDENTIAL6

SD/USBsize of 4GB for a boot device, 2.2GB of the USB is set aside for the core dump. Before vSphere 5.5, the VMkcore partition was only 100MB in size6

Virtual SAN Trace filesProvides extremely low-level logging for VSANVSAN traces require ~500MB of disk space.Majority of traces in binary format Persisted to VMFS or NFS if availableVSAN Datastore does not support log redirection at this timeStored on RAMDISK if no persistent storage availableIn case of reboot, Most recent/important VSAN traces persisted to store partitionIn case of crash, VSAN traces persisted to diagnostic partitionSince Virtual SAN 6.2 urgent trace files can be redirected to syslog target

/bootbank system

vmkDiagnostic

/altbootbank/store

VMFS

/scratch (RAMDISK)

/store

vmkDiagnostic

CONFIDENTIAL7

Since these traces are of extreme importance to VMware support, extra efforts are made to preserve them when /scratchis not on persistent storage. In these cases, when the ESXi host is booted from SD/USB, and the VSAN traces are on a RAMdisk, they also get copied to/lockerfor persistence via/etc/init.d/vsantraced when the host reboots. Since /locker is relatively small, typically all the VSAN trace files will not fit. To accommodate this, they are saved in value order so that the most recent/significant information is captured first.

When VSAN trace files are being written to a RAMdisk, they should also be persisted on a PSOD. This can be verified by the command esxcli system visorfs ramdisk list.

A common question is why do we not just persist the VSAN traces to the SD/USB rather that doing this step? Again, it is due to the bandwidth of the VSAN trace files. The concern here is that the number of writes generated by VSAN traces, and there are a lot of them, can burn out a USB/SD card.

DOM and CMMDS use vmkernel.log only for very important messages, but usually dont publish to vmkernel logs

VSAN traces. Two types: Urgent and normal traces. Urgent traces are supposed to be 1/10 as chatty as normal traces. vsanUrgent.log is that "urgent trace channel".Introduced it in 6.2 to give LogInsight and other aggregators access to more events from DOM/CMMDS

7

ESXi Core Dump Partition Special Partition incase of diagnostic crash2.2GB space set aside for memory dumpEnsures full memory dump gets written to persistent mediaESXI hosts with less than 512GB Physical MemoryUse SAS/SATA , SATADOM, vSphere ESXi Network Dump Collector if no suitable persistent media available

vmkDiagnostic

/scratch (RAMDISK)

/bootbank system

/altbootbank/storeCONFIDENTIAL8

SD/USBsize of 4GB for a boot device, 2.2GB of the USB is set aside for the core dump. Before vSphere 5.5, the VMkcore partition was only 100MB in sizeSize irrelevant to SSD8

9

Alerting What Are My Options?vSphere Built-InvRealize OperationsvRealize Log Insight

vSphere Built-invSphere Native Alerting70+ Virtual SAN Health AlarmsMany more vSphere alarmsAlert via SNMP / SMTP

Create custom alarmsUse VMware ESXi VOBs orObservation IDs for VSAN

Virtual SAN Management API 6.2 interface for bespoke solution CONFIDENTIAL11

VMware ESXi Observation IDs for Virtual SANEach VOB event is associated with an identifier (ID). Before you create a Virtual SAN alarm in the vCenter Server, you must identify an appropriate VOB ID for the Virtual SAN event for which you want to create an alert. You can create alerts in the VMware ESXi Observation Log file (vobd.log).

To review the list of VOB IDs for Virtual SAN, open thevobd.logfile located on your ESXi host in the/var/logdirectory. The log file contains the following VOB IDs that you can use for creating Virtual SAN alarms.11

vRealize Operations + Log Insight

Virtual SAN awareness with Storage Management Pack Virtual SAN Dashboards and Heat MapsHost and Device StatisticsHealth Alerts

LogInsight also have Virtual SAN awarenessVirtual SAN content packLog aggregation from Virtual SAN nodesIntegration with VROPS alerting

CONFIDENTIAL12

12

13

Virtual SAN Upgrade PrerequisitesWorkflowMonitoringGotchas

14

Upgrade OverviewVirtual SAN 6.2 has a new on disk format for disk groups and exposes new Data Services

Upgrades are performed in multiple phasesPhase 1: Upgrade to vSphere 6.0 U2 Phase 2: Object and Disk format conversion (DFC)

Virtual SAN 6.2

vSphere 6.2

vsan.v2_ondisk_upgrade

Cluster: Manual Mode

Phase 1Phase 2rvc >But before you beginPhase 0: Validate your current enviromentCONFIDENTIAL15

Phase 1: Fresh deployment or upgrade to vSphere 6.2vCenter ServerESXi HypervisorApply critical patches*

Phase 2: Disk format conversion (DFC)PrechecksObject ConversionReformat disk grou15

Phase 0 Please Read Before You StartVirtual SAN 6.2 Release NotesVMware Product Interoperability VMware Virtual SAN Hardware Server, Controller, SSD, Disk on HCLController Firmware, Disk Firmware, Controller Driver, Enclosure Firmware

CONFIDENTIAL16

Disk Format Conversion (DFC) conversion phase is where VMFS-L disk format will be replaced by VirstoFS on all participating magnetic devices.

What happens during the disk reformat phase?All the nodes should have been completed its software --> ESXi 6.2 VSAN2.0 cluster)Operates on one node and one diskgroup at a time must be orchestrated at cluster level as objects get a 1 MB address space and get alligned to 4KNode --> DiskGroup --> Data Evacuation --> reformat disks --> DiskGroup comes OnlineThe above flow repeats for remaining Diskgroups in the node and then the process jumps to the next node.No vsan node with ESXi55x software is allowed to join the VSAN2.0 cluster16

Phase 1 - Upgrading from Virtual SAN 5.5CONFIDENTIAL17You can upgrade from VSAN 5.5 to VSAN 6.X Howeverpatching is critical During upgrade some older releases of vSphere 5.5 may cause VMware Virtual SAN Data Unavailabilityand Instability.Make sure all critical patches are installed prior to upgrade

Not an issue between VSAN 6.0 and VSAN 6.X

More details please read VMware KB 2113024 and VMware KB 2139969

5.5 EP06 or 5.5 P04 to vSphere 6.0 GA can cause VMware Virtual SAN Data Unavailability(2113024)Resolved with patch VMware ESXi 5.5, Patch Release ESXi550-201504001 (2112672)andVMware ESXi 5.5, Patch ESXi550-201504201-BG: Updates esx-base (2112675).

Upgrading from ESXi 5.5 to ESXi 6.x in a Virtual SAN cluster can cause permanent loss of data(kb.vmware.com/kb/2139969)The cluster is mixed between ESXi host versions 5.5 and 6.0 such as during the upgrade of a cluster.A VSAN object is reconfigured while the cluster is in a mixed state.Resolved with VMware ESXi 5.5, Patch Release ESXi550-201601501 (2141164).

17

Phase 1 VSAN Disk Format Conversion TableCONFIDENTIAL18Virtual SAN Starting VersionVirtual SAN Target VersionPost-upgrade on-disk format upgrade required?VersionVirtual SAN 5.5 U1Virtual SAN 5.5 Update X No-Virtual SAN 5.5 Update XVirtual SAN 6.X Yes 1.0 to 2.0 / 3.0Virtual SAN 6.0 Virtual SAN 6.1 No-Virtual SAN 6.0 or 6.1Virtual SAN 6.2Yes 2.0 to 3.0

Starting verson can I go to 6.2????

18

Phase 1 vSphere Software UpgradeStep 1 Upgrade vCenter Server to 6.0 U2Step 2 Upgrade ESXi hosts to 6.0 U2

Maintenance Mode?Ensure accessibilityFast, but with risk Full data migrationSlower, but no risk

CONFIDENTIAL19

Disk Format Conversion (DFC) conversion phase is where VMFS-L disk format will be replaced by VirstoFS on all participating magnetic devices.

What happens during the disk reformat phase?All the nodes should have been completed its software --> ESXi 6.2 VSAN2.0 cluster)Operates on one node and one diskgroup at a time must be orchestrated at cluster level as objects get alligned to 4KNode --> DiskGroup --> Data Evacuation --> reformat MDs with VirstoFs --> DiskGroup comes OnlineThe above flow repeats for renaming Diskgroups in the node and then the process jumps to the next node.No vsan node with ESXi55x software is allowed to join the VSAN2.0 cluster after starting DFC.19

Phase 1 vSphere Software Health Check GOTCHAvCenter 6.0 Update 2 installedHealth check will not work when ESXi version is < 6.0 U2

CONFIDENTIAL20

Disk Format Conversion (DFC) conversion phase is where VMFS-L disk format will be replaced by VirstoFS on all participating magnetic devices.

What happens during the disk reformat phase?All the nodes should have been completed its software --> ESXi 6.2 VSAN2.0 cluster)Operates on one node and one diskgroup at a time must be orchestrated at cluster level as objects get alligned to 4KNode --> DiskGroup --> Data Evacuation --> reformat MDs with VirstoFs --> DiskGroup comes OnlineThe above flow repeats for renaming Diskgroups in the node and then the process jumps to the next node.No vsan node with ESXi55x software is allowed to join the VSAN2.0 cluster after starting DFC.20

Phase 1 vSphere Software Health Check Software Upgraded?Check your Virtual SAN HealthUpdate your HCL Database filesMake sure its all Green

Address any failed tests BEFORE proceeding to the On Disk Format Upgrade!

CONFIDENTIAL21

Phase 2 Disk Upgrade PrechecksAll hosts in cluster are connected to vCenter ServerAll host upgraded to ESXi 6.2No network partitions in the VSAN cluster.No hosts with auto-claim storage.No hosts in Maintenance Mode

CONFIDENTIAL22

Once all the pre-checks are done CMMDS will not allow 5.5x hosts to join the cluster22

Phase 2 Are You Sure?

CONFIDENTIAL23

Phase 2 Virtual SAN Object and Disk Format ConversionTwo Conversion steps

Objects On Disk Format

Version