vsan trends from the trenches: what our largest global

34
#vmworld HCI2049BU vSAN Trends from the Trenches: What Our Largest Global Customers Are Doing David Boone, VMware, Inc. Dave Morera, VMware, Inc. #HCI2049BU VMworld 2019 Content: Not for publication or distribution

Upload: others

Post on 22-Jul-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: vSAN Trends from the Trenches: What Our Largest Global

#vmworld

HCI2049BU

vSAN Trends from the Trenches: What Our Largest Global Customers Are Doing

David Boone, VMware, Inc.Dave Morera, VMware, Inc.

#HCI2049BU

VMworld 2019 Content: Not for publication or distribution

Page 2: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc.

Disclaimer

This presentation may contain product features or functionality that are currently under development.

This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Technical feasibility and market demand will affect final delivery.

Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.

2

The information in this presentation is for informational purposes only and may not be incorporated into any contract. There is no commitment or obligation to deliver any items presented herein. VMworld 2019 Content: Not for publication or distribution

Page 3: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc.

Agenda

3

Trends from Global Customers

How are these trends changing the way they do business

How these trends change requirements

Tips, tricks, and lessons learned

VMworld 2019 Content: Not for publication or distribution

Page 4: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 4

OverviewvSAN Success in Global Enterprises

VMworld 2019 Content: Not for publication or distribution

Page 5: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 5

Variety of IndustriesOur team interacts with our largest customers

Financials / Insurance

Healthcare Automotive Communications Logistics

VMworld 2019 Content: Not for publication or distribution

Page 6: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 6

Key Areas of vSAN Design

APPLICATION WORKLOADS

SERVERS (CPU, MEMORY, NICS)

DISK SUBSYSTEMS NETWORKING BC/DR/AVAILABILITY REQUIREMENTS

ADVANCED FEATURES PROS AND

CONS

COST VS. EVERYTHING

VMworld 2019 Content: Not for publication or distribution

Page 7: vSAN Trends from the Trenches: What Our Largest Global

7©2019 VMware, Inc.

Trend 1: Tier 1 ApplicationsProper design and sizing

VMworld 2019 Content: Not for publication or distribution

Page 8: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 8

vSAN Clusters for Tier 1

• Not all workloads are equal

• Expect high performance

• High Availability

vSAN design is key

• Best practices

Not sizing properly

• SPBM

• Growth

Workload Profiles

• Read/write ratio

• Peak metrics

• Random vs. Sequential

Latency vs. IOPS

Overview Problems Observed

Design and SizingTier 1 Applications

VMworld 2019 Content: Not for publication or distribution

Page 9: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 9

Understand vSAN’s Distributed Object Model and how vSAN achieves Availability

Application Workloads – Prepare Yourself

Availability Branching

VMworld 2019 Content: Not for publication or distribution

Page 10: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 10

Build generic when possible

Application Workloads

• Read and write I/O ratios• Average read and write I/O sizes• Average and peak IOPS and throughput• Application-level replication?

Application characteristics

• Latency < “x” ms• Write throughput > “y” MB/sec • Data remains available with ”x” simultaneous failures in cluster

Define “success”

VMworld 2019 Content: Not for publication or distribution

Page 11: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 11

Workload Profile

Live Optics

VMworld 2019 Content: Not for publication or distribution

Page 12: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 12

Sizing and Performance

Servers

Clock speed more important than cores to vSAN

Plan 10% overhead for standard vSAN, 35% with Deduplication & Compression

Intel vs. AMD

CPU MemoryPlan 10% overhead for standard vSAN, 35% with Deduplication & Compression

VMworld 2019 Content: Not for publication or distribution

Page 13: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 13

Disk Subsystems

2 disk groups per host

Additional controller if > 2 DGs

Key vSAN performance metric

at OEM stats is “4KB Random Write IOPS”

800GB SSDs vSAN cache • Resyncs can use it – faster MTTR

• Exception: 375GB Intel Optane P4800X is so fast and parallel, its space is enough NVMe for caching

tierNVMe or SAS SSDs

for capacity tier

VMworld 2019 Content: Not for publication or distribution

Page 14: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 14

BC/DR/Availability

Snapshots are not backups

• Consider Secondary level of Failures To Tolerate (SFTT, SFTM)Stretched Cluster

• Maintenance mode counts as a Fault• Disks don’t immediately fail as Bad Blocks form,

only get corrected and repaired if the block is read

Consider FTT=2

RAID-6 is attractive but needs 20Gb+

vSAN network bandwidth

The worst data corruption bugs did

not impact customers not using DD&C

Name: General PurposeFTM: RAID-1FTT: FTT=1

Name: Dev/TestFTM: RAID-5FTT: FTT=1IOPS Limit: 1,000

Name: SQL ServersFTM: RAID-1FTT: FTT=2

Name: Default Storage PolicyFTM: RAID-1FTT: FTT=1

vSAN

VMworld 2019 Content: Not for publication or distribution

Page 15: vSAN Trends from the Trenches: What Our Largest Global

15©2019 VMware, Inc.

Trend 3: All-NVMe vSANSuperior Performance

VMworld 2019 Content: Not for publication or distribution

Page 16: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 16

vSAN Clusters with All-NVMe devices

• Future proof

• Great price compared to older SSDs

• Less bottlenecks

Application driven

• Tier 1 applications

• Demanding Workloads

Superior Performance

Unknown Success Criteria

Application Best Practices not applied

• Oracle RAC

• MSSQL Server

Network Stack Ignored

• Nic speed

• Switch Buffers

• Too many hops

• MTU

Overview Problems Observed

All-NVMe vSAN ClustersInvestment on latest tech

VMworld 2019 Content: Not for publication or distribution

Page 17: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 17

vSAN Performance on Hardware

All-flash: More predictable and responsive than hybrid

SATA protocol is 1-to-1. Locks bus. Avoid if possible

Storage controllers can be bottleneck.

Limited support of SAS expanders (due to performance)

NVMe. Fastest, simple (no external controller), and low CPU overhead

The cost/performance pyramid of storage device types

Capacity

3D XPoint cache / NVMe capacity

All NVMe

NVMe cache / SAS capacity

All SAS

NVMe cache / SATA capacity

SAS cache / SATA capacity

All SATA

VMworld 2019 Content: Not for publication or distribution

Page 18: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 18

I/O Flow

vSAN can go as fat as the hardware allows

• Potential Bottlenecks are based on hardware selected

• New tech = less obstacles

• A race car can only go so fast on a rocky road

Potential Bottlenecks

SATA/SAS – Queue Depth

Disk GroupDisk Group Disk Group

Single vSAN datastore across cluster

vSphere vSAN vSphere vSAN

Disk Controllers

SATA/SAS – Queue Depth

NetworkBuffer Size

VMworld 2019 Content: Not for publication or distribution

Page 19: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 19

NetworkingSwitching

16MB Port buffers

minimum

Deep buffer size (1GB+)

Port Speed

Active/Standby

vs.

LACP

network extenders QoS, NetQueue

16MB ➗ 48 Ports = 0.33MB

Switch Buffers

VMworld 2019 Content: Not for publication or distribution

Page 20: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 20

Often overlooked

NICs

NICs – virtualization offloading

30% IOPS gain -Mellanox CX4

Firmware and Driver KB 2030818 Native inbox drivers

VMworld 2019 Content: Not for publication or distribution

Page 21: vSAN Trends from the Trenches: What Our Largest Global

21©2019 VMware, Inc.

Trend 3: AutomationThe ”Easy Button”

VMworld 2019 Content: Not for publication or distribution

Page 22: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 22

Empower teams

• Faster Deployments

• Less time wasted

Faster Life Cycle Management (LCM)

• Upgrade solutions end-to-end with no downtime

Interoperability

Homebrewed Scripts - outdated

Knowledge base

• Too many components

Low visibility between solutions

Overview Problems Observed

Environment AutomationAccomplish more, faster

VMworld 2019 Content: Not for publication or distribution

Page 23: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 23

Deployment at Scale

Cluster-wide settings must be consistent

• Use Distributed vSwitchor NSX

• Active / Standby, Standby / Active, LACP

• vSwitch MTU, vmkernel MTU

• Advanced Settings

• Boot settings

• Power Management

Automation is Key

VMworld 2019 Content: Not for publication or distribution

Page 24: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 24

vSAN Automation Overview

vCenter vSAN Cluster

vSAN API vSAN API vSAN API

vSAN API

UI SDKCLI

vSAN API endpoint on ESXi

vSAN API endpoint on vCenter

UI - vSphere H5 / Embedded Host Client

CLI - PowerCLI, ESXCLI & RVC

SDK - Programming/Scripting languages

VMworld 2019 Content: Not for publication or distribution

Page 25: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 25

vSAN & PowerCLIvSAN also uses SPBM (Storage Policy Based Management) so make sure you know the following cmdlets

ReplicationStart-SpbmReplicationFailoverSync-SpbmReplicationGroupGet-SpbmReplicationGroupGet-SpbmReplicationPairStart-SpbmReplicationPrepareFailoverStart-SpbmReplicationPromoteStart-SpbmReplicationReverseStart-SpbmReplicationTestFailoverStop-SpbmReplicationTestFailover

RulesNew-SpbmRuleNew-SpbmRuleSet

Storage PolicyRemove-SpbmStoragePolicyNew-SpbmStoragePolicyImport-SpbmStoragePolicyGet-SpbmStoragePolicySet-SpbmStoragePolicyExport-SpbmStoragePolicy

OthersGet-SpbmCapabilityGet-SpbmCompatibleStorageGet-SpbmEntityConfigurationSet-SpbmEntityConfigurationGet-SpbmFaultDomainGet-SpbmPointInTimeReplica

https://github.com/jasemccarty

VMworld 2019 Content: Not for publication or distribution

Page 26: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 26

VMware Cloud FoundationAutomation for the entire stack

Network

Storage

Compute

Mgmt.

Cloud Foundation

Consistency & Security

StandardizedArchitecture

Full Stack Approach Built-in Security Apps/Services/Infrastructure Automation

Tested and Validated

Simplified Experience

VMware Cloud Foundation

Management Compute Storage Networking

Public Cloud EdgeData CenterVMworld 2019 Content: Not for publication or distribution

Page 27: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 27

Trend 4: Operationalizing vSAN Day 2 Operations

VMworld 2019 Content: Not for publication or distribution

Page 28: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 28

Other teams taking over Day 2 operations

• Scale up/out

• HW replacement

• Driver/FW/Bios Updates

vSAN is easy

• No training provided

Lack of knowledge from these teams

• Self inflicted outages

Best Practices not applied

May result in performance issues

Distributed Architecture

Overview Problems

Operations and Support teamsClick to edit optional subtitle

VMworld 2019 Content: Not for publication or distribution

Page 29: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 29

The vSAN Difference

Add capacity the way you want

Scale UP by adding drives

Scale OUT by adding hosts

Scale UP and OUT for maximum agility

vSAN Datastore

Scale Out

Sca

le U

p

vSphere vSANvSphere vSANvSphere vSAN

VMworld 2019 Content: Not for publication or distribution

Page 30: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 30

Disk Groups – Use as a Strategy for Growth

Easily design hosts to increase capacity, and performance without adding hosts or licenses

• Initial purchase consisting of only some drives populated

• In 12-18 months, populate remaining bays.

• Cycle out older devices in future purchasing cycle to increase density even further

Take advantage of technology improvements and market conditions

Cache

Capacity

Disk GroupDisk Group

All-Flash vSAN

Disk Group

Single vSAN datastore across cluster

vSphere vSAN

VMworld 2019 Content: Not for publication or distribution

Page 31: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 31

VMware Compatibility Guide (VCG) Rules for vSAN

BIOS

• Match ESXi version to BIOS version, can use equal or newer

NIC

• Match ESXi version to NIC firmware/device driver version, can use equal or newer

• Alternative: See KB 2030818

Storage Controller

• Match ESXi version to Storage Controller firmware version and device driver version. FW & DD versions must align in same row. Must be an exact match to what was tested. No newer versions till certified.

Disks

• If disk firmware is listed, Match ESXi version to disk firmware. Must be equal or newer

VMworld 2019 Content: Not for publication or distribution

Page 32: vSAN Trends from the Trenches: What Our Largest Global

©2019 VMware, Inc. 32

Fix Knowledge Gaps

vSAN Badges

Hands On Lab (HOL)

StorageHub

Videos

Train your staff

VMworld 2019 Content: Not for publication or distribution

Page 33: vSAN Trends from the Trenches: What Our Largest Global

VMworld 2019 Content: Not for publication or distribution

Page 34: vSAN Trends from the Trenches: What Our Largest Global

VMworld 2019 Content: Not for publication or distribution