sto1515bu extreme performance series: vsan or distribution · – configure vsan optimally ......

48
Amitabha Banerjee and Suraj Kasi STO1515BU #VMworld #STO1515BU Extreme Performance Series: vSAN Performance Troubleshooting VMworld 2017 Content: Not for publication or distribution

Upload: others

Post on 22-Jan-2020

22 views

Category:

Documents


1 download

TRANSCRIPT

Amitabha Banerjee and Suraj Kasi

STO1515BU

#VMworld #STO1515BU

Extreme Performance Series: vSAN Performance Troubleshooting

VMworld 2017 Content: Not fo

r publication or distri

bution

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

#STO1515BU CONFIDENTIAL 2

VMworld 2017 Content: Not fo

r publication or distri

bution

Agenda

3

1 vSAN Day 1 Performance

2 Debugging and Improving vSAN Performance

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

vSAN 6.6 Is Loaded with Performance

• vSAN 6.6 delivers 50%+ performance improvements

• Enhancements behind performance improvements

– Optimizations in caching checksums

– Optimizations in Write buffer de-staging

– Improvements in Dedup & Compression IO path

– Improvements in Object placement

• https://storagehub.vmware.com/#!/vmware-vsan/vsan-6-6-performance-improvements

4#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

vSAN 6.6 Performance Improvements vs vSAN 6.5

5

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

IOP

S Im

pro

ve

me

nt

%

Random Read Random Write Mixed 70Read/30Write

Erasure Coding +Dedup

Erasure Coding Default Features Checksum Disabled

Data collected using FIO Benchmark on a 4-node vSAN cluster, 2 dgs per node, 60% capacity usage

Improvements in:High VM consolidationTransactional workloadsBatch jobs on large files/ datasets

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

vSAN Day 1 Performance

VMworld 2017 Content: Not fo

r publication or distri

bution

So You Just Deployed vSAN…

How fast can it go? What is the best IOPs? How about throughput and latency?

#STO1515BU CONFIDENTIAL 7

VMworld 2017 Content: Not fo

r publication or distri

bution

But, Benchmarking is Non-trivial…

• Which benchmark ?

– HCIBench/ Hammer DB/ FIO/ TPCE etc.

• Am I running the benchmark the right way?

• Have I configured vSAN the right way?

• Am I interpreting the results correctly?

• Is this the best that vSAN can do?

#STO1515BU CONFIDENTIAL 8

VMworld 2017 Content: Not fo

r publication or distri

bution

Introducing vSAN Performance Diagnostics

• Evaluate Benchmarks for a specific Goal: IOPS, Throughput, and Latency

• Select a Time Range during which benchmark ran for analysis

• Detect Performance Issues using vSAN Performance Diagnostics

• Ask VMware documents possible recommendations

• Hopefully, recommendations help you:

– Fine tune benchmark parameters

– Configure vSAN optimally

– Alter SPBM policy parameters to achieve desired goal

9

vSAN 6.6.1, vSphere 6.5 U1

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

vSAN Performance Diagnostics… Salient Features

• Cloud Service

– Enable CEIP + vCenter Internet connectivity ( No sensitive information collected )

– VMware can ship updates independent of on-prem installation.

– Integrated with VMware internal performance debugging

• Fully integrated with vSAN Performance Service

– Enable vSAN Performance Service

– Explore more data at a mouse click

– 90 days of data. Explore 1 day at a time

• Integrated with HCIBench 1.6.2

– HCIBench Time Ranges automatically populated

– Consume results within HCIBench

#STO1515BU CONFIDENTIAL 10

VMworld 2017 Content: Not fo

r publication or distri

bution

vSAN Performance Diagnostics Demo

• Run HCIBench with Random Write Workload

• Analyze with vSAN Performance Diagnostics

• Tweak vSAN / Workload

• Rerun, Re-iterate

#STO1515BU CONFIDENTIAL 11

VMworld 2017 Content: Not fo

r publication or distri

bution

#STO1515BU CONFIDENTIAL 12

VMworld 2017 Content: Not fo

r publication or distri

bution

#STO1515BU CONFIDENTIAL 13

VMworld 2017 Content: Not fo

r publication or distri

bution

Improvement in HCIBench Result

14

0

0.2

0.4

0.6

0.8

1

1.2

0

500

1000

1500

2000

2500

3000

3500

1 VM, 1 disk, 2 threads, Stripes=1, FTT=1 1 VM, 1 disk, 2 threads, Stripes=7, FTT=1

IOPS

95 percentile Latency (ms)

IOP

S

Late

ncy

10%

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

#STO1515BU CONFIDENTIAL 15

VMworld 2017 Content: Not fo

r publication or distri

bution

Improvement in HCIBench Result

#STO1515BU CONFIDENTIAL 16

0

0.5

1

1.5

2

2.5

3

0

10000

20000

30000

40000

50000

60000

1 VM, 1 disk, 2 threads, Stripes=1,FTT=1

1 VM, 1 disk, 2 threads, Stripes=7,FTT=1

8 VM, 1 disk, 8 threads, Stripes=7,FTT=1

IOPS

95 Percentile Latency (ms)

10%

1600%

IOP

S

Late

ncy

VMworld 2017 Content: Not fo

r publication or distri

bution

Debugging and Improving vSAN Performance

VMworld 2017 Content: Not fo

r publication or distri

bution

VSAN Architecture Overview

18

• VSCSI: Virtual SCSI

• DOM: Distributed Object Manager

– Client: Provide access to VSCSI/other clients

– Owner: Arbitrate access to objects, enforce policy

– Component Manager: Handles vSAN components

• LSOM: Log Structured Object Manager

– Handles objects in cache and capacity

• PSA/Device Layer

• https://www.youtube.com/watch?v=2XldDuBeY1k

VSCSI

DOM / vSAN Client

DOM Owner

DOM Comp. Mgr DOM Comp. Mgr

LSOM LSOM

PSA PSA

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Debugging and Improving vSAN Performance

19

• Resync Throttling

• Hardware Configuration

• Unaligned IO

• Congestion

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Example: Resync Throttling

• Typical resync workflows

– One or more node or disk failures

– Node or disk evacuation

– VM Storage policy reconfiguration

– Cluster rebalancing & disk format upgrades

• Designed for shortest resync time

• Resync traffic can impact VM IO

• Use the throttling mechanism to control the resync speed

– For, vSAN6.6 or vSphere65ep2 – End user control over resync activity through UI

– For, Pre vSAN6.6 – vsi config option called MaxinflightIO

• If resync is slow, reduce the VM IO traffic

20#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

#STO1515BU CONFIDENTIAL 21

VMworld 2017 Content: Not fo

r publication or distri

bution

Debugging and Improving vSAN Performance

22

• Resync Throttling

• Hardware Configuration

• Unaligned IO

• Congestion

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

vSAN Observer

• Capture & visualize vSAN performance statistics.

• Collected in Ruby vSphere Console (RVC)

$rvc username@ipAddress

Or

$(windows)%PROGRAMFILES%\VMware\Infrastructure\VirtualCenter Server\support\rvc\rvc.bat

$vsan.observer -r -k ./VMWorldTestBed/

usage: observer [opts] cluster

-r, --run-webserver Run a webserver to view live stats

-k, --keep-observation-in-memory or –f output_filename

--max-diskspace-gb=<i>, --interval=<i>, --generate-html-bundle=<s>

• Access the live data via web URL:

– https://vCenterServer_hostname_or_IP_Address:8010

23#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Hardware Configuration

• Sizing

– Undersized vSAN might not get the performance you expect

– Choose the right CPU, Memory, Cache, Capacity and Disk Group sizes

– https://vsansizer.vmware.com

• Driver Firmware Compatibility

– Check vSAN Health

– Check VMware Compatibility Guide for vSAN(VCG)

– Incompliance can result in huge performance implications

24#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Example: NVMe Driver

• Certain Intel NVMe drives don’t perform the best with inbox NVMe driver

• Large sequential write workload will cause performance issue

• For releases vSphere6.5 and before, Intel NVMe driver (not inbox driver) is recommended

25#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Wrong NVMe Driver

#STO1515BU CONFIDENTIAL 26

vSAN Client Latency – 60msNVMe Disk

• Read latency – 10ms

• Write latency – 4ms

DOM Owner Latency – 60msvSAN Disks Latency – 20msvSAN Disk IOPS – 5,000

VMworld 2017 Content: Not fo

r publication or distri

bution

Correct NVMe Driver

#STO1515BU CONFIDENTIAL 27

NVMe Disk

• Read latency – 1.2ms

• Write latency – 0.2ms

vSAN Disks latency – 3msvSAN Disks IOPS – 20,000

VMworld 2017 Content: Not fo

r publication or distri

bution

Debugging and Improving vSAN Performance

28

• Resync Throttling

• Hardware Configuration

• Unaligned IO

• Congestion

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Know Your Workload – IOInsight

• What does your IO workload look like?

• Understand your application better with IOInsight

– IOInsight monitors IO pattern and provides high-level insights on your application

• Metrics

– IO size distribution

– Read/Write ratio

– Sequentiality

– 4k-alignment

– Data fill rate

– Cache friendliness

• Download fling: https://labs.vmware.com/flings/ioinsight

29#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

IOInsight

• Can be used as pre-deployment tool to correctly configure vSAN

• Read/Write Ratio:

– Write-intensive applications can benefit from using >1 disk group

– Read / Write intensive workload benefit from higher stripe width

• 4k-Alignment

– Expect reduced performance if IOs are misaligned

• Data fill rate

– Roughly estimate rate of space growth for capacity planning

• Cache friendliness

– Estimate space requirement for caching tier for hybrid vSAN setups

30#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Example: Unaligned IO

31

• 4k block size was introduced in vSAN disk format version 3.0

• Unaligned IO causes IO Inflation

4k Block 4k BlockvSAN

4k WriteVMDK

• vSAN IO Inflation.

– 2 Reads

– Modify

– 2 Writes0 4096 8192

Physical Disk

• VMDK

– 1 Write

2048 6144

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Unaligned IO

#STO1515BU CONFIDENTIAL 32

vSAN Disks Latency – 0.5msDOM Owner Latency – 20msCPU Utilization – Low

Network Errors - Low

VMworld 2017 Content: Not fo

r publication or distri

bution

Unaligned IO – IO Insight

33#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Unaligned IO – IO Insight

34

Write Intensive workload

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Unaligned IO – IO Insight

35

4KB Write Workload

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Unaligned IO – IO Insight

36

4KB Sequential Write Workload

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Unaligned IO – IO Insight

37

4KB Unaligned Sequential Write

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Debugging and Improving vSAN Performance

38

• Resync Throttling

• Hardware Configuration

• Unaligned IO

• Congestion

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Congestion

• Flow control mechanism used by vSAN

• Bottleneck in lower layer is relieved through controlling the rate of incoming IO at the vSANingress

• Types of congestion

– SSD congestion – Cache tier write buffer space runs out

– Comp congestion – IO activity on a particular vSAN component is exceeding the threshold

– Mem congestion – Memory heap usage by vSAN internal components exceeds the threshold

– IOPS congestion – IOPS on vSAN components exceeds reservations

– Log congestion – Occurs when vSAN internal log space usage in cache tier disk runs out

• Encourage you attend vSAN Beyond the Basics [STO1479BU] talk

39#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Example: Log Congestion

• Occurs when vSAN internal log space usage in cache tier disk runs out

• Observed with write-intensive workloads and deduplication & compression are turned on

• Log compaction

– Efficient utilization of the available log space

– Introduced in vSphere 6.0 Update 3 and vSAN 6.6 (vSphere 6.5 EP2)

• Software update will unleash better performance

40#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Log Congestion

41

Log congestion – 190Log congestion – 52

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Conclusion

• vSAN 6.6 delivers 50% bettter performance

• vSAN Performance Diagnostics available in vSAN 6.6.1

– Integrated with HCIBench 1.6.2

• vSAN Performance Debugging Tools

– vSAN Performance Service Graphs

– vSAN Observer

– vSAN Health Checks

– IO Insight

42#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Extreme Performance Series – Las Vegas

• SER2724BU Performance Best Practices

• SER2723BU Benchmarking 101

• SER2343BU vSphere Compute & Memory Schedulers

• SER1504BU vCenter Performance Deep Dive

• SER2734BU Byte Addressable Non-Volatile Memory in vSphere

• SER2849BU Predictive DRS – Performance & Best Practices

• SER1494BU Encrypted vMotion Architecture, Performance, & Futures

• STO1515BU vSAN Performance Troubleshooting

• VIRT1445BU Fast Virtualized Hadoop and Spark on All-Flash Disks

• VIRT1397BU Optimize & Increase Performance Using VMware NSX

• VIRT2550BU Reducing Latency in Enterprise Applications with VMware NSX

• VIRT1052BU Monster VM Database Performance

• VIRT1983BU Cycle Stealing from the VDI Estate for Financial Modeling

• VIRT1997BU Machine Learning and Deep Learning on VMware vSphere

• FUT2020BU Wringing Max Perf from vSphere for Extremely Demanding Workloads

• FUT2761BU Sharing High Performance Interconnects across Multiple VMs

#STO1515BU CONFIDENTIAL 43

VMworld 2017 Content: Not fo

r publication or distri

bution

Extreme Performance Series – Barcelona

• SER2724BE Performance Best Practices

• SER2343BE vSphere Compute & Memory Schedulers

• SER1504BE vCenter Performance Deep Dive

• SER2849BE Predictive DRS – Performance & Best Practices

• VIRT1445BE Fast Virtualized Hadoop and Spark on All-Flash Disks

• VIRT1397BE Optimize & Increase Performance Using VMware NSX

• VIRT1052BE Monster VM Database Performance

• FUT2020BE Wringing Max Perf from vSphere for Extremely Demanding Workloads

#STO1515BU CONFIDENTIAL 44

VMworld 2017 Content: Not fo

r publication or distri

bution

Extreme Performance Series – Hand on Labs

Don’t miss these popular Extreme Performance labs:

• HOL-1804-01-SDC: vSphere 6.5 Performance Diagnostics & Benchmarking

– Each module dives deep into vSphere performance best practices, diagnostics, and optimizations using various interfaces and benchmarking tools

• HOL-1804-02-CHG: vSphere Challenge Lab

– Each module places you in a different fictional scenario to fix common vSphere operational and performance problems

45#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Performance Survey

46

The VMware Performance Engineeringteam is always looking for feedback about your experience with theperformance of our products, ourvarious tools, interfaces and wherewe can improve.

Scan this QR code to access ashort survey and provide us directfeedback.

Alternatively: www.vmware.com/go/perf

Thank you!

#STO1515BU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

VMworld 2017 Content: Not fo

r publication or distri

bution

VMworld 2017 Content: Not fo

r publication or distri

bution