sto1515bu extreme performance series: vsan or distribution · – configure vsan optimally ......
TRANSCRIPT
Amitabha Banerjee and Suraj Kasi
STO1515BU
#VMworld #STO1515BU
Extreme Performance Series: vSAN Performance Troubleshooting
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
#STO1515BU CONFIDENTIAL 2
VMworld 2017 Content: Not fo
r publication or distri
bution
Agenda
3
1 vSAN Day 1 Performance
2 Debugging and Improving vSAN Performance
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
vSAN 6.6 Is Loaded with Performance
• vSAN 6.6 delivers 50%+ performance improvements
• Enhancements behind performance improvements
– Optimizations in caching checksums
– Optimizations in Write buffer de-staging
– Improvements in Dedup & Compression IO path
– Improvements in Object placement
• https://storagehub.vmware.com/#!/vmware-vsan/vsan-6-6-performance-improvements
4#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
vSAN 6.6 Performance Improvements vs vSAN 6.5
5
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
IOP
S Im
pro
ve
me
nt
%
Random Read Random Write Mixed 70Read/30Write
Erasure Coding +Dedup
Erasure Coding Default Features Checksum Disabled
Data collected using FIO Benchmark on a 4-node vSAN cluster, 2 dgs per node, 60% capacity usage
Improvements in:High VM consolidationTransactional workloadsBatch jobs on large files/ datasets
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
So You Just Deployed vSAN…
How fast can it go? What is the best IOPs? How about throughput and latency?
#STO1515BU CONFIDENTIAL 7
VMworld 2017 Content: Not fo
r publication or distri
bution
But, Benchmarking is Non-trivial…
• Which benchmark ?
– HCIBench/ Hammer DB/ FIO/ TPCE etc.
• Am I running the benchmark the right way?
• Have I configured vSAN the right way?
• Am I interpreting the results correctly?
• Is this the best that vSAN can do?
#STO1515BU CONFIDENTIAL 8
VMworld 2017 Content: Not fo
r publication or distri
bution
Introducing vSAN Performance Diagnostics
• Evaluate Benchmarks for a specific Goal: IOPS, Throughput, and Latency
• Select a Time Range during which benchmark ran for analysis
• Detect Performance Issues using vSAN Performance Diagnostics
• Ask VMware documents possible recommendations
• Hopefully, recommendations help you:
– Fine tune benchmark parameters
– Configure vSAN optimally
– Alter SPBM policy parameters to achieve desired goal
9
vSAN 6.6.1, vSphere 6.5 U1
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
vSAN Performance Diagnostics… Salient Features
• Cloud Service
– Enable CEIP + vCenter Internet connectivity ( No sensitive information collected )
– VMware can ship updates independent of on-prem installation.
– Integrated with VMware internal performance debugging
• Fully integrated with vSAN Performance Service
– Enable vSAN Performance Service
– Explore more data at a mouse click
– 90 days of data. Explore 1 day at a time
• Integrated with HCIBench 1.6.2
– HCIBench Time Ranges automatically populated
– Consume results within HCIBench
#STO1515BU CONFIDENTIAL 10
VMworld 2017 Content: Not fo
r publication or distri
bution
vSAN Performance Diagnostics Demo
• Run HCIBench with Random Write Workload
• Analyze with vSAN Performance Diagnostics
• Tweak vSAN / Workload
• Rerun, Re-iterate
#STO1515BU CONFIDENTIAL 11
VMworld 2017 Content: Not fo
r publication or distri
bution
Improvement in HCIBench Result
14
0
0.2
0.4
0.6
0.8
1
1.2
0
500
1000
1500
2000
2500
3000
3500
1 VM, 1 disk, 2 threads, Stripes=1, FTT=1 1 VM, 1 disk, 2 threads, Stripes=7, FTT=1
IOPS
95 percentile Latency (ms)
IOP
S
Late
ncy
10%
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Improvement in HCIBench Result
#STO1515BU CONFIDENTIAL 16
0
0.5
1
1.5
2
2.5
3
0
10000
20000
30000
40000
50000
60000
1 VM, 1 disk, 2 threads, Stripes=1,FTT=1
1 VM, 1 disk, 2 threads, Stripes=7,FTT=1
8 VM, 1 disk, 8 threads, Stripes=7,FTT=1
IOPS
95 Percentile Latency (ms)
10%
1600%
IOP
S
Late
ncy
VMworld 2017 Content: Not fo
r publication or distri
bution
Debugging and Improving vSAN Performance
VMworld 2017 Content: Not fo
r publication or distri
bution
VSAN Architecture Overview
18
• VSCSI: Virtual SCSI
• DOM: Distributed Object Manager
– Client: Provide access to VSCSI/other clients
– Owner: Arbitrate access to objects, enforce policy
– Component Manager: Handles vSAN components
• LSOM: Log Structured Object Manager
– Handles objects in cache and capacity
• PSA/Device Layer
• https://www.youtube.com/watch?v=2XldDuBeY1k
VSCSI
DOM / vSAN Client
DOM Owner
DOM Comp. Mgr DOM Comp. Mgr
LSOM LSOM
PSA PSA
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Debugging and Improving vSAN Performance
19
• Resync Throttling
• Hardware Configuration
• Unaligned IO
• Congestion
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Example: Resync Throttling
• Typical resync workflows
– One or more node or disk failures
– Node or disk evacuation
– VM Storage policy reconfiguration
– Cluster rebalancing & disk format upgrades
• Designed for shortest resync time
• Resync traffic can impact VM IO
• Use the throttling mechanism to control the resync speed
– For, vSAN6.6 or vSphere65ep2 – End user control over resync activity through UI
– For, Pre vSAN6.6 – vsi config option called MaxinflightIO
• If resync is slow, reduce the VM IO traffic
20#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Debugging and Improving vSAN Performance
22
• Resync Throttling
• Hardware Configuration
• Unaligned IO
• Congestion
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
vSAN Observer
• Capture & visualize vSAN performance statistics.
• Collected in Ruby vSphere Console (RVC)
$rvc username@ipAddress
Or
$(windows)%PROGRAMFILES%\VMware\Infrastructure\VirtualCenter Server\support\rvc\rvc.bat
$vsan.observer -r -k ./VMWorldTestBed/
usage: observer [opts] cluster
-r, --run-webserver Run a webserver to view live stats
-k, --keep-observation-in-memory or –f output_filename
--max-diskspace-gb=<i>, --interval=<i>, --generate-html-bundle=<s>
• Access the live data via web URL:
– https://vCenterServer_hostname_or_IP_Address:8010
23#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Hardware Configuration
• Sizing
– Undersized vSAN might not get the performance you expect
– Choose the right CPU, Memory, Cache, Capacity and Disk Group sizes
– https://vsansizer.vmware.com
• Driver Firmware Compatibility
– Check vSAN Health
– Check VMware Compatibility Guide for vSAN(VCG)
– Incompliance can result in huge performance implications
24#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Example: NVMe Driver
• Certain Intel NVMe drives don’t perform the best with inbox NVMe driver
• Large sequential write workload will cause performance issue
• For releases vSphere6.5 and before, Intel NVMe driver (not inbox driver) is recommended
25#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Wrong NVMe Driver
#STO1515BU CONFIDENTIAL 26
vSAN Client Latency – 60msNVMe Disk
• Read latency – 10ms
• Write latency – 4ms
DOM Owner Latency – 60msvSAN Disks Latency – 20msvSAN Disk IOPS – 5,000
VMworld 2017 Content: Not fo
r publication or distri
bution
Correct NVMe Driver
#STO1515BU CONFIDENTIAL 27
NVMe Disk
• Read latency – 1.2ms
• Write latency – 0.2ms
vSAN Disks latency – 3msvSAN Disks IOPS – 20,000
VMworld 2017 Content: Not fo
r publication or distri
bution
Debugging and Improving vSAN Performance
28
• Resync Throttling
• Hardware Configuration
• Unaligned IO
• Congestion
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Know Your Workload – IOInsight
• What does your IO workload look like?
• Understand your application better with IOInsight
– IOInsight monitors IO pattern and provides high-level insights on your application
• Metrics
– IO size distribution
– Read/Write ratio
– Sequentiality
– 4k-alignment
– Data fill rate
– Cache friendliness
• Download fling: https://labs.vmware.com/flings/ioinsight
29#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
IOInsight
• Can be used as pre-deployment tool to correctly configure vSAN
• Read/Write Ratio:
– Write-intensive applications can benefit from using >1 disk group
– Read / Write intensive workload benefit from higher stripe width
• 4k-Alignment
– Expect reduced performance if IOs are misaligned
• Data fill rate
– Roughly estimate rate of space growth for capacity planning
• Cache friendliness
– Estimate space requirement for caching tier for hybrid vSAN setups
30#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Example: Unaligned IO
31
• 4k block size was introduced in vSAN disk format version 3.0
• Unaligned IO causes IO Inflation
4k Block 4k BlockvSAN
4k WriteVMDK
• vSAN IO Inflation.
– 2 Reads
– Modify
– 2 Writes0 4096 8192
Physical Disk
• VMDK
– 1 Write
2048 6144
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Unaligned IO
#STO1515BU CONFIDENTIAL 32
vSAN Disks Latency – 0.5msDOM Owner Latency – 20msCPU Utilization – Low
Network Errors - Low
VMworld 2017 Content: Not fo
r publication or distri
bution
Unaligned IO – IO Insight
33#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Unaligned IO – IO Insight
34
Write Intensive workload
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Unaligned IO – IO Insight
35
4KB Write Workload
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Unaligned IO – IO Insight
36
4KB Sequential Write Workload
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Unaligned IO – IO Insight
37
4KB Unaligned Sequential Write
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Debugging and Improving vSAN Performance
38
• Resync Throttling
• Hardware Configuration
• Unaligned IO
• Congestion
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Congestion
• Flow control mechanism used by vSAN
• Bottleneck in lower layer is relieved through controlling the rate of incoming IO at the vSANingress
• Types of congestion
– SSD congestion – Cache tier write buffer space runs out
– Comp congestion – IO activity on a particular vSAN component is exceeding the threshold
– Mem congestion – Memory heap usage by vSAN internal components exceeds the threshold
– IOPS congestion – IOPS on vSAN components exceeds reservations
– Log congestion – Occurs when vSAN internal log space usage in cache tier disk runs out
• Encourage you attend vSAN Beyond the Basics [STO1479BU] talk
39#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Example: Log Congestion
• Occurs when vSAN internal log space usage in cache tier disk runs out
• Observed with write-intensive workloads and deduplication & compression are turned on
• Log compaction
– Efficient utilization of the available log space
– Introduced in vSphere 6.0 Update 3 and vSAN 6.6 (vSphere 6.5 EP2)
• Software update will unleash better performance
40#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Log Congestion
41
Log congestion – 190Log congestion – 52
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Conclusion
• vSAN 6.6 delivers 50% bettter performance
• vSAN Performance Diagnostics available in vSAN 6.6.1
– Integrated with HCIBench 1.6.2
• vSAN Performance Debugging Tools
– vSAN Performance Service Graphs
– vSAN Observer
– vSAN Health Checks
– IO Insight
42#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Extreme Performance Series – Las Vegas
• SER2724BU Performance Best Practices
• SER2723BU Benchmarking 101
• SER2343BU vSphere Compute & Memory Schedulers
• SER1504BU vCenter Performance Deep Dive
• SER2734BU Byte Addressable Non-Volatile Memory in vSphere
• SER2849BU Predictive DRS – Performance & Best Practices
• SER1494BU Encrypted vMotion Architecture, Performance, & Futures
• STO1515BU vSAN Performance Troubleshooting
• VIRT1445BU Fast Virtualized Hadoop and Spark on All-Flash Disks
• VIRT1397BU Optimize & Increase Performance Using VMware NSX
• VIRT2550BU Reducing Latency in Enterprise Applications with VMware NSX
• VIRT1052BU Monster VM Database Performance
• VIRT1983BU Cycle Stealing from the VDI Estate for Financial Modeling
• VIRT1997BU Machine Learning and Deep Learning on VMware vSphere
• FUT2020BU Wringing Max Perf from vSphere for Extremely Demanding Workloads
• FUT2761BU Sharing High Performance Interconnects across Multiple VMs
#STO1515BU CONFIDENTIAL 43
VMworld 2017 Content: Not fo
r publication or distri
bution
Extreme Performance Series – Barcelona
• SER2724BE Performance Best Practices
• SER2343BE vSphere Compute & Memory Schedulers
• SER1504BE vCenter Performance Deep Dive
• SER2849BE Predictive DRS – Performance & Best Practices
• VIRT1445BE Fast Virtualized Hadoop and Spark on All-Flash Disks
• VIRT1397BE Optimize & Increase Performance Using VMware NSX
• VIRT1052BE Monster VM Database Performance
• FUT2020BE Wringing Max Perf from vSphere for Extremely Demanding Workloads
#STO1515BU CONFIDENTIAL 44
VMworld 2017 Content: Not fo
r publication or distri
bution
Extreme Performance Series – Hand on Labs
Don’t miss these popular Extreme Performance labs:
• HOL-1804-01-SDC: vSphere 6.5 Performance Diagnostics & Benchmarking
– Each module dives deep into vSphere performance best practices, diagnostics, and optimizations using various interfaces and benchmarking tools
• HOL-1804-02-CHG: vSphere Challenge Lab
– Each module places you in a different fictional scenario to fix common vSphere operational and performance problems
45#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Performance Survey
46
The VMware Performance Engineeringteam is always looking for feedback about your experience with theperformance of our products, ourvarious tools, interfaces and wherewe can improve.
Scan this QR code to access ashort survey and provide us directfeedback.
Alternatively: www.vmware.com/go/perf
Thank you!
#STO1515BU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution