storage troubleshooting with vc ops 5
DESCRIPTION
Storage Troubleshooting with VC Ops 5. Things we want to know in performance. The storage team have requested for greater visibility Joint troubleshooting, capacity planning, performance monitoring . Is there any storage bottlenect? If yes, where? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/1.jpg)
© 2010 VMware Inc. All rights reserved
Confidential
Storage Troubleshooting with VC Ops 5
![Page 2: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/2.jpg)
2 Confidential
Things we want to know in performance
The storage team have requested for greater visibility
• Joint troubleshooting, capacity planning, performance monitoring.
• Is there any storage bottlenect? If yes, where?
• You need to know both Big Picture and details
Your needs:
• Be able to quickly tell the overall workload
• Be able to quickly tell which VMs are generating the big IOPS.
• Be able to tell the total IOPS generate from all VMs, and see a chart to see if there is a spike.
You want to know the 3 dimensions
• IOPS: Read, Write, Read/Write Ratio, Total IOPS
• Latency: Read, Write, Total
• Throughput
This is a Level 300 material.I’m assuming you’re hands-on on both vSphere 5 and VC Ops 5.
This is based on vSphere 5.0.1 and VC Ops 5.0.1
Please read speaker notes.
![Page 3: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/3.jpg)
3 Confidential
The challenges
Your environment• Production Site
• 500 servers VM, 3000 desktop VM• 2 vCenters, 80 ESXi, 10 clusters, 60 datastores, 6 RDM.• 50 physical servers (mostly UNIX)
• You use VMFS on FC and NFS on 10 GE
• 2 storage arrays: 1 high end, 1 midrange
• DR Site• Let’s not talk about this. The production is complex enough already!
![Page 4: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/4.jpg)
4 Confidential
Storage counters: ESXi hostDatastore Disk
Storage Adapter or Storage Path
![Page 5: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/5.jpg)
5 Confidential
ESXi: Adapter, Device and Path
1 adapter can many Devices (LUN).1 Device is accessed via many paths.
1 path can only access 1 Device.
![Page 6: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/6.jpg)
6 Confidential
ESXi: Disk
![Page 7: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/7.jpg)
7 Confidential
NFS
ESXi: Adapter, Device and Path
Disk
ESXi 5.0
Disk
Datastore
Storage Path
Storage Adapter 1
Storage Path
Disk
Storage Path
Storage Adapter 2
Storage Path Storage Path Storage Path
vmhba2 vmhba3
vmhba3
vmnic
VMFS VMFS
Datastore
RDM
Datastore
![Page 8: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/8.jpg)
8 Confidential
Storage counters: VM
Disk
Virtual Disk (VMDK, RDM)
Datastore
Disk
VM
RDMVMFS NFS
Drive 1 Drive 2 Drive 3
Disk
scsi0:0 scsi0:2
Datastore Datastore
vDisk vDisk vDisk
![Page 9: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/9.jpg)
9 Confidential
VC Ops has 4 groups of Storage metrics for a VM
Which counters do you take? There are so many of them. Say you want Write Latency. Which one do you take: Virtual Disk, Datastore, Disk, or Storage?I’ll try to answer in the next few slides.If you want to know now, the counter with the black arrow is the counters that I think we should use.
? Not sure what this is
IOPS counters
Other counters
Latency counters
Thruput counters
Why only at Disk level?
? Not sure what this is
These don’t exist in vCenter. RDM?
Don’t use
![Page 10: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/10.jpg)
10 Confidential
VM: Storage
![Page 11: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/11.jpg)
11 Confidential
Comparing VC Ops with vCenter
Datastore shows the metric for this VM only, not for every VM in that datastore. Datastore figures will be higher if your VM has snapshot.
Disk = physical LUN backing up the datastore. If there is no extent, then Disk = Datastore.
Where does the Storage counter come from, as there is no Storage in vCenter? vCenter only has Datastore, Disk, Virtual Disk, as shown in this screenshot.If you know, let me know.
![Page 12: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/12.jpg)
12 Confidential
VC Ops has 2 groups of Storage metrics for a Datastore
Not sure the difference between Max Observed and Highest ObservedWhich counters do you take? There are so many of them. Say you want Write Latency. Which one do you take: Virtual Disk, Datastore, Disk, or Storage?I’ll try to answer in the next few slides.
IOPS counters
Other counters
Latency counters
Thruput counters
VMFS datastore NFS datastore
![Page 13: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/13.jpg)
13 Confidential
VC Ops has 4 groups of Storage metrics for a ESXi
Which counters do you take? There are so many of them. Say you want Write Latency. Which one do you take: Virtual Disk, Datastore, Disk, or Storage?I’ll try to answer in the next few slides.
IOPS counters
Other counters
Latency counters
Thruput counters
![Page 14: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/14.jpg)
14 Confidential
VC Ops: Storage metrics from Cluster until World
Notice Storage is not the group, but Disk. I was hoping for Storage as it is more intuitive.For IOPS or Throughput, it is the sum of all components (e.g. all VM in that vCenter)For Latency, I’m not sure if it is an average, or the max. If it is a Max, that would be an awesome Super Metric!IOPS counters
Other counters
Latency counters
Thruput counters
Cluster Datacenter WorldvCenter
![Page 15: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/15.jpg)
15 Confidential
Storage counters at VC level
![Page 16: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/16.jpg)
16 Confidential
Storage counters at World level
![Page 17: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/17.jpg)
17 Confidential
Part 1: IOPS
![Page 18: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/18.jpg)
18 Confidential
![Page 19: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/19.jpg)
19 Confidential
Same data, but on 1 chart
![Page 20: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/20.jpg)
20 Confidential
![Page 21: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/21.jpg)
21 Confidential
vCenter: performance chart
This is the object name. In this case, this is a VM and its name is vCenter5
This one tells us that it is the Datastore group, and it is showing Past day data (last 24 hours)
![Page 22: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/22.jpg)
22 Confidential
Same VM & timeline, but from the Disk counter.
![Page 23: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/23.jpg)
23 Confidential
vCenter Ops might aggregate differently than vCenter
Same info, but this time from vCenter Ops.They are similar, but not identical. Is this because the way VC Ops aggregate?Read peaks at 245 in vCenter vs 217 in VC Ops. Around 13% lower in VC Ops.Write peaks at 137 vs 135. This is close enough.
![Page 24: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/24.jpg)
24 Confidential
IOPS: Snapshot causes real IOPS penalty
This is from the Virtual Disk counters. 173 reads at Virtual Disk translates into 245 reads at Datastore. This is 40% more70 writes at Virtual Disk translates into 137 writes at Datastore. This is almost 200%!So a snapshot can cause much higher IOPS.
![Page 25: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/25.jpg)
25 Confidential
Again, the same gap remain between vCenter and VC Ops.
![Page 26: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/26.jpg)
26 Confidential
![Page 27: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/27.jpg)
27 Confidential
IOPS: Conclusion
Use the Datastore counter for vmdk• The Virtual Disk counter is useful if you are comparing with actual IOPS issued
at Guest OS level. It will be too low if you have snapshot.
• The Storage counter = Virtual Disk
• The Disk counter is useful if you are discussing with the Storage team, who is showing you LUN by LUN metrics. Disk = LUN. • It is not useful if your datastore spans multiple LUNs due to Extent.
• In most cases, Disk = Datastore as you should avoid Extent.
Use the Disk counter for RDM VC Ops counter may differ to vCenter
• If the number looks strange, check with vCenter.
• Sometimes the data in vCenter itself is wrong.
• Check a few VMs, not just 1.
![Page 28: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/28.jpg)
28 Confidential
Part 2: Latency
![Page 29: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/29.jpg)
29 Confidential
VM level: Total Latency
![Page 30: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/30.jpg)
30 Confidential
VM Level: Read Latency
![Page 31: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/31.jpg)
31 Confidential
![Page 32: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/32.jpg)
32 Confidential
![Page 33: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/33.jpg)
33 Confidential
![Page 34: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/34.jpg)
34 Confidential
Avoid the counter “Datastore | Highest Latency”
![Page 35: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/35.jpg)
35 Confidential
![Page 36: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/36.jpg)
36 Confidential
![Page 37: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/37.jpg)
37 Confidential
Data at VC Ops
![Page 38: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/38.jpg)
38 Confidential
Total Latency >< Read Latency + Write Latency
![Page 39: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/39.jpg)
39 Confidential
View at Datastore level
![Page 40: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/40.jpg)
40 Confidential
Latency: Conclusion
Use the Datastore counter for vmdk• The Virtual Disk counter is useful if you are comparing with actual IOPS issued
at Guest OS level. It will be too low if you have snapshot.
• The Storage counter = Virtual Disk
• The Disk counter is useful if you are discussing with the Storage team, who is showing you LUN by LUN metrics. Disk = LUN. • It is not useful if your datastore spans multiple LUNs due to Extent.
• In most cases, Disk = Datastore as you should avoid Extent.
Use the Disk or Virtual Disk counter for RDM VC Ops counter may differ to vCenter
• If the number looks strange, check with vCenter.
• Sometimes the data in vCenter itself is wrong.
• Check a few VMs, not just 1.
![Page 41: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/41.jpg)
41 Confidential
Latency: Conclusion
Do not use the Total Latency• When creating super metric, manually add the Read and the Write.
Use the Datastore counter for vmdk Use the Disk counter for RDM VC Ops counter may differ to vCenter
• If the number looks strange, check with vCenter.
• Sometimes the data in vCenter itself is wrong.
• Check a few VMs, not just 1.
![Page 42: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/42.jpg)
42 Confidential
Part 3: Throughput
![Page 43: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/43.jpg)
43 Confidential
Throughput counters for VM
![Page 44: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/44.jpg)
44 Confidential
Throughput counters for VM
![Page 45: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/45.jpg)
45 Confidential
Same VM, vastly different data
![Page 46: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/46.jpg)
46 Confidential
![Page 47: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/47.jpg)
47 Confidential
Throughput: Conclusion
Use the Datastore counter for vmdk• The Virtual Disk counter is useful if you are comparing with actual IOPS issued
at Guest OS level. It will be too low if you have snapshot.
• The Storage counter = Virtual Disk
• The Disk counter is useful if you are discussing with the Storage team, who is showing you LUN by LUN metrics. Disk = LUN. • It is not useful if your datastore spans multiple LUNs due to Extent.
• In most cases, Disk = Datastore as you should avoid Extent.
Be careful with the Disk counters, as they can report large numbers• vCenter: Disk | Disk Throughput usage
• vC Ops: Disk | IO Usage capacity
VC Ops counter may differ to vCenter• If the number looks strange, check with vCenter.
• Sometimes the data in vCenter itself is wrong.
• Check a few VMs, not just 1.
![Page 48: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/48.jpg)
48 Confidential
Part 4: Other Interesting Metrics
![Page 49: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/49.jpg)
49 Confidential
Built-in Super Metric?
The 3 chart below shows summary at World level• The actual world is on the right. It has 5 vCenters
![Page 50: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/50.jpg)
50 Confidential
Other interesting metrics
![Page 51: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/51.jpg)
51 Confidential
![Page 52: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/52.jpg)
52 Confidential
vCenter “equivalent” dashboard
![Page 53: Storage Troubleshooting with VC Ops 5](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56816523550346895dd7a672/html5/thumbnails/53.jpg)
53 Confidential
Capacity
You have 1000 VMs on 50 datastores.
max([$This:M180/14,$This:M1978/$This:M1977*0.8]) which is translated to: max([This Resource: summary|total_number_vms/14,This
Resource: capacity|used_space/This Resource: capacity|total_capacity*0.8])
this means show me all datastores where either number of attached vm's is more than 14 or space left is less than 20%.
You can imagine how great it can look on a heatmap