Monitoring and Building for SharePoint Farm Performance
Sean P. McDonoughMicrosoft MVPChief Technology OfficerBitstream Foundry LLC
Search for SPTechCon in your App Store and download the 2018 Mobile App to stay connected throughout the entire event.
• Conference and Session Feedback• Get up-to-date show details• Reference speaker profiles• Take notes and download presentations• Connect with other attendees• Find exhibiting sponsors and much more!
Download the SPTechCon Mobile App!
What We’ll Be Covering
1. Farm Environments
2. Getting a Solid Start
3. Tools and Monitoring Servers
4. Page Performance Monitoring
5. Questions & Answers
6. (LOTS of) References
Farm Environments
Farm Environments
Yes, I said farm, not stamp
• Subtle distinction, but it means we’re likely on-premises …• No SharePoint Online / Office 365
• Unless you’re on a “farm in the cloud”
• Why on-premises?• Significant surface reduction for monitoring in the cloud
• It’s “someone else’s” problem (i.e., a value-add for consumers)
• Administrative APIs very limited vs. on-premises
• Limited tools (no perfmon, developer dashboard, etc.)
• In short: we can’t get at the counters and logs we need!
Farm Environments
Some on-premises assumptions I’m making for this session:
• The big question: are you virtualizing?• Virtualization affords many options
• Virtualization provides many ways to destroy performance
• My assumption: you are virtualizing your environment
• Within the datacenter and beyond it• Easier than ever to build farm interdependencies and distributed environments
• Application and customization options push connections beyond the farm
• My assumption: we are focusing on basic (end-user) SharePoint performance
Getting a Solid Start
Getting a Solid Start
“An ounce of prevention is worth a pound of cure.”
When you have the luxury of starting from scratch, you can get the basics right:
• VM Product Configuration
• Virtual Machine (VM) Setup
• Operating System Configuration
• SQL Server Installation
Getting a Solid Start
VM Product Configuration
• Treatment of Memory• Amount of memory actively used and
swapped is configurable
• Depending on product, oversubscription is possible
• Your goal should be 1:1
• What if I can’t do 1:1?• Minimize swapping
• Swapping degrades performance
Getting a Solid Start
VM Product Configuration
• Virtualization Options• Avoid oversubscription on processors
and cores – nothing is free!
• Understand what you’re enabling and avoid going crazy to avoid potential performance implications
Getting a Solid Start
VM Product Configuration
• Virtualization Options• Avoid oversubscription on processors
and cores – nothing is free!
• Understand what you’re enabling and avoid going crazy to avoid potential performance implications
Getting a Solid Start
VM Product Configuration
• Virtualization Options• Avoid oversubscription on processors
and cores – nothing is free!
• Understand what you’re enabling and avoid going crazy to avoid potential performance implications
• Do you really know what Intel’s VT-X/EPT extensions are?
Getting a Solid Start
VM Product Configuration
• Virtualization Options• Avoid oversubscription on processors
and cores – nothing is free!
• Understand what you’re enabling and avoid going crazy to avoid potential performance implications
• Do you really know what Intel’s VT-X/EPT extensions are?
• VMs inside of VMs …
Getting a Solid Start
VM Setup
• Creating virtual disks• SharePoint is an I/O monster, especially when it
comes to SQL Server
• Big question: what is the storage technology in-use? Traditional hard drives? SSDs? Combination?
• Next question: where are the bottlenecks likely to occur? Drives? Controllers? Elsewhere?
• Our goal: avoid too many (abstraction) layers; each layer adds overhead
Getting a Solid Start
VM Setup
• Setup a new 100GB HD: Common Example• Create a new 100GB drive• Place on traditional hard drive(s)• Drives are in a RAID-5 configuration• Space allocated on demand
• What you don’t see …• Cost of parity calculations• Cost of virtual translation• Cost to “auto-grow”/allocate real drive
space• Latency at every step
Getting a Solid Start
VM Setup: ways to improve the scenario
• Understand common RAID configurations.• RAID-5: cheaper, but costly parity calculations
• RAID-10: better performance, more expensive
Getting a Solid Start
VM Setup: ways to improve the scenario
• Understand common RAID configurations.• RAID-5: cheaper, but costly parity calculation
• RAID-10: better performance, more expensive
• Go “direct to disk” with pass-through options• Hyper-V – “pass-through” storage
• VMware – “mapping” a disk
Getting a Solid Start
VM Setup: ways to improve the scenario
• Understand common RAID configurations.• RAID-5: cheaper, but costly parity calculation
• RAID-10: better performance, more expensive
• Go “direct to disk” with pass-through options• Hyper-V – “pass-through” storage
• VMware – “mapping” a disk
• Pre-allocate virtual disk space• Initialize drive space before it’s needed
• This will chew up real drive space!
Getting a Solid Start
VM Setup: ways to improve the scenario (summary)
• Remove unnecessary sources of latency• Allocation at run-time also hurts a lot. Pre-
allocate your storage space.
• Stay off of USB drives in performance-critical scenarios. USB latency hurts.
SLOW!
Getting a Solid Start
VM Setup: ways to improve the scenario (summary)
• Remove unnecessary sources of latency• Allocation at run-time also hurts a lot. Pre-
allocate your storage space.
• Stay off of USB drives in performance-critical scenarios. USB latency hurts.
• Avoid VMDKs and VHDs by passing-through to dedicated storage.
Getting a Solid Start
VM Setup: ways to improve the scenario (summary)
• Remove unnecessary sources of latency• Allocation at run-time also hurts a lot. Pre-
allocate your storage space.
• Stay off of USB drives in performance-critical scenarios. USB latency hurts.
• Avoid VMDKs and VHDs by passing-through to dedicated storage.
• Software RAID kills performance in so many ways. Use hardware RAID.
Getting a Solid Start
VM Setup: ways to improve the scenario (summary)
• Remove unnecessary sources of latency• Allocation at run-time also hurts a lot. Pre-
allocate your storage space.
• Stay off of USB drives in performance-critical scenarios. USB latency hurts.
• Avoid VMDKs and VHDs by passing-through to dedicated storage.
• Software RAID kills performance in so many ways. Use hardware RAID.
• Realize that multiple virtual drives on a single drive array degrades performance!
Getting a Solid Start
VM Setup: ways to improve the scenario (summary)
• The great equalizer – solid state drives (SSDs)• Dramatically better performance hands-down
• No moving parts
• Performance varies for SSDs• Choose your protocol: SATA (AHCI) vs. NVMe
• M.2 is a new form factor, not a new standard
• Limits (theoretical)• SATA (300MB/s) or (600MB/s) (SATA 2 or 3)
• NVMe (2GB/s) or (4GB/s) (PCIe Gen 2 or 3)
• The role of so-called “hybrids”
Getting a Solid Start
Operating System Options: Paging File
• Various strategies for managing• System managed
• Manual allocation(s)
• Remove entirely
• I prefer to assign an allocation manually• Create a dedicated paging drive
• Pre-allocate space in VM environment
• Size is 1.5x the amount of memory
Getting a Solid Start
Operating System Options: Paging File
• Various strategies for managing• System managed
• Manual allocation(s)
• Remove entirely
• I prefer to assign an allocation manually• Create a dedicated paging drive
• Pre-allocate space in VM environment
• Size is 1.5x the amount of memory
• Alter Windows paging options
• (note: be sure to set no paging for C:)
Getting a Solid Start
Operating System Options: Miscellaneous
• Visual Effects• “Adjust for best performance”
• Turns off animations and other CPU-wasting eye-candy
Getting a Solid Start
Operating System Options: Miscellaneous
• Visual Effects• “Adjust for best performance”
• Turns of animations and other CPU-wasting eye-candy
• If your host drives are removable …• Review removal policies for each drive
• Will likely have policies that optimize for performance – typically by enabling caching.
Getting a Solid Start
SQL Server is the performance lynchpin in nearly all SharePoint environments. Luckily, there are some pretty basic adjustments that will help improve performance:
• Instant File Initialization …
• Storage Selection …
• Drive Formatting …
• Data and Log Assignment …
• TempDB Configuration …
• DB Sizing and Autogrowth …
Tools and Monitoring Servers
Reasons
Why do we monitor performance? Reasons typically fall into one of the following three categories:
• We are seeking to understand why our SharePoint environment is underperforming• Troubleshooting!
• We want to ensure that we have enough headroom to scale and grow as desired.• Capacity!
• We want to quantify changes we’ve made to our farm in terms of performance• Improvements!
Troubleshooting
We’re looking for the source of a performance problem. Where should we start?
Performance issues typically originate in at least one general sub-system:
• Memory
• Network
• Processor (CPU)
• Storage (Disk)
Of course, SharePoint problems often muddy the waters by spanning more than one category
Tools
Recommendation: start with monitoring the server(s) over time to gain an understanding:
• First understand “the normal state” of a server
• Then observe the server when a problem occurs
Tools
Recommendation: start with monitoring the server(s) over time to gain an understanding:
• First understand “the normal state” of a server
• Then observe the server when a problem occurs
Many different tools at our disposal:
• Farm Health Analyzer
• Event Viewer
• ULS Viewer
• Fiddler
• Developer Dashboard
• Wireshark
Performance Counters
Today’s focus for performance monitoring is on counters
• Specific performance counters that can help direct further investigation and keep us out of the weeds
Performance Counters
Today’s focus for performance monitoring is on counters
• Specific performance counters that can help direct further investigation and keep us out of the weeds
How do we view performance counters?
• Windows Performance Monitor (perfmon.exe)
Performance Counters
Today’s focus for performance monitoring is on counters
• Specific performance counters that can help direct further investigation and keep us out of the weeds
How do we view performance counters?
• Windows Performance Monitor (perfmon.exe)
• Windows Resource Monitor (resmon.exe)
Performance Counters
Today’s focus for performance monitoring is on counters
• Specific performance counters that can help direct further investigation and keep us out of the weeds
How do we view performance counters?
• Windows Performance Monitor (perfmon.exe)
• Windows Resource Monitor (resmon.exe)
• More specialized tools (e.g., SysKit’s tools)
Performance Counters
Performance Counter Basics
The operating system exposes counters
• Memory, CPU, network, and more
Performance Counters
Performance Counter Basics
The operating system exposes counters
• Memory, CPU, network, and more
Applications oftentimes expose their own counters
• For instance, SharePoint alone exposes over 20 categories and hundreds of counters
Performance Counters
Performance Counter Basics
The operating system exposes counters
• Memory, CPU, network, and more
Applications oftentimes expose their own counters
• For instance, SharePoint alone exposes over 20 categories and hundreds of counters
Bottom line: unless you know what to watch, you’ll suffer a cruel and horrible death at the hands of the Performance Counter Gods.
Server Roles and Counters
What should I be watching?
That depends on the role of the server
• Web Front-End
• Application Server
• SQL Server
Web Front-Ends
WFEs serve-up pages through IIS, so we want low values for all of these counters
• ASP.NET: Requests Queued (should be “low”)
• ASP.NET: Requests Rejected (should be 0)
• ASP.NET: Request Wait Time (should be near 0)
• ASP.NET: Worker Process Restarts (should be 0)
Web Front-Ends
WFEs serve-up pages through IIS, so we want low values for all of these counters
• ASP.NET: Requests Queued (should be “low”)
• ASP.NET: Requests Rejected (should be 0)
• ASP.NET: Request Wait Time (should be near 0)
• ASP.NET: Worker Process Restarts (should be 0)
WFEs also use their memory for caching to accelerate web requests.
• ASP.NET Applications: Cache API Trims (should be near 0)
• ASP.NET Applications: Cache API Hit Ratio (should be “high”)
• SharePoint Publishing Cache: Total Number of Cache Compactions (should be near 0)
• SharePoint Publishing Cache: Publishing Cache Hit Ratio (should be “high”)
• SharePoint Publishing Cache: Publishing Cache Flushes / Second (should be 0)
Web Front-Ends
WFEs use disks for BLOB caching
• SharePoint Publishing Cache: BLOB Cache % Full (maintain headroom)
Application Servers
Unless an application server is experiencing issues specific to its function (which might require monitoring specialized counters), consider monitoring the following:
• Processor: % Processor Time (>75% - 85% is bad)
• Memory: Available Mbytes (<2 GB is bad)
• Memory: Cache Faults/sec (>1 is bad)
• Memory: Pages/sec (>10 is bad)
• Disk: Avg. Disk Queue Length (depends)
• Disk: % Idle Time (<90% is bad)
• Disk: % Free Space (<30% is bad)
These also are valid for WFEs, as well!
SQL Servers
Consider watching the following:
• SQLServer:Buffer Manager: Buffer Cache Hit Ratio
• SQLServer:Databases: Transactions/sec
• SQLServer:General Statistics: User Connections
• SQLServer:Latches: Average Latch Wait Time (ms)
• SQLServer:Latches: Latch Waits/sec
• SQLServer:Locks: Average Wait Time (ms)
• SQLServer:Locks: Lock Wait Time (ms)
• SQLServer:Locks: Number of Deadlocks/sec
• SQLServer:Plan Cache: Cache Hit Ratio
• SQLServer:SQL Statistics:SQL Compilations/sec
• SQLServer:SQL Re-Compilations/sec
Page Performance Monitoring
Page Performance Monitoring
We’ve been looking at server-side performance monitoring thus far. It represents only half of the overall equation.
Page Performance Monitoring
We’ve been looking at server-side performance monitoring thus far. It represents only half of the overall equation.
We need go to put ourselves in the role of the end-user to monitor and diagnose a number of other issues, including page performance issues.
Page Performance Monitoring
We’ve been looking at server-side performance monitoring thus far. It represents only half of the overall equation.
We need go to put ourselves in the role of the end-user to monitor and diagnose a number of other issues, including page performance issues.
What can we do from the other end of the wire?
Page Performance Monitoring
The answer is “quite a bit”
Your browser is an amazingly capable performance tool – if you understand how to use it.
Page Performance Monitoring
The answer is “quite a bit”
Your browser is an amazingly capable performance tool – if you understand how to use it.
Requests and their responses are recorded chronologically – including all sorts of information such as HTTP headers, response codes, cookies, and much more.
Page Performance Monitoring
X-SharePointHealthScore
• A measure of the front-end’s general load or stress. Values from 0 (no stress) to 10 (max stress). We want this low.
Page Performance Monitoring
X-SharePointHealthScore
• A measure of the front-end’s general load or stress. Values from 0 (no stress) to 10 (max stress). We want this low.
SPRequestDuration
• The amount of time your request spends processing on the server (in ms). Ideally less than three seconds (3000ms)
Page Performance Monitoring
X-SharePointHealthScore
• A measure of the front-end’s general load or stress. Values from 0 (no stress) to 10 (max stress). We want this low.
SPRequestDuration
• The amount of time your request spends processing on the server (in ms). Ideally less than three seconds (3000ms)
SPIisLatency
• The amount of time your request spends waiting on the server (in ms). Should be near zero.
Page Performance Monitoring
Round Trip Time – (SPRequestDuration + SPIisLatency) = Time lost “Elsewhere”
Page Performance Monitoring
Round Trip Time – (SPRequestDuration + SPIisLatency) = Time lost “Elsewhere”
For example:
• Round Trip Time = 76.04ms
• SPRequestDuration = 51ms
• SPIisLatency = 0
• Time Lost Elsewhere = 25.04ms
Page Performance Monitoring
Round Trip Time – (SPRequestDuration + SPIisLatency) = Time lost “Elsewhere”
For example:
• Round Trip Time = 76.04ms
• SPRequestDuration = 51ms
• SPIisLatency = 0
• Time Lost Elsewhere = 25.04ms
This is a high-performance SharePoint
farm that is not under load.
• May not reflect real world conditions
SharePoint On-Premises
This will work for …
• SharePoint 2013 on-prem
SharePoint On-Premises
This will work for …
• SharePoint 2013 on-prem
• SharePoint 2016 on-prem
The Common Outcomes
I’ve got consistently high SPRequestDuration values
• This is oftentimes where we find questionable dev practices
• May be related to server (over-)load or other factors
• X-SharePointHealthScore can corroborate (or not)
The Common Outcomes
I’ve got consistently high SPRequestDuration values
• This is oftentimes where we find questionable dev practices
• May be related to server (over-)load or other factors
• X-SharePointHealthScore can corroborate (or not)
I’m seeing a lot of “time lost elsewhere”
• Network congestion or failure
• Web proxies inserting themselves between you and SharePoint
• DNS resolution issues
• Routing problems
Other Questions?
References
References
1. What does virtualize Intel VT-x/EPT or AMD-V/RVI do?https://communities.vmware.com/thread/525101
2. What Are VMware Virtual CPU Performance Counters (vPMCs)?https://www.vladan.fr/what-are-vmware-virtual-cpu-performance-monitoring-counters-vpm1cs/
3. The road to IOMMU (directed video card memory access)https://communities.vmware.com/thread/399066
4. Advantages and Disadvantages of Various RAID Levelshttps://10gbps.io/blog/advantages-disadvantages-various-raid-levels/
5. Configuring Pass-Through Disks in Hyper-V https://blogs.technet.microsoft.com/askcore/2008/10/24/configuring-pass-through-disks-in-hyper-v/
6. NVMe vs. vs. SSD vs HDD Performance: Is it Time to Switch?https://photographylife.com/nvme-vs-ssd-vs-hdd-performance
7. Why Storage Drive Speeds Don’t Hit Their Theoretical Limitshttp://www.tested.com/tech/pcs/457172-why-storage-drive-speeds-dont-hit-their-theoretical-limits/
References
8. What is SSHD (Solid State Hybrid Drive)https://www.lifewire.com/solid-state-hybrid-drive-833451
9. Disable Automatic Updates in 2016https://social.technet.microsoft.com/Forums/lync/en-US/d3a2694c-32da-4158-943a-81c2904ffb3d/disable-automatic-updates-in-2016?forum=WinServerPreview
10.Storage and SQL Server Capacity Planning and Configuration (SharePoint Server) https://technet.microsoft.com/en-us/library/cc298801(v=office.16).aspx
11.Best Practices for SQL Server in a SharePoint Server Farmhttps://technet.microsoft.com/en-us/library/hh292622(v=office.16).aspx
12.Storage and SQL Server Capacity Planning and Configuration (SharePoint Server) https://technet.microsoft.com/en-us/library/a96075c6-d315-40a8-a739-49b91c61978f(v=office.16)#Section6_3
13.Diskspd Utility: A Robust Storage Testing Tool (superseding SQLIO) https://gallery.technet.microsoft.com/DiskSpd-a-robust-storage-6cd2f223
14.Github repository for diskspdhttps://github.com/microsoft/diskspd
References
15. Using Microsoft DiskSpd to Test Your Storage Subsystemhttps://sqlperformance.com/2015/08/io-subsystem/diskspd-test-storage
16. CrystalDiskMark 6.0.0https://crystalmark.info/download/index-e.html
17. The Ultimate SharePoint Performance Guidehttps://leanpub.com/SharePointPerformanceGuide/c/SysKit
18. SysInternals Suitehttps://docs.microsoft.com/en-us/sysinternals/downloads/sysinternals-suite
19. AutoSPInstaller on GitHubhttps://github.com/brianlala/AutoSPInstaller
20. AutoSPInstaller GUIhttps://autospinstaller.com
21. Monitoring and maintaining SharePoint Server 2013https://technet.microsoft.com/en-us/library/ff758658(v=office.16).aspx
References
22. Performance Testing for SharePoint Server 2013https://technet.microsoft.com/en-us/library/ff758659(v=office.16).aspx
23. Capacity management and sizing overview for SharePoint Server 2013https://technet.microsoft.com/en-us/library/ff758647(v=office.16).aspx
24. SharePoint Performance Monitoring – How and Why?http://blog.syskit.com/sharepoint-performance-monitoring
25. Performance Counters for ASP.NEThttps://msdn.microsoft.com/en-us/library/fxk122b4.aspx
26. Monitor Cache Performance in SharePoint Server 2016https://technet.microsoft.com/en-us/library/ff934623(v=office.16).aspx
27. ASP.NET Performance Monitoring, and When to Alert Administratorshttps://msdn.microsoft.com/en-us/library/ms972959.aspx
28. MOSS Object Cache Memory Tuning is not an Intuitive Processhttps://sharepointinterface.com/2009/08/30/moss-object-cache-memory-tuning-is-not-an-intuitive-process/
References
29. High Avg Disk Queue Length and Finding the Causehttp://www.ithacks.com/2008/09/12/high-avg-disk-queue-length-and-finding-the-cause/
30. SharePoint Performance: Best Practices from the Field https://www.slideshare.net/jasonhimmelstein/sharepoint-performance
31. ULS Viewerhttps://www.microsoft.com/en-us/download/details.aspx?id=44020
32. Fiddlerhttps://www.telerik.com/download/fiddler
33. Using the Developer Dashboardhttps://msdn.microsoft.com/en-us/library/office/ff512745(v=office.16).aspx
34. The Five-Minute Page Performance Troubleshooting Guide for SharePoint Onlinehttps://sharepointinterface.com/2017/07/07/the-five-minute-page-performance-troubleshooting-guide-for-sharepoint-online/
References
35. Akamai Reveals 2 Seconds As The New Threshold Of Acceptability For Ecommerce Web Page Response Timeshttps://www.akamai.com/us/en/about/news/press/2009-press/akamai-reveals-2-seconds-as-the-new-threshold-of-acceptability-for-ecommerce-web-page-response-times.jsp
36. How Loading Time Affects Your Bottom Linehttps://blog.kissmetrics.com/loading-time/
Sean P. McDonoughSharePoint and Office 365 Gearhead, Tinkerer, Microsoft MVP
Email:Twitter:
Blog:About:
[email protected]@spmcdonoughhttp://SharePointInterface.comhttp://about.me/spmcdonough