lifting the blinds: monitoring windows server 2012
TRANSCRIPT
Read the full guide at: http://www.datadoghq.com/blog/monitoring-windows-server/
Lifting the Blinds: Monitoring Windows Server 2012
• SaaS based infrastructure and app monitoring
• Open Source Agent
• Time series data (metrics and events)
• Processing nearly a trillion data points per day
• Intelligent Alerting and Insightful Dashboards
Datadog Overview
Operating Systems, Cloud Providers (AWS), Containers, Web Servers, Datastores,
Caches, Queues and more...
Monitor Everything
Agenda
- Why should I monitor Windows Server?
- What are some indicators of performance
issues?
- How can I collect performance metrics for
analysis?
What to monitor?
CPU metrics
- PercentProcessorTime
- ContextSwitchesPersec
- ProcessorQueueLength
- DPCsQueuedPersec
- PercentPrivilegedTime
- PercentDPCTime
- PercentInterruptTime
CPU: ContextSwitchesPersecWhat it tracks:
Number of times the processor switched to a new thread
Correlate with:
Memory: PageFaultsPersec
Disk: DiskTransfersPersec
Network: BytesSentPersec/BytesReceivedPersec
Issue resolution:Adding processors, thread partitioning, DPC partitioning,
hardware interrupt partitioning, disable I/O counters
CPU: PercentProcessorTimeWhat it tracks:
Percentage of time spent performing work (not idle)
Correlate with:ProcessorQueueLength
Issue resolution:More processors, bigger instance, optimize offending application,
CPU: ProcessorQueueLengthWhat it tracks:
Size of processor queue
Correlate with:
CPU: PercentProcessorTime, PercentPrivilegedTime, PercentDPCTime, PercentInterruptTime
Issue resolution:Adding processors, thread partitioning, DPC partitioning,
hardware interrupt partitioning, disable I/O counters
CPU:DPCsQueuedPersecWhat it tracks:
Deferred procedure call (DPC) enqueue rate
Correlate with:
CPU: PercentDPCTime
Disk: DiskTransfersPersec
Network: BytesSentPersec/BytesReceivedPersec
Issue resolution:Remove buggy device, rollback driver
CPU: PercentPrivilegedTime/PercentDPCTime
PercentInterruptTimeWhat they track:
Percentage of time CPU spent in privileged mode/deferred procedure
calls/interrupts
Correlate with:ContextSwitchesPersec/PercentPrivilegedTime/PercentDPCTime PercentInterruptTime
Issue resolution:Adding processors, thread partitioning, DPC partitioning,
hardware interrupt partitioning, disable I/O counters
Memory metrics
- PoolNonpagedBytes
- PageFaultsPersec
- PagesInputPersec
Memory: PoolNonpagedBytesWhat it tracks:
Amount of non-paged memory in use
Correlate with:
Windows Event 2019 “Nonpaged Memory Pool Empty”
Issue resolution:Identify troublesome driver/roll back to known good state
What it tracks:
Rate of page faults
Correlate with:
PagesInputPersec
Issue resolution:Increase system memory
Memory: PageFaultsPersec
What it tracks:
Rate pages are read (from disk) into memory
Correlate with:
PageFaultsPersec/ DiskTransfersPersec
Issue resolution:Increase system memory, move page file to separate physical disk
Memory: PagesInputPersec
- AvgDiskQueueLength
- DiskTransfersPersec
- PercentIdleTime
Disk Metrics
Disk: AvgDiskQueueLengthWhat it tracks:
Running average of I/O ops in queue
Correlate with:DiskTransfersPersec
Issue resolution:Move data for I/O-intensive applications to separate disk; add disks to system
Disk: DiskTransfersPersecWhat it tracks:
Aggregate I/O rate
Correlate with:AvgDiskQueueLength
Issue resolution:Move data for I/O-intensive applications to separate disk; add disks to
system; increase disk cache
Disk: PercentIdleTimeWhat it tracks:
Percent of time disk is idle
Correlate with:AvgDiskQueueLength
Issue resolution:Move page file to separate disk; add disks to system; use SSDs
Tooling
Word of Warning
Powershell
- Windows’ scripting language (no more batch files!)
- Powerful language with deep OS support
- Integrates with C# natively
- Output is typed (unlike *NIX)
Powershell
Powershell
Perfmon
Windows Performance Toolkit
Requires Windows
Assessment and
Deployment Kit (formerly
Windows Performance
Toolkit)
https://www.microsoft.com
/en-
US/download/details.aspx
?id=39982
Windows Performance Recorder
Questions?
Evan Mouzakitis
Research Engineer
Twitter: @vagelim
Email: [email protected]
Read the full guide at: http://www.datadoghq.com/blog/monitoring-windows-server/