proactive storage performance and capacity management
TRANSCRIPT
Smart Storage Sizing
Storage Intelligence
Proactive Storage Performance and Capacity Management
Lee LaFrese – Senior Consultant [email protected]
214-432-7920 www.intellimagic.net
© IntelliMagic 2014
Storage Intelligence
Lee LaFrese, Senior Storage Consultant
Speaker
© IntelliMagic 2014 2
Storage Intelligence © IntelliMagic 2014 3
Objectives
What is Storage Intelligence?
Storage Technology Overview and Measurement
IntelliMagic Vision’s Proactive Analytics
Case Study – High Front-end Response Times
Conclusion and Q&A
Storage Intelligence
Performance Management: Perception and Reality
Common IT Perceptions
• “We have ‘proactive’ monitoring – we get alerts on outages, usually before the end user calls to report the problem”
Reality
• ‘Proactive’ requires predictive analytics to prevent problems. Anything less is just ‘airbag monitoring’
• “Disruptions due to performance problems are inherently unexpected and cannot be predicted”
• “Optimization is too complex and time consuming, we don’t have the hours or headcount for that”
• Most I/O problems are not due to failures. They are from unbalanced or over-saturated components - which can be predicted
• The more constrained your staff, the more you need automated risk prediction to avoid fire fighting and constant ‘whack a mole’
© IntelliMagic 2014
• “Our storage hardware vendor’s proprietary tools provide me with good enough insight”
• Most IT environments are multi-vendor which implies multiple proprietary tools and duplicate resources to manage them
Storage Intelligence
Storage Cost Trends: Better and Worse
Storage Costs Are Decreasing Storage Costs Are Increasing
• Virtualization
• De-duplication, thin provisioning
• Compression
• SSD, Automated Tiering
• Commoditization of hardware
• Cloud storage options
• Blurring of the lines between file and block level storage
• Constant growth in Data • More frequent spikes in Demand
(e.g., mobile) creates SLA pressure • More hotspots - Access Density is
higher: de-duped, thin provisioned, compressed...
• More complexity to manage • Expertise shortage, constantly
growing ratio of devices per head
Constant, Fresh Storage Intelligence about your workloads on your hardware is how to minimize both business risk and total storage costs
© IntelliMagic 2014
Storage Intelligence 6
Time
Response Time
Sub-component Saturation
Classic Monitoring tools
Storage Intelligence provides Key Risk Indicators
SLA
Perf
orm
ance
Storage Intelligence delivers constant knowledge about how your infrastructure is handling your peak workloads
© IntelliMagic 2014
Storage Intelligence uses predictive analytics to produce Key Risk Indicators that identify root issues before the “knee of the curve” is reached avoiding SLA violations
Storage Intelligence
IntelliMagic is a leader in advanced predictive analytics - especially for large data storage
infrastructures
Over 20 years developing storage performance modeling solutions
Privately held, financially independent
Customer centric and highly responsive
Proactive Performance Monitoring: Find / Avoid Problems
Predictive Performance Modeling Services: Optimize Investments
© IntelliMagic 2014 7
Storage Intelligence
IntelliMagic solutions are used to mange many of the largest, most complex IT environments in the world:
• 4 of the 5 largest banks in the US • The largest wireless provider in the US market • The world’s largest manufacturer of farm equipment • One of the largest auto manufacturers in the world • Multiple US Federal Agencies • The largest bank in Brazil • The largest data center in Germany
Trusted to provide solutions for some of the largest enterprise storage vendors:
• IBM • Disk Magic, Capacity Magic, Batch Magic, etc.
• HP
Who uses IntelliMagic?
8 © IntelliMagic 2014
Storage Intelligence
IntelliMagic is the market leader in providing advanced, actionable Storage Intelligence
9 © IntelliMagic 2014
Visualizes workload constraints deep inside storage devices
Quantifies Key Risk Indicators (KRIs) across the enterprise
Includes performance best practices learned from peer sites
Storage Intelligence
Quantifies risk of specific storage components being unable to handle the daily peak period workloads running on that hardware
Storage Intelligence: See Deep Inside Storage Devices
10 © IntelliMagic 2014
Storage Intelligence
Provides hardware and workload sensitive Key Risk Indicators (KRI’s) which are understandable in a single glance and give early warning of issues
Storage Intelligence: See Risk Across Entire Enterprise
11 © IntelliMagic 2014
Storage Intelligence © IntelliMagic, 2014 12 The intuitive web interface provides a clear, prioritized view of risks and observations
© IntelliMagic 2014
Storage Intelligence
IntelliMagic Vision Architecture z/OS Disk
13 © IntelliMagic 2014
Storage Intelligence
IntelliMagic Vision Architecture z/OS Tape
14 © IntelliMagic 2014
Storage Intelligence
IntelliMagic Vision Architecture Distributed SAN
15
IntelliMagic Vision as a Service (IVaaS) is available for all three platforms and include IntelliMagic consulting. Hard to obtain knowledge from experiences outside your company borders benefits you.
© IntelliMagic 2014
Storage Intelligence
Infrastructure Layers
© IntelliMagic 2014 16
Host multi-pathing or
HA contention
Fabric/FICON or ISL
congestion
Front-end, cache, CPU or back end
Potential Bottlenecks
Unix Servers
Fabric
Storage Systems
VMWare Servers
Hosts
System z Servers
Storage Intelligence
Why Do I/O Bottlenecks Occur?
Symptom Infrastructure Layer Bottleneck Reason Common
Solutions
Poor/inconsistent I/O response time on host
DSS Front End Imbalanced and overs-subscribed Host Adapters or ports
Rebalance or add adapters or ports
Poor/inconsistent I/O response time on host
DSS Back-end HDDs
Too many I/Os per spindle
Add disks/SSDs or move workload
Low read-hit ratio on and/or back-end storage controller
DSS Cache Cache hostile applications (i.e., large data warehouse) and large/few storage pools
Add cache, rebalance or add storage pool resources
Poor/inconsistent I/O response time on host
Host Multi-Pathing
Only using a single path when bandwidth required is greater
Increase pathing
Poor/inconsistent front-end response times
Fabric ISL or path
Poor design, ISLs oversubscribed or hardware outage
Use design best practices, reduce oversubscription if appropriate
© IntelliMagic 2014 17
Bottlenecks are a leading
cause of performance
issues.
Storage Intelligence
CF
Collect data on a per sysplex basis
Consolidate by DSS for reporting
CHPIDs CHPIDs / MIF
z/OS z/OS z/OS
FCD FCD
73: Channel
74.5: Cache
73: Channel 74.1: z/OS Device
74.5: Cache 74.8: Link and HDD
78.3: I/O Queueing
74.7: FICON Director
42.5/6: DFSMS Dataset
required
optional
70: Processor 72: Workload 75: Paging
z/OS Data Sources
18
74.4: CF
74.2: XCF
30: Job
74.7: FICON Director
© IntelliMagic 2014
Optional User Records: Type 105: GDPS GM Type 206: SRDF/A VISION may also process GDPS GEOXPARM when loading data from an XRC installation
Storage Intelligence
Data Sources for z/OS Tape
19
required
optional
CF
CHPIDs CHPIDs / MIF
z/OS z/OS z/OS
TMS 21: Mounts
30: Jobs/Pgms
14: DSN Read
15: DSN Write
Real and/or Virtual Tape
Collect data on a per Library (Grid/Cluster) basis
Consolidate by Grid/Library Cluster for reporting
BVIR TS7700
Optional Back-end Tape
Virtual Tape
© IntelliMagic 2014
Storage Intelligence
What Can We Measure – SAN Storage
© IntelliMagic 2014 20
Component Metrics Port Throughput
Adapter / processor
Throughput, I/Os, response time, utilization
Volume Throughput, I/Os, response time, cache hits, sequential stages, cache-full delays, backend activity
Disk / RAID group
Throughput, I/Os, response time / utilization
Storage View
HBA
HBA
HA LUN
core
core
edge
edge
edge
edge HA
Storage Intelligence
What Can We Measure – SAN Fabric
© IntelliMagic 2014 21
Component Metrics Port traffic Bytes, packets, and frames transmitted and received
Port errors Address errors, zero buffer-to-buffer credits, CRC errors, link failures, signal failures, …
Fabric View
HBA
HBA
HA LUN
core
core
edge
edge
edge
edge HA
Storage Intelligence
Rated Performance Reports
© IntelliMagic 2014 22
No Border, No Rating Green Border, Good Yellow Border, Early Warning
Red Border, Performance Exceptions Reports for key
metrics are rated according
to adaptive thresholds defined per
platform, providing pro-active warning
of potential performance
issues.
Storage Intelligence
Static Thresholds
© IntelliMagic 2014 23
Thresholds can be changed if
required.
Chart frame indicates if there is a problem.
Thresholds based on hardware
configuration.
Storage Intelligence
Dynamic Workload Based Thresholds
© IntelliMagic 2014 24
Some thresholds, like those for Front-
end Read Response time are based upon the capabilities
of the controller,
activity and workload.
Storage Intelligence
Vision for z/OS Examples
25 © IntelliMagic 2014
Storage Intelligence
Throughput per DSS (MB/s) [rating: 0.15] for all Disk Storage Systems by Serial
26 © IntelliMagic 2014
Throughput shows storage
machine capabilities and
thus it is important for
proactive monitoring.
Storage Intelligence
Response Time (ms) [rating: 0.00] for all Disk Storage Systems by Serial
27 © IntelliMagic 2014
From this chart, you can drill-down to
further analyze response time.
Storage Intelligence
Dissecting Disconnect Time
Compute Read Miss and Synchronous Replication components Queuing & Delays is disconnect time that can’t be accounted for
28 © IntelliMagic 2014
Storage Intelligence
Front-end Adapter Utilization (%) [rating: 0.00] by Serial
29
IntelliMagic Vision goes beyond just showing the
response time. This chart looks
at the Host Adapters
themselves.
© IntelliMagic 2014
Storage Intelligence
Storage Pool Read & Write Response Time
30
Response time at the drive level tells us
when the disks are overloaded
From this chart, you can drill-down to
further analyze response time.
© IntelliMagic 2014
Storage Intelligence
Replication Send (MB/s) [rating: 0.00] for all Ports by Serial
31 © IntelliMagic 2014
Storage Intelligence
Average RPO over interval (sec) [rating: 0.00] for all Global Mirror Sessions by Session name
32 © IntelliMagic 2014
Storage Intelligence
All CP, zIIP and zAAP time used (processors) for all Service Classes by Service Class
33
The “by service class” charts
help show changes in workloads
© IntelliMagic 2014
From this chart, you can drill-down to
further analyze response time.
Storage Intelligence
Address spaces with highest Disk I/O intensity (#) (top 20)
For Service Class 'YXDNXY' by Address Space Name
34
You can easily see which
service classes can benefit from proper
storage management
© IntelliMagic 2014
From this chart, you can drill-down to
further analyze response time.
Storage Intelligence
Coupling Facility Dashboard [rating: 3.00] for all Coupling Facilities by CF Name
35 © IntelliMagic 2014
Storage Intelligence 36
Example of a Tape Dashboard
Storage Intelligence
z/OS Tape Drill Down to a Detail Rated Chart
37 © IntelliMagic 2014
Storage Intelligence
Vision for Distributed SAN Examples
38 © IntelliMagic 2014
Storage Intelligence
Operations: Instant Enterprise Wide Health Check of Storage Environment
© IntelliMagic 2014 39
Automated Enterprise
Performance Dashboard
Report.
Quickly survey your enterprise
storage performance
health.
Storage Intelligence © IntelliMagic 2014
Cluster (Host) Performance Dashboard
40
Summarize all key clusters
(hosts) and drill down to identify
performance issues.
Storage Intelligence
Sample Fabric Health Check
© IntelliMagic 2014 41
Quickly detect and identify any port/SFP issues
across SAN environment.
Storage Intelligence
Performance/Capacity: Identify Imbalances
© IntelliMagic 2014 42
Drill down to ports and
volumes to identify the
volumes and associated
hosts causing the imbalances.
Chart shows average, min/max and standard deviation over time
Storage Intelligence © IntelliMagic 2014
Capacity Management: Location, Tier Trending, and Detailed Reports
43
Identify trends and determine which storage
pools are growing over
time.
Storage Intelligence
Case Study High Front-end Response Times
© IntelliMagic 2014 44
Storage Intelligence
DSS Dashboard Minicharts
45
Drill down shows rated mini line charts. This is
useful in seeing relationships
between the data.
Notice the relationship between the
throughput and response time and
Read Hit %.
© IntelliMagic 2014
Storage Intelligence
Response Time (ms) [rating: 0.61] For Serial 'IBM-000'
46
Response time peaks at 11:00
PM.
© IntelliMagic 2014
Storage Intelligence
Drive Read Response (ms) [rating: 0.00] For Serial 'IBM-000'
47
Average back-end response
time has lots of peaks. This
indicates a lack of correlation with
front-end response time.
© IntelliMagic 2014
Storage Intelligence
Fibre Front-End Read Response (ms) [rating: 1.62]
for all Ports by Serial
48
Fibre port read response times peak at 11:00
PM!
© IntelliMagic 2014
Storage Intelligence
Read+Write (MB/s) For Serial 'IBM-000' by HA Name
49
Peak throughput by HA shows
imbalanced host adapter
throughput.
Throughput is reaching peak for this type of host adapter capabilities.
© IntelliMagic 2014
Storage Intelligence
Conclusion
© IntelliMagic 2014 50
Finding Recommendations HA0000 & HA0001 are saturated during peaks periods. Back-end drive write response times peak during same time but there are no FW bypasses. Front-end write response time should not be affected by increases in back-end response time.
Redistribute some of the load from HA0000 & HA0001 to HA0002 which was previously not in use.
Storage Intelligence © IntelliMagic, 2014
IntelliMagic Offerings Summary
IntelliMagic Vision as a Service • Daily Monitoring / Analysis / Reporting • Expert Recommendations • 1-2 Day Install – Analytics Next Day
IntelliMagic Vision On Premise • SAN, z/OS or Tape • Perpetual or Term Licenses
• IntelliMagic Direction storage modeling engagements
• In a “tech refresh cycle” - Model prospective hardware vendors’ ability to deliver your specific workloads
51
Proactive Performance Monitoring: Find / Avoid Problems
Predictive Performance Modeling Services: Optimize Investments
Special Offer for Today’s Webinar Participants Customers who sign up for IntelliMagic Vision as a Service by August 29
will have their Initial Setup Fee waived (a $10,000 value)
Storage Intelligence
Questions?
© IntelliMagic 2014 52
Contact us for a custom demo
Call 1-877-815-3799 (toll free) Email [email protected] Web www.intellimagic.net
Twitter: @IntelliMagic
Storage Intelligence
Appendix
53 © IntelliMagic 2014
Storage Intelligence
Platform Support – Distributed SAN
© IntelliMagic 2014 54
Vendor Models
EMC VNX (block), VMAX, DMX, CX
IBM SVC, V7000, DS8000, XIV, DS3000,DS4000,DS5000
HP 3PAR, P2000, P9500
HDS VSP, AMS
Brocade Fabric Switches