assuring service quality despite limited resourcesmedia.govtech.net/govtech_website/events/... ·...
TRANSCRIPT
Assuring Service Quality Despite Limited Resources
North Carolina Digital Government SummitSept. 2, 2009
David Hayward Sr. Principal Service Assurance [email protected]
Every User Matters. Every Transaction Counts.
2
Copyright © 2009 CA
A quality user experience is your government’s lifeline.
IT is critical to delivering a quality user experience to fulfill your organization’s mission and mandates.
Constituents
Network
IT Operations
DB
System
App
The Government IT Service Challenge
It’s difficult to get a handle on each of these problems and even harder to understand how these problems affect each other.
Users are complainingUsers are complainingUsers are complaining
What’s happening in my domain?How is it affecting services?What should I prioritize on? Where is the problem?What caused it?Not my issue…
Stakeholders
Agency & DepartmentExecutives
Service Owners
Why do constituents see the problem before IT?
Are we meeting expectations or causing dissatisfaction?
Why can’t I complete my transaction?
Why is this taking so long?
Understanding
Service Assurance
Challenges
IT Service Assurance Paradox
Unavailable or Slow Available, Performing as Expected
Web Server
App Server
Mainframe
Database
Network
Application
End-User Service
05:00 06:00 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00
99%
99%
99.9%
99.9%
99.999%
99.9%
?
% available
And who is measuring the real user experience … and managing the whole service
Copyright © 2009 CA5
Managing Technologies vs. Services
Managing Infrastructure Top-Down in Context of Services is Required
Technical Challenge: Managing Services
Databases
Service #1
Service #2
Service #3
Service #4
Service #5
Service #6
Storage
Client Systems
Networks
Applications
Servers
Intelligent Service Modeling and Analytics is Required
Copyright © 2009 CA6
Managerial Challenge: People & Processes
7
Service Desk
Network Engineering
System Engineering
Application Engineering
DB Administration
Prioritized & Shared Understanding of Services, Their Condition & Root Cause is Required
Databases
Service #1
Service #2
Service #3
Service #4
Service #5
Service #6
Storage
Client Systems
Networks
Applications
Servers
What IT Professionals
Say About
Service Assurance
Challenges
Slide 9
0% 10% 20% 30% 40% 50% 60%
Calls from Users
NOC or Monitoring
Center
Transaction
Management products
When application-related problems occur, what are the primary ways in which IT most often finds out about the problem?
IT is Reactive
54% of problems first detected by end usersn= 186Feb 2009
Enterprise Management Associates : 2008 Survey
When application-related problems occur, what are the primary ways in which IT most often finds out about the problem?
Why Is This Number Important?
89%
Enterprise Management Associates : 2008 Survey
Slow Root Cause Determination/MTTR
89% IT organizations
use
triage teams
to solve
service performance
problems
Adapted from Enterprise Management Associates :2008 Survey
Slide 12
Sample Size = 180, Valid Cases = 148; Responses not shown received 0%
For these cross-domain teams, what types of personnel are most often included?
Impact on Productivity
Enterprise Management Associates : 2008 Survey
Slide 13
Sample Size = 232, Valid Cases = 232
IT App/Service Management Roadblocks
In your opinion, which reasons hamper your organization's effectiveness in terms of managing applications and
services?
1Changes to applications and infrastructure not well
documented or controlled
2 Poor coordination between support teams
3 Troubleshooting/root cause analysis takes too long
Enterprise Management Associates : 2008 Survey
Slide
14
Wish List: Top Three “Most Wanted” Tools
What are the top three tools and/or functionalities that your organization currently lacks that would most benefit your
current support requirements?
1 Consolidated event correlation
2 Change tracking, verification, audit
3Transaction management, problem
identification, root cause analysis
Enterprise Management Associates : 2008 Survey
Top Three Pressures Facing IT Operations Management
Gartner: 2008 Survey
Trends Affecting IT Operations Management
Accelerating adoption of ITIL and process frameworks
Increasing IT service delivery orientation
New organizational roles and collaborative responsibilities
Focus on automation
Need to improve (and prove) IT operations efficiencies
Improve or establish IT operations accountability
Priority shift from management of infrastructure to management of
services and processes
Movement back towards vendor frameworks
Reducing TCO of management via deployment of agentless
solutions
Increasing technical complexity,which involves new
management buyers
Increasing commoditization
of platform agentry
CMDB and workflow (RBA) as new age framework
Technology TrendsGlobal Trends
Green IT (power & cooling)
Data Center Consolidation(cost reductions)
IT virtualization
Adapted from Gartner: 2009 Report
Summary: Barriers to Overcome
17
Copyright © 2009 CA
> Too many tools
> Silo tools not well integrated
> Technology silos preventend-to-end management
> Determining root cause is difficult
> Reactive: little or no predictive visibility
> Performance becoming most significant service issue
> Little time to optimize
> Flat or down budgets
CA: 2008 Focus Groups
People, Processes and
Technology for IT in
Lean Economic Times
IT Approach for Lean Economic Times
Focus on what’s
most important — to
meet your mission &
mandates delivering
value to constituents
Aim resources at high
value deliverables
Improve user
experience
Reduce waste
Increase productivity
Executives/Officers
Staff
Compliance
CIO
PMO
CISO
Support
Engineering/Development
IT Hardware & Software Assets
COST
VALUE
Ap
plic
atio
n P
erfo
rman
ce M
gmt
Serv
ice
Man
agem
ent
Pro
ject
& P
ort
folio
Mgm
t
Infr
astr
uct
ure
Man
agem
ent
Secu
rity
Man
agem
ent
Mai
nfr
ame
2.0
Ap
plic
atio
n P
erfo
rman
ce M
gmt
Serv
ice
Man
agem
ent
Pro
ject
& P
ort
folio
Mgm
t
Infr
astr
uct
ure
Man
agem
ent
Secu
rity
Man
agem
ent
Self
-Se
rvic
e
Copyright © 2009 CA19
External Constituents
Internal Constituents
IT Mngt & Operations
Infrastructure
Citizens
Businesses
Suppliers
Technology Building Blocks
for Transforming IT in Lean Times
3 Building Blocks
Infrastructure Management
Service Assurance
Application Management
Copyright © 2009 CA21
3 Building Blocks
Infrastructure Management
Networks
Physical & virtual systems
Databases
Service Assurance
Application Management
Copyright © 2009 CA22
Pivotal Insight
Fault, Performance,
Configuration &
Capacity
September 5, 2009
23
WAN/WWW
App ServerRouter Firewall Switch Web Servers
Load Balancer
Portal
Databases
Mainframe
Database
NETWORKUSER FRONT END
BACK END
Discover networks,
systems and databases;
map to IT services
Monitor performance
trends/violations and
outages
Correlate alarms;
diagnose network, system
and database root cause
MIDDLEWARE
End User
#1: Infrastructure Management
End-to-End View of Infrastructure Status
September 5, 2009
24
WAN/WWW
App ServerRouter Firewall Switch Web Servers
Load Balancer
Portal
Databases
Mainframe
Database
NETWORKUSER FRONT END
BACK END
Discover networks,
systems and databases;
map to IT services
Monitor performance
trends/violations and
outages
Correlate alarms;
diagnose network, system
and database root cause
MIDDLEWARE
End User
#1: Infrastructure Management
End-to-End View of Infrastructure Status
> Inductive Modeling
Discover assets
Learn relationships
Create model
Monitor the model for
conditions
© 2009 CA, Inc.25 25
Best Practice: Automated Discovery
> Active monitoring does
not just rely on event
streams
> Distinguish between a
flood of events and
meaningful alarms
Symptomatic event
Causal event
© 2009 CA, Inc.26 26
Best Practice: Automated Root Cause Analysis
> Correlate symptoms
> Suppress unnecessary,
symptomatic alarms
> Pinpoint root cause
Must be automated
Manual methods
overwhelming and
slow
© 2008 CA, Inc.27 27
Best Practice: Automated Root Cause Analysis
28Copyright © 2009 CA
Best Practice: Configuration-Aware Root Cause
> Manage to standards
Gold standard
Change Authorization
> Know infrastructure
impact of change
Fault-Awareness
Degradation-Awareness
Automated detection and
roll-back
Misconfiguration Original Configuration
Best Practice: Proactive Perf. & Capacity Management
Upper
Threshold
Lower
ThresholdTime Window
= 1 hour
Time is Unacceptable
= 15 min.
Basel
ine
Time
Deviation from NormalTime over Threshold
Time
Time Over Threshold
= 15 min.
Time Window
= 1 hour
Identify when something is too
wrong for too long
Networks Systems Databases Energy
Automatically
determine
baseline
values
Automatically
reduce
noise
Identify when unusual behavior
Is happening
3 Building Blocks
Infrastructure Management
Networks
Physical & virtual systems
Databases
Service Assurance
Application Management
Copyright © 2009 CA30
Pivotal Insight
Fault, Performance,
Configuration &
Capacity
IT Benefits
Improve Infra MTTR 50%
Reduce Infra DT 70%
Boost Efficiency 43%
Pop Quiz!!
What Does This Mean?
M T T I
An Important Concept
Mean Time
To
Innocence
3 Building Blocks
Infrastructure Management
Networks
Physical & virtual systems
Databases
Service Assurance
Application Management
Web portals & servers
Application servers
J2EE, .NET & client/server apps
Copyright © 2009 CA34
Pivotal Insight
Fault, Performance,
Configuration &
Capacity
Customer Experience
& Transaction
Behavior
IT Benefits
Improve Infra MTTR 50%
Reduce Infra DT 30%
Reduce Help Desk Calls 40%
MTTR = Mean-Time-to-RepairDT = Downtime
35
WAN/WWW
App ServerRouter Firewall Switch Web Servers
Load Balancer
Portal
SAP
Siebel
Web Services
3rd Party Applications
Databases
Mainframe
PSFT
Database
NETWORKUSER FRONT END
BACK END
Discover applications;
understand user
experience; establish SLAs
Monitor all business
transactions through
infrastructure; measure
response and SLAs
Conduct triage across
infrastructure; diagnose
application root cause
MIDDLEWARE
End User
#2: Application Performance Management
End-to-End View of Transaction Behavior
September 5, 2009
36
WAN/WWW
App ServerRouter Firewall Switch Web Servers
Load Balancer
Portal
SAP
Siebel
Web Services
3rd Party Applications
Databases
Mainframe
PSFT
Database
NETWORKUSER FRONT END
BACK END
MIDDLEWARE
End User
Discover applications;
understand user
experience; establish SLAs
Monitor all business
transactions through
infrastructure; measure
response and SLAs
Conduct triage across
infrastructure; diagnose
application root cause
#2: Application Performance Management
End-to-End View of Transaction Behavior
September 5, 2009
37
WAN/WWW
App ServerRouter Firewall Switch Web Servers
Load Balancer
Portal
SAP
Siebel
Web Services
3rd Party Applications
Databases
Mainframe
PSFT
Database
NETWORKUSER FRONT END
BACK END
MIDDLEWARE
End User
Discover applications;
understand user
experience; establish SLAs
Monitor all business
transactions through
infrastructure; measure
response and SLAs
Conduct triage across
infrastructure; diagnose
application root cause
#2: Application Performance Management
End-to-End View of Transaction Behavior
38September 5, 2009 CA Wily Technology Copyright © 2008 CA
Best Practice: Understand Real User Experience
> Monitor all transactions all the time
24x7 in production
From browser to back-end
> Understand your users’ experience
Monitor end-user browser response times
Identify users by name and priority
See transaction behavior’s
impact to the service
39September 5, 2009 CA Wily Technology Copyright © 2008 CA
Best Practice: Set SLAs
> Set and measure SLAs on
processes
> Understand the impact of
Application Performance
> Communicate in the language
of the stakeholders
DMV H.P.
Unemployment H.P.
Unemployment Reg
Driving Test
Online Reg.
Driving Manual
Job Postings
Prof. Skills
40 September 5, 2009 CA Wily Technology Copyright © 2008 CA
Best Practice: Incident Prioritization
> Assign value to successful
and unsuccessful transactions
> Prioritize incidents based on
the impact
Criticality of the transaction
Priority of the users
Severity of the error
> Work to resolve most
service’s critical issues first
41 September 5, 2009 CA Wily Technology Copyright © 2008 CA
Best Practice: Triage and Root Cause Analysis
> Triage
Identify the problem source’s domain
Aid remediation
> Diagnosis and root cause analysis
Monitor the transaction calls through the application
Identify application components or back end calls behind failures
3 Building Blocks
Infrastructure Management
Networks
Physical & virtual systems
Databases
Service Assurance
Application Management
Web portals & servers
Application servers
J2EE, .NET & client/server apps
Copyright © 2009 CA42
Pivotal Insight
Fault, Performance,
Configuration &
Capacity
Customer Experience
& Transaction
Behavior
IT Benefits
Improve App MTTR 50%
Reduce App DT 70%
Reduce App Perf Delays 71%
Improve Infra MTTR 50%
Reduce Infra DT 30%
Reduce Help Desk Calls 40%
MTTR = Mean-Time-to-RepairDT = Downtime
3 Building Blocks
Infrastructure Management
Networks
Physical & virtual systems
Databases
Service Assurance
End-to-end infrastructure
Applications
Application Management
Web portals & servers
Application servers
J2EE, .NET & client/server apps
Copyright © 2009 CA43
Pivotal Insight
Fault, Performance,
Configuration &
Capacity
Customer Experience
& Transaction
Behavior
Infrastructure
Impact on
Quality & Risk
IT Benefits
Improve App MTTR 50%
Reduce App DT 70%
Reduce App Perf Delays 71%
Improve Infra MTTR 50%
Reduce Infra DT 70%
Reduce Help Desk Calls 40%
MTTR = Mean-Time-to-RepairDT = Downtime
September 5, 2009
44
WAN/WWW
App ServerRouter Firewall Switch Web Servers
Load Balancer
Portal
SAP
Siebel
Databases
Mainframe
PSFT
Database
NETWORKUSER FRONT END
BACK END
MIDDLEWARE
End User
Transaction behavior
insight
Infrastructure status
insight
Service quality, risk to
service delivery & root
cause insight
#3: Service Assurance Management…!
Complete View of the IT Service
September 5, 2009
45
WAN/WWW
App ServerRouter Firewall Switch Web Servers
Load Balancer
Portal
SAP
Siebel
Databases
Mainframe
PSFT
Database
NETWORKUSER FRONT END
BACK END
MIDDLEWARE
End User
Transaction behavior
insight
Infrastructure status
insight
Service quality, risk to
service delivery & root
cause insight
#3: Service Assurance Management…!
Complete View of the IT Service
11
Best Practice: Service Modeling & Impact Analysis
Service
Switches
Servers
Service
Topology:
real-time
model,
component
relationships
& impact
Services &
Components:
based
infrastructure,
application
and customer
experience
Component
Detail:
status,
severity,
impact &
root cause
Copyright © 2009 CA46
Transactions
Databases
DMV H.P.
Driver’s Manual
Tax Filing
Medicare
Fish & Game
QualityRisk
HealthAvailability
Best Practice: Service Dashboards & Reports
Copyright © 2009 CA47
Historical
Service
Status
Details
Services listed
according to:
Business
Importance
Quality level
Risk to quality
Real-time
Service
Status
Indicators
3 Building Blocks
Infrastructure Management
Networks
Physical & virtual systems
Databases
Service Assurance
End-to-end infrastructure
Applications
Application Management
Web portals & servers
Application servers
J2EE, .NET & client/server apps
Copyright © 2009 CA48
Pivotal Insight
Fault, Performance,
Configuration &
Capacity
Customer Experience
& Transaction
Behavior
Infrastructure
Impact on
Quality & Risk
IT Benefits
Focus staff on your mission
Improve quality/predictability
Lower IT cost
Improve App MTTR 50%
Reduce App DT 70%
Reduce App Perf Delays 71%
Improve Infra MTTR 50%
Reduce Infra DT 30%
Reduce Help Desk Calls 40%
MTTR = Mean-Time-to-RepairDT = Downtime
What if You Could Solve the Paradox?
Unavailable or Slow Available, Performing as Expected
Web Server
App Server
Mainframe
Database
Network
Application
End-User Service
05:00 06:00 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00
99%
99%
99.9%
99.9%
99.999%
99.9%
?
% available
And who is measuring the real user experience … and managing the whole service
Copyright © 2009 CA49
Managing Technologies vs. Services
Managing Infrastructure Top-Down in Context of Services is Required
What if You Could Define & Manage Services?
Databases
Service #1
Service #2
Service #3
Service #4
Service #5
Service #6
Storage
Client Systems
Networks
Applications
Servers
Intelligent Service Modeling and Analytics is Required
Copyright © 2009 CA50
What if You Could Focus People & Processes?
51
Service Desk
Network Engineering
System Engineering
Application Engineering
DB Administration
Prioritized & Shared Understanding of Services, Their Condition & Root Cause is Required
Databases
Service #1
Service #2
Service #3
Service #4
Service #5
Service #6
Storage
Client Systems
Networks
Applications
Servers
Analyst
Recommendations
Have a Common Framework for IT Ops Tools
60% of the IT operations management market share is in the hands of BMC, CA, HP and IBM TivoliGartner Dataquest: IT Operations Management Software,
Worldwide, 2007
Framework Should Support IT Ops Processes
> Example: Fault to Remediation Process
> Reducing the MTTR by eliminating manual intervention and automatically passing event data between products
Gartner Dataquest: IT Operations Management Software, Worldwide, 2007
Settle on a Management Strategy
> Focus staff on highest value deliverables
> Improve user experience
> Reduce waste
> Improve productivity
CA Leaders in Lean IT: 2009
Settle on a Management Process
> Set and measure SLAs on key business
processes
> Monitor 100% of all end user transactions
> Employ predictive and proactive monitoring
> Prioritize incidents based on service impact
> Implement the capability for rapid triage and
root cause analysis
> Report results and implement continuous
improvement processes
Adapted from Ashton-Metzler Associates: 2009
Evaluate:
How staff determines & weigh risks
Percent reactive vs. proactive
Continuous improvement methods & success
Evaluate:
Triage process
Manual versus automated methods
Building block (tools) capability & integrations (enable best practices?)
Evaluate:
Time/cost to determine root cause
Resources to manage problems vs. add value
Pick three top services
Evaluate staff focus, understanding & approach to monitoring
Focus on
high value
Reduce Waste
Improve Service
Improve Productivity
Getting Started
Final Quiz
Why Are These Numbers Important?
30
40
50
70
71
25
9
433
September 5, 2009
Sources: “Achieving Business Value & Gaining ROI with CA’s EITM Software,” IDC, 2007, and “IT Economics, Gaining Business Value with CA’s Enterprise IT Management Software: An ROI Study,” IDC, 2008, and direct customer data
Analyst Data on Best Practices
> Lower overall cost of service delivery
by up to 30%
Reduce service desk calls up to 40%
Improve MTTR up to 50%
Reduce downtime up to 70%
Reduce performance delays up to 71%
Improve staff productivity up to 25%
Reduce number of IT tools up to 50%
> New Management Technologies
Average payback 9 months or less
Average 433% ROI over 3 years
Copyright © 2009 CA60
COST
VALUE
Ap
plic
atio
n P
erfo
rman
ce M
gmt
Serv
ice
Man
agem
ent
Pro
ject
& P
ort
folio
Mgm
t
Infr
astr
uct
ure
Man
agem
ent
Secu
rity
Man
agem
ent
Mai
nfr
ame
2.0
Thank You
Assuring Service Quality Despite Limited Resources