Information Technology DivisionExecutive Office for Administration and Finance
IT Service Excellence Committee(ITSEC)
Sept. 19, 2011
Topic Desired Outcomes Discussion lead
Allotted time
August KPI review Discussion of KPI volumes in order to determine opportunity areas
Tom/All 10 min
Systems Criticality Rating Initiative Teeing up for a coming discussion on Application Criticality
Tom/Dan/Ron 25 min
Change Management Review of proposed Change Management discussion for ISB
Ron/Deb Seaward
30 min
FY 12 Plan Review proposed goals Tom 5 min
Roundtable News of member activity All 15 min
2
Agenda
3
Augusts Incident Activity by Priority
August Incident Activity by Category
4
Background: Critical Systems designation
5
• In 2009 in response to Pandemic planning exercise the Commonwealth undertook effort to identify critical services – led by both EPS, EHS (Dept. of Public Health) and MEMA
• A Senior level pandemic readiness committee across the Executive staff was formed and supported by an IT committee and by a expert business continuity consultant (members included Dan Walsh, Curt Wood, Ron T., etc.)
• Across the Executive Office, identified the Essential Functions (ESFs) and then the critical IT systems that support them across Health, Safety and Financial Services
• Initial focus of Pandemic planning was on loss of personnel and effect on critical services• There were multiple Tiering levels and priorities assigned • A list was developed which identified approx. 200 applications and their priorities (see
appendix)• There is a need to fully operationalize this work because it drives:
• Disaster Recovery and Business Continuity Planning• Back-up• Service Restoration in terms of Incident Management • System Design choices
Executive Sponsorship to operationalize
6
• In August of this year Dan Walsh and Claudia Boldman from ITD brought forth a proposed classification scheme to formalize the process of defining system criticality- see appendix
• Categories: Vital, Important, Supporting• This work was endorsed by the CIO Cabinet (and ITD Executive Committee)• The CIO Cabinet directed that the Commonwealth’s Service Excellence program
operationalize this classification scheme (actually Emmet Millet had raised the need for having a formal approach to id critical systems)
• Objective is to move beyond “loss of personnel resources” to a more normalized approach – loss of IT service
• Why this committee has been asked to lead this work• Aligns well with the overall thrust of Service Excellence program & ITSEC has the
necessary representation• Some ITSEC members have been engaged with their business partners to determine
SLO levels (i.e. and thus some understanding of levels of expected support)• The ITSEC has done very good work to map business driven examples of Incident
Impact and Urgency into an Incident Prioritization model
* Maureen Quinn from ITD who supported this effort
Considerations/questions
7
• Considerations- the customers would drive the classification, i.e. this is a business call
• If a system is classified as being of a certain criticality, then it must meet certain minimum technical standards and be subject to certain policies,
• D/R requirements (i.e. designed such that 2nd data center capabilities can be leveraged)
• Availability- expectation• Supportability: runs on a supported OS, supported version of DB, supported h/w, etc.• For new systems in design phase, a critical classification would be determined and if a
system was designated as a Commonwealth Critical system (i.e. Vital classification) certain design characteristics must be incorporated
• Significant Changes to Systems classified as vital would be made visible through a commonwealth-wide Change Management process
• Retrofitting of such system designation to existing systems and implied level of technical compliance- i.e. there is a price associated w/ designating a system as mission critical
• This work was driven by a sense of urgency during the 2009 pandemic awareness (H1N1 Flu virus) – how can we re-ignite that sense of importance on the business side?
Establish SE Culture
Offer of Enterprise Service Delivery tool
Commonwealth Policies, Processes and Metrics for Incident Management
Develop Commonwealth SLOs and SLO Reporting
Commonwealth Policies, Processes &Metrics for Change Management
Common H/W & S/W Asset Mgt tool
Institute single Commonwealth Virtual Operations Culture
8
FY 11 Q2-Q4
We Are Here
Model fully implemented
FY 12 FY 13
ITSECestablished
Internal ITD Pilot- Monthly LOB/SLO Rpting
Single Metrics reporting Framework established
ITSECWiki presence
ITD COMiTimplemented
1st Sec. Adoption COMiT
2nd Sec. Adoption COMiT
Follow-on COMiT Adoptions or integrations
Incident PrioritiesDefined & ISB approved
Incident end-to-endModel defined
ITD LOB/SLO reporting to Customers
ITD BSLO customer metrics
Change Types & Stnd WindowsDefined & ISB approved
Weekly CW wideChange Calendar
ITD weekly CM Calendar published
All ITD h/w s/w in Enterprise tool Commonwealth h/w s/w in Enterprise tool
Operational Framework Defined and agreed Monitoring tools rationalized/integrated in support of end to end SE model
Education & Marketing plan
ITSEC Road show ITSEC Day & Symposium
Commonwealth-wide Reporting of Incident metrics
CIO endorsed Service Excellence Program 3 year Road Map
Change Management – Context
9
•At our 8/22 meeting, Donna P. raised the question of focus on Change Management (CM)- this co-incides with our FY 12 committee objectives•Objective: Obtain ITSEC approval to have a collective initiative on Change activity•Change meaning both Change Tickets and Service Requests •Rationale:
• One major challenge is minimizing the impact of Change on our Production Environment• A second important challenge we face is the often conflicting demands of Incident
Management and Service/Request Management• A large part of what our Helpdesks are managing volume wise is Customer Requests
•Start with Change Management, then work on Service Requests•Ron offered to share one approach to CM at the next meeting•Measurement
• As Secretariats begin to adopt a Change Management & Service Request processes, we would begin to gather that data in a manner similar to our Incident Metrics
• ITD and EHS would initially begin reporting Change metrics as this data becomes available
Vision
10
• End State:• Changes to the Commonwealth-wide IT Production environment will be managed and
controlled through a formal change process under the responsibility of a Commonwealth Change Advisory Board (CCAB)
• A Commonwealth Change Advisory Board will be supported by similar boards at the Secretariat level established to advise the CCAB in the Assessment, prioritization and scheduling of Changes.
• The scope of changes covered by this policy would be based on a collective agreement across the Secretariats
• Metrics would be published similar to those of Incident Management by the ITSEC resulting in a demonstrated collective improvement in the stability of IT Services across the Commonwealth
Sharing of a Change Management process
11
• Background: Process been in place for several years, we are constantly working to improve it, i.e. we just introduced a real-time Change Dashboard*
• We are very excited about other groups moving forward in this area and see it as a real win
• Process and Procedures are documented here (click on Procedure) :http://www.mass.gov/?pageID=itdintranetsubtopic&L=3&L0=Home&L1=IT+Service+Management+(ITSM)&L2=Change+Management&sid=Aitdintranet
• Weekly Change Management is always open to representatives from other Secretariats
• Forward Schedule of Change•http://osgdashboard.anf.govt.state.ma.us:83/scripts/comit_forwardschedule.asp?MyMonth=9&MyYear=2011
• *Daily Change Dashboard: http://osgdashboard.anf.govt.state.ma.us:83/scripts/comit_chgdaily-1.asp
Gaining traction around CM- what does it take?
12
1. Obtain Executive level support for adopting Change Management2. Designate a single Change Owner—for the Secretariat level3. Establish a monthly or weekly CAB (Change Advisory Board) 4. Draft a documented change process and policy
a) Should have well defined categories of changesb) Criteria for approval (i.e. Testing plan, adequate notice, emergency changes)
5. Establish some kind of ticketing system- even if manual spreadsheet6. Start reporting basic metrics… # successful ; # failed; # of emergency
changes 7. Implement a Post Implementation Review (PIR) process
Proposal
13
• Establish a monthly (to start with) Commonwealth-wide Change Management meeting
• Really looking initially to sponsor an information exchange i.e. doesn’t need folks to have a formal change calendar or change process currently in place
• A Single Representative for each Secretariat -- sharing can be very informal• Representatives would have a broad view of change activity across their Secretariat• Each Representative would speak to known significant change activity planned for the
upcoming month in their area• ITD would include it’s Service Account Managers as additional resources• Criteria for discussion:
• Type of changes: Infrastructure or Application changes • Scope: (As determined by this Board)-- High level: Change activity that will or
possibility that it could affect other Secretariats if unforeseen issues arise • Major application release events such as NMMIS, ALARS, etc but small
application changes would excluded• Major infrastructure changes: new Main Core switch, Internet changes
• Expected numbers: no more than 8-10 total changes at first• Benefits:
• Help one group avoid collusions (i.e. ITD making changes and being unaware of concurrent changes to same environment being made by another Secretariat)
• Provide an awareness of major change activity and enable quicker incident resolution on change induced outages
Next Steps
14
• ITSEC members provide names of representatives for a Commonwealth-wide CAB (CCAB) by Oct. 7th ( 3 weeks)
• Idea presented to 10/11 ISB meeting by ITSEC members
• Deb Seaward schedules first meeting for 3rd week October
• Tom/Deb to work off-line on pre-work to help structure discussion and prepare our first FSC (Forward Schedule Change) with direct Secretariat access
• ITSEC debriefs at following meeting
Caveats/Considerations/Concerns
15
• We know folks are in different places in terms of maturity
• What kind of visibility do folks have in their respective areas ?
• This work requires executive level support
16
• Continue Implementation Enterprise Service Delivery tool for Secretariats – Q1-3
• Develop Common Policies, Processes and Metrics for Change Management and Request Management - Q2
• Expand delivery of Incident KPI reporting to include breakdown by service categories and formalize definition of Response Time – Q2
• Complete Definition of SLO’s in support of Service Catalog’s for 50% of ITSEC membership – Q2
• Complete migration of ITD and begin migration Secretariats to a common hardware & software asset management tool- Q3
• Continue development of Service Excellence culture- complete ITIL training through eLearning at practitioner level for Incident, Change and Configuration (as appropriate) Q1-4
ITSEC - Proposed FY 12 Goals
17
Appendix
Service Request and Change Tickets• Proposal
• As Secretariats begin to adopt a Change Management & Service Request processes, we would begin to gather that data in a manner similar to our Incident Metrics
18
**SLO meaning that the (Scheduled finish time- Schedule start time) is greater or equal to the (actual- finish time – actual start time), then SLO is met.
19
Procedure Owner / Description Procedure Step Required Time Period Change Meetings and Reports
Change Owner The Change owner creates a Change Request in COMiT and attaches all of the required supporting information.
Record a Change Request
LOW Priori ty Change: two or more weeks before Scheduled (requested) s tart date. MEDIUM Priori ty Change: more than 48 hours and less than two weeks before Scheduled (requested) s tart date. HIGH Priority Change: 48 hours or less than Scheduled (requested) s tart date
Change Management Reports, including the Change Management Calendar, are ava ilable real-time through the ITD Internal Portal in the section titled Service Metrics.
Change Management Support Change Manager or designee reviews the Change Request for: 1. schedule conflicts 2. correct classification 3. Completeness. Non-standard changes require a technical assessment as well as the Change Implementation steps and back-out plan. Customer approvals are needed, where appropriate.
Review and Accept a Change Request
Change Request reviews need to be completed in a timely manner so that:
1. Change Request will be included on the Weekly Change Management Report of Proposed and Approved Changes for two weeks, and
2. Change Request can be scheduled for a CAB review in sufficient time for customer notification.
Weekly Change Management Meeting – Every Thursday at 11:00 AM. The Proposed and Approved Change Management report i s ava ilable for this meeting. This rea l -time report i s available through the ITD Internal Portal in the Service Metrics section.
Change Management Support/CAB Standard changes are approved by the Change Manager or designee. All other changes are reviewed for approval by the CAB. The Change Manager prepares the agenda for the meeting and provides the CAB members with the change requests. The Change Manager or designee follow-up on any tasks or i ssues that the CAB members address in their review meeting. The Change Manager or designee dis tributes the CAB meeting minutes to CAB members and change owners.
Approve a Change Request
Change Request must be reviewed and approved in order to provide customer notifications at least 24 hours before the change. Changes that have wide spread impact with extended downtime requires 2 weeks’ notice for customers.
CAB Meeting ; Every Tuesday and Thursday at 10:30 am. CAB EC meetings are scheduled as needed for High Priority change, as needed. The Changes Approved by CAB report i s available through the ITD Internal Portal in the section titled Service Metrics.
Change Owner The change owner implements the change and follows the COMiT change request process, indicating the success of both implementation tasks and the overall change request.
Implement a Change Request
The change is implemented according to the approved scheduled date and time. Change record is expected to be updated within 60 minutes of the work being done.
Change Management Procedure
20
Change Management Summary Procedure
Change Management Support Change Manager or designee reviews the s tatus of non-standard changes and may conduct a Post Implementation Review (PIR). The PIR will be conducted for unsuccessful changes. A PIR may be conducted for changes that have high impact or are complex.
Conduct Post Implementation Review (PIR)
After the change has been attempted or completed. (in the future)
Change Management Support Change Manager or designee reviews the change record for the status reported by the change owner, and indicates the status of the change taking into consideration the perspective of the customer. A PIR may have been conducted before this step occurs .
Close a Change Request
As soon as the Change owner has updated the status of the change in COMiT and/or after the PIR has been completed.
Change Management Procedure (con’t)
Commonwealth Change Management Policy- a sample framework for future consideration
21
Change Type and key Change Management business rules
Change Type Assessment required
CAB approval required
Emergency CAB approval required
Only Change Manager approval required
Change Manager or Duty Manager (if off hours); LOB Owner and Product Owner (always) and (where appropriate) Customer agency IT executive approval required
Normal No (this requirement may possibly come at a later point of maturity in several months)
Standard
Emergency
Classification of Critical IT Services That Support Essential Commonwealth Functions
22
Problem Statement
At various points in the lifecycle of an IT system there is a need to account for the criticality of that system. System criticality is an important attribute to consider in making investment decisions in system design, business continuity and disaster recovery planning, determining business and information security risk, and incident management to name a few examples. Over the years different definitions for critical systems have been used as part of Y2K planning, pandemic planning, disaster recovery, and incident management. More often than not the “critical” designation was applied based on subjective criteria. There is a need to establish a standard definition for Commonwealth Critical systems that can be used for classification in an objective manner throughout the system lifecycle across Commonwealth entities. This common classification can then be used as a starting point to apply other criteria that will determine the next course of action in specific planning or service management processes.
The word “system” as used above can also be problematic in that it has been used to generally refer to an application often in isolation of all the related system supports that allows that application to function appropriately and effectively. Alternatively, the term “IT service” refers to all IT products (including people, processes, and technologies) necessary to serve a specific business need. When referring to IT support for Essential Commonwealth Functions it is important to think not only about the core application or system but also all the supporting IT products without which the system could not meet business requirements. We use the term IT Service rather than system in the classification guidance below.
I. The Commonwealth engages in a wide variety of government functions on behalf of its constituents. Three of those are considered to be Commonwealth Essential Service Functions (ESFs):
Protect the health of constituents Protect the safety of constituents Provide financial services and support to constituents
It is acknowledged that there are other important government functions and associated IT Services that are considered critical to the mission of specific secretariats and agencies. On a statewide basis, however, only those that impact the health, safety, and financial support for constituents are considered Commonwealth ESFs.
II. IT Services that support Commonwealth ESFs are considered “Commonwealth Critical”. The degree of the IT Service’s impact on the performance of the essential function is categorized as follows:
Vital – Non-availability or degradation of this service will significantly impact the health, safety and/or financial well-being of constituents Important – Non-availability or degradation of this service could potentially impact the health, safety and/or financial well-being of constituents
Supporting – Non-availability or degradation of this service will not significantly impact the health, safety and/or financial well-being of constituents
23
I. Classification of Commonwealth Critical IT Services Identify the Commonwealth ESF impacted by the system Categorize the IT Service as Vital, Important or Supporting using the standard definitions
above Provide enough detailed impact information to justify the Vital, Important, or
Supporting designation
Examples:
IT Service Name Commonwealth ESF Category Impact of Non-Availability/Degradation Meditech Hospital Health Vital Medical practitioner orders will be delayed and
workflow will be disrupted impacting the health and possibly life of patients
COMETS Child Support Financial Vital Child support payments will be stopped or delayed affecting the well-being of children and their families
MMARS Financial Important Vendor payments will be delayed potentially affecting vendors’ cash flow
Dept. of Corrections Intranet
Safety Important Communications among DOC staff will be disrupted and access to applications and databases important to the operation of the department will be made more difficult potentially affecting the day to day management of correctional facilities and inmates
Commonwealth Information Warehouse
Financial Supporting Agency reports will not reflect most current data delaying reconciliation of accounts
Classification of Critical IT Services (con’t)
24
I. Other Important Criteria Once an IT Service is classified as Vital, Important or Supporting, additional criteria will be applied to initiate a standard course of action. Disaster Recovery example - In Disaster Recovery planning reference is made to the
additional criteria of Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) defined as a part of the Business Continuity Planning process. When used in conjunction with Critical IT Service classification, RTOs and RPOs help determine the most appropriate Disaster Recovery strategy, i.e. nature of back-ups, data replication, mirroring techniques, etc.
Incident Management example – Critical IT Service classification can aid Incident Management processes and the service restoration activities covered under the ITIL Service Management framework. Determining the priority of an incident is a first step in the Incident Management process and Critical IT Service classification is one of the elements to consider. Prioritization of incidents is based on two factors: Impact and Urgency. Impact is determined by the scope and/or severity of the outage or degradation. Impact can also be determined by considering the party(ies) directly affected. Critical IT Service classification is an aid in determining Impact. Urgency refers to the relative need for immediate attention, work around options, time and cost of a work around implementation versus full service restoration, etc. All things being equal (i.e. scope of outage, number of people affected), incidents affecting Critical IT Services classified as Vital will have a higher impact and thus higher priority for service restoration than those classified as Important or Supporting.
Classification of Critical IT Services (con’t)
25