itil v3 service operation

70
ITIL v3 / 2011 Service Operation

Upload: abdul-salam-infobhan

Post on 25-Oct-2014

204 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: ITIL v3 Service Operation

ITIL v3 / 2011

Service Operation

Page 2: ITIL v3 Service Operation

ITIL v3/2011 Processes

Page 3: ITIL v3 Service Operation

Objective

The objective of ITIL Service Operation is to make sure that IT services are delivered effectively and efficiently. This includes 1. fulfilling user requests2. resolving service failures3. fixing problems4. carrying out routine operational tasks.

Page 4: ITIL v3 Service Operation

Processes

• Event Management• Incident Management• Request Fulfillment• Access Management• Problem Management• IT Operations Control• Facilities Management• Application Management• Technical Management

Page 5: ITIL v3 Service Operation

Event Management

• Objective: The objective of ITIL Event Management is to make sure CIs and services are constantly monitored. Event Management aims to filter and categorize Events in order to decide on appropriate actions if required.

• Process Description :Essentially, the activities and process objectives of the Event Management process are identical in ITIL V3 and V2. In ITIL 2011 Event Management has been updated to reflect the concept of 1st Level Correlation and 2nd Level Correlation

Page 6: ITIL v3 Service Operation

Sub Processes

1. Maintenance of Event Monitoring Mechanisms and Rules - To set up and maintain the mechanisms for generating meaningful Events and effective rules for their filtering and correlating.

2. Event Filtering and 1st Level Correlation - To filter out Events which are merely informational and can be ignored, and to communicate any Warning and Exception Events.

3. 2nd Level Correlation and Response Selection - To interpret the meaning of an Event and select a suitable response if required.

4. Event Review and Closure - To check if Events have been handled appropriately and may be closed. This process also makes sure that Event logs are analyzed in order to identify trends or patterns which suggest corrective action must be taken.

Page 7: ITIL v3 Service Operation

Definitions to represent process outputs and inputs

• Event- see Event Record – • Event Categorization Scheme The Categorization Scheme for Events supports a

consistent approach to dealing with specific types of Events. Ideally, this scheme should be harmonized with the schemes to categorize CIs, Incidents and Problems.

• Event Filtering and Correlation Rules - Rules and criteria used to determine if an Event is significant and to decide upon an appropriate response. Event Filtering and Correlation Rules are typically used by Event Monitoring systems. Some of those rules are defined during the Service Design stage, for example to ensure that Events are triggered when the required service availability is endangered.

• Event Record - A record describing a change of state which has significance for the management of a Configuration Item or service. The term Event is also used to mean an alert or notification created by any IT service, Configuration Item or monitoring tool. Events often require IT operations personnel to take actions, and may lead to Incidents being logged.

• Event Trends and Patterns -Any trends and patterns identified during analysis of significant Events, which suggest that improvements to the infrastructure are needed.

Page 8: ITIL v3 Service Operation

Responsibility Matrix: ITIL Event Management

Responsibility Matrix: ITIL Event ManagementITIL Role / Sub-Process IT OPM IT OPR EMS Other RolesMaintenance of Event Monitoring Mechanisms and Rules

A[1]R[2] R - R

Event Filtering and 1st Level Correlation A - R -2nd Level Correlation and Response Selection A R R -

Event Review and ClosureAR - - -

Remarks[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Event Management process.[2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Event Management.[3] In cooperation, as appropriate: IT Operations Manager, Access Manager, Capacity Manager, Availability Manager, IT Service Continuity Manager, Information Security Manager, Applications Analyst and/ or technical Analyst.

Page 9: ITIL v3 Service Operation

Incident Management

• Objective: ITIL Incident Management aims to manage the lifecycle of all Incidents. The primary objective of Incident Management is to return the IT service to users as quickly as possible.

• Parent Process: Service Operation• Process Owner: Incident Manager

Page 10: ITIL v3 Service Operation

Process Description (v3)

• Incident Management according to ITIL V3 distinguishes between Incidents (Service Interruptions) and Service Requests (standard requests from users, e.g. password resets). Service Requests are no longer fulfilled by Incident Management; instead there is a new process called Request Fulfilment.

• There is a dedicated process in ITIL V3 for dealing with emergencies ("Major Incidents"). Furthermore a process interface was added between Event Management and Incident Management. Significant Events are triggering the creation of an Incident.

Page 11: ITIL v3 Service Operation

Process Description (2011)• Guidance has been improved in Incident Management on how to prioritize an

Incident (see Checklist Incident Prioritization Guideline). • Additional steps have been added to Incident Resolution by 1st Level Support to

explain that Incidents should be matched (if possible) to existing Problems and Known Errors.

• Incident Resolution by 1st Level Support and Incident Resolution by 2nd Level Support have been considerably expanded to provide clearer guidance on when to invoke Problem Management from Incident Management. The emphasis is now on restoring services as quickly as possible, and to seek the help of Problem Management if the underlying cause of an Incident cannot be resolved with a minor Change and/or within the committed resolution time.

• The Incident Management sub-process Incident Closure and Evaluation now states more clearly that it is important to check whether there are new Problems, Workarounds or Known Errors that must be submitted to Problem Management.

• The process overview of ITIL Incident Managementis showing the most important interfaces (see Figure 1).

Page 12: ITIL v3 Service Operation

Incident Prioritization Guideline

• This describes the rules for assigning priorities to Incidents, including the definition of what constitutes a Major Incident. Since Incident Management escalation rules are usually based on priorities, assigning the correct priority to an Incident is essential for triggering appropriate Incident escalations.

• An Incident’s priority is usually determined by assessing its impact and urgency, where Urgency is a measure how quickly a resolution of the Incident is required Impact is measure of the extent of the Incident and of the potential damage caused by the Incident before it can be resolved.

Page 13: ITIL v3 Service Operation

Incident Urgency (Categories)

• This section establishes categories of urgency. The definitions must suit the type of organization, so the following table is only an example: To determine the Incident’s urgency, choose the highest relevant category:

Category Description

High (H) • Staff are not able to do their job• Customers are being acutely

disadvantaged in some way

Medium (M)

• Staff are unable to do their job properly

• Customers are inconvenienced in some way

Low (L) • Staff are able to deliver an acceptable service but this requires extra effort

• Customers are inconvenienced but not in a significant way

Page 14: ITIL v3 Service Operation

Incident Impact (Categories)

• This section establishes categories of impact. The definitions must suit the type of organization, so the following table is only an example:

• To determine the Incident’s impact, choose the highest relevant category:

Category Description

High (H) • A large number of users is affected• A large number of customers is affected• The financial impact of the Incident is (for

example) likely to exceed $10,000• The damage to the reputation of the business is

likely to be high• Someone has been injured

Medium (M)

• A moderate number of users is affected• A moderate number of customers is affected• The financial impact of the Incident is (for

example) likely to exceed $1,000 but will not be more than $10,000

• The damage to the reputation of the business is likely to be moderate

Low (L) • A minimal number of users is affected• A minimal number of customers is affected• The financial impact of the Incident is (for

example) likely to be less than $1,000• The damage to the reputation of the business is

likely to be minimal

Page 15: ITIL v3 Service Operation

Incident Priority Classes

• Incident Priority is derived from urgency and impact. If classes are defined to rate urgency and impact (see above), an Urgency-Impact Matrix can be used to define priority classes, identified in this example by colors and priority codes:

Impact

H M N

Urgency

H 1 2 3

M 2 3 4

L 3 4 5

Priority Code Description Target Response Time

Target Resolution Time

1 Critical Immediate 1 Hour

2 High 10 Minutes 4 Hours

3 Medium 1 Hour 8 Hours

4 Low 4 Hours 24 Hours

5 Very low 1 Day 1 Week

Page 16: ITIL v3 Service Operation

Circumstances that warrant the Incident to be treated as a Major Incident

• Major Incidents call for the establishment of a Major Incident Team and are managed through the Handling of Major Incidents process. The above prioritization scheme notwithstanding, it is often appropriate to define additional, readily understandable indicators for identifying Major Incidents (see also the comments below on identifying Major Incidents). Examples for such indicators are:

1. Certain (groups of) business-critical services, applications or infrastructure components are unavailable and the estimated time for recovery is unknown or exceedingly long (specify services, applications or infrastructure components)

2. Certain (groups of) Vital Business Functions (business-critical processes) are affected and the estimated time for restoring these processes to full operating status is unknown or exceedingly long (specify business-critical processes)

Page 17: ITIL v3 Service Operation

Identifying Major Incidents• It is not easy to give clear guidelines on how to identify major incidents although the

1st Level Support often develops a "sixth sense" for these. It is also probably better to err on the side of caution in this respect. A Major incidents tend to be characterized by its impact, especially on customers. Consider some examples:

1. A high speed network communications link fails and part of or all data communication to and from outside the organization is cut off.

2. A website grinds to a halt because of unexpected heavy demand prior to a deadline (for example to reserve tickets or make a legal submission) resulting in large numbers of customers failing to meet that deadline.

3. A key business database is found to be corrupted.4. More than one business server is infected by a worm.5. The private and confidential information of a significant number of individuals is

accidentally disclosed in a public forum.• Note also that all disasters (covered by the IT Service Continuity Strategy and

underpinning ITSCM Plans) are Major Incidents and that smaller incidents that are compounded by errors or inaction can become major incidents.

Page 18: ITIL v3 Service Operation

Some of the key characteristics that make these Major Incidents are:

• The ability of significant numbers of customers and/or key customers to use services or systems is or will be affected.

• The cost to customers and/or the service provider is or will be substantial, both in terms of direct and indirect costs (including consequential loss).

• The reputation of the Service Provider is likely to be damaged.AND• The amount of effort and/or time required to manage and resolve

the incident is likely to be large and it is very likely that agreed service levels (target resolution times) will be breached.

• A Major Incident is also likely to be categorized as a critical or high priority incident.

Page 19: ITIL v3 Service Operation

9 Sub-Processes• Incident Management Support - to provide and maintain the tools, processes, skills and rules for an effective and efficient

handling of Incidents.

• Incident Logging and Categorization - To record and prioritize the Incident with appropriate diligence, in order to facilitate a swift and effective resolution.

• Immediate Incident Resolution by 1st Level Support - To solve an Incident (service interruption) within the agreed time schedule. The aim is the fast recovery of the IT service, where necessary with the aid of a Workaround. As soon as it becomes clear that 1st Level Support is not able to resolve the Incident itself or when target times for 1st level resolution are exceeded, the Incident is transferred to a suitable group within 2nd Level Support.

• Incident Resolution by 2nd Level Support - To solve an Incident (service interruption) within the agreed time schedule. The aim is the fast recovery of the service, where necessary by means of a Workaround. If required, specialist support groups or third-party suppliers (3rd Level Support) are involved. If the correction of the root cause is not possible, a Problem Record is created and the error-correction transferred to Problem Management.

• Handling of Major Incidents - To resolve a Major Incident. Major Incidents cause serious interruptions of business activities and must be resolved with greater urgency. The aim is the fast recovery of the service, where necessary by means of a Workaround. If required, specialist support groups or third-party suppliers (3rd Level Support) are involved. If the correction of the root cause is not possible, a Problem Record is created and the error-correction transferred to Problem Management.

• Incident Monitoring and Escalation- To continuously monitor the processing status of outstanding Incidents, so that counter-measures may be introduced as soon as possible if service levels are likely to be breached.

• Incident Closure and Evaluation Process : To submit the Incident Record to a final quality control before it is closed. The aim is to make sure that the Incident is actually resolved and that all information required to describe the Incident's life-cycle is supplied in sufficient detail. In addition to this, findings from the resolution of the Incident are to be recorded for future use.

• Pro-Active User Information Process : To inform users of service failures as soon as these are known to the Service Desk, so that users are in a position to adjust themselves to interruptions. Proactive user information also aims to reduce the number of inquiries by users. This process is also responsible for distributing other information to users, e.g. security alerts.

• Incident Management Reporting Process : ITIL Incident Management Reporting aims to supply Incident-related information to the other Service Management processes, and to ensure that that improvement potentials are derived from past Incidents.

Page 20: ITIL v3 Service Operation

DefinitionsThe following ITIL terms and acronyms (information objects) are used in the ITIL Incident Management process to represent process outputs and inputs:

• Incident -An Incident is defined as an unplanned interruption or reduction in quality of an IT service (a Service Interruption). • Incident Escalation Rules -A set of rules defining a hierarchy for escalating Incidents, and triggers which lead to escalations. • Triggers are usually based on Incident severity and resolution times. See also: Checklist Incident Priority• Incident Management Report -A report supplying Incident-related information to the other Service Management processes.• Incident Model -An Incident Model contains the pre-defined steps that should be taken for dealing with a particular type of Incident. This is a way to

ensure that routinely occurring Incidents are handled efficiently and effectively. • Incident Prioritization Guideline -The Incident Prioritization Guideline describes the rules for assigning priorities to Incidents, including the definition of

what constitutes a Major Incident. Since Incident Management escalation rules are usually based on priorities, assigning the correct priority to an Incident is essential for triggering appropriate escalations. See also: Checklist Incident Prioritization Guideline

• Incident Record -A set of data with all details of an Incident, documenting the history of the Incident from registration to closure. An Incident is defined as an unplanned interruption or reduction in quality of an IT service. Every event that could potentially impair an IT service in the future is also an Incident (e.g. the failure of one hard-drive of a set of mirrored drives). See also: ITIL Checklist Incident Record

• Incident Status Information -A message containing the present status of an Incident sent to a user who earlier reported a service interruption. Status information is typically provided to users at various points during an Incident's lifecycle.

• Major Incident - Major Incidents cause serious interruptions of business activities and must be solved with greater urgency. See also: Checklist Incident Priority: Major Incidents

• Major Incident Review -A Major Incident Review takes place after a Major Incident has occurred. The review documents the Incident's underlying causes (if known) and the complete resolution history, and identifies opportunities for improving the handling of future Major Incidents. Notification of Service FailureThe reporting of a service failure to the Service Desk, for example by a user via telephone or e-mail, or by a system monitoring tool.

• Pro-Active User Information - A notification to users of existing or imminent service failures even if the users are not yet aware of the interruptions, so that users are in a position to prepare themselves for a period of service unavailability.

• Status Inquiry - An inquiry regarding the present status of an Incident or Service Request, usually from a user who earlier reported an Incident or submitted a request.

• Support Request - A request to support the resolution of an Incident or Problem, usually issued from the Incident or Problem Management processes when further assistance is needed from technical experts.

• User Escalation -Escalation regarding the processing of an Incident or Service Request, initiated by a user experiencing delays or a failure to restore their services.

• User FAQs -Self-help information for users supplied by the Service Desk, usually as part of the Support Pages on the intranet.

Page 21: ITIL v3 Service Operation

ITIL KPIs Incident Management

Key Performance Indicator (KPI) Definition

Number of repeated Incidents Number of repeated Incidents, with known resolution methods

Incidents resolved Remotely Number of Incidents resolved remotely by the Service Desk

(i.e.without carrying out work at user's location)

Number of Escalations Number of escalations for Incidents not resolved in the agreed resolution time

Number of Incidents Number of incidents registered by the Service Desk

grouped into categories

Average Initial Response Time Average time taken between the time a user reports an Incident and the time that the Service Desk responds to that Incident

Incident Resolution Time Average time for resolving an incident

grouped into categories

First Time Resolution Rate Percentage of Incidents resolved at the Service Desk during the first call

grouped into categories

Resolution within SLA Rate of incidents resolved during solution times agreed in SLA

grouped into categories

Incident Resolution Effort Average work effort for resolving Incidents

grouped into categories

Page 23: ITIL v3 Service Operation

Checklist Incident Record - ITIL V2

The following data is recorded during the registration of an Incident:• Unique ID of the Incident (usually allocated automatically by the system)• Date and time of the creation (usually allocated automatically by the system)• Service Desk agent responsible for the registration• Caller/ User data• Incident type (Service Interruption, Service Request)• Description of symptoms• Affected IT Service(s)• Relevant SLAs• Relationship to CIs• Product category, usually selected from a category-tree according to the following example:

– Client PC• Standard configuration 1• ...

– Printer• Manufacturer 1• ...

– Incident category, i.e.• Hardware error• Software error• ...

• Link/ Attribution to another Incident (if a similar outstanding Incident exists, to which the new Incident is able to be attributed)

Page 24: ITIL v3 Service Operation

Checklist Initial Analysis of an Incident

Using the assignment of the Incident to CIs and to Product and Incident categories, the Support Knowledge Base is searched for:

• Known Solutions• Known Workarounds• Known ErrorsIf it becomes apparent during the initial analysis that the attributions originally assigned were not applicable, these are corrected:• Relationships to CIs• Product category, usually selected from a category-tree according to the following example

– Client PC• Standard configuration 1• ...

– Printer• Manufacturer 1• ...

– Incident category, i.e.• Hardware error• Software error• ...

Page 25: ITIL v3 Service Operation

Checklist Incident EscalationThe Escalation of Incidents follows pre-defined rules:• Defined triggers for Escalations, i.e. combinations of

– Degree of severity of an Incident (severe Incidents are, for example, immediately escalated)

– Duration (an Escalation occurs, if the Incident was not resolved within a pre-determined period, as for example the maximum resolution times agreed within the SLAs)

– In an ideal case this would be system-controlled triggered by customisable Escalation rules

• Defined Escalation levels in the form of an Escalation Hierarchy, for example– 1st Level Support– Incident Manager– Manager of Data Processing Centre– CIO

• Assigned triggers to the Escalation Hierarchy (conditions/ rules, which lead to the Escalation to a particular level within the Escalation Hierarchy)

Page 26: ITIL v3 Service Operation

Checklist Closure of an IncidentThe following entries of an Incident Record are investigated for their integrity and completeness during the closure of an Incident:• Protocol of actions

– Person in charge – Support Group– Time and Date– Description of the activity

• History of status changes, for example– "New" into "Initial Analysis Completed"– "Initial Analysis Completed" into "Assigned to 2nd Level Support"– ...– "Resolved" into "Closed"

• Documentation of applied Workarounds• Documentation of the root cause of the Service interruption• Documentation of the applied resolution to eliminate the root cause• Date of the Incident resolution• Date of the Incident closure

Page 27: ITIL v3 Service Operation

Checklist Incident Report

The Incident Manager's report includes the following information:• Adherence to agreed Service Levels

– Agreed Service Levels– Attained Service Levels

• Major Incidents causing breaches of agreed IT Service Levels – In the past (prolonged IT Service failures etc.)

• Type of event• Causes• Counter-measures for the elimination of the Incident• Measures for the future avoidance of similar occurrences

– In the future (e.g. planned prolonged downtimes to IT Services)

• Statistical evaluations – Number of Incidents

• Over time• According to categories

– Resolution times• According to duration • According to categories

– Initial resolution rate• Over time • According to categories

– Trend analyses

• Technical analysis of important or repetitive Incidents – Description– Applied resolution strategy

• Elimination of the root cause• Workaround

Page 28: ITIL v3 Service Operation

Roles | Responsibilities • Incident Manager - Process Owner - The Incident Manager is responsible for the effective

implementation of the Incident Management process and carries out the corresponding reporting. He represents the first stage of escalation for Incidents, should these not be resolvable within the agreed Service Levels.

• 1st Level Support - The responsibility of 1st Level Support is to register and classify received Incidents and to undertake an immediate effort in order to restore a failed IT service as quickly as possible. If no ad-hoc solution can be achieved, 1st Level Support will transfer the Incident to expert technical support groups (2nd Level Support). 1st Level Support also keeps users informed about their Incidents' status at agreed intervals.

• 2nd Level Support - 2nd Level Support takes over Incidents which cannot be solved immediately with the means of 1st Level Support. If necessary, it will request external support, e.g. from software or hardware manufacturers. The aim is to restore a failed IT service as quickly as possible. If no solution can be found, the 2nd Level Support passes on the Incident to Problem Management.

• 3rd Level Support - 3rd Level Support is typically located at hardware or software manufacturers (third-party suppliers). Its services are requested by 2nd Level Support if required for solving an Incident. The aim is to restore a failed IT Service as quickly as possible.

• Major Incident Team - A dynamically established team of IT managers and technical experts, usually under the leadership of the Incident Manager, formulated to concentrate on the resolution of a Major Incident.

Page 29: ITIL v3 Service Operation

Responsibility MatrixResponsibility Matrix: ITIL Incident Management

ITIL Role / Sub-Process Incident Manager

1st Level Support

2nd Level Support

Major Incident Team

Applications Analyst[3]

Technical Analyst[3]

IT Operator[3]

Incident Management Support

A[1]R[2] - - - - - -

Incident Logging and Categorization

A R - - - - -

Immediate Incident Resolution by 1st Level Support

A R - - - - -

Incident Resolution by 2nd Level Support

A - R - R[4] R[4] R[4]

Handling of Major Incidents

AR R - R - - R

Incident Monitoring and Escalation

AR R - - - - -

Incident Closure and Evaluation

A R - - - - -

Pro-Active User Information

A R - - - - -

Incident Management Reporting

AR - - - - - -

Remarks[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Incident Management process.[2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Incident Management.[3] see → Role descriptions...[4] In cooperation, as required. 2nd Level Support Groups often include Applications Analysts and/ or Technical Analysts.

Page 30: ITIL v3 Service Operation

Request Fulfilment

• Process Objective: To fulfill Service Requests, which in most cases are minor (standard) Changes (e.g. requests to change a password) or requests for information.

• Process Description - Request Fulfilment was added as a new process to ITIL V3 with the aim to have a dedicated process dealing with Service Requests. This was motivated by a clear distinction in ITIL V3 between Incidents (Service Interruptions) and Service Requests (standard requests from users, e.g. password resets).

• In ITIL 2011, Request Fulfilment has been completely revised. To reflect the latest guidance Request Fulfilment now consists of five sub-processes, to provide a detailed description of all activities and decision points.

• Request Fulfilment now contains interfaces with Incident Management - if a Service Request turns out to be an Incident and with Service Transition - if fulfilling a Service Request requires the involvement of Change Management. The process overview of ITIL Request Fulfilment is showing the most important interfaces (see Figure 1).

• A clearer explanation of the information that describes a Service Request and its life cycle has been added.

• The concept of Service Request Models is explained in more detail.

Page 31: ITIL v3 Service Operation

Service Request Fulfilment

Page 32: ITIL v3 Service Operation

Sub Processes• Request Fulfilment Support ProcessObjective: To provide and maintain the tools, processes, skills and rules for an effective and efficient handling of Service Requests.• Request Logging and CategorizationProcess Objective: To record and categorize the Service Request with appropriate diligence and check the requester's authorization to submit the request, in order to facilitate a swift and effective processing.• Request Model ExecutionProcess Objective: To process a Service Request within the agreed time schedule.• Request Monitoring and EscalationProcess Objective: To continuously monitor the processing status of outstanding Service Requests, so that counter-measures may be introduced as soon as possible if service levels are likely to be breached.• Request Closure and EvaluationProcess Objective: To submit the Request Record to a final quality control before it is closed. The aim is to make sure that the Service Request is actually processed and that all information required to describe the request's life-cycle is supplied in sufficient detail. In addition to this, findings from the processing of the request are to be recorded for future use.

Page 33: ITIL v3 Service Operation

Definitions

• Request for Service - A formal request from a user for something to be provided – for example, a request for information or advice; to reset a password; or to install a workstation for a new user. The details of a Request for Service are recorded by Request Fulfilment in a Service Request Record.

• Service Request Model - A (Service) Request Model defines specific agreed steps that will be followed for a Service Request of a particular type (or category).

• Service Request Record - A record containing all details of a Service Request. Service Requests are formal requests from a user for something to be provided – for example, a request for information or advice; to reset a password; or to install a workstation for a new user.

• Service Request Status Information - A message containing the present status of a Service Request sent to a user who earlier reported requested a service. Status information is typically provided to users at various points during a Service Request's lifecycle.

Page 34: ITIL v3 Service Operation

Roles | Responsibilities• Incident Manager – (Process Owner) - The Incident Manager is responsible for

the effective implementation of the Incident Management process and carries out the respective reporting.He represents the first stage of escalation for Incidents, should these not be resolvable within the agreed Service Levels.

• 1st Level Support - The responsibility of 1st Level Support is to register and classify received Incidents and to undertake an immediate effort in order to restore a failed IT service as quickly as possible. If no ad-hoc solution can be achieved, 1st Level Support will transfer the Incident to expert technical support groups (2nd Level Support). 1st Level Support also processes Service Requests and keeps users informed about their Incidents' status at agreed intervals.

• Service Request Fulfilment Group - Groups specialize on the fulfilment of certain types of Service Requests. Typically, 1st Level Support will process simpler requests, while others are forwarded to the specialized Fulfilment Groups.

Page 35: ITIL v3 Service Operation

Responsibility Matrix: Request FulfilmentResponsibility Matrix: ITIL Request Fulfilment

ITIL Role / Sub-Process Incident Manager 1st Level Support Service Request Fulfilment Group

Request Fulfilment Support A[1]R[2]

Request Logging and Categorization

A R -

Request Model Execution A R R

Request Monitoring and Escalation

AR R -

Request Closure and Evaluation A R -

Remarks[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Request Fulfilment process.[2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Request Fulfilment.

Page 36: ITIL v3 Service Operation

Access Management• Objective: ITIL Access Management aims to grant authorized users the right to use a service,

while preventing access to non-authorized users. The Access Management processes essentially execute policies defined in Information Security Management. Access Management is sometimes also referred to as Rights Management or Identity Management.

• Part of: Service Operation• Process Owner: Access Manager• Process Description - Access Management was added as a new process to ITIL V3. The

decision to include this dedicated process was motivated by Information security reasons, as granting access to IT services and applications only to authorized users is of high importance from an Information Security viewpoint. In ITIL 2011 an interface between Access Management and Event Management has been added, to emphasize that (some) Event filtering and correlation rules should be designed by Access Management to support the detection of unauthorized access to services. The process overview of ITIL Access Management is showing the most important interfaces (see Figure 1). A dedicated activity has been added to revoke access rights if required, to make this point clearer.

• In ITIL 2011 it has been made clearer in the Request Fulfilment and Incident Management processes that the requester's authorization must be checked.

Page 37: ITIL v3 Service Operation

Sub Processes• Maintenance of Catalogue of

User Roles and Access ProfilesProcess Objective: To make sure that the catalogue of User Roles and Access Profiles is still appropriate for the services offered to customers, and to prevent unwanted accumulation of access rights.

• Processing of User Access Requests

Process Objective: To process requests to add, change or revoke access rights, and to make sure that only authorized users are granted the right to use a service.

Page 38: ITIL v3 Service Operation

Definitions

• Access Rights - A set of data defining what services a user is allowed to access. This definition is achieved by assigning the user, identified by his User Identity, to one or more User Roles.

• Request for Access Rights - A request to grant, change or revoke the right to use a particular service or access certain assets.

• User Identity Record - A set of data with all the details identifying a user or person. It is used to grant rights to that user or person.

• User Identity Request - A request to create, modify or delete a User Identity. • User Role - A role as part of a catalogue or hierarchy of all the roles (types of users) in

the organization. Access rights are based on the roles that individual users have as part of an organization.

• User Role Access Profile - A set of data defining the level of access to a service or group of services for a certain type of user (User Role). User Role Access Profiles help to protect the confidentiality, integrity and availability of assets by defining what information computer users can utilize, the programs that they can run, and the modifications that they can make.

• User Role Requirements - Requirements from the business side for the catalogue or hierarchy of user roles (types of users) in the organization. Access rights are based on the roles that individual users have as part of an organization.

Page 39: ITIL v3 Service Operation

Roles | Responsibilities - Matrix

• Access Manager – (Process Owner) grants authorized users the right to use a service, while preventing access to non-authorized users. The Access Manager essentially executes policies defined in Information Security Management.

Responsibility Matrix: ITIL Access Management

ITIL Role / Sub-Process Access Manager

Maintenance of Catalogue of User Roles and Access Profile

A[1]R[2]

Processing of User Access Requests AR

Remarks[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Access Management process.[2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Access Management.

Page 40: ITIL v3 Service Operation

Process Implementation: Notes

• There are a number of different approaches to implementing Access Management. Depending on the size of an organization the methods applied can be rather complex. In this context, ITIL does not provide a detailed explanation of all aspects of Access Management.

• Well-defined interfaces between the business and Access Management are vital to achieve high security standards. Typically, responsibilities of both sides are defined in a dedicated Information Security Policy. This policy would, for example, stipulate that HR is to inform Access Management without delay about employees entering or leaving the company.

Page 41: ITIL v3 Service Operation

Problem Management• Objective: The objective of ITIL Problem Management is to manage the lifecycle of all

Problems. The primary objectives of Problem Management are to prevent Incidents from happening, and to minimize the impact of incidents that cannot be prevented. Proactive Problem Management analyzes Incident Records, and uses data collected by other IT Service Management processes to identify trends or significant Problems.

• Process Description - Essentially, the activities and process objectives of ITIL Problem Management are identical in ITIL V3 and ITIL V2.

• A new sub-process Major Problem Review was introduced in ITIL V3 to review the solution history of major Problems in order to prevent a recurrence and learn lessons for the future.

• In ITIL 2011 the new sub-process Proactive Problem Identification has been added to emphasize the importance of proactive Problem Management.

• In Problem Categorization and Prioritization, it has been made clearer that categorization and prioritization should be harmonized with the approach used in Incident Management, to facilitate matching between Incidents and Problems. The process overview of ITIL Problem Management is showing the most important interfaces (see Figure 1).

• The concept of recreating Problems during Problem Diagnosis and Resolution is now more prominent. This sub-process has been completely revised to provide clearer guidance on how this process cooperates with Incident Management.

• Note: The new ITIL 2011 books also contain an expanded section on problem analysis techniques and examples for situations where the various techniques may be applied.

Page 42: ITIL v3 Service Operation
Page 43: ITIL v3 Service Operation

Sub-Processes 1. Proactive Problem Identification - To

improve overall availability of services by proactively identifying Problems. Proactive Problem Management aims to identify and solve Problems and/or provide suitable Workarounds before (further) Incidents recur.

2. Problem Categorization and Prioritization - To record and prioritize the Problem with appropriate diligence, in order to facilitate a swift and effective resolution.

3. Problem Diagnosis and Resolution- To identify the underlying root cause of a Problem and initiate the most appropriate and economical Problem solution. If possible, a temporary Workaround is supplied.

4. Problem and Error Control - To constantly monitor outstanding Problems with regards to their processing status, so that where necessary corrective measures may be introduced.

Page 44: ITIL v3 Service Operation

Sub-Processes5. Problem Closure and Evaluation -

To ensure that - after a successful Problem solution - the Problem Record contains a full historical description, and that related Known Error Records are updated.

6. Major Problem Review - To review the resolution of a Problem in order to prevent recurrence and learn any lessons for the future. Furthermore it is to be verified whether the Problems marked as closed have actually been eliminated.

7. Problem Management Reporting - ITIL Problem Management Reporting aims to ensure that the other Service Management processes as well as IT Management are informed of outstanding Problems, their processing-status and existing Workarounds (see "Problem Management Report").

Page 45: ITIL v3 Service Operation

Definitions• Known Error - is a problem that has a documented root cause and a Workaround. Known Errors are managed

throughout their lifecycle by the Problem Management process. The details of each Known Error are recorded in a Known Error Record stored in the Known Error Database (KEDB). As a rule, Known Errors are identified by Problem Management, but Known Errors may also be suggested by other Service Management disciplines, e.g. Incident Management, or by suppliers.

• Known Error Database (KEDB) - is created by Problem Management and used by Incident and Problem Management to manage all Known Error Records.

• Problem - cause of one or more Incidents. The cause is not usually known at the time a Problem Record is created.

• Problem Management Report - A report supplying Problem-related information to the other Service Management processes.

• Problem Record - contains all details of a Problem, documenting the history of the Problem from detection to closure (see: ITIL Checklist Problem Record).

• Suggested new Known Error - A suggestion to create a new entry in the Known Error Database, for example raised by the Service Desk or by Release Management. Known Errors are managed throughout their lifecycle by Problem Management.

• Suggested new Problem - A notification about a suspected Problem, handed over to Problem Management for further investigation, possibly leading to the formal logging of a Problem.

• Suggested new Workaround - A suggestion to enter a new Workaround in the Known Error Database, for example raised by the Service Desk or by Release Management. Workarounds are managed throughout their lifecycle by Problem Management.

• Workaround - are temporary solutions aimed at reducing or eliminating the impact of Known Errors (and thus Problems) for which a full resolution is not yet available. As such, Workarounds are often applied to reduce the impact of Incidents or Problems if their underlying causes cannot be readily identified or removed.

Page 46: ITIL v3 Service Operation

ITIL KPIs Problem ManagementKey Performance Indicator (KPI) Definition

Number of Problems Number of Problems registered by Problem Management

grouped into categories

Problem Resolution Time Average time for resolving Problems

grouped into categories

Number of unresolved Problem Number of Problems where the underlying root cause is not known at a particular time

Number of Incidents per Known Problem

Number of reported Incidents linked to the same Problem after problem identification

Time until Problem Identification Average time between first occurance of an Incident and identification of the underlying root cause

Problem Resolution Effort Average work effort for resolving Problems

grouped into categories

Page 47: ITIL v3 Service Operation

Checklists for Problem ManagementChecklist Problem Record - ITIL V2

The following data is entered during the creation of a Problem Record:• Unique Problem ID (usually assigned automatically by the system)• Creation date and time (usually allocated automatically by the system)• Person in charge for the creation• Description of symptoms• Affected IT Service(s)• Relevant SLAs• Relationship to CIs• Product category, usually selected from a category-tree according to the following example:

– Client PC• Standard configuration 1• ...

– Printer• Manufacturer 1• ...

– Problem category, for example• Hardware error• Software error• ...

• Links to– Incidents associated with this problem – Other Problems, whose resolution is associated with this Problem

• Workaround for the circumvention of the Problem, if known

Page 48: ITIL v3 Service Operation

Checklist Problem Record - ITIL 2011• A Problem Record typically contains the following information:• Unique ID of the Problem (usually allocated automatically by the system)• Date and time of detection• Problem owner • Description of symptoms• Affected users/ business areas• Affected service(s)• Prioritization, a function of the following components:

– Urgency (available time until the resolution of the Problem), e.g.• Up to 5 working days• Up to 2 weeks• Up to 4 weeks

– Degree of severity (damage caused to the business), e.g.• "High" (interruption to critical business processes)• "Normal" (interruption to the work of individual employees)• "Low" (hindrance to the work of individual employees, continuation of work possible by means of a circumventive solution)

– Priority (for example in stages 1, 2 and 3): The result from the combination of urgency and the degree of severity• Relationships to CIs• Problem category, usually selected from a category-tree according to the following example (Problem categories should be harmonized with CI and Incident categories to support matching between

Incidents, Problems and CIs): – Hardware error

• Server A– Component x

» Symptom a» Symptom b

– Component y– …

• Server B• …

– Software error• System A• System B• …

– Network error– ...

• Links to related Problem Records (if there are other outstanding Problems related to this one)• Links to related Incident Records (if outstanding Incidents exist, whose solution depends on the solution of this Problem)• Links to Known Errors and Workarounds (if Known Errors and Workarounds related to the Problem have been identified)• Problem Recovery Procedures: Any procedures that are required to be performed to eliminate the Problem. These procedures may need to be performed as part of removing Workarounds that have

been applied while solving related Incidents.• Activity log/ resolution history

– Date and time– Person in charge– Description of activities– New Problem status (if the activity results in a change of status)

Page 49: ITIL v3 Service Operation

Checklist Problem Priority

The priority of a Problem is assigned according to the following rules:• Urgency (available time until the resolution of the Problem), e.g.

– 1: up to 4 hrs. – 2: up to 1 day – 3: up to 5 days

• Degree of severity (damage caused to the business), e.g.– 1: „High“ (interruption to critical business processes) – 2: „Normal“ (interruption to the work of individual employees) – 3: „Low“ (hindrance to the work of individual employees, continuation of

work possible by means of a circumventive solution) • Priority (e.g. in stages 1, 2 and 3): A function of urgency and the

degree of severity

Page 50: ITIL v3 Service Operation

Checklist Closure of a ProblemThe following entries are investigated with regards to their completeness and integrity during the closure of a Problem:• Protocol of actions

– Person in charge– Support group– Time and date– Description of the activity

• History of the change in status, e.g.– „New“ into „Initial Analysis Completed“– „Initial Analysis Completed“ into „Assigned to Specialists“– ... – „Resolved“ into „Closed“

• Documentation of the root cause of the Problem (Known Error) • Documentation of possible Workarounds• Documentation of the applied (causal) resolution• Date of Problem resolution• Date of Problem closure

Page 51: ITIL v3 Service Operation

Checklist Problem ReportThe Problem Manager's report includes the following information:• Statistical evaluations

– Outstanding Problems• According to duration since creation of the Problem Record• According to categories

– Resolution times of closed Problems • According to duration• According to categories

– Trend analyses• Problems with special importance regarding Availability, Capacity, IT Service Continuity and IT

Security Management– Description– Problem cause– Applied resolution strategy

• Elimination of the root cause• Possible Workarounds

– Time schedule for the resolution of the Problem• Other important Problems with extensive effects upon the quality of the IT Services

– Description– Problem cause– Applied resolution strategy

• Elimination of the root cause• Possible Workarounds

– Time schedule for the resolution of the Problem

Page 52: ITIL v3 Service Operation

Roles | Responsibilities Problem Manager – (Process Owner) is responsible for managing the lifecycle of all Problems. His primary objectives are to prevent Incidents from happening, and to minimize the impact of Incidents that cannot be prevented. To this purpose he maintains information about Known Errors and Workarounds.

Responsibility Matrix: ITIL Problem Management

ITIL Role | Sub-Process Problem Manager

Applications Analyst[3] Technical Analyst[3]

Proactive Problem Identification A[1]R[2] - -

Problem Categorization and Prioritization

AR - -

Problem Diagnosis and Resolution

AR R R

Problem and Error Control AR - -

Problem Closure and Evaluation AR - -

Major Problem Review AR - -

Problem Management Reporting AR - -

Remarks[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Problem Management process.[2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Problem Management.[3] see Role descriptions...

Page 53: ITIL v3 Service Operation

IT Operations Control

• Objective: IT Operations Control aims to monitor and control the IT services and their underlying infrastructure. The process IT Operations Control executes day-to-day routine tasks related to the operation of infrastructure components and applications. This includes job scheduling, backup and restore activities, print and output management, and routine maintenance.

• Part of: Service Operation• Process Owner: IT Operations Manager

Page 54: ITIL v3 Service Operation

Process Description

• ITIL does not provide a detailed explanation of all aspects of IT Operations, as the activities to be carried out will depend on the specific applications and infrastructure components in use. Rather, ITIL 2011 highlights common operational activities and assists in identifying important interfaces with other Service Management processes. The official ITIL publications treat IT Operations Control as a "function". The process overview of IT Operations Control is showing the most important interfaces (see Figure 1).

• Remark: In ITIL V3, IT Operations Control activities were covered in the process "IT Operations Management".

Page 55: ITIL v3 Service Operation
Page 56: ITIL v3 Service Operation

Roles | Responsibilities

• IT Operations Manager - Process Owner. An IT Operations Manager will be needed to take overall responsibility for a number of Service Operation activities. For instance, this role will ensure that all day-to-day operational activities are carried out in a timely and reliable way.

• IT Operator - are the staff who perform the day-to-day operational activities. Typical responsibilities include: Performing backups, ensuring that scheduled jobs are performed, installing standard equipment in the data center.

Page 57: ITIL v3 Service Operation

Responsibility Matrix: IT Operations Control

Responsibility Matrix: IT Operations Control

ITIL Role / Sub-Process IT Operations Manager

IT-Operator

IT Operations Control(no sub-processes specified) A[1] R[2]

Remarks[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the IT Operations Control process.[2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within IT Operations Control.

Page 58: ITIL v3 Service Operation

IT Facilities Management

• Objective: The objective of ITIL Facilities Management is to manage the physical environment where the IT infrastructure is located. Facilities Management includes all aspects of managing the physical environment, for example power and cooling, building access management, and environmental monitoring.

• Part of: Service Operation• Process Owner: Facilities Manager

Page 59: ITIL v3 Service Operation

Process Description

• ITIL Facilities Management is part of ICT Infrastructure Management in ITIL V2, where some aspects of managing facilities are described in more detail as in the new ITIL V3 books.

• Interfaces between Facilities Management and the other ITIL processes were adjusted in order to reflect the new ITIL V3 process structure. The process overview of ITIL Facilities Management is showing the most important interfaces (see Figure 1).

• Note: The official ITIL publications treat Facilities Management as a "function".

Page 60: ITIL v3 Service Operation
Page 61: ITIL v3 Service Operation

Roles | Responsibilities• Facilities Manager – (Process Owner) The Facilities Manager is responsible

for managing the physical environment where the IT infrastructure is located. This includes all aspects of managing the physical environment, for example power and cooling, building access management, and environmental monitoring.

Responsibility Matrix: ITIL Facilities Management

ITIL Role / Sub-Process Facilities Manager

Facilities Management(no sub-processes specified)

A[1]R[2]

Remarks[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the ITIL Facilities Management process.[2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within ITIL Facilities Management.

Page 62: ITIL v3 Service Operation

ITIL Application Management

• Objective: ITIL Application Management is responsible for managing applications throughout their lifecycle. This process plays an important role in the application-related aspects of designing, testing, operating and improving IT services, as well as in developing the skills required to operate the IT organization's applications. Application Management is an ongoing activity, as opposed to Application Development which is typically a one-time set of activities to construct applications.

• Part of: Service Operation• Process Owner: Applications Analyst

Page 63: ITIL v3 Service Operation

Process Description • Application Management is treated in ITIL as a "function". It plays an

important role in the management of applications and systems. • Many Application Management activities are embedded in various

ITIL processes - but not all Application Management activities. For this reason, at IT Process Maps we decided to introduce an Application Management process as part of the ITIL Process Map which contains the Application Management activities not covered in any other ITIL process.

• Application Management activities embedded in other processes are shown there, with responsibility assigned to the Applications Analyst role.

• The process overview of ITIL Application Management is showing the most important interfaces (see Figure 1).

Page 64: ITIL v3 Service Operation
Page 65: ITIL v3 Service Operation

Definitions /Roles | Responsibilities • Skills Inventory - identifies the skills required to deliver IT services (now and in

future), as well as the individuals who possess those skills. The Skills Inventory is the basis for developing training plans for individual employees.

• Applications Analyst - Process Owner - is an Application Management role which manages applications throughout their lifecycle. There is typically one Applications Analyst or team of analysts for every key application. This role plays an important part in the application-related aspects of designing, testing, operating and improving IT services. It is also responsible for developing the skills required to operate the applications required to deliver IT services.

Responsibility Matrix: ITIL Application Management

ITIL Role / Sub-Process Applications Analyst

Application Management(no sub-processes specified)

A[1]R[2]

Remarks[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the ITIL Application Management process.[2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Application Management.

Page 66: ITIL v3 Service Operation

ITIL Technical Management

• Objective: ITIL Technical Management provides technical expertise and support for the management of the IT infrastructure. Technical Managements plays an important role in the technical aspects of designing, testing, operating and improving IT services, as well as in developing the skills required to operate the IT infrastructure required.

• Part of: Service Operation• Process Owner: Technical Analyst

Page 67: ITIL v3 Service Operation

Process Description• Technical Management is treated in ITIL as a "function". It plays an

important role in the management of the IT infrastructure. • Many Technical Management activities are embedded in various

ITIL processes - but not all Technical Management activities. For this reason, at IT Process Maps we decided to introduce a Technical Management process as part of the ITIL Process Map which contains the Technical Management activities not covered in any other ITIL process.

• Technical Management activities embedded in other processes are shown there, with responsibility assigned to the Technical Analyst role.

• The process overview of ITIL Technical Managementis showing the most important interfaces (see Figure 1).

Page 68: ITIL v3 Service Operation
Page 69: ITIL v3 Service Operation

Roles | Responsibilities• Technical Analyst - Process Owner - is a Technical Management role which

provides technical expertise and support for the management of the IT infrastructure. There is typically one Technical Analyst or team of analysts for every key technology area. This role plays an important part in the technical aspects of designing, testing, operating and improving IT services. It is also responsible for developing the skills required to operate the IT infrastructure.

Responsibility Matrix: ITIL Technical Management

ITIL Role / Sub-Process Technical Analyst

Technical Management(no sub-processes specified)

A[1]R[2]

Remarks[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the ITIL Technical Management process.[2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within ITIL Technical Management.

Page 70: ITIL v3 Service Operation

ITIL roles and boards - Service Operation• 1st Level Support - The responsibility is to register and classify received Incidents and to undertake an immediate effort in

order to restore a failed IT service as quickly as possible. If no ad-hoc solution can be achieved, 1st Level Support will transfer the Incident to expert technical support groups (2nd Level Support). 1st Level Support also processes Service Requests and keeps users informed about their Incidents' status at agreed intervals.

• 2nd Level Support - takes over Incidents which cannot be solved immediately with the means of 1st Level Support. If necessary, it will request external support, e.g. from software or hardware manufacturers. The aim is to restore a failed IT Service as quickly as possible. If no solution can be found, the 2nd Level Support passes on the Incident to Problem Management.

• 3rd Level Support - is typically located at hardware or software manufacturers (third-party suppliers). Its services are requested by 2nd Level Support if required for solving an Incident. The aim is to restore a failed IT Service as quickly as possible.

• Access Manager - grants authorized users the right to use a service, while preventing access to non-authorized users. The Access Manager essentially executes policies defined in Information Security Management.

• Facilities Manager - is responsible for managing the physical environment where the IT infrastructure is located. This includes all aspects of managing the physical environment, for example power and cooling, building access management, and environmental monitoring.

• Incident Manager - is responsible for the effective implementation of the Incident Management process and carries out the corresponding reporting. He represents the first stage of escalation for Incidents, should these not be resolvable within the agreed Service Levels.

• IT Operations Manager - will be needed to take overall responsibility for a number of Service Operation activities. For instance, this role will ensure that all day-to-day operational activities are carried out in a timely and reliable way.

• IT Operator - IT Operators are the staff who perform the day-to-day operational activities. Typical responsibilities include: Performing backups, ensuring that scheduled jobs are performed, installing standard equipment in the data center.

• Major Incident Team - A dynamically established team of IT managers and technical experts, usually under the leadership of the Incident Manager, formulated to concentrate on the resolution of a Major Incident.

• Problem Manager - is responsible for managing the lifecycle of all Problems. His primary objectives are to prevent Incidents from happening, and to minimize the impact of Incidents that cannot be prevented. To this purpose he maintains information about Known Errors and Workarounds.

• Service Request Fulfilment Group - Groups specialize on the fulfillment of certain types of Service Requests. Typically, 1st Level Support will process simpler requests, while others are forwarded to the specialized Fulfilment Groups