11 itil v3 foundation notes_ service operation [2] - pmp exam, agile pmi-acp, itil certification...

ITIL FOUNDATION

ITIL v3 Foundation Notes: Service Operation [2]BY EDWARD CHUNG · DECEMBER 18, 2014

[ITIL v3 Foundation Notes] Other processes of the ServiceOperation phase for the ITIL v3 Foundation Certification exam arecovered here, including: Incident Management Process and ProblemManagement Process. The purpose, objectives, and scope of theprocesses and their importance in the Service Operation lifecyclestage are addressed. Definition on some generic concepts are alsodiscussed: incident, impact, urgency and priority, problem,workaround, known error and known error database (KEDB).

Requests vs Incidents vs ProblemsRequests are not incidents as no service has been impactedIncident Management Process and Problem ManagementProcess are two of the most important processes in ITIL and areoften the first ones to be implementedIncident Management – fix faults as quickly as possible,to resume service, incidents will NEVER become problemsProblem Management – find the root cause to prevent faultsfrom happening again, to improve overall quality and free upresources needed to deal with repeated incidents

Incident Management Process[definition] An incident is defined as an unplanned interruptionto an IT service, a reduction in the quality of an IT service, or afailure of a CI (configuration item) that has not yet impacted an

http://edward-designer.com/web/itil/

http://edward-designer.com/web/author/edward-chung/

http://edward-designer.com/web/itil

IT service.Incident Management is responsible for progressing all incidentsfrom reporting to closing – usually the responsibility of servicedesk.Purpose

to restore normal service operation (defined in SLA) assoon as possible and minimize impact tobusiness operations

Objectivesensure all incidents are responded (logged, managed,resolved and reported) efficiently with standard proceduresaccording to business priorityimprove customer satisfaction

Scopehandle all incidents (event which disrupts, or which coulddisrupt, a service), either by service desk reports or eventmanagement tool alerts

Concepts and definitionsTimescales – time is of essence, need to log the time andseek improvementIncident Models – incident templates with the necessarysteps to resolve common incidents, allow faster resolution(stored in the SKMS)Major Incidents – define what constitutes a major incidentand follow predefined procedures, need to inform users onthe progressIncident Status – the current status of the incident

Open – identified and loggedAssigned – sent to a support teamAllocated or In Progress – a support technician hasbeen allocatedOn Hold – cannot contact the userResolved – completed the work but not confirmed bythe customer or awaiting automatic closureClosed – accepted by the user

Expanded Incident Lifecycle – used by the service designavailability management process and within CSI, breaksdown each step for closer examination to examine theimpact of incidentsImpact – a measure of the effect of an incident, problem,

or change on business processes. Impact is often based onhow service levels will be affected. Impact and urgency areused to assign priorityUrgency – a measure of how long it will be until anincident, problem or change has a significant impact onthe businessPriority – a category used to identify the relativeimportance of an incident, problem or change, based onimpact and urgency. High priority (Priority 1) is given the anincident with high impact and high urgency.

Lifecycle of Incidents1. Incident Identification – realize an incident before the user

notices / reports with event management (a reactiveprocess)

2. Incident Logging – log ALL incidents for servicelevelmanagement reporting and problem management

unique reference numberincident category, impact, urgency and priority,symptoms, steps to resolution and known errorstime from logging to closurehow to identify

3. Incident Categorization – use a simple categorization foreffective implementation

4. Incident Prioritization – consider business impact andurgency, to be completed in a preagreed time dependingon the priority, may change during the lifecycle

5. Initial Diagnosis – the service desk to diagnose the faultand try to resolve it with the known error database (byproblem management), incident models or other tools(incident matching)

6. Incident Escalation – the incidents are owned by servicedesk (need to track till closure)

functional escalation – service desk unable to solve theincident within a given timehierarchic escalation – inform management of majorincidents / incidents not progressing based on SLAtarget time

7. Investigation and Diagnosis – try to find out what hashappened and how to resolve

8. Resolution and Recovery – test potential resolutions to

ensure the incident has been solved without causingadverse consequences

9. Incident Closure – contact user to verify and reviewcategorization, finish documentation. Closed incidents maybe reopened if the incident resurfaces again. Anyappropriate function can close the incident.

Interfaces with other stages[Service Design] Service Level Management, InformationSecurity Management, Capacity Management,Availability Management[Service Transition] Change Management, Service Assetand Configuration Management – to identify impact ofproblems[Service Operation] Problem Management, AccessManagement – security breaches / unauthorized access

Problem Management Process[definition] A problem is defined as an underlying cause of oneor more incidents. The cause is not usually known at the time aproblem record is created, and the problem managementprocess is responsible for further investigation.[definition] A known error is a problem that has a documentedroot cause and a workaround. Known errors are created andmanaged throughout their lifecycle by problem management.Known errors may also be identified by development orsuppliers.[definition] A workaround is a way of reducing or eliminating theimpact of an incident or problem for which a full resolution is notyet available, workarounds for known errors are documented inknown error records. The problem will remain open in this caseas the problem is fully resolved.Problem Management is the process to investigates the rootcauseof incidents and implements a permanent solution /workaround to prevent them from happening againNot visible to the users / businessIncidents will not become problems, they must be handledseparatelyAlthough incident and problem managementare separateprocesses, they are closely related and willtypically use the same tools, and may use similar categorization,

impact and priority coding systems. This will ensure effectivecommunication when dealing with related incidents andproblems.The time to resolve problem cannot be defined in SLAPurpose

to document, investigate, and remove causes of incidentsto provide workarounds

Objectivesprevent problems from happeningeliminate recurring incidentsminimize impact of incidents that cannot be prevented

Scopediagnosis the root cause of incidentstake steps to eliminate them (with other processes, inparticular change management process)document problems, workarounds and resolutions (maintainthe known error database) for more effective handling ofsimilar incidents

OutputKnown errors (and entry to KEDB)WorkaroundsResolutions (may include RFCs)

Concepts and definitionsReactive and Proactive Activities – trigger by incidentsreporting / analysis of incident trendsProblem Models – handle problems that have not and willnot be resolved (e.g. the cost of a permanent resolution istoo high) by some predefined workaround

Lifecycle of Problem Management1. Detecting Problems – identify problems in reactive /

proactive ways2. Logging Problems – log in the problem record (link to the

incidents)3. Categorizing Problems – same categorization as incident

management4. Prioritizing Problems – depends on impact and urgency5. Investigating and Diagnosing Problems – uses CMS and

KEDB6. [in some cases] Identifying a Workaround – provides the

workaround to service desk for resolving the incident and

reassesses the priority7. Raising a Known Error Record – after the root causes

has been identified and workaround/solution found forfuture reference

8. Problem Resolution – implement the solution throughchange management (as emergency change)

9. Problem Closure – a permanent solution has been testedand implemented so that the problem will not occur again(user confirmation NOT needed)Major Problem Review – lessons learned for proactiveproblem detection

Interfaces with other stages[Service Strategy] Financial Management for ITServices – to determine whether solution is financiallyjustified[Service Design] Availability Management, CapacityManagement, IT Service Continuity Management,Service Level Management – problem managementsupplies the information for solving problems handled bythese processes[Service Transition] Change Management, Service Assetand Configuration Management – to identify impact ofproblems,Release and Deployment Management –implement the change, Knowledge Management – KEDB[Continual Service Improvement] The SevenStepImprovement Process – actions are entered into CSIregister

Conclusion: ITIL v3 Foundation ServiceOperationThis ITIL v3 Foundation study note touches upon the definition,purpose, objectives and scope of two important processes of ServiceOperation, namely the Issue Management Process and the ProblemManagement Process. These two processes also work withprocesses in other stages of the service lifecycle to provide highquality IT services. Key ITIL concepts are examined, including:incident, impact, urgency, priority, problem, workaround, knownerror, known error database (KEDB).

11 itil v3 foundation notes_ service operation [2] - pmp exam, agile pmi-acp, itil certification...

Documents