lecture2 - fault management
TRANSCRIPT
-
8/8/2019 Lecture2 - Fault Management
1/23
CIT 443: Enterprise Network Management
Fault Management
-
8/8/2019 Lecture2 - Fault Management
2/23
Fault? An event that causes adverse, unintended,
or non-specification operating conditions in
or on an enterprise network system May be masked by automatic error
correction routines
May be perceived initially as performance
problems
Incidents may become an indicator of more
serious issues with increased frequency
-
8/8/2019 Lecture2 - Fault Management
3/23
Classification of Faults
Event Type Severity Response
Incident Informational Notice
Problem Alarm Alert
Error Emergency Caution
Failure Critical Warning
-
8/8/2019 Lecture2 - Fault Management
4/23
Fault Management The process of identifying, locating,
documenting, & resolving adverse,
unintended, or non-specification operatingconditions of enterprise network systems
Includes the necessary policies,
processes, &/or procedures for all stepsas well
-
8/8/2019 Lecture2 - Fault Management
5/23
Benefits of Fault Management
Reduce down-time
Reduce the need for fire-fighting
Allow more time for other management
tasks
-
8/8/2019 Lecture2 - Fault Management
6/23
Elements of Fault Management
System Monitoring
Alarm Processing
Fault Resolution
-
8/8/2019 Lecture2 - Fault Management
7/23
Forouzan, B. A. TCP/IP Protocol Suite, SecondEdition. McGraw Hill, 2003.
System Monitoring & Alarm Processing
3 Relevant Protocols
SNMP (v3): Defines the format of packetsexchanged between a manager and an agent. Itreads and changes the status (values) of objects
(variables) in SNMP packets. (Forouzan, p. 625) MIB (v2): Creates a collection of named objects,
their types, and their relationships to each other inan entity to be managed. (Forouzan, p. 625)
SMI (v2): A guideline for SNMP that emphasizes
three attributes to handle an object:1. Name2. Data Type
3. Encoding Method
-
8/8/2019 Lecture2 - Fault Management
8/23
Forouzan, B. A. TCP/IP Protocol Suite, SecondEdition. McGraw Hill, 2003.
SNMP: Managers and Agents Framework for managing devices in an internetwork
using the TCP/IP protocol suite.
Manager: Host that runs the SNMP client program
Agent: Host (router, switch, etc.) that runs theSNMP server
Agent maintains information in a database to bequeried and/or modified by the manager
Agent can also contribute to the managementprocess by sending unsolicited messages to themanager (traps) to notify of system events
-
8/8/2019 Lecture2 - Fault Management
9/23
Forouzan, B. A. TCP/IP Protocol Suite, SecondEdition. McGraw Hill, 2003.
SNMP: Three Management Functions
1. Manager can query an agent for
information
2. Manager can force an agent to
perform a task
3. Agent can contribute to management
process (traps)
-
8/8/2019 Lecture2 - Fault Management
10/23
Structure of Management Info Abstract Syntax Notation (ASN.1) is used to
access information contained within the MIB
stucture. A notation system that identifies data structures
for reliable encoding, transmission, and
decoding of messages.
Nearly all entities managed by SNMP havean object ID that starts with 1.3.6.1.2.1
ISO.org.dod.internet.mgmt.mib-2
-
8/8/2019 Lecture2 - Fault Management
11/23
Fault Resolution Process1. Identify the fault
What are the fault symptoms?
What could be t
he problem?
2. Isolate the fault
3. Prioritize the fault
4. Correct the fault (if possible)
5. Fault Reporting
-
8/8/2019 Lecture2 - Fault Management
12/23
Identify a Fault - Collect Information
Log Network Events Through the use of SNMP Traps, etc.
Which device(s) originated the events?
Watchdog Timers Reset with the completion of a given task
Generate a trap when timer expires and the task is notcomplete
Polling Periodic monitoring of network activity
Polled data is often logged to a server
Useful in trend analysis and resolving intermittent faults
Useful for resolving problems after the fact
Polling uses bandwidth shorter polling intervals require
more bandwidth
-
8/8/2019 Lecture2 - Fault Management
13/23
Isolate the Fault Look Beyond the Symptoms
Use a Fault Isolation Methodology Top Down
Bottom Up Intermittent Problems are Difficult!
Why?
Attempt to take a snap-shot of network at time of serviceinterruption
Take note of recurrence time
Attempt to correlate data: What is the same?
Determine if part of a Common Cause Fault (Failure)Group?
Root Cause Analysis
-
8/8/2019 Lecture2 - Fault Management
14/23
Isolate the Fault
-
8/8/2019 Lecture2 - Fault Management
15/23
Prioritize Faults Not all faults are of the same priority
Determine which faults to take
immediate action on and which to defer
Some prioritization can be performed at
the help desk level
Divide and conquer
-
8/8/2019 Lecture2 - Fault Management
16/23
Prioritize Faults
Criticality
Low 3 4 5
Medium 2 3 4
High 1 2 3
High Medium Low
Impact
-
8/8/2019 Lecture2 - Fault Management
17/23
Correct the Fault Repair, Restore, Replace, then
Reevaluate
Remember, faults can be caused by just
about anything in the networkincluding
users.
Fixing the underlying fault may require achange in the policies of how users interact
with network systems
-
8/8/2019 Lecture2 - Fault Management
18/23
Fault Reporting Symptoms
Effect on Network Operations
Cause
Resolution
Update Documentation
What is the purpose of reporting?
-
8/8/2019 Lecture2 - Fault Management
19/23
Reporting/Documentation
MTTF
MTBF
Failure Rate
MTTR
-
8/8/2019 Lecture2 - Fault Management
20/23
Fault Management: Network Entities
PBX
Hubs
Routers
Switches
Servers Workstations
Firewalls
Intrusion Detection/Prevention Systems
Wireless Access Points
Power Management Systems
Network SCADA systems Temperature Management Systems (HVAC)
Home Appliances?
Others?
-
8/8/2019 Lecture2 - Fault Management
21/23
Industry Trends Enterprise Network Management is a key
initiative for large companies
All encompassing Manage every part ofthe enterprise network
Automate Correlation 80% of time is spent trying to isolate & determine
the fault (root cause analysis) Notify a manager or engineer of what to fix
Automate Fault Resolution - Device managerfixes problems local to the box or networkcomprised of the same components
-
8/8/2019 Lecture2 - Fault Management
22/23
Topics for Further Investigation
1. Technologies for Automating Fault Diagnosis
2. Methods for Automated Fault Resolution
3. Evolution of Protocols for Fault Notification and Trapping
4. Fault Management System Architectures5. Enterprise Network Management: Best Practices and Lessons
Learned
6. Corporate Implementations of Enterprise Network ManagementSystems
7. Current Issues with Enterprise Network Management
8. Enterprise Network Management of Wireless Networks9. Enterprise Network Management of Converged Networks with
Differentiated Services
-
8/8/2019 Lecture2 - Fault Management
23/23
Forouzan, B. A. TCP/IP Protocol Suite, Second
Edition. McGraw Hill, 2003.
Questions?