analysis of trouble tickets issued by apan jp noc
DESCRIPTION
Analysis of Trouble Tickets Issued by APAN JP NOC. Jin Tanaka [email protected] KDDI. APAN NOC Session in Busan, Korea on 27 August 2003. Agenda. Introduction to APAN JP Site NOC Statistics of Trouble Tickets Trouble analysis Equipment in TokyoXP TransPAC - PowerPoint PPT PresentationTRANSCRIPT
Analysis of Trouble TicketsIssued by APAN JP NOC
KDDI
APAN NOC Session in Busan, Koreaon 27 August 2003
Agenda
• Introduction to APAN JP Site NOC • Statistics of Trouble Tickets• Trouble analysis
– Equipment in TokyoXP– TransPAC
• Characteristics of Our Trouble• Proposal for improving Network Service Level
APAN JP Site NOC
APAN JP Site NOC:Location:
– Physically located at the KDDI Otemachi Bldg 12F in Tokyo, and APAN Tokyo XP equipment is installed on the 5F
Staff:– 24×7 Operators standby Operators are charged with additional operations for other networks
• Scientific, Academic, Commercial
Duties:– Opening and closing of Trouble Tickets– Receiving problem reports– Trouble shooting– Development and maintenance of measurement and operation
tools
Monitoring Environment
KDDICircuit Division
Operation StaffOperation Staff
Open ViewNNM
Mail & Web Client
PhysicalLayer Monitor
KDDIAPANKDDIAPAN
ハブ
ハブ
ハブ
12F
5F
APAN Equipment
• HP Open View works independently in the NOC segment.•The NOC staff is utilizing Mail & Web clients enabling to detect alerts.• Physical Layer Monitor system of KDDI observes circuits. When any alerts are detected,
we can check the same status as KDDI Circuit Division.
APAN JP Site NOC:
NOC
Statistics of Trouble Tickets
• Objects– All trouble tickets issued by APAN JP NOC for the
last 12 months (from 2002/Aug ~ 2003/July)– The total of tickets amount to about 200 tickets– Issue-selecting rules
•Trouble
– All the outages on TransPAC are covered. For others, outage of 15 minutes or more are covered.
•Maintenance– All the maintenance works are covered (including such switch-hits over circuit within 1msec.)
Statistics of Trouble Tickets:
Statistics of Trouble Tickets:
88 82
29
0
10
20
30
40
50
60
70
80
90
Trouble Maintenance Tracking
Number ofTickets
Fig1: Trouble Tickets on Tokyo XP
Trouble Tickets on Tokyo XP
Number of Tickets
0
2
4
6
8
10
12
14
16
8/2002
9/2002
10/2002
11/2002
12/2002
1/2003
2/2003
3/2003
4/2003
5/2003
6/2003
7/2003
Trouble
Maintenance
Fig2: Number of Monthly Tickets/Maintenance
Statistics of Trouble Tickets:Number of Monthly Tickets for Trouble/Maintenance
Fig2: Number of Monthly Tickets for Trouble on Circuit/Equipment/Others/Unknown
StatisticsNumber of Monthly Tickets for Trouble
Number of Tickets
0
2
4
6
8
10
12
14
16
8/2002
9/2002
10/2002
11/2002
12/2002
1/2003
2/2003
3/2003
4/2003
5/2003
6/2003
7/2003
Trouble(Total)Trouble(Circuit)Trouble(Equipment)Trouble(Others)Trouble(Unknown)
Fig3: Number of Monthly Tickets for Maintenance on Circuit/Equipment
Statistics of Trouble Tickets:Number of Monthly Tickets for Maintenance
Number of Tickets
0
2
4
6
8
10
12
14
16
8/2002
9/2002
10/2002
11/2002
12/2002
1/2003
2/2003
3/2003
4/2003
5/2003
6/2003
7/2003
Maintenance(Total)
Maintenance(Circuit)
Maintenance(Equipment)
223:03:3754:22:17
0:00:00
24:00:00
48:00:00
72:00:00
96:00:00
120:00:00
144:00:00
168:00:00
192:00:00
216:00:00
240:00:00
Total Time[hh:mm:ss]
Trouble Maintenance
Fig4:Time Volume of Trouble/Maintenance of APAN Tokyo XP
Statistics of Trouble Tickets:Total Length of Time of Trouble/Maintenance of APAN Tokyo XP
Time underService96.83%
Time underMaintenance
0.62%
Time underTrouble2.55%
Fig5: Total Availability of APAN Network
Statistics of Trouble Tickets:
Total Availability of APAN Network
Results of Trouble Tickets Statistics
• The total numbers of trouble and maintenance almost equal to each other
• The number of tickets varies mainly in response to
circuit trouble and maintenance, which is obvious
especially on TransPAC• Availability of the whole APAN network is 96.83%.
(97.45% when maintenance is excepted from outage)
Trouble Analysis
Trouble Analysis:
Fig6: Trouble Tickets Classified by Area
0
5
10
15
20
25
30
35
40
45
Korea Taiwan Philippine Thailand China J apan U.S.A.
Area
Numbet of Tickets
APAN Seoul XP APAN Taiwan PHnet NECTEC CERNET AI3-NAIST AI3-SFCAPAN Genkai XP CRL IMnet KDDI LABS MAFFIN NIG Osaka UNIVQGPOP RIKEN SINET Softopia Tokyo UNIV WIDE TransPAC northTransPAC south Abilene ANL Genuity NISN StarLight
Trouble Tickets Classified by Area
Fig7: Total Outage Time Classified by Area
0:00:00
24:00:00
48:00:00
72:00:00
96:00:00
Korea Taiwan Philippine Thailand China J apan U.S.A.
Area
Total Time[hh:mm:ss]
APAN Seoul XP APAN Taiwan PHnet NECTEC CERNET AI3- NAISTAI3- SFC APAN Genkai XP CRL IMnet KDDI LABS MAFFINNIG Osaka UNIV QGPOP RIKEN SINET SoftopiaTokyo UNIV WIDE TransPAC north TransPAC south Abilene ANLGenuity NISN StarLight
Trouble Analysis:Total Outage Time Classified by Area
Fig8: Average Outage Time Classified by Area
0:00:00
2:24:00
4:48:00
7:12:00
9:36:00
12:00:00
14:24:00
Korea Taiwan Philippine Thailand China J apan U.S.A.Area
Average Outage Time[hh:mm:ss]
Korea Taiwan Philippine Thailand China J apan U.S.A.
Trouble Analysis:Average Outage Time Classified by Area
0
5
10
15
20
25
30
Korea Taiwan Philippine Thailand China J apan U.S.A.
Number of tickets
Int'l circuit to Seoul XP Equipment of PHnetMaintenance at PHnet Equipment of NECTECInt'l circuit to NECTEC Domestic circuit in ChinaEquipment of CERNET Int'l circuit to CERNETDomestic circuit to Genkai XP Equipment of Tokyo XPJ GN circuit in J apan operation mistake at Tokyo XProuting trouble of Tokyo XP Equipment of AbileneEquipment of ANL Equipment of GenuityDomestic circuit in U.S.A about TransPAC north Domestic circuit in U.S.A about TransPAC southEquipment of Indiana Equipment of TransPACInt'l circuit to TransPAC north
Fig9: Number of Trouble Tickets by Trouble-occurring Area
Trouble Analysis:Number of Trouble Tickets by Trouble-occurring Area
Equipment of PHnet
Local circuit in China
Equipment of TokyoXP
Routing trouble of TokyoXP Int’l circuit to TransPAC
Routing4.55% Operation
mistake1.14%
Maintenance1.14%Unknown
31.82%
Equipment22.73%
Circuit38.64%
Routing Operation mistake Circuit Equipment Unknown Maintenance
Fig10 : Distribution by reason for Amount of Trouble
Trouble Analysis:Distribution by reason for Amount of Troubles
Routing6.68%
Circuit15.97%
Operation Mistake0.37%
Unknown40.62%
Equipment36.36%
Unknown Equipment Circuit Routing Operation miss
Fig11 : Distribution by Reason for Outage Time
Trouble Analysis:Distribution by Reason for Outage Time
Equipment Trouble Analysis in TokyoXP
Equipment Trouble Analysis in TokyoXP:
Fig12: Classification by Vender for TokyoXP
Others1.64%
J uniper16.39%
Cisco32.79%
Foundry49.18%
Classification by Vender for TokyoXP
Fig13: Classification by Software/Hardware for TokyoXP
Others45.45%
Soft45.45%
Hard9.09%
Equipment Trouble Analysis in TokyoXP:
Classification by Software/Hardware for TokyoXP
Trouble Analysis on TransPAC
Trouble Analysis on TransPAC:
0:00:00
2:24:00
4:48:00
7:12:00
9:36:00
12:00:00
14:24:00
16:48:00
19:12:00
Total Time[hh:mm:ss]
Time 15:01:40 16:49:00
Northern link Southern link0
2
4
6
8
10
12
14
16
18
20
Numbet of Tickets
Tickets 20 5
Northern link Southern link
Fig14: Tickets Volume on Northern/Southern links Fig15: Total Outage Time on Northern/Southern links
0
2
4
6
8
10
12
14
16
18
20
Numbet of Tickets
Equipment 3 1
Circuit 17 4
Northern link Southern link0:00:00
2:24:00
4:48:00
7:12:00
9:36:00
12:00:00
14:24:00
16:48:00
19:12:00
NTotal Time[hh:mm:ss]
Equipment 7:34:00 12:00:00
Circuit 7:27:40 4:49:00
Northern link Southern link
Fig16: Ticket Volume on TransPAC links Classified by Circuit/Equipment
Fig17: Total Outage Time on TransPAC links Classified by Circuit/Equipment
Trouble Analysis on TransPAC:
0
2
4
6
8
10
12
14
16
18
Numbet of Tickets
Unknown 2
Submarine cable 1
US local 12 3
J apan local 2 1
Northern link Southern link0:00:00
1:12:00
2:24:00
3:36:00
4:48:00
6:00:00
7:12:00
8:24:00
Total Time[hh:mm:ss]
Unknown 0:00:03
Submarine cable 0:00:06
US local 7:27:29 3:59:00
J apan local 0:00:02 0:50:00
Northern link Southern link
Fig18: Ticket Volume of Circuit Troubles on TransPAC links Classified by reason
Fig19: Time Volume of Circuit Troubles on TransPAC links Classified by reason
Trouble Analysis on TransPAC:
0:00:00
2:24:00
4:48:00
7:12:00
9:36:00
12:00:00
Total Time[hh:mm:ss]
Unknown
StarLight 12:00:00
TransPAC and TokyoXP 6:19:00
TokyoXP 1:15:00
Northern link Southern link0
0.5
1
1.5
2
2.5
3
Numbet of Tickets
Unknown
StarLight 1
TransPAC and TokyoXP 2
TokyoXP 1
Northern link Southern link
Fig20: Ticket Volume of Equipment Troubles on TransPAC links Classified by reason
Fig21: Time Volume of Equipment Troubles on TransPAC links Classified by reason
Trouble Analysis on TransPAC:
Timeunder
Service99.80732
%
Timeunder
Trouble0.19197%
TimeunderMainte
0.000710%
Time under Trouble Time under MaintenamceTime under Service
Trouble Analysis on TransPAC:
Timeunder
Service99.81942%
Timeunder
Trouble0.17155%
TimeunderMainte
0.009028%
Time under Trouble Time under MaintenamceTime under Service
• Northern link Availability = 99.819422% (Including trouble and maintenance)
• Southern link Availability= 99.807319% (Including trouble and maintenance)
• Total Availability = 100 - ( (100 - 99.819422) * (100 - 99.807319) ) = 99.999652%• Redundancy is achieved by the northern and southern links• Fortunately we have no outage at the same time!
Availability of TransPAC
Fig22: Availability of TransPAC Northern link Fig23: Availability of TransPAC Southern link
Characteristics of Our Trouble
Characteristics of Our Trouble:
• Average outage time per trouble
• Longest outage time per trouble
MinutesNumber
of Tickets
0< 18
10< 13
30< 16
60< 19
120< 10
240< 4
480< 5
960< 3
Total 88
20.45%
14.77%18.18%21.59%
11.36%
4.55% 5.68% 3.41%
0<
10<
30<
60<
120<
240<
480<
960<
02468
101214161820
0< 10< 30< 60< 120< 240< 480< 960<
Minutes
Number ofTickets
Table1: APAN Network Outages Fig22: APAN Network Outages
Fig23: Distribution of APAN Network Outages by Length of Time
2:32:09
34:45:00
Minutes
Characteristics of Our Trouble:
• 70% of all the troubles are cleared up within 60 minutes• Equipment troubles are noticeable, causing long outage time in many cases.
– Utilizing housing sites and cooperation with venders are important
• Domestic troubles are noticeable, but the average outage time is shortSharing trouble information internationally is defficult (Time zone, language)
• Trouble occurring on lower layers such as Layer1(circuit) and Layer2(Ethernet switch) are noticeable.
• Having redundant circuits and equipment, as seen on the TransPAC network, will be useful for shortening outage time.
Proposal for Improving Network Service Level
• Shortening of trouble-handling time– Start trouble-handling and announce the information quickly
• Operation tools which enabling us to issue trouble tickets automatically and announce information quickly.
– Shorten trouble-shooting time• Remote trouble-shooting from other areas ( cf. Router Proxy on Global NOC)
– These are under examination in TokyoXP
• World Wide Information sharing– Installation of a shared information server Providing the following information
• Performance and Operation status of the whole APAN network (cf. Animated Traffic map on Global NOC)• Trouble and Maintenance information • Syslog of routers in XPs and APs ※ It is desirable that such a server should be installed on a commercial ISP, distant from the APAN networks.
Proposal for Improving Network Service Level:
• Redundant Network configuration– TransPAC links shows redundant configuration is very effective
in realizing high availability. It is desirable that we establish redundant configuration as much as possible.
• Monitoring of lower layers – For the operation of worldwide networks, it is very important to c
heck the status of international circuits in cooperation with circuit carriers.
– Possibility of using new Ethernet technologies eg,
• BNDP – Bridge Neighbor Discovery Protocol• LFS - Link Fault Signaling (10GbE: 802.3ae)
Proposal for Improving Network Service Level: