analysis of trouble tickets issued by apan jp noc

Post on 09-Jan-2016

69 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

DESCRIPTION

Analysis of Trouble Tickets Issued by APAN JP NOC. Jin Tanaka tanaka@kddnet.ad.jp KDDI. APAN NOC Session in Busan, Korea on 27 August 2003. Agenda. Introduction to APAN JP Site NOC Statistics of Trouble Tickets Trouble analysis Equipment in TokyoXP TransPAC - PowerPoint PPT Presentation

TRANSCRIPT

Analysis of Trouble TicketsIssued by APAN JP NOC

Jin Tanakatanaka@kddnet.ad.jp

KDDI

APAN NOC Session in Busan, Koreaon 27 August 2003

Agenda

• Introduction to APAN JP Site NOC • Statistics of Trouble Tickets• Trouble analysis

– Equipment in TokyoXP– TransPAC

• Characteristics of Our Trouble• Proposal for improving Network Service Level

APAN JP Site NOC

APAN JP Site NOC:Location:

– Physically located at the KDDI Otemachi Bldg 12F in Tokyo, and APAN Tokyo XP equipment is installed on the 5F

Staff:– 24×7 Operators standby Operators are charged with additional operations for other networks

• Scientific, Academic, Commercial

Duties:– Opening and closing of Trouble Tickets– Receiving problem reports– Trouble shooting– Development and maintenance of measurement and operation

tools

Monitoring Environment

KDDICircuit Division

Operation StaffOperation Staff

Open ViewNNM

Mail & Web Client

PhysicalLayer Monitor

KDDIAPANKDDIAPAN

ハブ

ハブ

ハブ

12F

5F

APAN Equipment

• HP Open View works independently in the NOC segment.•The NOC staff is utilizing Mail & Web clients enabling to detect alerts.• Physical Layer Monitor system of KDDI observes circuits. When any alerts are detected,

we can check the same status as KDDI Circuit Division.

APAN JP Site NOC:

NOC

Statistics of Trouble Tickets

• Objects– All trouble tickets issued by APAN JP NOC for the

last 12 months (from 2002/Aug ~ 2003/July)– The total of tickets amount to about 200 tickets– Issue-selecting rules

•Trouble

– All the outages on TransPAC are covered. For others, outage of 15 minutes or more are covered.

•Maintenance– All the maintenance works are covered (including such switch-hits over circuit within 1msec.)

Statistics of Trouble Tickets:

Statistics of Trouble Tickets:

88 82

29

0

10

20

30

40

50

60

70

80

90

Trouble Maintenance Tracking

Number ofTickets

Fig1: Trouble Tickets on Tokyo XP

Trouble Tickets on Tokyo XP

Number of Tickets

0

2

4

6

8

10

12

14

16

8/2002

9/2002

10/2002

11/2002

12/2002

1/2003

2/2003

3/2003

4/2003

5/2003

6/2003

7/2003

Trouble

Maintenance

Fig2: Number of Monthly Tickets/Maintenance

Statistics of Trouble Tickets:Number of Monthly Tickets for Trouble/Maintenance

Fig2: Number of Monthly Tickets for Trouble on Circuit/Equipment/Others/Unknown

StatisticsNumber of Monthly Tickets for Trouble

Number of Tickets

0

2

4

6

8

10

12

14

16

8/2002

9/2002

10/2002

11/2002

12/2002

1/2003

2/2003

3/2003

4/2003

5/2003

6/2003

7/2003

Trouble(Total)Trouble(Circuit)Trouble(Equipment)Trouble(Others)Trouble(Unknown)

Fig3: Number of Monthly Tickets for Maintenance on Circuit/Equipment

Statistics of Trouble Tickets:Number of Monthly Tickets for Maintenance

Number of Tickets

0

2

4

6

8

10

12

14

16

8/2002

9/2002

10/2002

11/2002

12/2002

1/2003

2/2003

3/2003

4/2003

5/2003

6/2003

7/2003

Maintenance(Total)

Maintenance(Circuit)

Maintenance(Equipment)

223:03:3754:22:17

0:00:00

24:00:00

48:00:00

72:00:00

96:00:00

120:00:00

144:00:00

168:00:00

192:00:00

216:00:00

240:00:00

Total Time[hh:mm:ss]

Trouble Maintenance

Fig4:Time Volume of Trouble/Maintenance of APAN Tokyo XP

Statistics of Trouble Tickets:Total Length of Time of Trouble/Maintenance of APAN Tokyo XP

Time underService96.83%

Time underMaintenance

0.62%

Time underTrouble2.55%

Fig5: Total Availability of APAN Network

Statistics of Trouble Tickets:

Total Availability of APAN Network

Results of Trouble Tickets Statistics

• The total numbers of trouble and maintenance almost equal to each other

• The number of tickets varies mainly in response to

circuit trouble and maintenance, which is obvious

especially on TransPAC• Availability of the whole APAN network is 96.83%.

(97.45% when maintenance is excepted from outage)

Trouble Analysis

Trouble Analysis:

Fig6: Trouble Tickets Classified by Area

0

5

10

15

20

25

30

35

40

45

Korea Taiwan Philippine Thailand China J apan U.S.A.

Area

Numbet of Tickets

APAN Seoul XP APAN Taiwan PHnet NECTEC CERNET AI3-NAIST AI3-SFCAPAN Genkai XP CRL IMnet KDDI LABS MAFFIN NIG Osaka UNIVQGPOP RIKEN SINET Softopia Tokyo UNIV WIDE TransPAC northTransPAC south Abilene ANL Genuity NISN StarLight

Trouble Tickets Classified by Area

Fig7: Total Outage Time Classified by Area

0:00:00

24:00:00

48:00:00

72:00:00

96:00:00

Korea Taiwan Philippine Thailand China J apan U.S.A.

Area

Total Time[hh:mm:ss]

APAN Seoul XP APAN Taiwan PHnet NECTEC CERNET AI3- NAISTAI3- SFC APAN Genkai XP CRL IMnet KDDI LABS MAFFINNIG Osaka UNIV QGPOP RIKEN SINET SoftopiaTokyo UNIV WIDE TransPAC north TransPAC south Abilene ANLGenuity NISN StarLight

Trouble Analysis:Total Outage Time Classified by Area

Fig8: Average Outage Time Classified by Area

0:00:00

2:24:00

4:48:00

7:12:00

9:36:00

12:00:00

14:24:00

Korea Taiwan Philippine Thailand China J apan U.S.A.Area

Average Outage Time[hh:mm:ss]

Korea Taiwan Philippine Thailand China J apan U.S.A.

Trouble Analysis:Average Outage Time Classified by Area

0

5

10

15

20

25

30

Korea Taiwan Philippine Thailand China J apan U.S.A.

Number of tickets

Int'l circuit to Seoul XP Equipment of PHnetMaintenance at PHnet Equipment of NECTECInt'l circuit to NECTEC Domestic circuit in ChinaEquipment of CERNET Int'l circuit to CERNETDomestic circuit to Genkai XP Equipment of Tokyo XPJ GN circuit in J apan operation mistake at Tokyo XProuting trouble of Tokyo XP Equipment of AbileneEquipment of ANL Equipment of GenuityDomestic circuit in U.S.A about TransPAC north Domestic circuit in U.S.A about TransPAC southEquipment of Indiana Equipment of TransPACInt'l circuit to TransPAC north

Fig9: Number of Trouble Tickets by Trouble-occurring Area

Trouble Analysis:Number of Trouble Tickets by Trouble-occurring Area

Equipment of PHnet

Local circuit in China

Equipment of TokyoXP

Routing trouble of TokyoXP Int’l circuit to TransPAC

Routing4.55% Operation

mistake1.14%

Maintenance1.14%Unknown

31.82%

Equipment22.73%

Circuit38.64%

Routing Operation mistake Circuit Equipment Unknown Maintenance

Fig10 : Distribution by reason for Amount of Trouble

Trouble Analysis:Distribution by reason for Amount of Troubles

Routing6.68%

Circuit15.97%

Operation Mistake0.37%

Unknown40.62%

Equipment36.36%

Unknown Equipment Circuit Routing Operation miss

Fig11 : Distribution by Reason for Outage Time

Trouble Analysis:Distribution by Reason for Outage Time

Equipment Trouble Analysis in TokyoXP

Equipment Trouble Analysis in TokyoXP:

Fig12: Classification by Vender for TokyoXP

Others1.64%

J uniper16.39%

Cisco32.79%

Foundry49.18%

Classification by Vender for TokyoXP

Fig13: Classification by Software/Hardware for TokyoXP

Others45.45%

Soft45.45%

Hard9.09%

Equipment Trouble Analysis in TokyoXP:

Classification by Software/Hardware for TokyoXP

Trouble Analysis on TransPAC

Trouble Analysis on TransPAC:

0:00:00

2:24:00

4:48:00

7:12:00

9:36:00

12:00:00

14:24:00

16:48:00

19:12:00

Total Time[hh:mm:ss]

Time 15:01:40 16:49:00

Northern link Southern link0

2

4

6

8

10

12

14

16

18

20

Numbet of Tickets

Tickets 20 5

Northern link Southern link

Fig14: Tickets Volume on Northern/Southern links Fig15: Total Outage Time on Northern/Southern links

0

2

4

6

8

10

12

14

16

18

20

Numbet of Tickets

Equipment 3 1

Circuit 17 4

Northern link Southern link0:00:00

2:24:00

4:48:00

7:12:00

9:36:00

12:00:00

14:24:00

16:48:00

19:12:00

NTotal Time[hh:mm:ss]

Equipment 7:34:00 12:00:00

Circuit 7:27:40 4:49:00

Northern link Southern link

Fig16:  Ticket Volume on TransPAC links Classified by Circuit/Equipment

Fig17: Total Outage Time on TransPAC links Classified by Circuit/Equipment

Trouble Analysis on TransPAC:

0

2

4

6

8

10

12

14

16

18

Numbet of Tickets

Unknown 2

Submarine cable 1

US local 12 3

J apan local 2 1

Northern link Southern link0:00:00

1:12:00

2:24:00

3:36:00

4:48:00

6:00:00

7:12:00

8:24:00

Total Time[hh:mm:ss]

Unknown 0:00:03

Submarine cable 0:00:06

US local 7:27:29 3:59:00

J apan local 0:00:02 0:50:00

Northern link Southern link

Fig18: Ticket Volume of Circuit Troubles on TransPAC links Classified by reason

Fig19: Time Volume of Circuit Troubles on TransPAC links Classified by reason

Trouble Analysis on TransPAC:

0:00:00

2:24:00

4:48:00

7:12:00

9:36:00

12:00:00

Total Time[hh:mm:ss]

Unknown

StarLight 12:00:00

TransPAC and TokyoXP 6:19:00

TokyoXP 1:15:00

Northern link Southern link0

0.5

1

1.5

2

2.5

3

Numbet of Tickets

Unknown

StarLight 1

TransPAC and TokyoXP 2

TokyoXP 1

Northern link Southern link

Fig20: Ticket Volume of Equipment Troubles on TransPAC links Classified by reason

Fig21: Time Volume of Equipment Troubles on TransPAC links Classified by reason

Trouble Analysis on TransPAC:

Timeunder

Service99.80732

%

Timeunder

Trouble0.19197%

TimeunderMainte

0.000710%

Time under Trouble Time under MaintenamceTime under Service

Trouble Analysis on TransPAC:

Timeunder

Service99.81942%

Timeunder

Trouble0.17155%

TimeunderMainte

0.009028%

Time under Trouble Time under MaintenamceTime under Service

• Northern link Availability = 99.819422% (Including trouble and maintenance)

• Southern link Availability= 99.807319% (Including trouble and maintenance)

• Total Availability = 100 - ( (100 - 99.819422) * (100 - 99.807319) ) = 99.999652%• Redundancy is achieved by the northern and southern links• Fortunately we have no outage at the same time!

Availability of TransPAC

Fig22: Availability of TransPAC Northern link Fig23: Availability of TransPAC Southern link

Characteristics of Our Trouble

Characteristics of Our Trouble:

• Average outage time per trouble

• Longest outage time per trouble

MinutesNumber

of Tickets

0< 18

10< 13

30< 16

60< 19

120< 10

240< 4

480< 5

960< 3

Total 88

20.45%

14.77%18.18%21.59%

11.36%

4.55% 5.68% 3.41%

0<

10<

30<

60<

120<

240<

480<

960<

02468

101214161820

0< 10< 30< 60< 120< 240< 480< 960<

Minutes

Number ofTickets

Table1: APAN Network Outages Fig22: APAN Network Outages

Fig23: Distribution of APAN Network Outages by Length of Time

2:32:09

34:45:00

Minutes

Characteristics of Our Trouble:

• 70% of all the troubles are cleared up within 60 minutes• Equipment troubles are noticeable, causing long outage time in many cases.

– Utilizing housing sites and cooperation with venders are important

• Domestic troubles are noticeable, but the average outage time is shortSharing trouble information internationally is defficult (Time zone, language)

• Trouble occurring on lower layers such as Layer1(circuit) and Layer2(Ethernet switch) are noticeable.

• Having redundant circuits and equipment, as seen on the TransPAC network, will be useful for shortening outage time.

Proposal for Improving Network Service Level

• Shortening of trouble-handling time– Start trouble-handling and announce the information quickly

• Operation tools which enabling us to issue trouble tickets automatically and announce information quickly.

– Shorten trouble-shooting time• Remote trouble-shooting from other areas ( cf. Router Proxy on Global NOC)

– These are under examination in TokyoXP

• World Wide Information sharing– Installation of a shared information server Providing the following information

• Performance and Operation status of the whole APAN network (cf. Animated Traffic map on Global NOC)• Trouble and Maintenance information • Syslog of routers in XPs and APs ※ It is desirable that such a server should be installed on a commercial ISP, distant from the APAN networks.

Proposal for Improving Network Service Level:

• Redundant Network configuration– TransPAC links shows redundant configuration is very effective

in realizing high availability. It is desirable that we establish redundant configuration as much as possible.

• Monitoring of lower layers – For the operation of worldwide networks, it is very important to c

heck the status of international circuits in cooperation with circuit carriers.

– Possibility of using new Ethernet technologies eg,

• BNDP – Bridge Neighbor Discovery Protocol• LFS - Link Fault Signaling (10GbE: 802.3ae)

Proposal for Improving Network Service Level:

top related