high availability survivable networks

23
high availability survivable networks Wayne D. Grover, Anthony Sack 9 October 2007 High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? 7-10 October 2007 La Rochelle, France Presented at:

Upload: leane

Post on 15-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

high availability survivable networks. High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity?. Wayne D. Grover, Anthony Sack 9 October 2007. Presented at:. 7-10 October 2007 La Rochelle, France. Segue….back to DRCN 2005. Beautiful Ischia. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: high availability survivable networks

high availability survivable networks

Wayne D. Grover, Anthony Sack

9 October 2007

High Availability Survivable Networks:When is Reducing MTTR Better than Adding

Protection Capacity?

7-10 October 2007La Rochelle, France

Presented at:

Page 2: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 2

Segue….back to DRCN 2005

Beautiful Ischia.

Question for Dr. Grover: (paraphrasing)

“In a network that is already designed for single failure restorability, to get yet higher availability of services, would you think it is better to add still more spare capacity to increase the dual-failure restorability or to invest at that point in MTRR reduction to enhance availability?”

GOOD QUESTION ! …and it lead to this study..

..the closing Panel Discussion.

Page 3: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 3

Why this question is so insightful…

To minimize the impact of dual-failure events, we can:

Reduce physical MTTR values for network spans

physical repairs will happen faster

Time spent in an overlapping dual failure repair state will go down as MTTR-2

As MTTR ->0 there are no dual failures

Increase the network restorability to dual span failures By adding more spare

capacity

fewer dual-failure pairs will be outage causing

R2 = 1 means a triple failure will be needed to cause outage!

Which is best approach? Is there an optimal investment strategy?

Page 4: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 4

In a survivable network, MTTR takes on new importance…

Key point is that in an R1=1 survivable network, outage requires two failures which interact in the network (restoration-wise) with repair processes that overlap in time

Otherwise the two failures are simply time-successive single failures.

This means that:

In a survivable network, unavailability drops as the square of the MTTR !

Page 5: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 5

Illustrating the principle

TTR_Failure_1 TTR_Failure_2

No risk of outage(R1=1)

R2<1: risk of outage. Duration proportional to repair overlap time. Reduces as 2( )O MTTR

Decreasing time to repair

Increasing time to repair

Page 6: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 6

Study mandate and design

Explore the trade-off between availability improvement through R2 enhancement and MTTR reduction.

What is the most cost-effective strategy for combined investment in capacity additions and repair time

improvements to maximize availability?

Framework: Total Availability Investment

100% to Dual-failure Restorability (R2)

100% to Physical MTTR Reduction

Interesting because the response of both variables to increasing investment is not linear.

Page 7: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 7

“R1” is the level of single-failure restorability (range 0 to 1)

“R2” is the level of dual-failure restorability (range 0 to 1 as well)

Examples:

R1 = 1 indicates a network fully restorable to all single failures;

R2 = 0.60 means that 60% of failed working capacity units (or service paths) are restorable to dual failures, on average.

Some Terminology

Page 8: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 8

Typical capacity cost profile of enhancing R2

Any network designed for100% R1 will always havea non-zero R2 level as well.

This is true even if the R1network design is optimal.

R2 vs. cost curve thenasymptotically approachesunity – always a diminishingreturn to further capacity investment.

This characteristic curve shape is well-known in the literature. Exact shapes for this curve have been found for different networks in, for example, the Ph.D. thesis by Clouquer

1.0

R2inherent

R2

Cost for restorabilityCR1 CR2

0.0

1.0

R2inherent

R2

Cost for restorabilityCR1 CR2

0.0

Page 9: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 9

Properties of MTTR versus expenditure

Shape of the MTTR vs. costcurve is much less certain, but a plausible parametricmodel seems defensible.

For example, initialinvestments lead to largeMTTR reductions, withdiminishing returns thereafter.

Conceivably,however, this curve could alsobe convex – with initial investments leading to only small reductions, and larger investments required for larger changes.

Both scenarios will be tested in our experimental calculations.

0.0

MTTR

Cost for MTTR reduction0.0

0.0

MTTR

Cost for MTTR reduction0.0

Page 10: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 10

Theoretical model development

Consider the availability of an individualreference path in an R1 survivable network:(for conceptual investigation, all spans taken as identical;S: number of spans in the network, N: number of spans on the path)

21 2 ( ) ( 1) (1 2)path spanA N S N N N U R

(One failureon the path)

(Both failures on the path)

Number of dual-failure scenarios that may cause

service path outage

Probability of any dual-failure event actually occurring

dual-span failure restorability of the

network

• This expression excludes contributions due to triple (or higher-order) independent failure scenarios.

Page 11: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 11

Relation of unavailability to MTTR

We can now express the span unavailability in terms of MTTR:

/( )spanU MTTR MTTR MTTF Well-known expression:

spanU MTTR Approximation:(where λ is the failure rate)

2 22 ( 1) ( ) (1 2)pathU N S N MTTR R

Also: Switch to an unavailability orientation

Algebraically simplify “number of scenarios”

term from before

&

Page 12: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 12

Unavailability as a function of MTTR and R2

We now have the operative expression that relates unavailability to both MTTR and R2:

2 22 ( 1) ( ) (1 2)pathU N S N MTTR R

Note that unavailability responds in a linear way to R2, but to the square of the MTTR or failure rate (λ).

Could availability improvements be most optimally gained through MTTR improvements or some type of

combined strategy with R2 enhancement?

Page 13: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 13

MTTR and R2 as functions of cost

To study the economic tradeoff, we define MTTR and R2 as functions of cost:

2 ( )r rR f C( )m mMTTR f C

222 ( 1) ( ) 1 ( )path m m r rU N S N f C f C

If Cm + Cr is a constant total budget amount, then an optimum split of total investment must exist which minimizes Upath.

What is this optimum split for each of the two MTTR characteristic curve shapes postulated earlier?

Page 14: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 14

Experimental calculations

For experimental calculations,we set a numerical value of 100“arbitrary cost units” to be dividedbetween MTTR and R2.(i.e. total investment remains constant,only the allocation changes)

Data points from the characteristic curves for both variables were used in the equation just presented to generate new curves for the unavailability of a typical reference path.

Other assumed parameters:N = 6 (length of the reference path)S = 20 (number of spans in the network)λ = 0.0005 (failure rate per span)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

100806040200

% of Total Budget Spent on R2 Enhancement

Av

era

ge

R2

(i,j)

Ac

hie

ve

d  

% of Total Budget Spent on R2 Enhancement

Ave

rag

e R

2(i,j

) A

chie

ved

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

100806040200

% of Total Budget Spent on R2 Enhancement

Av

era

ge

R2

(i,j)

Ac

hie

ve

d  

% of Total Budget Spent on R2 Enhancement

Ave

rag

e R

2(i,j

) A

chie

ved

N, S, λ

222 ( 1) ( ) 1 ( )path m m r rU N S N f C f C

0.0000

0.0005

0.0010

0.0015

0.0020

0.0025

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction (Balance goes to R2 Enhancement)

Re

fere

nc

e P

ath

Un

av

aila

bili

ty  

% of Total Budget Spent on MTTR Reduction(Balance goes to R2 Enhancement)

Re

fere

nce

Pat

h U

na

vaila

bili

ty

0.0000

0.0005

0.0010

0.0015

0.0020

0.0025

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction (Balance goes to R2 Enhancement)

Re

fere

nc

e P

ath

Un

av

aila

bili

ty  

% of Total Budget Spent on MTTR Reduction(Balance goes to R2 Enhancement)

Re

fere

nce

Pat

h U

na

vaila

bili

ty

0.0

4.0

8.0

12.0

16.0

20.0

24.0

28.0

32.0

36.0

40.0

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction

MT

TR

(h

ou

rs)

% of Total Budget Spent on MTTR Reduction

MT

TR

(h

our

s)

0.0

4.0

8.0

12.0

16.0

20.0

24.0

28.0

32.0

36.0

40.0

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction

MT

TR

(h

ou

rs)

% of Total Budget Spent on MTTR Reduction

MT

TR

(h

our

s)

Test one: concave MTTRTest two: convex MTTR

Page 15: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 15

Experimental R2 curve (both tests)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

100806040200

% of Total Budget Spent on R2 Enhancement

Av

era

ge

R2

(i,j)

Ac

hie

ve

d  

% of Total Budget Spent on R2 Enhancement

Ave

rag

e R

2(i

,j) A

chie

ved

Page 16: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 16

Experimental MTTR curve – concave (test one)

0.0

4.0

8.0

12.0

16.0

20.0

24.0

28.0

32.0

36.0

40.0

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction

MT

TR

(h

ou

rs)

% of Total Budget Spent on MTTR Reduction

MT

TR

(h

ours

)

Page 17: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 17

0.0000

0.0005

0.0010

0.0015

0.0020

0.0025

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction (Balance goes to R2 Enhancement)

Re

fere

nc

e P

ath

Un

av

aila

bili

ty  

% of Total Budget Spent on MTTR Reduction(Balance goes to R2 Enhancement)

Re

fere

nce

Pat

h U

nav

aila

bilit

y

Test one result

Region 2 Region 3

Region 1

Page 18: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 18

Test one: discussion

Three distinct regions evident.

In Region 1 Availability greatlybenefits from relatively easilyobtained initial reductions in MTTR.

In Region 3 MTTR reduction isa matter of diminishing returns. It would have been better to add more capacity with the same money, to enhance R2.

In Region 2, the overall availability is lowest and not very sensitive to exactly how the budget is spent on R2, or MTTR.

0.0000

0.0005

0.0010

0.0015

0.0020

0.0025

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction (Balance goes to R2 Enhancement)

Re

fere

nc

e P

ath

Un

av

aila

bili

ty  

% of Total Budget Spent on MTTR Reduction(Balance goes to R2 Enhancement)

Ref

eren

ce P

ath

Una

vaila

bilit

y

0.0000

0.0005

0.0010

0.0015

0.0020

0.0025

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction (Balance goes to R2 Enhancement)

Re

fere

nc

e P

ath

Un

av

aila

bili

ty  

% of Total Budget Spent on MTTR Reduction(Balance goes to R2 Enhancement)

Ref

eren

ce P

ath

Una

vaila

bilit

y

Region 2 Region 3

Region 1

Region 2Region 2 Region 3Region 3

Region 1Region 1

Page 19: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 19

Experimental MTTR curve – convex (test two)

0.0

4.0

8.0

12.0

16.0

20.0

24.0

28.0

32.0

36.0

40.0

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction

MT

TR

(h

ou

rs)

% of Total Budget Spent on MTTR Reduction

MT

TR

(h

ours

)

Page 20: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 20

Test two result

0.0000

0.0005

0.0010

0.0015

0.0020

0.0025

0.0030

0.0035

0.0040

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction (Balance goes to R2 Enhancement)

Re

fere

nc

e P

ath

Un

av

aila

bili

ty  

Re

fere

nce

Pat

h U

nav

aila

bilit

y

% of Total Budget Spent on MTTR Reduction(Balance goes to R2 Enhancement)

Page 21: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 21

Test two discussion

This is the curve portraying whereit is very difficult (costly) to obtain any MTTR reductions.

We think less plausible a shape,but worthwhile as a “what if” to show the range of strategythis analysis can inform an operator on.

In this case, the preferred strategy is strongly on capacity addition to enhance R2,

0.0000

0.0005

0.0010

0.0015

0.0020

0.0025

0.0030

0.0035

0.0040

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction (Balance goes to R2 Enhancement)

Re

fere

nc

e P

ath

Un

av

aila

bili

ty  

Ref

eren

ce P

ath

Una

vaila

bili

ty

% of Total Budget Spent on MTTR Reduction(Balance goes to R2 Enhancement)

0.0000

0.0005

0.0010

0.0015

0.0020

0.0025

0.0030

0.0035

0.0040

0 20 40 60 80 100

% of Total Budget Spent on MTTR Reduction (Balance goes to R2 Enhancement)

Re

fere

nc

e P

ath

Un

av

aila

bili

ty  

Ref

eren

ce P

ath

Una

vaila

bili

ty

% of Total Budget Spent on MTTR Reduction(Balance goes to R2 Enhancement)

Page 22: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 22

Concluding comments

Suffices to show that at least conceptually an optimum combined strategy in MTTR and R2 investment exists.

A unique / interesting phenomenon arising specifically in the context of networks that are already “R1=1” survivable by design. (Note MTTR has no special role for R1=1 design)

Once a network is R1=1, however, MTTR takes on new importance because thereafter U ~ O(MTTR-2)

Other factors to consider: MTTR improvements are probably annual expenses, manpower R2 improvement is, however, probably a capital investment. Added capacity never hurts in a network (throughput, flexibility, grown) But fast repairs will be directly appreciated by users too

Page 23: high availability survivable networks

High Availability Survivable Networks: When is Reducing MTTR Better than Adding Protection Capacity? DRCN 2007 23

thank you

Thank You

(And thanks again to the great question from the DRCN 2005 Panel Discussion !)

www.telus.com www.trlabs.ca www.ualberta.ca