cascading failures in infrastructure networks

57
Cascading Failures in Infrastructure Networks David Alderson Ph.D. Candidate Dept. of Management Science and Engineering Stanford University April 15, 2002 Advisors: William J. Perry, Nicholas Bambos

Upload: september-alvarado

Post on 03-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Cascading Failures in Infrastructure Networks. David Alderson Ph.D. Candidate Dept. of Management Science and Engineering Stanford University April 15, 2002 Advisors: William J. Perry, Nicholas Bambos. Outline. Background and Motivation Union Pacific Case Study Conceptual Framework - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cascading Failures in  Infrastructure Networks

Cascading Failures in Infrastructure Networks

David Alderson

Ph.D. Candidate

Dept. of Management Science and Engineering

Stanford University

April 15, 2002

Advisors: William J. Perry, Nicholas Bambos

Page 2: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Outline

• Background and Motivation

• Union Pacific Case Study

• Conceptual Framework

• Modeling Cascading Failures

• Ongoing Work

Page 3: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Background

• Most of the systems we rely on in our daily lives are designed and built as networks– Voice and data communications– Transportation– Energy distribution

• Large-scale disruption of such systems can be catastrophic because of our dependence on them

• Large-scale failures in these systems– Have already happened– Will continue to happen

Page 4: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Recent Examples• Telecommunications

– ATM network outage: AT&T (February 2001)– Frame Relay outage: AT&T (April 1998), MCI (August 1999)

• Transportation– Union Pacific Service Crisis (May 1997- December 1998)

• Electric Power– Northeast Blackout (November 1965)– Western Power Outage (August 1996)

• All of the above– Baltimore Tunnel Accident (July 2001)

Page 5: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Public Policy

• U.S. Government interest from 1996 (and earlier)

• Most national infrastructure systems are privately owned and operated– Misalignment between business imperatives (efficiency) and

public interest (robustness)

• Previously independent networks now tied together through common information infrastructure

• Current policy efforts directed toward building new public-private relationships– Policy & Partnership (CIAO)– Law Enforcement & Coordination (NIPC)– Defining new roles (Homeland Security)

Page 6: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Research QuestionsBroadly:

• Is there something about the network structure of these systems that contributes to their vulnerability?

More specifically:

• What is a cascading failure in the context of an infrastructure network?

• What are the mechanisms that cause it?

• What can be done to control it?

• Can we design networks that are robust to cascading failures?

• What are the implications for network-based businesses?

Page 7: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Outline

• Background and Motivation

• Union Pacific Case Study

• Conceptual Framework

• Modeling Cascading Failures

• Ongoing Work

Page 8: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Union Pacific Railroad

• Largest RR in North America– Headquartered in Omaha, Nebraska– 34,000 track miles (west of Mississippi River)

• Transporting– Coal, grain, cars, other manifest cargos– 3rd party traffic (e.g. Amtrak passenger trains)

• 24x7 Operations:– 1,500+ trains in motion– 300,000+ cars in system

• More than $10B in revenue annually

Page 9: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Union Pacific Railroad

• Four major resources constraining operations:– Line capacity

(# parallel tracks, speed restrictions, etc.)– Terminal capacity (in/out tracks, yard capacity)– Power (locomotives)– Crew (train personnel, yard personnel)

• Ongoing control of operations is mainly by:– Dispatchers– Yardmasters– Some centralized coordination, primarily through a

predetermined transportation schedule

Page 10: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Union Pacific Railroad

• Sources of network disruptions: – Weather

(storms, floods, rock slides, tornados, hurricanes, etc.)– Component failures

(signal outages, broken wheels/rails, engine failures, etc.)– Derailments (~1 per day on average)– Minor incidents (e.g. crossing accidents)

• Evidence for system-wide failures– 1997-1998 Service Crisis

• Fundamental operating challenge

Page 11: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

UPRR Fundamental Challenge

Two conflicting drivers:

• Business imperatives necessitate a lean operation that maximizes efficiency and drives the system toward high utilization of available network resources.

• An efficient operation that maximizes utilization is very sensitive to disruptions, particularly because of the effects of network congestion.

Page 12: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Railroad CongestionThere are several places where congestion may be

seen within the railroad:

• Line segments

• Terminals

• Operating Regions

• The Entire Railroad Network

• (Probably not locomotives or crews)

Congestion is related to capacity.

Page 13: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

UPRR Capacity Model Concepts

Factors Affecting Observed Performance:•Dispatcher / Corridor Manager Expertise•On Line Incidents / Equipment Failure•Weather•Temporary Speed Restrictions

3628

25

32

Lin

e S

egm

ent V

eloc

ity

Volume (trains per day)

Emprically-DerivedRelationship

18

35

The Effect of ForcingVolume in Excess of Capacity

Page 14: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Implications of Congestion

Concepts of traffic congestion are important for two key aspects of network operations:– Capacity Planning and Management– Service Restoration

In the presence of service interruptions, the objective of Service Restoration is to: – Minimize the propagation across the network of any

disturbance caused by a service interruption– Minimize the time to recovery to fluid operations

Page 15: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Modeling Congestion

We can model congestion using standard models from transportation engineering.

Define the relationships between:

• Number of items in the system (Density)

• Average processing rate (Velocity)

• Input Rate

• Output Rate (Throughput)

Page 16: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Modeling Congestion

N

nKnv 1)(

N

K

Velocity (v)

Density (n)

Velocity vs. Density: Assume that velocity decreases (linearly) with

the traffic density.

Page 17: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Modeling Congestion

N

nnKnnvn

2

)()(

Density (n)N

*

N/2

Throughput ()

Throughput vs. Density Throughput = Velocity · Density

Throughput is maximized at

n = N/2 with value

* = N/4 (K=1).

Page 18: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

*

Velocity

Throughput

K

N

Velocity

Density

*

N Density

ThroughputModeling Congestion

Velocity

Throughput

Page 19: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Modeling Congestion

p

N

nKnv 1)(

p

N

nnKnvnn 1)()(

Let p represent the intensity of congestion onset.

v(n)=1-(n/10)^p

0

0.2

0.4

0.6

0.8

1

1.2

0 1 2 3 4 5 6 7 8 9 10n

0.1

0.25

0.5

1

2

4

10

mu(n) = n*v(n) = n * { 1 - (n/10)^p }

0

1

2

34

5

6

7

8

0 1 2 3 4 5 6 7 8 9 10n

0.1

0.25

0.5

1

2

4

10

Page 20: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Modeling Congestion

p

N

nnK

p1

lim

It is clear that

nK

N

becomes

otherwise

Nnn

nnK

N

N

0

1)( where

))(1(

mu(n) = n*v(n) = n * { 1 - (n/10)^p }

0

1

2

34

5

6

7

8

0 1 2 3 4 5 6 7 8 9 10n

0.1

0.25

0.5

1

2

4

10

Page 21: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

*

Velocity

Throughput

K

N

Velocity

Density

*

N Density

ThroughputModeling Congestion

Velocity

Throughput

Page 22: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

UP Service Crisis• Initiating Event

– 5/97 derailment at a critical train yard outside of Houston

• Additionally – Loss of BNSF route that was decommissioned for repairs– Embargo at Laredo interchange point to Mexico

• Complicating Factors– UP/SP merger and transition to consolidated operations– Hurricane Danny, fall 1997– Record rains and floods (esp. Kansas) in 1998

• Operational Issues– Tightly optimized transportation schedule– Traditional service priorities

Page 23: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Union Pacific RailroadTotal System Inventory, December 1996 - November 1998

280,000

290,000

300,000

310,000

320,000

330,000

340,000

350,000

360,000

370,000

Dec

-96

Mar

-97

Jun-

97

26-S

ep-9

7

17-O

ct-9

7

7-N

ov-9

7

28-N

ov-9

7

19-D

ec-9

7

9-Ja

n-98

30-J

an-9

8

20-F

eb-9

8

13-M

ar-9

8

3-A

pr-9

8

24-A

pr-9

8

15-M

ay-9

8

5-Ju

n-98

26-J

un-9

8

17-J

ul-9

8

7-A

ug-9

8

28-A

ug-9

8

18-S

ep-9

8

9-O

ct-9

8

30-O

ct-9

8

20-N

ov-9

8

Inventory (cars)

UP Service Crisis

Source: UP Filings with Surface Transportation Board, September 1997 – December 1998

Houston-Gulf Coast

Central Corridor(Kansas-Nebraska-Wyoming)

Southern California

Page 24: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Case Study: Union PacificCompleted Phase 1 of case study:

• Understanding of the factors affecting system capacity, system dynamics

• Investigation of the 1997-98 Service Crisis

• Project definition: detailed study of Sunset Route

• Data collection, preliminary analysis for the Sunset Route

Ongoing work:

• A detailed study of their specific network topology

• Development of real-time warning and analysis tools

Page 25: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Outline

• Background and Motivation

• Union Pacific Case Study

• Conceptual Framework

• Modeling Cascading Failures

• Ongoing Work

Page 26: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Basic Network Concepts• Networks allow the sharing of distributed resources

• Resource use resource load– Total network usage = total network load

• Total network load is distributed among the components of the network– Many networking problems are concerned with finding a

“good” distribution of load

• Resource allocation load distribution

Page 27: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Infrastructure Networks

• Self-protection as an explicit design criterion

• Network components themselves are valuable– Expensive – Hard to replace– Long lead times to obtain

• Willingness to sacrifice current system performance in exchange for future availability

• With protection as an objective, connectivity between neighboring nodes is– Helpful– Harmful

Page 28: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Cascading Failures

Cascading failures occur in networks where– Individual network components can fail– When a component fails, the natural dynamics of the

system may induce the failure of other components

Network components can fail because– Accident– Internal failure– Attack

Initiating events

A cascading failure is not– A single point of failure

– The occurrence of multiple concurrent failures

– The spread of a virus

Page 29: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Related Work

Cascading Failures:– Electric Power: Parrilo et. al. (1998), Thorp et. al. (2001)– Social Networks: Watts (1999)– Public Policy: Little (2001)

Other network research initiatives– “Survivable Networks”– “Fault-Tolerant Networks”

Large-Scale Vulnerability– Self-Organized Criticality: Bak (1987), many others– Highly Optimized Tolerance: Carlson and Doyle (1999)– Normal Accidents: Perrow (1999) – Influence Models: Verghese et. al. (2001)

Page 30: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Our Approach

• Cascading failures in the context of flow networks– conservation of flow within the network

• Overloading a resource leads to degraded performance and eventual failure

• Network failures are not independent– Flow allocation pattern resource interdependence

• Focus on the dynamics of network operation and control

• Design for robustness (not protection)

Page 31: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Taxonomy of Network Flow Models

FluidApproximations

Time-DependentAverages

Static FlowModels

Long-TermAverages

DiffusionApproximations

Averages &Variances

QueueingModels

Probability Distributions

SimulationModels

Event Sequences

Quantity ofInterest

ModelingApproach

OngoingOperation

(Processing& Routing)

RelevantDecisions

CoarseGrainedModels

FineGrainedModels

Failure &Recovery

CapacityPlanning

Reference:Janusz Filipiak

Page 32: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Time Scales in Network Operations

minutesto hours

daysto weeks

milliseconds to seconds

daysto weeks

monthsto years

minutesto hours

LongTime

Scales

ShortTime

Scales

ComputerRouting

RailroadTransportation

RelevantDecisions

Ongoing Operation(Processing & Routing)

Failure &Recovery

CapacityPlanning

Page 33: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

What Are Network Dynamics?

Type of Network Dynamics

UnderlyingAssumption

DynamicsON

Networks

DynamicsOF

Networks

Network topologyis CHANGING

Network topologyis STATIC

Failure &Recovery

Page 34: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Network Flow Optimization

• Original work by Ford and Fulkerson (1956)

• One of the most studied areas for optimization

• Three main problem types– Shortest path problems– Maximum flow problems– Minimum cost flow problems

• Special interpretation for some of the most celebrated results in optimization theory

• Broad applicability to a variety of problems

Page 35: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Single Commodity Flow ProblemNotation:

N set of nodes, indexed i = 1, 2, … N

A set of arcs , indexed j = 1, 2, … M

di demand (supply) at node i

fj flow along arc j

uj capacity along arc j

A node-arc incidence matrix,

A set of flows f is feasible if it satisfies the constraints:

Ai f = di i N (flows balanced at node i, and

supply/demand is satisfied)

0 fj uj j A (flow on arc j less than capacity)

otherwise

node exits arc if

node enters arc if

0

1

1

ij

ij

aij

Page 36: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Single Commodity Flow Problem

Feasible region, denoted F():

(flows balanced at node i)

0 fj uj j A (flow on arc j feasible)

otherwise

if

if

0

ti

si

dfA ii

ts

Page 37: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Minimum Cost Problem

subject to:

(flows balanced at node i)

0 fj uj j A (flow on arc j feasible)

otherwise

if

if

0

ti

si

dfA ii

ts

Let cj = cost on arc j

Minimizef (j A) cj fj

Page 38: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Shortest Path Problem

subject to:

(flows balanced at node i)

0 fj uj=1 j A (flow on arc j feasible)

otherwise

if

if

0

ti

si

dfA ii

=11=

Let costs cj correspond to “distances”, set = 1

Minimizef (j A) cj fj

ts

Page 39: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Maximum Flow Problem

subject to:

(flows balanced at node i)

0 fj uj j A (flow on arc j feasible)

otherwise

if

if

0

ti

si

dfA ii

Maximizef

ts

Page 40: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Network Optimization

Traditional Assumptions:– Complete information– Static network (capacities, demands, topology)– Centralized decision maker

Solution obtained from global optimization algorithms

Relevant issues:– Computational (time) complexity

• Function of problem size (number of inputs)• Based on worst-case data

– Parallelization (decomposition)– Synchronization (global clock)

Page 41: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

New ChallengesMost traditional assumptions no longer hold…

• Modern networks are inherently dynamic– Connectivity fluctuates, components fail, growth is ad hoc– Traffic demands/patterns constantly change

• Explosive growth massive size scale

• Faster technology shrinking time scale

• Operating decisions are made with incomplete, incorrect information

• Claim: A global approach based on static assumptions is no longer viable

Page 42: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Cascading Failures & Flow Networks

• In general, we assume that network failures result from violations of network constraints• Node feasibility (flow conservation)• Arc feasibility (arc capacity)

• That is, failure infeasibility

• The network topology provides the means by which failures (infeasibilities) propagate

• In the optimization context, a cascading failure is a collapse of the feasible region of the optimization problem that results from the interaction of the constraints when a parameter is changed

Page 43: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Addressing New Challenges

• Extend traditional notions of network optimization to model cascading failures in flow networks– Allow for node failures– Include flow dynamics

• Consider solution approaches based on– Decentralized control– Local information

• Leverage ideas from dual problem formulation

• Identify dimensions along which there are explicit tensions and tradeoffs between vulnerability and performance

Page 44: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Dual Problem FormulationPrimal Problem

Min cT f

s.t. A f = d

f 0

f u

Dual Problem

Max Td - uT

s.t. TA c

unrestricted

0

• Dual variables , have interpretation as prices at nodes, arcs

• Natural decomposition as distributed problem

• e.g. Nodes set prices based on local information

• Examples:

• Kelly, Low and many others for TCP/IP congestion control

• Boyd and Xiao for dual decomposition of SRRA problem

Page 45: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Outline

• Background and Motivation

• Union Pacific Case Study

• Conceptual Framework

• Modeling Cascading Failures

• Ongoing Work

Page 46: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

• Let n(k) = flow being processed in interval k

• Node dynamics

n(k+1) = n(k) + a(k) – d(k)

n(k)(load)

d(k)(performance)

Node Dynamics• Consider each node as a simple input-output system running in discrete time…

n(k)a(k) d(k)

n(k)

knknkd

0

)(0)()(

• Processing capacity • State-dependent outputSystem is feasible

for a(k) <

constant a(k)

n*

• a(k) – d(k) indicates how n(k) is changing

• n* is equilibrium point

• Node “fails” if n(k) >

Page 47: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

a2(k)

=a2(k) d2(k)u1

Network Dynamics• The presence of an arc between adjacent nodes couples their behavior

n1(k)a1(k) d1(k) n2(k)

n2(k)

d2(n2)

2

2

n1(k)

d1(n1)

1

1

a1(k)

• Arc capacities limit both outgoing and incoming flow

u1(k)

u1(k)

a2(k)a2(k)

Page 48: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

=u1u1=0

Network Dynamics• The failure of one node can lead to the failure of another

n1(k)a1(k) d1(k) n2(k)a2(k) d2(k)

• When a node fails, the capacity of its incoming arcs drop effectively to zero.

• Upstream node loses capacity of arc

• In the absence of control, the upstream node fails too.

Result: Node failures propagate “upstream”…

Question:

• How will the network respond to perturbations?

Page 49: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Network Robustness

Consider the behavior of the network in response to a perturbation to arc capacity:1. Does the disturbance lead to a local failure?2. Does the failure propagate?3. How far does it propagate?

Measure the consequences in terms of:– Size of the resulting failure island– Loss of network throughput

Key factors:– Flow processing sensitivity to congestion– Network topology– Local routing and flow control policies– Time scales

Page 50: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Congestion SensitivityIn many real network systems, components are sensitive to congestion

SystemLoad

SystemPerformance

• Using the aforementioned family of functions we can tune the

sensitivity of congestion

• Direct consequences on local dynamics, stability, and control

• Tradeoff between system efficiency vs. fragility

• Implications for local behavior

evidence ofcongestion

Page 51: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Qualitative Behavior

Input Rate

x1* x2

* TotalSystemLoad

OutputRate

StableEquilibrium

UnstableEquilibrium

CongestionCollapse

Page 52: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

SevereCongestion

MildCongestion

FluidProcessing

Qualitative Behavior

Input Rate

x1* x2

* TotalSystemLoad

OutputRate

System response to changes in input rate is opposite in fluid vs. congested regions.

Page 53: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Qualitative Behavior

Input Rate

x1* x2

* TotalSystemLoad

OutputRate

Safety Margin

New Input Rate

y1* y2

*

“Efficiency” results in “Fragility”

Page 54: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Ongoing Work• Modeling behavior of flow networks

– Vulnerability to cascading failures– Sensitivity to congestion

• Bringing together notions from network optimization, dynamical systems, and distributed control

• Exploring operating tradeoffs between – efficiency and robustness– global objectives vs. local behavior– system performance vs. system vulnerability

• Collectively, these features provide a framework for study of real systems– UPRR case study– Computer networks

Page 55: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Future Directions

• Development of decision support tools to support real-time operations– Warning systems– Incident recovery

• Investigation of issues related to topology

• Notions from economics– Network complements and substitutes– Node cooperation and competition

Page 56: Cascading Failures in  Infrastructure Networks

IPAM 4/15/2002 David Alderson

Key Takeaways

• Large-scale failures happen– Elements of vulnerability associated with connectivity– But we are moving to connect everything together…

• Critical tradeoff for network-based businesses– Business profitability from resource efficiency– System robustness

• Two fundamental aspects to understanding large-scale failure behavior– Networks– Dynamics

• Relevance to a wide variety of applications

Page 57: Cascading Failures in  Infrastructure Networks

Thank You

[email protected]