igp scaling and stability
TRANSCRIPT
-
8/8/2019 IGP Scaling and Stability
1/22
Why are we scared ofSPF?IGP Scaling and Stability
Dave KatzDave Katz
-
8/8/2019 IGP Scaling and Stability
2/22
Overview
History
Components of IGP Convergence
Conclusions
-
8/8/2019 IGP Scaling and Stability
3/22
Copyright 2002, Juniper Networks, Inc. 3
History
1990: Stability, Scalability, Speed, Correctness--Choose one
First few years spent just getting implementations towork
Nave implementations had enough troubleaccomplishing correctness without being complicated byreality
Prototype-quality software shipped; things tended tofall apart in really ugly ways when pushed hard
-
8/8/2019 IGP Scaling and Stability
4/22
Copyright 2002, Juniper Networks, Inc. 4
History
1994: Stability, Scalability, Speed, Correctness--Choose two
Convergence speed became marketing bullet, InterOpbooth fodder
Cute trick for demos, but the world wasnt clamoring forit
Fast convergence == network back up before someonecan call the NOC
Efforts to speed convergence tended to cause instability
-
8/8/2019 IGP Scaling and Stability
5/22
Copyright 2002, Juniper Networks, Inc. 5
History
1995: Stability, Scalability, Speed, Correctness--Choose 2.5
Networks started getting larger; the era of large ISPsbegan
Stability and scalability were really important, lest youend up in the newspaper (AOL down for 19 hours,other less famous catastrophes)
Simplistic software/hardware architectures wereinherently unstable
Big guard rails used to stay away from the instability cliff Speed was sacrificed (chunky timers)
-
8/8/2019 IGP Scaling and Stability
6/22
Copyright 2002, Juniper Networks, Inc. 6
The Modern Era
Pressure is mounting to get fast again
Real applications exist that could make use of it (VoIP,etc.)
Not just a parlor trick any more
Perception of IP as being too slow used to promoteother technologies
We know how to do better now
-
8/8/2019 IGP Scaling and Stability
7/22
Components of IGP Convergence
Detection
LSA/LSP Generation
Flooding/Propagation
SPF Calculation
Route Recursion
Route Download
-
8/8/2019 IGP Scaling and Stability
8/22
Copyright 2002, Juniper Networks, Inc. 8
Detection
Hardware detection is vastly preferable
Can be debounced, held down, etc., in or close tohardware to reduce churn
GE and 10GE use in POPs makes this difficult (since you
need a way to detect a failed path to a neighbor, not justa failed interface)
-
8/8/2019 IGP Scaling and Stability
9/22
Copyright 2002, Juniper Networks, Inc. 9
Detection
Software detection (Hellos) ultimately needed
Fast hellos have been destabilizing in the past due toscheduling latencies (relative to adjacency timeouts)
Fast hellos are now doable, and are even somewhat
scalable (subsecond detection and hundreds ofneighbors)
Intelligent scheduling and/or distributed processing
If Hello load exceeds 100% of capacity (CPU or protocolI/O bandwidth) things will still fail
Adjacency maintenance must be immune to heavyCPU load
-
8/8/2019 IGP Scaling and Stability
10/22
Copyright 2002, Juniper Networks, Inc. 10
LSA/LSP Generation
When something changes, you have to tell theworld
Traditionally, generation delayed to collectmultiple changes, then hold down to limit
network traffic (on order of seconds) More intelligent strategy is to rapidly announce
interesting changes, allow several successivechanges to be announced quickly before
holddown Newer LSPs will tend to overtake old ones during
flooding on systems under load, if doneintelligently
-
8/8/2019 IGP Scaling and Stability
11/22
Copyright 2002, Juniper Networks, Inc. 11
LSA/LSP Generation
ISIS relatively malleable; some time constantsspecified but none are truly normative
OSPF requires receivers to drop LSAs updatedwithin five seconds (limiting senders is sufficient)
Suggestion--drop receiver behavior completely,use adaptive strategy on transmit
Old receivers will drop rapid updates, butretransmission will operate in similar timeframe
(or add a knob)
-
8/8/2019 IGP Scaling and Stability
12/22
Copyright 2002, Juniper Networks, Inc. 12
Flooding/Propagation
Propagation of received LSA/LSPs delayed
Group LSAs into bigger LSUpd packets in OSPF
Throttling transmission bounds neighbor load (no flowcontrol)
Propagation delays directly affect convergence The next guy cant even think of calculating routes until
the LSA/LSP arrives
Background noise (refreshes, flaps) add to the problem
-
8/8/2019 IGP Scaling and Stability
13/22
-
8/8/2019 IGP Scaling and Stability
14/22
Copyright 2002, Juniper Networks, Inc. 14
SPF Calculation
Traditionally viewed with abject terror
Nave implementations were slow
Run-to-completion scheduling led to lost hellos
Inefficient implementations caused even more overhead
(reinstalling all routes in FIB) Holddowns and scheduling delays added to work
around stability problems
Delays slow convergence, create routing loops (2-3 times delay value)
-
8/8/2019 IGP Scaling and Stability
15/22
Copyright 2002, Juniper Networks, Inc. 15
SPF Calculation
In a properly engineered system, SPF should notbe destabilizing
Do adjacency maintenance in a preemptive fashion
Schedule SPF calculations as background (relative to
LSA/LSP processing, flooding, etc.) SPF should be able to run back-to-back all day long
without threatening stability, and with only marginalimpact on overall convergence
Incremental SPF helps even more, though gains are not
significant compared to other things given currentnetworks
Backoff algorithms arguably unnecessary (especiallyexponential backoff)
-
8/8/2019 IGP Scaling and Stability
16/22
Copyright 2002, Juniper Networks, Inc. 16
Route Recursion
A change in IGP next hop may cause a next hopchange in many thousands of BGP routes
By far the richest target in improving convergence
Traditionally done in software in order to produce
a flat forwarding table
Indirect lookup in hardware has minimalforwarding time cost (essentially free ifforwarding engine has any free cycles) with huge
win in convergence time
-
8/8/2019 IGP Scaling and Stability
17/22
Copyright 2002, Juniper Networks, Inc. 17
Route Download
Output of route calculations typically must bedownloaded to hardware
Download overhead typically rises with thenumber of forwarding tables
Can be very expensive unless recursion is done inhardware
Some level of distribution (multiple engines)necessary for scaling; fixing recursion problem
and careful engineering minimizes cost
-
8/8/2019 IGP Scaling and Stability
18/22
Conclusions
-
8/8/2019 IGP Scaling and Stability
19/22
Copyright 2002, Juniper Networks, Inc. 19
Conclusions
Stability and Scalability have been the primaryconcerns until recently; this effort was quitesuccessful
Some of the biggest barriers to overall network
convergence have been outside of the IGPimplementation per se; examine the behavior ofthe system as a whole (and the network as awhole)
As these barriers fall it becomes more interestingto take more heroic measures to improve IGPperformance
-
8/8/2019 IGP Scaling and Stability
20/22
Copyright 2002, Juniper Networks, Inc. 20
Conclusions
2002: Stability, Scalability, Speed, Correctness--Choose 3.5
Careful engineering should be able to provide speed,scalability, and stability
The only effect of a hea
vily loaded system should be agradual slowing in convergence (not to crash and burn)
IGPs are not inherently unstable, at least until it is nolonger possible to support all of the adjacencies (andeven then it should be possible to gnaw off limbs)
-
8/8/2019 IGP Scaling and Stability
21/22
Copyright 2002, Juniper Networks, Inc. 21
Conclusions
Adding knobs is not the answer
Nobody really knows how to set them
Most settings are wrong
Either make the parameters adaptive, or make
them non-critical Keep adaptivity simple and bounded; behavior is chaotic
enough as it is
-
8/8/2019 IGP Scaling and Stability
22/22
http://www.juniper.net
Copyright 2002, Juniper Networks, Inc. All rights reserved. Juniper Networks is registered in the U.S. Patent and Trademark Officeand in other countries as a trademark of Juniper Networks, Inc. G10, Internet Processor, Internet Processor II, JUNOS, JUNOScript,M5, M10, M20, M40, M40e, and M160 are trademarks of Juniper Networks, Inc. All other trademarks, service marks, registeredtrademarks, or registered service marks are the property of their respective owners. All specifications are subject to change withoutnotice.
Juniper Networks assumes no responsibility for any inaccuracies in this presentation. Juniper Networks reserves the right to change,modify, transfer, or otherwise revise this information without notice.