igp scaling and stability

Post on 09-Apr-2018

221 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • 8/8/2019 IGP Scaling and Stability

    1/22

    Why are we scared ofSPF?IGP Scaling and Stability

    Dave KatzDave Katz

  • 8/8/2019 IGP Scaling and Stability

    2/22

    Overview

    History

    Components of IGP Convergence

    Conclusions

  • 8/8/2019 IGP Scaling and Stability

    3/22

    Copyright 2002, Juniper Networks, Inc. 3

    History

    1990: Stability, Scalability, Speed, Correctness--Choose one

    First few years spent just getting implementations towork

    Nave implementations had enough troubleaccomplishing correctness without being complicated byreality

    Prototype-quality software shipped; things tended tofall apart in really ugly ways when pushed hard

  • 8/8/2019 IGP Scaling and Stability

    4/22

    Copyright 2002, Juniper Networks, Inc. 4

    History

    1994: Stability, Scalability, Speed, Correctness--Choose two

    Convergence speed became marketing bullet, InterOpbooth fodder

    Cute trick for demos, but the world wasnt clamoring forit

    Fast convergence == network back up before someonecan call the NOC

    Efforts to speed convergence tended to cause instability

  • 8/8/2019 IGP Scaling and Stability

    5/22

    Copyright 2002, Juniper Networks, Inc. 5

    History

    1995: Stability, Scalability, Speed, Correctness--Choose 2.5

    Networks started getting larger; the era of large ISPsbegan

    Stability and scalability were really important, lest youend up in the newspaper (AOL down for 19 hours,other less famous catastrophes)

    Simplistic software/hardware architectures wereinherently unstable

    Big guard rails used to stay away from the instability cliff Speed was sacrificed (chunky timers)

  • 8/8/2019 IGP Scaling and Stability

    6/22

    Copyright 2002, Juniper Networks, Inc. 6

    The Modern Era

    Pressure is mounting to get fast again

    Real applications exist that could make use of it (VoIP,etc.)

    Not just a parlor trick any more

    Perception of IP as being too slow used to promoteother technologies

    We know how to do better now

  • 8/8/2019 IGP Scaling and Stability

    7/22

    Components of IGP Convergence

    Detection

    LSA/LSP Generation

    Flooding/Propagation

    SPF Calculation

    Route Recursion

    Route Download

  • 8/8/2019 IGP Scaling and Stability

    8/22

    Copyright 2002, Juniper Networks, Inc. 8

    Detection

    Hardware detection is vastly preferable

    Can be debounced, held down, etc., in or close tohardware to reduce churn

    GE and 10GE use in POPs makes this difficult (since you

    need a way to detect a failed path to a neighbor, not justa failed interface)

  • 8/8/2019 IGP Scaling and Stability

    9/22

    Copyright 2002, Juniper Networks, Inc. 9

    Detection

    Software detection (Hellos) ultimately needed

    Fast hellos have been destabilizing in the past due toscheduling latencies (relative to adjacency timeouts)

    Fast hellos are now doable, and are even somewhat

    scalable (subsecond detection and hundreds ofneighbors)

    Intelligent scheduling and/or distributed processing

    If Hello load exceeds 100% of capacity (CPU or protocolI/O bandwidth) things will still fail

    Adjacency maintenance must be immune to heavyCPU load

  • 8/8/2019 IGP Scaling and Stability

    10/22

    Copyright 2002, Juniper Networks, Inc. 10

    LSA/LSP Generation

    When something changes, you have to tell theworld

    Traditionally, generation delayed to collectmultiple changes, then hold down to limit

    network traffic (on order of seconds) More intelligent strategy is to rapidly announce

    interesting changes, allow several successivechanges to be announced quickly before

    holddown Newer LSPs will tend to overtake old ones during

    flooding on systems under load, if doneintelligently

  • 8/8/2019 IGP Scaling and Stability

    11/22

    Copyright 2002, Juniper Networks, Inc. 11

    LSA/LSP Generation

    ISIS relatively malleable; some time constantsspecified but none are truly normative

    OSPF requires receivers to drop LSAs updatedwithin five seconds (limiting senders is sufficient)

    Suggestion--drop receiver behavior completely,use adaptive strategy on transmit

    Old receivers will drop rapid updates, butretransmission will operate in similar timeframe

    (or add a knob)

  • 8/8/2019 IGP Scaling and Stability

    12/22

    Copyright 2002, Juniper Networks, Inc. 12

    Flooding/Propagation

    Propagation of received LSA/LSPs delayed

    Group LSAs into bigger LSUpd packets in OSPF

    Throttling transmission bounds neighbor load (no flowcontrol)

    Propagation delays directly affect convergence The next guy cant even think of calculating routes until

    the LSA/LSP arrives

    Background noise (refreshes, flaps) add to the problem

  • 8/8/2019 IGP Scaling and Stability

    13/22

  • 8/8/2019 IGP Scaling and Stability

    14/22

    Copyright 2002, Juniper Networks, Inc. 14

    SPF Calculation

    Traditionally viewed with abject terror

    Nave implementations were slow

    Run-to-completion scheduling led to lost hellos

    Inefficient implementations caused even more overhead

    (reinstalling all routes in FIB) Holddowns and scheduling delays added to work

    around stability problems

    Delays slow convergence, create routing loops (2-3 times delay value)

  • 8/8/2019 IGP Scaling and Stability

    15/22

    Copyright 2002, Juniper Networks, Inc. 15

    SPF Calculation

    In a properly engineered system, SPF should notbe destabilizing

    Do adjacency maintenance in a preemptive fashion

    Schedule SPF calculations as background (relative to

    LSA/LSP processing, flooding, etc.) SPF should be able to run back-to-back all day long

    without threatening stability, and with only marginalimpact on overall convergence

    Incremental SPF helps even more, though gains are not

    significant compared to other things given currentnetworks

    Backoff algorithms arguably unnecessary (especiallyexponential backoff)

  • 8/8/2019 IGP Scaling and Stability

    16/22

    Copyright 2002, Juniper Networks, Inc. 16

    Route Recursion

    A change in IGP next hop may cause a next hopchange in many thousands of BGP routes

    By far the richest target in improving convergence

    Traditionally done in software in order to produce

    a flat forwarding table

    Indirect lookup in hardware has minimalforwarding time cost (essentially free ifforwarding engine has any free cycles) with huge

    win in convergence time

  • 8/8/2019 IGP Scaling and Stability

    17/22

    Copyright 2002, Juniper Networks, Inc. 17

    Route Download

    Output of route calculations typically must bedownloaded to hardware

    Download overhead typically rises with thenumber of forwarding tables

    Can be very expensive unless recursion is done inhardware

    Some level of distribution (multiple engines)necessary for scaling; fixing recursion problem

    and careful engineering minimizes cost

  • 8/8/2019 IGP Scaling and Stability

    18/22

    Conclusions

  • 8/8/2019 IGP Scaling and Stability

    19/22

    Copyright 2002, Juniper Networks, Inc. 19

    Conclusions

    Stability and Scalability have been the primaryconcerns until recently; this effort was quitesuccessful

    Some of the biggest barriers to overall network

    convergence have been outside of the IGPimplementation per se; examine the behavior ofthe system as a whole (and the network as awhole)

    As these barriers fall it becomes more interestingto take more heroic measures to improve IGPperformance

  • 8/8/2019 IGP Scaling and Stability

    20/22

    Copyright 2002, Juniper Networks, Inc. 20

    Conclusions

    2002: Stability, Scalability, Speed, Correctness--Choose 3.5

    Careful engineering should be able to provide speed,scalability, and stability

    The only effect of a hea

    vily loaded system should be agradual slowing in convergence (not to crash and burn)

    IGPs are not inherently unstable, at least until it is nolonger possible to support all of the adjacencies (andeven then it should be possible to gnaw off limbs)

  • 8/8/2019 IGP Scaling and Stability

    21/22

    Copyright 2002, Juniper Networks, Inc. 21

    Conclusions

    Adding knobs is not the answer

    Nobody really knows how to set them

    Most settings are wrong

    Either make the parameters adaptive, or make

    them non-critical Keep adaptivity simple and bounded; behavior is chaotic

    enough as it is

  • 8/8/2019 IGP Scaling and Stability

    22/22

    http://www.juniper.net

    Copyright 2002, Juniper Networks, Inc. All rights reserved. Juniper Networks is registered in the U.S. Patent and Trademark Officeand in other countries as a trademark of Juniper Networks, Inc. G10, Internet Processor, Internet Processor II, JUNOS, JUNOScript,M5, M10, M20, M40, M40e, and M160 are trademarks of Juniper Networks, Inc. All other trademarks, service marks, registeredtrademarks, or registered service marks are the property of their respective owners. All specifications are subject to change withoutnotice.

    Juniper Networks assumes no responsibility for any inaccuracies in this presentation. Juniper Networks reserves the right to change,modify, transfer, or otherwise revise this information without notice.