network layer routing - andrzej dudaduda.imag.fr/m1/routing.pdf · goal: determine “good” path...
TRANSCRIPT
1
Computer Networking
Network Layer routing
http://duda.imag.fr
Prof. Andrzej Duda [email protected]
2
Network Layer
Chapter goals: § understand principles of routing protocols
§ internal protocols § distance vector (RIP § link state (OSPF)
§ external routing § distance path (BGP)
3
Routing and Packet forwarding § Packet forwarding - data plane
§ forward every packet to the next hop § in real time
§ Routing - control plane § computation of routing tables or data structures for unicast and
multicast § normally only between routers § non-real time: latency up to several minutes § two level hierarchy
§ internal routing inside an administrative domain (called autonomous system - AS)
§ external routing between AS (administrative domains or ISPs) § uses routing protocols such as
§ internal: RIP, OSPF, EIGRP § external: BGP
4
Autonomous systems
host
subnetwork
autonomous system
border router
internal router
switch (bridge)
interconnection layer 2
interconnection layer 3
VLAN
5
Interconnection of AS
NAP, MAE, GIX, IXP
subnetworks
border router
autonomous system
6
Routing
Graph abstraction for routing algorithms:
§ graph nodes are routers § graph edges are
physical links § link cost: delay, $ cost,
or congestion level
Goal: determine “good” path (sequence of routers) thru
network from source to dest.
Routing protocol
A
E D
C B
F 2
2 1 3
1
1 2
5 3
5
§ “good” path: § typically means minimum
cost path § other def’s possible
7
Distance vector § Dynamic routing based on distributed estimation of
the distance to the destination § uses the distributed algorithm by Bellman-Ford (dynamic
programming) § each router receives aggregated information from its
neighbors § estimates the local cost to its neighbors § computes the best routes § no global network states
§ Distance § number of hops § delay
8
Bellman-Ford algorithm § Bellman-Ford algorithm
§ node i knows cost c(i,k) to its immediate neighbours (+∞ for most values of k)
§ distance D(i,n) is given by: D(i,n) = mink (c(i,k) + D(k,n)) § in the worst case, convergence after N-1 iterations
§ Distributed Bellman-Ford algorithm § initially: D(i,n) = 0 if i directly connected to n and D(i,n) =
+∞ otherwise § node i receives from neighbour k latest values of D(k,n) for
all n (distance vector) § node i computes the best estimates
D(i,n) = mink (c(i,k) + D(k,n))
9
Bellman-Ford algorithm
c(i,m)
c(i,1) D(1,n)
c(i,k) D(k,n)
D(m,n)
i n
1
k
m
10
Example of Bellman-Ford
A B
G H
K J I
A I H K Table of J
A 0 24 20 21 8 A B 12 36 31 28 20 A G 18 31 6 31 18 H H 17 20 0 19 12 H I 21 0 14 22 10 I J 9 11 7 10 0 - K 24 22 22 0 6 K J: 8 10 12 6
computation of G : 18+8=26, 31+10=41, 6+12=18, 6+31=37 → choice of 18, H
11
Distance vector example § Simple network
§ routers connected by links § destinations = subnetworks connected to routers § symmetric links § cost = number of hops
12
Initialization
dest link cost
A local 0
A
l1 A B
l6 D E
l4 l3 C l5
l2
dest link cost
B local 0
B
dest link cost
C local 0
C
dest link cost
D local 0
D
dest link cost
E local 0
E
13
Distance vector announcement
dest link cost
A local 0
A
l1 A B
l6 D E
l4 l3 C l5
l2
dest link cost
B local 0 A l1 1
B
dest link cost
C local 0
C
dest link cost
D local 0 A l3 1
D
dest link cost
E local 0
E
from A: dest cost A 0
from A: dest cost A 0
14
Distance vector announcement
dest link cost
A local 0 B l1 1
A
l1 A B
l6 D E
l4 l3 C l5
l2
dest link cost
B local 0 A l1 1
B
dest link cost
C local 0 A l2 2 B l2 1
C
dest link cost
D local 0 A l3 1
D
dest link cost
E local 0 A l4 2 B l4 1
E
from B: dest cost A 1 B 0
from B: dest cost B 0 A 1
from B: dest cost A 1 B 0
15
Distance vector announcement
dest link cost
A local 0 B l1 1 D l3 1
A
l1 A B
l6 D E
l4 l3 C l5
l2
dest link cost
B local 0 A l1 1
B
dest link cost
C local 0 A l2 2 B l2 1
C
dest link cost
D local 0 A l3 1
D
dest link cost
E local 0 A l4 2 B l4 1 D l6 1
E
from D: dest cost D 0 A 1
from D: dest cost D 0 A 1
16
Final
dest link cost
A local 0 B l1 1 D l3 1 C l1 2 E l1 2
A
l1 A B
l6 D E
l4 l3 C l5
l2
dest link cost
B local 0 A l1 1 C l2 1 E l4 1 D l1 2
B
dest link cost
C local 0 A l2 2 B l2 1 D l5 2 E l5 1
C
dest link cost
D local 0 A l3 1 B l3 2 C l6 2 E l6 1
D
dest link cost
E local 0 A l4 2 B l4 1 D l6 1 C l5 1
E
17
Link failure
dest link cost
A local 0 B l1 ∞ D l3 1 C l1 ∞ E l1 ∞
A
A B
l6 D E
l4 l3 C l5
l2
dest link cost
B local 0 A l1 ∞ C l2 1 E l4 1 D l1 ∞
B
dest link cost
C local 0 A l2 ∞ B l2 1 D l5 2 E l5 1
C
dest link cost
D local 0 A l3 1 B l3 ∞ C l6 2 E l6 1
D
dest link cost
E local 0 A l4 ∞ B l4 1 D l6 1 C l5 1
E
from A: dest cost A 0 B,C,E ∞ D 1
18
Link failure
dest link cost
A local 0 B l1 ∞ D l3 1 C l1 ∞ E l1 ∞
A
A B
l6 D E
l4 l3 C l5
l2
dest link cost
B local 0 A l1 ∞ C l2 1 E l4 1 D l1 ∞
B
dest link cost
C local 0 A l2 ∞ B l2 1 D l5 2 E l5 1
C
dest link cost
D local 0 A l3 1 B l6 2 C l6 2 E l6 1
D
dest link cost
E local 0 A l4 ∞ B l4 1 D l6 1 C l5 1
E
from B: dest cost E 0 A ∞ B,D,C 1
19
Final state after failure
dest link cost
A local 0 B l3 3 D l3 1 C l3 3 E l3 2
A
A B
l6 D E
l4 l3 C l5
l2
dest link cost
B local 0 A l4 3 C l2 1 E l4 1 D l4 2
B
dest link cost
C local 0 A l5 3 B l2 1 D l5 2 E l5 1
C
dest link cost
D local 0 A l3 1 B l6 2 C l6 2 E l6 1
D
dest link cost
E local 0 A l6 2 B l4 1 D l6 1 C l5 1
E
20
Equal link costs - link failures
dest link cost
A local 0 B l3 3 D l3 1 C l3 3 E l3 2
A
A B
D E
l4 l3 C l5
l2
dest link cost
B local 0 A l4 3 C l2 1 E l4 1 D l4 2
B
dest link cost
C local 0 A l5 3 B l2 1 D l5 2 E l5 1
C
dest link cost
D local 0 A l3 1 B l6 ∞ C l6 ∞ E l6 ∞
D
dest link cost
E local 0 A l6 2 B l4 1 D l6 1 C l5 1
E
21
Counting to infinity
dest link cost
A local 0 B l3 3 D l3 1 C l3 3 E l3 2
A
A
D
l3
dest link cost
D local 0 A l3 1 B l3 4 C l3 4 E l3 3
D
from A: dest cost A 0 B,C 3 D 1 E 2
§ Loop between A and D § Exchange of routes, costs increase
by 2 each cycle § Convergence to a stable state
§ ∞ = large number § e.g. RIP: ∞ = 16
22
Split horizon
§ Minimize the effects of bouncing and counting to infinity
§ Rule § if A routes packets to X via B, it does not announce this
route to B
23
Example of split horizon
dest link cost
A local 0 B l3 3 D l3 1 C l3 3 E l3 2
A
A B
D E
l4 l3 C l5
l2
dest link cost
B local 0 A l4 3 C l2 1 E l4 1 D l4 2
B
dest link cost
C local 0 A l5 3 B l2 1 D l5 2 E l5 1
C
dest link cost
D local 0 A l3 1 B l6 ∞ C l6 ∞ E l6 ∞
D
dest link cost
E local 0 A l6 2 B l4 1 D l6 1 C l5 1
E
24
Split horizon
dest link cost
A local 0 B l3 3 D l3 1 C l3 3 E l3 2
A
A
D
l3
dest link cost
D local 0 A l3 1 B l6 ∞ C l6 ∞ E l6 ∞
D
from A: dest cost A 0
§ Split horizon cuts the process of counting to infinity
25
Split horizon
dest link cost
A local 0 B l3 ∞ D l3 1 C l3 ∞ E l3 ∞
A
A
D
l3
dest link cost
D local 0 A l3 1 B l6 ∞ C l6 ∞ E l6 ∞
D
from D: dest cost D 0 B,C,E ∞
§ Split horizon cuts the process of counting to infinity
26
Split horizon may fail
B
E
l4 C l5
l2
dest link cost
B local 0 A l4 ∞ C l2 1 E l4 1 D l4 ∞
B
dest link cost
C local 0 A l5 3 B l2 1 D l5 2 E l5 1
C
dest link cost
E local 0 A l6 ∞ B l4 1 D l6 ∞ C l5 1
E
from E: dest cost E 0 A ∞ C 1 D ∞
27
Split horizon may fail
B
E
l4 C l5
l2
dest link cost
B local 0 A l2 4 C l2 1 E l4 1 D l2 3
B
dest link cost
C local 0 A l5 3 B l2 1 D l5 2 E l5 1
C
dest link cost
E local 0 A l6 ∞ B l4 1 D l6 ∞ C l5 1
E
from C: dest cost C 0 A 3 D 2 E 1
from C: dest cost C 0 B 1
28
Split horizon may fail
B
E
l4 C l5
l2
dest link cost
B local 0 A l2 4 C l2 1 E l4 1 D l2 3
B
dest link cost
C local 0 A l5 3 B l2 1 D l5 2 E l5 1
C
dest link cost
E local 0 A l4 5 B l4 1 D l4 4 C l5 1
E from B: dest cost A 4 B 0 C 1 D 3
29
RIP v1 § Distance vector protocol § Metric - hops § Network span limited to 15
§ ∞ = 16
§ Split horizon § Destination network identified by IP address
§ no prefix/subnet information - derived from address class
§ Encapsulated as UDP packets, port 520 § Largely implemented (routed on Unix) § Broadcast every 30 seconds or when update detected § Route not announced during 3 minutes
§ cost becomes ∞
30
Message format
address family zero
IP address zero
zero
0 31
command version zero
metric
§ May be repeated 25 times § Command
§ REQUEST - 1 (sent at boot to initialize) § RESPONSE - 2 (broadcast each 30 sec)
31
Missing netmask
B
C
D
A
E F
10.0.0.0 (255.0.0.0)
10.0.0.0 (255.0.0.0)
10.1.0.0 255.255.0.0
10.2.0.0 255.255.0.0
§ A and E can forward to 10.0.0.0 § Packet to 10.2.0.1 can go through F or B
§ if sent to B, it goes through A and C
§ If link C-D broken, no route to destination
packet to 10.2.0.1
32
RIP v2 (RFC 2453)
§ Subnetworks § take into account CIDR prefixes and netmasks
§ Authentication § Multicast
§ 224.0.0.9 mapped to MAC 01-00-5E-00-00-09 § on LAN only, no need for IGMP
33
Message format
address family route tag
IP address netmask
next router
0 31
command version unused
metric
§ Command, version unchanged § One address family - authentication § Next router
§ used at the border of different routing domains (e.g. RIP and OSPF)
§ Route tag § for external routes (used by BGP)
34
Announcing netmasks
B
C
D
A
E F
10.1.0.0 (255.255.0.0)
10.2.0.0 (255.255.0.0)
10.1.0.0 255.255.0.0
10.2.0.0 255.255.0.0
§ E can forward to 10.2.0.0 § Packet to 10.2.0.1 can go through F
packet to 10.2.0.1
35
Routing domains
B C
D
A
E F
§ Different routing domains § e.g. routers under different administrations that run different
routing protocols (RIP, OSPF)
§ If A wants to send a packet to F, it goes through D and E
§ When announcing F, D adds E as next router
domain 1 (OSPF)
domain 2 (RIP)
36
Simple authentication
xFFFF authentication type = 2
password on 16 bytes
0 31
command version unused
§ Configuration of gated (/etc/gated.conf) rip yes { interface all version 2 multicast authentication simple "qptszwmz" }
37
MD5 authentication
xFFFF authentication type = 3
zero
0 31
command version unused
packet length key Id
xFFFF x01
auth. length increasing sequence no.
zero
route info
seal
38
MD5 authentication
§ Seal § MD5 digest on the message using a shared secret § sequence number avoids replay attacks
§ Configuration of gated (/etc/gated.conf) rip yes { interface all version 2 multicast authentication md5 "qptszwmz" }
39
Link State Routing § Principles
§ estimate metrics with neighbors § bandwidth, delay, cost (fixed by administrator)
§ build a packet with the metrics of all neighbors § flood to all routers § compute the shortest path to all destinations (Dijkstra) § update if modification of topology
§ Used in OSPF (Open Shortest Path First) and PNNI (ATM routing protocol)
40
Topology Database Synchronization
§ Neighbouring nodes synchronize before starting any relationship § Hello protocol; keep alive § initial synchronization of database § description of all links (no information yet)
§ Once synchronized, a node accepts link state advertisements § contain a sequence number, stored with record in the
database § only messages with new sequence number are accepted § accepted messages are flooded to all neighbours § sequence number prevents anomalies (loops or blackholes)
41
Example network
n1
A
B
n6
D E
n4
n3
C
n5 n2
F
n7
§ Each router knows directly connected networks
42
Initial routing tables
net type
n1 Ether n2 P-to-P
A
n1
A
B
n6
D E
n4
n3
C
n5 n2
F
n7
net type
n6 Ether n5 P-to-P
D net type
n6 Ether n7 Ether
E
net type
n1 Ether n7 Ether
F
net type
n1 Ether n4 P-to-P n5 P-to-P
C
net type
n3 Ether n2 P-to-P n4 P-to-P
B
43
Flooding
net cost
n1 10 n2 100
A
net cost
n6 10 n5 100
D net cost
n6 10 n7 10
E net cost
n1 10 n7 10
F
net cost
n1 10 n4 100 n5 100
C
net cost
n3 10 n2 100 n4 100
B
§ The local metric information is flooded to all routers § After convergence, all routers have the same
information
44
Topology graph § Arrows to nets with a given metric
§ except P-to-P, stub, and external networks
§ From nets to routers, metric = 0
A
B
C
D
F
E
100
10 10
10
10
100 100
n1
100 n6
n7
n3
100
10
100
10
10
10
0
0
0
0
0 0
external network
54
0
stub network
point to point link
broadcast network
external network
45
Simplified graph
§ Only arrows with metrics between routers § Execute the SPF (Shortest Path First - Dijkstra)
algorithm on the graph
A
B
C
D
F
E
100
10
10
10
10
10
100 100
46
SPF at A
1. Initialization 1. PATH variable: router A (the best path to destination) 2. TENT variable: empty (tentative paths)
2. For each router N in PATH 1. for each neighbor M of N
1. c(A, M) = c(A, N) + c(N, M) 2. if M is not in PATH nor in TENT with a better cost, insert M
with direction N in TENT
3. If TENT is empty, end. Otherwise take the entry with the best cost from TENT, insert it into PATH and go to 2.
At the end PATH contains the tree of best paths to all destinations
47
Executing SPF
100 10
A
B C F
10
PATH TENT consider a router
Before: TENT: A-B(100), A-C(10), A-F(10) PATH: A After: TENT: A-B(100), A-F(10) PATH: A-C(10)
48
Executing SPF
100 10
A
B C F
10
20 110
A B D 110
F 20
Before: TENT: A-B(100), A-F(10), C-D(110) PATH: A-C(10) After: TENT: A-B(100), C-D(110) PATH: A-C(10), A-F(10)
49
Executing SPF
100 10
A
B C F
10
D 110
20
A
C
E
20
20
Before: TENT: A-B(100), C-D(110), F-E(20) PATH: A-C(10), A-F(10) After: TENT: A-B(100), C-D(110) PATH: A-C(10), A-F(10), F-E(20)
50
Executing SPF
100 10
A
B C F
10
D 110
20
E
F 30
D 30
Before: TENT: A-B(100), C-D(110) TENT: A-B(100), E-D(30) PATH: A-C(10), A-F(10), F-E(20) After: TENT: A-B(100) PATH: A-C(10), A-F(10), F-E(20), E-D(30)
51
Executing SPF
100 10
A
B C F
10
20
E
D 30
130 40
C E
Before: TENT: A-B(100) PATH: A-C(10), A-F(10), F-E(20), E-D(30) After: TENT: PATH: A-C(10), A-F(10), F-E(20), E-D(30) A-B(100)
52
Executing SPF
100 10
A
B C F
10
20
E
D 30
200 200
A C
Before: TENT: PATH: A-C(10), A-F(10), F-E(20), E-D(30) After: TENT: PATH: A-C(10), A-F(10), F-E(20), E-D(30), A-B(100)
53
Result
100 10
A
B C F
10
§ Tree of best paths to all destinations
20
E
D 30
dest next cost
B direct 100 C direct 100 D F 30 E F 20 F direct 10
A
54
Routing table of A
net next
n1 direct n2 direct n3 B n4 C n5 C n6 F n7 F
A
n1
A
B
n6
D E
n4
n3
C
n5 n2
F
n7
55
Towards OSPF § OSPF (Open Shortest Path First)
§ Link State protocol § Link State information: LSA (Link State Advertisement) § different sub-protocols: Hello, Database Description, Link
State flooding
§ It allows to § separate hosts and routers § consider different types of networks
§ broadcast (Ethernet), NBMA (ATM, X.25), point-to-point (PPP)
§ divide large networks into several areas § independent route computing in each area
56
Separate hosts and routers
§ Link should be described in the DB § link between a router and each host,
but LANs in most cases: advertize the link to the "stub network"
§ link of the form of a broadcast network (Ethernet)
§ IP address of the subnetwork (stub network)
§ e.g. n3 identified by 128.88.38/24
§ link to a neighbor router § IP address of the neighbor router
§ e.g. n2 identified by 176.44.23.254 § no IP address assigned to the interface
– interface index
A
B
n3
n2
57
Designated routers
§ Number of neighbors § if n routers, n(n-1)/2 neighbors
§ Election of a designated router on a LAN § n-1 neighbors § flooding
§ advertise to 224.0.0.6 (all designated routers) § flooded to 224.0.0.5 (all routers)
§ back-up designated router § listens to advertisements, but does not flood § failure of the designated router detected by Hello
– back-up becomes designated router
l1
A B D C A B
D C
A B
D C
A B
D C
58
Virtual networks
§ LAN represented as a virtual network § less entries in the DB § real cost to n1, zero to routers
n1
A B D C
A C
F
n1
F
10
0
10
10
10
0 0
0
59
Divide large networks § Why divide large networks? § Cost of computing routing tables
§ update when topology changes § SPF algorithm
§ n routers, k links § complexity O(n*k)
§ size of DB, update messages grows with the network size
§ Limit the scope of updates and computational overhead § divide the network into several areas § independent route computing in each area § inject aggregated information on routes into other areas
60
Hierarchical Routing § A large OSPF domain can be configured into areas
§ one backbone area (area 0) § non backbone areas (areas numbered other than 0)
§ All inter-area traffic goes through area 0 § strict hierarchy
§ Inside one area: link state routing as seen earlier § one topology database per area
area 0
B1 X3
X1
X4 A1
area 2 area 1
X1
X3 X4 B2 A2
61
Principles § Routing method used in the higher level:
§ distance vector § no problem with loops - one backbone area
§ Mapping of higher level nodes to lower level nodes § area border routers (inter-area routers) belong to two areas
§ Inter-level routing information § summary link state advertisements (LSA) from other areas
are injected into the local topology databases
62
Example
§ Assume networks n1 and n2 become visible at time 0. Show the topology databases at all routers
area 0
B1 X4
X1
X3 A1
area 2 area 1
X2
X6 X5 B2 A2
n1
n2
10
10
10
6 6
6 6
6
6
10
10
10
63
Solution area 0
B1 X4
X1
X3 A1
area 2 area 1
X2
X6 X5 B2 A2
n1
n2
10
10
10
6 6
6 6
6
6
n1
n2
area 2 topology database
area 0 topology database
n1, m=10 n2, m=16
n1, m=16 n2, m=10
n1, m=28 n2, m=22
n1, m=22 n2, m=16
10
10
10
area 1 topology database
64
Explanations
§ All routers in area 2 propagate the existence of n1 and n2, directly attached to B1 (resp. B2).
§ Area border routers X4 and X6 belong to area 2, thus they can compute their distances to n1 and n2
§ Area border routers X4 and X6 inject their distances to n1 and n2 into the area 0 topology database (item 3 of the principle). The corresponding summary LSA is propagated to all routers of area 0.
§ All routers in area 0 can now compute their distance to n1 and n2, using their distances to X4 and X6, and using the principle of distance vector (item 1 of the principle).
65
Comments § Distance vector computation causes none of the RIP
problems § strict hierarchy: no loop between areas
§ External and summary LSA for all reachable networks are present in all topology databases of all areas § most LSAs are external § can be avoided in configuring some areas as terminal: use default entry to the backbone
§ Area partitions require specific support
66
Classification of routers
§ Internal routers § a router with all directly connected networks belonging to
the same area
§ Area border routers § attached to multiple areas § condense LSA of their attached areas for distribution to the
backbone
§ Backbone routers § a router that has an interface to the backbone area
§ AS boundary routers § exchange routing information with routers belonging to
other AS
67
Classification of routers
area 0
B1 X4
X1
X3 A1
area 2 area 1
X2
X6 X5 B2 A2
n1
n2
10
10
10
6 6
6 6
6
6
10
10
10
AS-boundary router
backbone router
area border router internal router
68
OSPF protocol § On top of IP (protocol type = 89) § Multicast
§ 224.0.0.5 - all routers of a link § 224.0.0.6 - all designated and backup routers
§ Sub-protocols § Hello to identify neighbors, elect a designated and a
backup router § Database description to diffuse the topology between
adjacent routers § Link State to request, update, and ack the information on
a link (LSA - Link State Advertisement)
§ LSA § tagged with the router Id and checksum § 5 different types
69
Convergence § Route timeout after 1 hour
§ LS Update every 30 min.
§ Detect a failure § 40 sec (dead interval)
§ Smallest interval to recompute SPF § 30 sec (Dijkstra interval)
§ Reconfiguration time § 70 sec.
§ Proposals § Hello each 100 ms § SPF immediately
70
OSPF vs. RIP § much more complex, but presents many advantages
§ no count to infinity § no limit on the number of hops (OSPF topologies limited by
Network and Router LSA size (max 64KB) to O(5000) links) § less signaling traffic (LS Update every 30 min) § advanced metric § large networks - hierarchical routing
§ most of the traffic when change in topology § but periodic Hello messages § in RIP: periodic routing information traffic
§ drawback § difficult to configure
71
Interconnection of AS
NAP, MAE, GIX, IXP
subnetworks
border router
autonomous system
72
Interconnection of AS
§ Border routers § interconnect AS
§ NAP or GIX, or IXP § exchange of traffic - peering
§ Route construction § based on the path through a series of AS § based on administrative policies § routing tables: aggregation of entries § works if no loops and at least one route - external routing
protocols, e.g. BGP
73
Border Gateway Protocol
§ Path vector § knowledge of the global state
§ path: sequence of AS with attributes § loop detection: AS appears twice § global optimization
§ Policy routing § define what routes are accepted, chosen, and advertised
74
§ AS border router - BGP speaker § peer-to peer relation with another AS border router § connected communication (on top of a TCP connection)
Autonomous System C
C2
C1
C4 C3
IGRP
B2
B1 B4
B3
A2
A1
A4
A3
Autonomous System A (ex: IMAG) Autonomous System B
BGP
BGP
OSPF
Autonomous System D
BGP
BGP
D2
D3
D1
D5
D4 OSPF
area 0
area 2 area 1
BGP - Hierarchical routing
75
What does BGP do? § BGP is a routing protocol between AS. It is
used to establish routes from one router in one AS to any network prefix in the world
§ There are two levels in BGP: § Inter-domain: one AS is a virtual node in the higher
layer § Intra-domain: distribution of routes inside one AS
§ The method of routing is § Path vector § With policy
§ A route advertisement from B to A for a destination prefix is an agreement by B that it will forward packets sent via A destined for any destination in the prefix.
advertisement C B:n1,n2
A
B
C
packet to n2
n1, n2
76
A
B
C
E n1, n2
A:n1,n2
A:n1,n2
C A:n1,n2 C:n3
B A:n1,n2 B:n5
D
D C A:n1,n2 D C: n3 D: n4
dest AS path
n1 B A n2 B A n3 D C n4 D n5 B
BGP table in E n5
n3
n4
Path Vector routing
§ AS maintains a table of best paths known so far § Table updated using local rules § Suitable when
§ no global meaning for costs can be assumed (heterogeneous environments) § global topology is fairly stable
77
Border Routers, E-BGP and I-BGP § E-BGP: BGP runs on border routers = “BGP speakers”
belonging to one AS only § two border routers per boundary (OSPF - one per area boundary)
§ I-BGP: BGP speakers talks to each other inside the AS using “Internal-BGP” § full mesh called the “BGP mesh” § I-BGP is the same as E-BGP except for one rule: routes learned from a
neighbour in the mesh are not repeated inside the mesh
D1 D2
D4 D5
D3
A B
G H
C D
E F
X:n1 X:n1
A->C: D1,X: n1 C->E: D1,X: n1 C->D: D1,X: n1 C->F: D1,X: n1 E->G: D3,D1,X: n1
E-BGP
E-BGP
I-BGP
78
Policy Routing
§ Mainly 3 types of relations depending on money flows § customer: EPFL is customer of Switch. EPFL pays Switch § provider: Switch is provider for EPFL; Switch is paid by
EPFL § peer: EPFL and CERN are peers: costs of interconnection is
shared
§ Type of relation is negotiated in bilateral agreements there is no architecture rule, just business
79
Goal of Policy Routing § Example:
§ ISP3 - ISP2 is transatlantic link, cost shared between ISP2 and ISP 3
§ ISP 3 - ISP 1 is a local, inexpensive link § Ci is customer of ISPi, ISPs are peers
§ It is advantageous for ISP3 to send traffic to n2 via ISP1
§ ISP1 does not agree to carry traffic from C3 to C2 § ISP1 offers a “transit service” to C1 and
a “non-transit” service to ISP 2 and ISP3
§ The goal of “policy routing” is to support this and other similar requirements
ISP 1
ISP 3 ISP 2
C1
C2 C3
n2
provider
customer peers
80
How does Policy Routing Work ? § Implemented by rules followed by BGP speakers
§ refuse to import or announce some routes § modify the attributes that control which route is
preferred (see later)
§ Example § ISP 1 announces to ISP 3 all networks of C1 § ISP 1 announces to C1 all routes it has learnt from
ISP3 and ISP2 § ISP2 announces “ISP2 n2” to ISP3 and ISP1;
assume that ISP1 announces “ISP1 ISP2 n2” to ISP3.
§ ISP 3 has two routes to n2: “ISP2 n2” and “ISP1 ISP2 n2”; assume that ISP3 prefers “ISP1 ISP2 n2”
§ packets from n3 to n2 are routed via ISP1 – undesired
§ solution: ISP 1 announces to ISP3 only routes to ISP1’s customers (not “ISP1 ISP2 n2”)
ISP 1
ISP 3 ISP 2
C1
C2 C3
n2 n3
provider
customer peers
81
Typical Policy Routing Rules § Provider (ISP1) to customer (C1)
§ announce all routes learnt from other ISPs § import only routes that belong to C1
example: import from IMAG only one route 129.88/16
§ Customer (C1) to Provider (ISP1) § announce all routes that belong to C1 § import all routes
§ Peers (ISP1 to ISP3) § announce only routes to all customers of ISP1 § import only routes to ISP3’s customer § these routes are defined as part of peering
agreement
§ The rules are defined by every AS and implemented in all BGP speakers in one AS
ISP 1
ISP 3 ISP 2
C1
C2 C3
n2 n3
provider
customer peers
82
BGP (Border Gateway Protocol)
§ BGP-4, RFC 1771 § AS border router - BGP speaker
§ peer-to peer relation with another AS border router § connected communication
§ on top of a TCP connection, port 179 (vs. datagram (RIP, OSPF))
§ external connections (E-BGP) § with border routers of different AS
§ internal connections (I-BGP) § with border routers of the same AS
§ BGP only transmits modifications
83
BGP principles
§ Establish BGP session § Update
§ list of destinations reachable via each router § path attributes such as degree of preference for a particular
route
AS x AS y
1.1.1.1 2.2.2.2
n1 n2 n3
n4
n1,n2
n3,n4
84
BGP principles
§ n1 no longer reachable § Incremental update
§ withdraw n1
AS x AS y
1.1.1.1 2.2.2.2
n2 n3 n4
withdraw n1
85
Summary § The network layer transports packets from a sending
host to the receiver host § Internet network layer
§ connectionless § best-effort
§ Main components: § addressing § packet forwarding § routing protocols and routers
§ Hierarchical routing protocols § internal (RIP, OSPF, EIGRP) § external (BGP)
86