1
The Network Layer
• Services:
– Deliver packets between any two hosts, reliably or unreliably.
• A network-wide concern:
– Transport layer (above): between two end hosts.
– Data link layer (below): between two physically connected hosts.
– Network layer: involves each and every host, router, and gateway in the network.
2
Architectural Approaches
• Connectionless - similar to postal system; endpoint puts data to send into a packet and hands to network for delivery
• Connection-oriented - similar to telephone system; endpoints establish and maintain a connection as long as they have data to exchange
3
Connectionless (Datagram) Service
• No connection established
• Source of data adds destination information to data and delivers to network
• Network delivers each data item individually
• No routes set up at connection establishment time - each packet may follow different route to destination (but typically won’t).
• No guarantee of reliable, or in-order delivery (although data link layer may still do link-by-link error control).
• Advantages:– Robust with respect to node / link failures.– Recovery at end to end (transport) level.
• Examples: IP
4
Connection-oriented Service
• One endpoint requests connection from network
• Other endpoint agrees to connection
• Computers exchange data through connection
• Typically uses a “stream” interface
• Source delivers stream of data to network
• Network breaks into packets for delivery
• Data transmission not necessarily continuous; like telephone, connection remains in place while no data transmitted
• One endpoint requests network to break connection when transmission is complete
• Examples: Asynchronous Transfer Mode (ATM), X.25
5
Connection duration and persistence
• Connections can be made on-demand or set up permanently
– Switched connection or switched virtual circuit
– Permanent connection or provisioned virtual circuit
• Permanent connections
– Originally hard-wired
– Now configured at system initialization
• Switched connections
– Computer maintains permanent connection to network
– Network makes connection on demand
6
Virtual circuits
• Virtual: acts like a circuit, but isn’t really one.
• “Reliable” delivery of packets between end hosts.
• All packets within connection follow the same route.
AB C
D
E F
two VCsshare link B-C
7
Virtual circuits (2)
• At connection establishment time:
– Connection setup packet flows from sender to receiver.
– Routing tables updated at intermediate nodes to reflect new virtual circuit (VC).
– Fits well with quality of service (QoS) guarantees: reject call on path if QoS can’t be guaranteed.
– Potential difficulty: recovery from link or router failure.
8
Address and Connection Identifiers
• Asynchronous Transfer Mode (ATM) - 160-bit address, 28-bit connection identifier – Connection identifier
includes: – 12-bit virtual path
identifier (VPI) – 16-bit virtual circuit
identifier (VCI) – Connection identifier local
to each computer – May be different in different
parts of the ATM switch
• Address is a complete, unique identifier
• Connectionless delivery requires address on each packet
• Connection-oriented delivery can use a shorthand that identifies the connection rather than the destination
9
Internetworking
• In the real world, computers are connected by many different technologies
• Internetworking is a scheme for interconnecting multiple networks of dissimilar technologies
• Uses both hardware and software
• Extra hardware positioned between networks
• Software on each attached computer
• System of interconnected networks is called an “internetwork” or an internet
10
Routers
• A router is a hardware component used to interconnect networks
• The router is the main layer 3 building block for large internets.
• A router has interfaces on multiple networks
• Networks can use different technologies
• Router forwards packets between networks
• Transforms packets as necessary to meet standards for each network
11
Internet Architecture
• An internetwork is composed of arbitrarily many networks interconnected by routers
• Routers can have more than two interfaces
12
A virtual network
Net 2
Net 2
Net 3
Net 3
Net 1
Net 1
• Internetworking software builds a single, seamless virtual network out of multiple physical networks
• Universal addressing scheme
• Universal service
• All details of physical networks hidden from users and application programs
13
A virtual network
Net 2
Net 2 router
Physicalnetwork
Net 3
Net 3
Net 1
Net 1
• Internetworking software builds a single, seamless virtual network out of multiple physical networks
• Universal addressing scheme
• Universal service
• All details of physical networks hidden from users and application programs
14
Internetworking Protocols
• TCP/IP is the mostly widely used internetworking protocol suite
– First internetworking protocol suite
– Initially funded through ARPA
– Picked up by NSF
• Others include IPX, VINES, AppleTalk
• TCP/IP is by far the most widely used
– Vendor and platform independent
15
Internet addresses
• One key aspect of virtual network is single, uniform address format
• Cannot use hardware addresses because different technologies have different address formats
• Address format must be independent of any particular hardware address format
• Sending host puts destination internet address in packet
• Destination address can be interpreted by any intermediate router
• Routers examine address and forward packet on to the destination
16
IP addresses
• Addressing in TCP/IP is specified by the Internet Protocol (IP)
• Each host is assigned a 32-bit number
• Called the IP address or Internet address
• Unique across entire Internet
• Each IP address is divided into a prefix and a suffix
• Prefix identifies network to which computer is attached
• Suffix identifies computer within that network
• Address format makes routing efficient
17
Network and Host Numbers
• Every network in a TCP/IP internet is assigned a network number.
• Each host on a specific network is assigned a host number or host address that is unique within that network.
• Host's IP address is the combination of the network number (prefix) and host address (suffix)
• Network numbers must be unique.
• Host addresses may be reused on different networks; combination of network number prefix and host address suffix will be unique.
• Assignment of network numbers must be coordinated globally; assignment of host addresses can be managed locally.
18
IP address format
• IP designers chose 32-bit addresses (see RFC 790)
• Allocate some bits for prefix, some for suffix
– Large prefix, small suffix - many networks, few hosts per network
– Small prefix, large suffix - few networks, many hosts per network
• Because of variety of technologies, need to allow for both large and small networks
• Designers chose a compromise - multiple address formats that allow both large and small prefixes
• Each format is called an address class
• Class of an address is identified by first four bits
19
Dotted Decimal Notation
• 32 bits divided into 4 octets
• Each octet is converted to decimal value
• Dots used to separate the 4 decimal values
• Examples:
32 bit binary number Dotted decimal
10000001 00110100 00000110 00000000 129.52.6.0
11000000 00000101 00110000 00000011 192.5.48.3
10000000 10000000 11111111 00000000 128.128.255.0
20
IP addresses in C/C++
From /usr/include/netinet/in.h
/* Internet address * This definition contains obsolete fields for * compatibility with SunOS 3.x and 4.2bsd. The * presence of subnets renders divisions into fixed * fields misleading at best. New code should use * only the s_addr field. */
struct in_addr {
union {
struct { u_char s_b1,s_b2,s_b3,s_b4; } S_un_b;
struct { u_short s_w1,s_w2; } S_un_w;
u_long S_addr;
} S_un;
#define s_addr S_un.S_addr /* should be used for all code */
};
21
Useful function calls
unsigned long inet_addr( char* cp )
– Converts string with dotted address to 32 bit value
– Example: inet_addr(“129.0.0.1”)socketAddress.sin_addr.s_addr = inet_addr( charIPAddress );
char* inet_ntoa(struct in_addr in)
– Converts 32 bit value of IP address to a string in dotted decimal format.
22
IP Addresses in Java
• Class java.net.InetAddress
static InetAddress getByName(String host)
– Creates new instance of InetAddress based on a string address
– String can either be a dotted decimal IP address (e.g. “129.0.0.1”), or a host name
static InetAddress getByAddress(byte[] address)
– Creates new instance of InetAddress based on bytes containing the 4 values for the IP address
String getHostAddress( )– Returns the IP address as a dotted decimal string
byte[] getAddress( )– Returns the raw IP address as an array of bytes
23
IP Address Classes
Octet 1 Octet 2 Octet 3 Octet 4
0 prefix suffix
10 prefix suffixB
A
110 prefix suffixC
1110 multicastD
1111 reserved for future useE
1.0.0.1 to126.255.255.254
128.0.0.1 to191.255.255.254
192.0.0.1 to223.255.255.254
224.0.0.0 to239.255.255.255
240.0.0.0 to254.255.255.255
Class
24
Special IP addresses
Prefix Suffix Type of address
Purpose
All 0s All 0s This computer Used during rebooting
Network
All 0s Network Identifies a network
Network
All 1s Directed broadcast
Broadcast on specified net
All 1s All 1s Limited broadcast
Broadcast on local net
127 Any Loopback Testing
25
Allocation of IP address classes
Class Bits in prefix
Maximum number of networks
Bits in suffix
Maximum number of hosts / network
A 7 128 24 16777216
B 14 16384 16 65536
C 21 2097152 8 256
26
CIDR addresses
• CIDR = Classless Internet Domain Routing
• Created to allow more flexibility in subnet sizes; in particular, different values between 256 and 65536
• Notation: IP address / # bits in prefix
• Usage:
– Set up 32 bit mask with indicated number of 1 bits followed by 0 bits
– Logical AND with mask and IP address to get network prefix
27
CIDR Example
• Example: allocate 2 sub-networks that can hold 14 hosts each
• Prefix calculated by logical AND:
• Network 1: 128.211.0.16 / 28 ← 28 bits in prefix
• Network 2: 128.211.0.32 / 28
• Mask is: 11111111 11111111 11111111 11110000
• Net 1: 10000000 11010011 00000000 0001––––
– Allows IP addresses 128.211.0.17 through 128.211.30, since suffix cannot be all 0s or all 1s.
• Net 2: 10000000 11010011 00000000 0010––––
28
Routers and IP addressing
• IP address depends on network address
• What about routers - connected to two networks?
• IP address specifies an interface, or network attachment point, not a computer
• Router has multiple IP addresses - one for each interface
Token Ring223.240.129.0
Ethernet 131.108.0.0
WAN 76.0.0.0WAN 76.0.0.0
131.108.99.5
223.240.129.2
223.240.129.17
76.0.0.17
29
IP – Internet Protocol
Version IHL Service type Total length
Identification
Time to live Protocol Header Checksum
Flags Fragment offset
Source address
Destination address
Options
0 4 8 16 19 31
Data: up to 65,516 octets
Bits
Maximum packet size: 65,536 octets
30
IP protocol fields
• Definition: RFC 791, plus subsequent additions
• Version: version number of protocol (currently 4; version 6 also standardized)
• Internet Header Length (IHL): number of 32-bit words in header
– Minimum value: 5 (which indicates no options)
– Larger values used when options are present.
31
IP protocol fields
• Type of service:
– Specifies, precedence (bits 0-2), delay (bit 3), throughput (bit 4), reliability (bit 5) parameters
– 0 bit = normal, 1 bit = exceptional
• Total length: length of packet in octets
• Identification: sequence number
• Flags (3):
– More: indicates packet is a fragment, with more to come
– Don’t fragment: prohibits fragmentation
– (Reserved for future use)
32
IP Protocol Fields
• Fragment offset: Indicates where in original datagram, measured in 64-bit units– Note that this requires fragmentation happen at 64-bit
boundaries (except for last fragment)
• Time to live: specifies, in seconds, time remaining before this packet expires– Every router must decrease this value by at least one.
• Protocol: indicates protocol at next higher level– Current list:
http://www.iana.org/assignments/protocol-numbers– Examples
– 1: ICMP Internet Control Message Protocol
– 6: TCP Transmission Control Protocol– 17: UDP User Datagram Protocol
33
IP Protocol Fields
• Header checksum:
– 16 bit ones-complement addition of all 16 bit words in the header
– Set to zero before computation
– Re-computed at each router
– Some fields, such as time-to-live will change as message travels through network
• Source address: 32 bit IP address
• Destination address: 32 bit IP address
34
IP options
• Defined in RFC 791 and others
• Examples:
– Secure packet
– Routing information provided
– Record route
– Record time stamps
– Stream identifier
35
IP upper level interface
where:
– src = source address
– dst = destination address
– prot = protocol
– TOS = type of service
– TTL = time to live
– BufPTR = buffer pointer
– len = length of buffer
– Id = Identifier
– DF = Don't Fragment
– opt = option data
• Two service primitives: send and receive (recv)
Result = SEND(src,dst,prot,TOS,TTL,BufPTR,en,Id,DF,opt)
Result = RECV(BufPTR,prot,&src,&dst,&TOS,&len,&opt)
36
Internet Control Message Protocol (ICMP)
• Defined in RFC 792, plus updates
• Required for internet compliance
• Carried in IP packets
• ICMP messages often sent as a reply to IP packet
Type Code Checksum
Parameters
0 4 8 16 31
Message content: variable length
Bits
37
ICMP message types
8: Echo
0: Echo reply– Asks for return of this message for testing– Parameters: identifier, sequence number
3: Destination unreachable– Code indicates particular condition:
0: net unreachable1: host unreachable2: protocol unreachable3: port unreachable4: fragmentation required; don’t fragment flag set5: source route failure
– Data: original IP header, plus first 64 bits of data
38
ICMP message types
4: Source quench– Request to slow sending rate of IP packets– Data: as in destination unreachable
5: Redirect– Used to indicate a shorter routing path – Parameters: IP address of suggested router
11: Time exceeded– Time to live counter of IP packet reached zero– Data: as in destination unreachable
12: Parameter problem– Indicates problems with an IP message (usually bad
option format)– Data: as in destination unreachable
39
ICMP message types
13: Timestamp
– Sends message that records sending time, and asks for reply
– Data: sending time, reception time (to be filled in), reply sending time (to be filled in)
14: Timestamp reply
– Reply to timestamp request
– Data: values filled in from ICMP 13 message
17: Address mask request
– Host asks router on LAN for CIDR address mask (usually at reboot)
18: Address mask reply
– Reply to address mask request
– Data: the address mask
40
Network administration functionsthat use ICMP
• Ping: test if a host will respond
– Sends an ICMP echo message to designated host
– Host sends ICMP echo reply
– Used to test connectivity
– Many organizations have disabled ping to prevent denial-of-service attacks
• Traceroute: find route from source to destination
– Sends IP packet with time-to-live of 1
– First router will discard packet and send ICMP time exceeded message
– Next message sent has time-to-live of 2, and so on until destination is reached
– Each router en route will have sent an ICMP message
41
Mapping IP addresses
• Problem: How to map IP addresses onto hardware?
– Address resolution
• Where this takes place: router attached to physical network.
• Three methods used to resolve addresses:
– Table lookup
– “Computation”
– Message exchange
42
Resolution using Table Lookup
• Router keeps table.
• The following could be a table for network 197.15.3.0 / 24
• To save space and time, only the host value of the IP address would be stored.
IP address (32 bits) Hardware address (48 bits)
197.15.3.2 0A:07:4B:12:82:36
197.15.3.3 0A:9C:28:71:32:8D
197.15.3.4 0A:11:C3:68:01:99
197.15.3.5 0A:74:59:32:CC:1F
197.15.3.6 0A:04:BC:00:03:28
197.15.3.7 0A:77:81:0E:52:FA
43
Resolution using Computation
• If hardware addresses are configurable, they can be assigned to correspond with the host part of their IP address
– Example:
– host with IP address 229.123.1.1 is assigned hardware address 1;
– host with IP address 229.123.1.2 is assigned hardware address 2;
– … and so on.
• Computation: logical AND with value 000000FF.hardware_address = ip_address & 0xff
44
Resolution using Message Exchange
• Example: Ethernet Address Resolution Protocol (ARP)
– See RFC 826
• Router sends broadcast ARP message to LAN to query hosts as to who matches the IP address
– Only the host with the matching IP address replies directly to router
– Router then has hardware address
45
ARP message format
• There is a generic format in RFC 826
• The following is specific for Ethernet: 32 bit protocol (P) addresses and 48 bit hardware (H) addresses
Sender’s P. address pt. 2
0 8 16 31
Target protocol address
Bits
H. addr. length P. addr. length Operation
Target hardware address, part 2
Target H. address pt. 1
Sender’s H. address pt. 2 Sender’s P. address pt. 1
Sender’s hardware address, part 1
Protocol address type: 0800Hardware address type: 0001
46
Transmission of ARP messages
Ethernet frame
ARP packet
Preamble data CRCSourceAddr.
Dest.Addr.
7 46 – 1500 46 6 2
SFD
1 octets
octets
PaddingARP
octets1828
0806
Frametype
47
IP Fragmentation and Reassembly
• Construction of an IP packet requires obeying maximum frame sizes at each data link layer
– MTU: maximum transmission unit
– Example: IP packet carried inside an Ethernet frame (see next slide) can have, at most, 1478 octets of user data + 20 octets of IP header = 1498
• RFC 791 says any part of the internet must have an MTU 68 octets
– Any host must be able to receive 576 octets (possibly in fragments)
• If the IP “don’t fragment” flag is set, and there is more data than the MTU allows, a router will trash the IP packet and send an ICMP message (more on this later).
• Otherwise, router has to separate user data into fragments of allowable size.
• Fragmentation can be done at any router; reassembly is only done at final destination.
48
Example of MTU: Ethernet frames
Ethernet frame
IP Packet
Preamble data CRCSourceAddr.
Dest.Addr.
7 46 – 1500 46 6 2
SFD
1
1500 ( = MTU)
octets
octets
Layer 4 data
octets
0800
Frametype
SourceAddr.
Dest.Addr.
44
IP
12 24 – 1480
49
Example of Fragmented Data
User data: 2276 octets
TL=816, FO=185, more=0 User data: 796 octets
TL=1500, FO=0, more=1 User data: 1480 octets
20
20
TL = total length, FO = frame offset (in 8-octet/64-bit units)
With an MTU of 1500, this could be sent as:
50
IP Fragmentation
• The frame offset is used instead of a “fragment sequence number” because this allows for further fragmentation at a subsequent router
TL=816, FO=185, more=0 796TL=1500, FO=0, more=1 1480
TL=700, FO=100, more=1 680
TL=820, FO=0, more=1 800
TL=816, FO=185, more=0 796
MTU = 820:
51
Reassembly
• Reassembly is only done at the destination
– i.e. host with IP address in destination field
• Fragments are reassembled based on matching source address, destination address, identification field (sequence number), and protocol
• A reassembly timer is often used as the holding time for resources while waiting for all fragments
– Timer started when first fragment arrives.
– Timer cancelled when contiguous data from frame offset 0, to a fragment where the ‘more’ flag is 0 has arrived.
– If timer expires, buffer is released and fragments are trashed (and ICMP “time exceeded” message returned).
• Alternative: use ‘Time to live’ field of first fragment
52
IP Version 6 (IPv6)
• Defined in RFC 2460 and others
• Enhancements:
– 128 bit addresses
– Revised (incompatible) base header format
– Extension headers used for additional information
– Support for Quality of Service specification
– Extensibility
– Modifications to accommodate faster routing
53
IPv6 addresses
• IPv4 addresses have first 96 bits as 0 in IPv6
• New shorthand notation: colon hexadecimal
105.220.136.100.255.255.255.255.0.0.18.128.140.10.255.255
becomes
69DC:8864:FFFF:FFFF:0:1280:8C0A:FFFF
FFOC:0:0:0:0:0:0:0:B1
becomes
FFOC : : B1
• In IPv6, an IP address is assigned to an interface, not a node– One device can have 2 or more IPv6 addresses on the same
network– Intended to speed routing of packets
– Example: one address could be the “higher priority” interface.
54
IPv6 multiple headers
• Each extension header will identify its own length, as well as the type of extension header (“next header”) or data that follows.
IPv6 base dataExtension 1
40 octets
Extension N…
optional
55
IPv6 Base Header
Version Traffic class Flow label
Payload length
Source address
0 4 12 16 24 31Bits
Next header Hop limit
Destination address
56
IPv6 base header fields (1)
• Version: 6
• Traffic class:
– Available for establishing classes or priorities for packet handling
– First 6 bits: differentiated services field
– Last 2 bits: reserved for congestion notification (not yet standardized)
• Flow label: identifier for a sequence of packets from a single source, and with similar transmission requirements
– Example: one flow could identify a specific video transmission
57
IPv6 base header fields (2)
• Payload length (in octets):
– Length of all extension headers plus upper layer data
– Does not include the fixed header.
• Next header: identifies type of header following this header
– Could indicate upper level protocol, or IPv6 extension header
– Values are the protocol numbers defined in: http://www.iana.org/assignments/protocol-numbers
58
IPv6 base header fields (3)
• Hop limit: after visiting this many routers, packet will be discarded.
• Source, destination addresses
– Destination address may not be packet’s ultimate destination
– Available modes:
– Unicast: single destination
– Anycast: choose one destination from a list
– Multicast: specific group of destinations
– Broadcast: to everyone
59
Extension headers
• Recommended order of appearance:– IPv6 base (required) – Hop-by-hop options (next header = 0)– Destination options (next header = 60)
– To be processed by first destination in IPv6 header, plus destinations in routing header.
– Routing header (next header = 43)– Fragmentation header (next = 44)– Authentication (next header = 51)– Security / Encapsulation (next header = 50)– Destination options (next header = 60)
– For packet’s final destination– Upper layer protocol (next header = 6 for TCP, 17 for UDP,
58 for ICMPv6, 41 for IPv6 inside IPv6)
60
Hop-by-Hop Options
• “Jumbo payload”: packet is larger than 65,535 octets
– Payload length in fixed header must be zero
– No fragment header
• “Router alert”: information should be examined by each router along the way
– Example: using a protocol such as the Resource reSerVation Protocol (RSVP) to set up quality of service parameters.
61
Fragmentation in IPv6
• An extension header, the “fragment header” contains the fragmentation information not contained in the base header
• All fragmentation in IPv6 must be done by original sender
– This means that the sender has to discover the minimum MTU for the entire transmission.
– Find MTU by sending decreasingly larger ICMP “echo” messages with “don’t fragment” set, until an ICMP “echo reply” is returned instead of “destination unreachable”
– IPv6 has the rule that networks must have an MTU 1280 octets
62
Authentication Codes
• Message Authentication Code (MAC):
– carried in authentication header.
• Assume that sender A and receiver B have a shared secret key, KAB.
• MAC = f(KAB, M), where f is a mutually-agreed encryption function
• Receiving the correct MAC means:
– receiver knows that message is not altered.
– message is from correct sender
– sequence of message is correct
63
Congestion
• Congestion occurs when the number of packets being transmitted through the network approaches the packet handling capacity of the network
• Congestion control aims to keep number of packets below level at which performance falls off dramatically
• Data network is a network of queues
• Generally 80% utilization is critical
• Finite queues mean data may be lost
65
Router Packet Handling
• Packets arriving are stored at input buffers
• Routing decision made
• Packet moves to output buffer
• Packets queued for output transmitted as fast as possible
• If packets arrive to fast to be routed, or to be output, buffers will fill and overflow.
– Can discard packets
– Can use flow control– Can propagate congestion through
network
66
Congestion Principles
• Usually occurs at a point of transition to reduced throughput.
• Occurs when the higher capacity part of a system is currently carrying more traffic than the lower capacity part can handle.
• Difference from flow control:
– Flow control is one sender agreeing not to overflow one receiver at the endpoints of a transmission
– Congestion is usually caused by multiple senders, and occurs at an intermediate point in the network
– This makes congestion more difficult to detect, and to alleviate.
67
Implicit Congestion Detection
• What are the signs of congestion?
– Increased transmission time
– Packets spend more time in queues that are longer: delay increases
– Disappearance of packets
– On a fibre-based network (or ones with data link error control), disappearance of packets can be interpreted as a sign of congestion.
– Sending timers (at transport layer) start expiring.
69
Idealized Performance
• Network can accept load up to its capacity
• Additional load will be delivered at capacity throughput rates.
– Packets are queued up at intermediate points
70
Idealized Performance: Throughput
0
0.2
0.4
0.6
0.8
1
1.2
0 0.5 1 1.5 2
Normalized load
No
rmal
ized
th
rou
gh
pu
t
73
Practical Performance
• Ideal assumes infinite buffers and no overhead
• Buffers are finite
• Overheads occur in exchanging congestion control messages
74
The Congestion Control Paradox
• When congestion occurs, the problem is that there are too many packets in the network
• If packets are trashed, senders will likely resend them, along with new packets.
– Result: increased congestion
• If one node sends out messages to announce it is congested, then it increases the number of extra overhead packets in the network.
– Result: increased congestion
• If one node asks its neighbours to slow down, then the output queues of the neighbouring nodes will start filling up.
– Result: increased congestion
75
Congestion Control
• Implicit
– No action taken
– It is assumed senders will notice evidence of congestion and deal with it themselves.
– What can senders do?– Slow rate of packet sending– Increase timeout length for sent packets
• Explicit
– Various mechanisms to announce or alleviate congestion, taken by intermediate network notes.
76
Implicit Congestion Signaling
• Transmission delay may increase with congestion
• Packet may be discarded
• Source can detect these as implicit indications of congestion
• Useful on connectionless (datagram) networks
– Example: IP leaves congestion (and flow) control to upper layer (normally TCP).
• Used in frame relay LAPF
77
Explicit Congestion Signaling
• Network alerts end systems of increasing congestion
• End systems take steps to reduce offered load
• Backwards
– Congestion avoidance in opposite direction to packet required
• Forwards
– Congestion avoidance in same direction as packet required
78
Backpressure
• If node becomes congested it can slow down or halt flow of packets from other nodes
• May mean that other nodes have to apply control on incoming packet rates
• Propagates back to source
• Can restrict to logical connections generating most traffic
• Used in connection oriented that allow hop by hop congestion control (e.g. X.25)
• Not used in ATM nor frame relay
• Only recently developed for IP
79
Choke Packet
• Control packet
– Generated at congested node
– Sent to source node
– e.g. ICMP source quench
– From router or destination
– Source cuts back until no more source quench message
– Sent for every discarded packet, or anticipated
• Rather crude mechanism
80
Categories of Explicit Signaling
• Binary
– A bit set in a packet indicates congestion
• Credit based
– Indicates how many packets source may send
– Common for end to end flow control
• Rate based
– Supply explicit data rate limit
– e.g. ATM
81
TCP Slow Start
0
4096
8192
12288
16384
20480
24576
28672
32768
36864
40960
45056
Transmission Number
Co
ng
esti
on
Win
do
w (
byt
es) Threshold 1
Threshold 2
Timeout
82
Rate-based Congestion Control
• Regulate rate at which sender can inject packets into network:
• A packet must match up with (and remove) a token before entering network.
• Tokens added to bucket at rate r.
• At most b tokens can accumulate in bucket; tokens overflow and are lost after that– Bucket size b controls “burstiness”
• Max. number of packets entering network in [ t, t + δ ] is b + δr
tokens arriveat fixed rate
“bucket” of tokens
to network
storage for upto b tokens
packet waiting area
83
Congestion Control in Packet Switched Networks
• Send control packet to some or all source nodes
– Requires additional traffic during congestion
• Rely on routing information
– May react too quickly
• End to end probe packets
– Adds to overhead
• Add congestion info to packets as they cross nodes
– Either backwards or forwards
84
Traffic Management
• Fairness
• Quality of service
– May want different treatment for different connections
– What is more critical: delay or loss?
• Reservations
– e.g. ATM (Asynchronous Transfer Mode)
– Traffic contract between user and network
85
Case Study: ATM Traffic Management
• ATM standards specify several service categories
• Network traffic is managed to achieve Quality of Service (QoS) goals
• For each of the service categories (on subsequent slides):
– What is the highest priority for QoS?– Delay– Loss
– What would be a congestion control / avoidance strategy?
86
ATM Service Categories
• Real time
– Constant bit rate (CBR)
– Real time variable bit rate (rt-VBR)
• Non-real time
– Non-real time variable bit rate (nrt-VBR)
– Available bit rate (ABR)
– Unspecified bit rate (UBR)
88
CBR: Constant Bit Rate
• Fixed data rate continuously available
• Tight upper bound on delay
• Uncompressed audio and video
– Video conferencing
– Interactive audio
– Audio / video distribution and retrieval
89
rt-VBR: Real-time Variable Bit Rate
• Time sensitive application
– Tightly constrained delay and delay variation
• rt-VBR applications transmit at a rate that varies with time
• Example: compressed video
– Produces varying sized image frames
– Original (uncompressed) frame rate constant
– So compressed data rate varies
• Can statistically multiplex connections
90
nrt-VBR: Non-real-time Variable Bit Rate
• May be able to characterize expected traffic flow
• Improve Quality of Service (QoS) in loss and delay
• End system specifies:
– Peak cell rate
– Sustainable or average rate
– Measure of how bursty traffic is
• e.g. Airline reservations, banking transactions
91
UBR: Unspecified Bit Rate
• May be additional capacity over and above that used by CBR and VBR traffic
– Not all resources dedicated
– Bursty nature of VBR
• For application that can tolerate some cell loss or variable delays
– e.g. TCP based traffic
• Cells forwarded on FIFO basis
• Best efforts service
92
ABR: Available Bit Rate
• Application specifies peak cell rate (PCR) and minimum cell rate (MCR)
• Resources allocated to give at least MCR
• Spare capacity shared among all ABR sources
• e.g. LAN interconnection
93
Asynchronous Transfer Mode (ATM)
• Properties of ATM:
– Small, fixed-sized packets, called “cells”
– ATM networks are connection-oriented: a connection must be set up at the start of a call
– Set up a “virtual path” (VP) on a “virtual channel” (VC)
– Subsequent cells will follow the same route to destination
– Control signaling on separate channel from user data
– Cell delivery is not guaranteed, but cell order is preserved
– Traffic management is taken into account when setting up a connection.
– High speed: data rates up to 622.08 Mbits / s
94
ATM Reference Model
Plane management
Layer management
Control plane User plane
ATM layer
ATM adaptation layer
Physical layer
Upper layer Upper layer
ATM layer
• ATM layer is approximately equivalent to the OSI network layer
95
Reference Model Layers
• Physical layer:
– Handles equivalent of OSI physical and data link layers
• ATM layer
– Deals with cells, and cell transport
– Defines cell layout, and header fields
– Establishment and release of virtual circuits
– Congestion control
• AAL: ATM adaptation layer
– Provides for transmission of packets larger than a cell.
– Various AAL protocols deal with different ATM service categories (CBR, etc.)
96
Reference Model Planes
• User plane
– Provides for user information transfer
• Control plane
– Call and connection control
• Management plane
– Plane management– whole system functions
– Layer management– Resources and parameters in protocol
entities
97
ATM Connection Setup
• Performed in control plane: VP0, VC5
• ITU protocol Q.2931
setupsetup
setupcall proceeding
connectcall proceeding
connectconnect
connect ackconnect ack
connect ack
releaserelease
releaserelease completerelease complete
release complete
98
ATM Cells
• Fixed size: 53 octets
– 5 octet header
– 48 octet information field
• Small cells reduce queuing delay for high priority cells
• Small cells can be switched more efficiently
• Easier to implement switching of small cells in hardware
99
ATM Cell Format
• Ordered transmission of 53 octet cells
• 5 octet header identifies virtual path, virtual channel , which together comprise a “connection identifier”
VPI: virtual path identifier - used for routing
VCI: virtual channel identifier - identifies transmissions within
PTI: payload type
CLP: cell loss priority
HEC: header error check
VPI HECCLPPTIVCI
12 16 3 1 8bits
upper level data
384 (= 48 octets)
100
User – Network Interface (UNI) cell
• First 4 bits of virtual path identifier used as a flow control field for a cell entering the network
• Will be overwritten by first router
GFC: generic flow control
VPI HECCLPPTIVCI
8 16 3 1 8bits
upper level data
384 (= 48 octets)
GFC
4
101
ATM payload type field
• Three bits:
0 0 0: User data cell type 0, no congestion
0 0 1: User data cell type 1, no congestion
0 1 0: User data cell type 0, congestion
0 1 1: User data cell type 1, congestion
1 0 0: Operation / administration / maintenance (OAM) message, this hop
1 0 1: OAM message, end to end
1 1 0: Resource management cell
1 1 1: Reserved for future use
102
ATM Traffic Management
• High speed, small cell size, limited overhead bits
• Still evolving
• Requirements
– Majority of traffic not amenable to flow control
– Feedback slow due to reduced transmission time compared with propagation delay
– Wide range of application demands
– Different traffic patterns
– Different network services
– High speed switching and transmission increases volatility
103
Latency/Speed Effects
• ATM 622.08 Mbps
• ~6.8x10-7 seconds to insert single cell
• Time to traverse network depends on propagation delay, switching delay
• Assume propagation at two-thirds speed of light
• If source and destination on opposite sides of Canada, propagation time ~ 2.75x10-2 seconds
• Given implicit congestion control, by the time dropped cell notification has reached source, 1.7x107 bits have been transmitted
• So, this is not a good strategy for ATM
104
Cell Delay Variation
• For ATM voice/video, data is a stream of cells
• Delay across network must be short
• Rate of delivery must be constant
• There will always be some variation in transit
• Delay cell delivery to application so that constant bit rate can be maintained to application
105
Network Contribution to Cell Delay Variation
• Packet switched networks in general
– Queuing delays
– Routing decision time
• ATM
– ATM protocol designed to minimize processing overheads at switches
– ATM switches have very high throughput
– Only noticeable delay is from congestion
– Must not accept load that causes congestion
106
Cell Delay Variation At The User-Network Interface
• Application produces data at fixed rate
• Processing at three layers of ATM causes delay
– Interleaving cells from different connections
– Operation and maintenance cell interleaving
– If using synchronous digital hierarchy frames, these are inserted at physical layer
– Can not predict these delays
107
Traffic and Congestion Control Framework
• ATM layer traffic and congestion control should support QoS classes for all foreseeable network services
• Should not rely on AAL protocols that are network specific, nor higher level application specific protocols
• Should minimize network and end to end system complexity
108
Timings Considered
• Cell insertion time
• Round trip propagation time
• Connection duration
• Long term
• Determine whether a given new connection can be accommodated
• Agree performance parameters with subscriber
109
Traffic Management and Congestion Control Techniques
• Resource management using virtual paths
• Connection admission control
• Usage parameter control
• Selective cell discard
• Traffic shaping
– Use the token bucket scheme for rate-based congestion control.
110
Resource Management Using Virtual Paths
• Separate traffic flow according to service characteristics
• User to user application
• User to network application
• Network to network application
• Concern with:
– Cell loss ratio
– Cell transfer delay
– Cell delay variation
111
Connection Admission Control
• First line of defense
• User specifies traffic characteristics for new connection by selecting a QoS
• Network accepts connection only if it can meet the demand
• Traffic contract
– Peak cell rate
– Cell delay variation
– Sustainable cell rate
– Burst tolerance
112
Usage Parameter Control
• Protection of network resources from overload by one connection
• Monitor connection to ensure traffic conforms to contract
– Monitor peak cell rate
– Measure cell delay variation
– Determine average cell rate
– Track burst sizes
• Discard cells that do not conform to traffic contract
– Called traffic policing
113
ATM-ABR Traffic Management
• Some applications (Web, file transfer) do not have well defined traffic characteristics
• Best efforts
– Allow these applications to share unused capacity
– If congestion builds, cells are dropped
• Closed loop control
– ABR connections share available capacity
– Share varies between minimum cell rate (MCR) and peak cell rate (PCR)
– ARB flow limited to available capacity by feedback
– Buffers absorb excess traffic during feedback delay
– Low cell loss
114
Feedback Mechanisms
• Transmission rate characteristics:
– Allowed cell rate
– Minimum cell rate
– Peak cell rate
– Initial cell rate
• Start with ACR=ICR
• Adjust ACR based on feedback from network
– Resource management cells– Congestion indication bit– No increase bit– Explicit cell rate field
115
Routers
• The main function of a router is to decide how best to forward packets, based on its network address.
• Action: look up identifier in a routing table, and forward packets to appropriate outgoing link, or to upper layer if applicable.
A
CBD
116
Properties Desired for Routing
• Correctness: send packet “closer” to destination
• Simplicity: less error-prone, faster
• Robustness: ability to react to changes
• Stability: routing algorithms should converge to a stable state
• Fairness: guarantee that packets are not held up indefinitely
• Performance: speed, throughput
• Scalability: can deal with ever-increasing number of network nodes
• Security: filtering of malicious activity
117
Performance Criteria
• Used for selection of route
• Criterion is used to measure the “least cost” route
• Cost could be…
– Number of hops
– $ price of link
– Delay time
– Suitability for QoS requirements
119
Routing Decision Time and Place
• Time
– Datagram service: on arrival of each packet
– Virtual circuit service: at connection setup
• Place
– Distributed
– Made by each node
– Centralized
– Source
– Initial sender specifies route (e.g. IP option)
120
Network Information Sourceand Update Timing
• Routing decisions usually (but not always!) based on knowledge of network
• Distributed routing– Nodes use local knowledge– May collect information from adjacent nodes– May collect information from all nodes on a potential
route
• Central routing– Collect information from all nodes
• Update timing– When is network info held by nodes updated?
– Fixed routing – requires human intervention– Adaptive - regular updates
122
Fixed Routing
• Single permanent route for each source to destination pair
• Determine routes using a least cost algorithm
• Route fixed, at least until a change in network topology
124
Central Routing Table
From: To: 1 2 3 4 5 6
1 – 2 3 4 4 4
2 1 – 3 4 4 4
3 1 5 – 5 5 5
4 2 2 5 – 5 5
5 4 2 3 4 – 6
6 5 5 5 5 5 –
125
Local Routing Tables
12456
15555
12356
2
55
2
5
12345
5
55
55
23456
23444
13456
13444
1 2 3 4
4
34
2
6
1
34
2
6
5 6
261
126
Flooding
• No network info required
• Packet sent by node to every neighbor
• Incoming packets retransmitted on every link except incoming link
• Eventually a number of copies will arrive at destination
• Each packet is uniquely numbered so duplicates can be discarded
• Nodes can remember packets already forwarded to keep network load in bounds
• Can include a hop count in packets
127
Flooding Example
1
2 3
4 5
6
11
1
1,2
1,2
1,4
1,4
1,4
1,3
1,31,3
1,3
1,3,4,5
2,3,4
2,3,4
2,3,4
1,3,6
3 2
2,3 3,5
4,5
4
3
2,44
2,42
128
• Once more, but with routing tables…
– Assume packets carry a hop count for each node.
• Note: due to space limitations, the routing table for node 4 will not appear.
129
1
2 3
4 5
6
11
1
1,2
1,2
1,4
1,4
1,4
1,3
1,31,3
1,3
1,3,4,5
2,3,4
2,3,4
2,3,4
1,3,6
3 2
2,3 3,5
4,5
4
3
2,44
2,42
23456
13456
12456
12345
12346
1 134
11
1 124
11
5 1
3
34
2
11
3
6
2
1
3
3
2
13
35
2
21
234
111
130
Properties of Flooding
• All possible routes are tried
– Very robust
• At least one packet will have taken minimum hop count route
– Can be used to set up virtual circuit
• All nodes are visited
– Useful to distribute information
131
Random Routing
• Node selects one outgoing path for retransmission of incoming packet
• Selection can be random or round robin
• Can select outgoing path based on probability calculation
• No network info needed
• Route is typically not least cost nor minimum hop
132
Adaptive Routing
• Used by almost all packet switching networks
• Routing decisions change as conditions on the network change
– Failure
– Congestion
• Requires info about network
• Decisions more complex
• Tradeoff between quality of network info and overhead
– Reacting too quickly can cause oscillation
– Reacting too slowly to be relevant
133
Adaptive Routing
• Two factors used to make decision:
– Sending the packet in “generally” the right direction.
– Minimizing congestion
• Instead of having one entry in routing table for a destination, keep a list of alternative links.
• Each alternative has a bias factor Bi that indicates the preference for correct routing.
– Lowest bias factor implies “shortest” route to destination.
• Route packets based on the combination of the current outgoing queue length Qi for a particular link, and the bias factor.
– That is, minimize Qi + Bi over the set of alternatives.
134
Classification
• Based on information sources
– Local (isolated)
– Route to outgoing link with shortest queue
– Can include bias for each destination
– Rarely used - does not take advantage of easily available information about other nodes.
– Adjacent nodes
– All nodes
135
Local Adaptive Routing Example
To 1
To 2
To 3
To 5
1235
9630
Bias fordestination 6
Result: Chooselink to 3, since sumof bias and queuelength is 4
136
ARPANET Routing Strategies(1)
• First Generation (1969)
– Distributed adaptive
– Estimated delay as performance criterion (“cost”)
– Use modified Bellman-Ford algorithm (1962)
– Node exchanges delay vector with neighbors every 128 ms
– Update routing table based on incoming info
– Does not consider link speed, just queue length
– Queue length not a good measurement of delay
– Responds slowly to congestion
137
Bellman-Ford Algorithm
• Determines shortest paths from a source node s to all other nodes.
• For all nodes, keep the current best known shortest path– Initialize to 0 for the source and +∞ for all other nodes
• Algorithm proceeds by hop count from source node– Start with hop count of 0.
• Keep a set of edges E which have been examined.– Start with an empty set
• Repeat until E includes all edges: – Add one to current hop count– Add all edges that can be reached in this hop count to E.– For each edge added, if cost of edge to node is lower than current
minimum, replace current minimum.– Update the current best known shortest paths to all nodes, based
on inclusion of this edge.
138
Example: Bellman-Ford Algorithm
23456
∞
∞
1
2 3
4 5
6
23
3 11 2
4
8
5
1
1
1
7
23
6
35
2
8
1
2 3
4 5
6
∞∞2
2
1∞543
104
140
Distance (Cost) Vector Routing
• Localized version of Bellman-Ford algorithm
• Router receives information from neighbours, and chooses the best option from information received.
• Updates corresponds to stages in global algorithm:
– As router finds out about more destinations, new entries added.
R1 R2 R3
destination - costA - 1B - 2C - 2D - 6
destination - costA - 3B - 1E - 1F - 4
destination - costA - 2 via R1B - 2 via R3C - 3 via R1 D - 7 via R1E - 2 via R3F - 5 via R3
141
ARPANET Routing Strategies(2)
• Second Generation (1979)
– Uses delay as performance criterion– Delay measured directly
– Computed every 10 s by time-stamping packets.
– Significant changes passed on via flooding
– Uses Dijkstra’s algorithm (1959)
– Good under light and medium loads
– Under heavy loads, little correlation between reported delays and those experienced
– Why? Routers all recompute routing tables at same time, and could all switch from a heavily loaded link to a lightly loaded link – which just moves congestion elsewhere.
142
Dijkstra’s Algorithm
• Determines shortest paths from a source node s to all other nodes.
• For all nodes, keep the current best known shortest path
– Initialize to 0 for the source and +∞ for all other nodes
• Keep a set of nodes N for which the shortest path is known.
– Initialize this set to {s}.
• Repeat until N includes all nodes:
– For each node not in N, what would be the shortest path from s to the node by taking, as the last hop, an edge from a node in N?
– Whichever node results in the minimum shortest path, add that node to N.
– Update the current best known shortest paths to all nodes, based on inclusion of the new node.
143
Example: Dijkstra’s Algorithm
23456
∞
∞
1
2 3
4 5
6
23
3 11 2
4
8
5
1
1
1
7
23
6
35
2
8
1
2 3
4 5
6
∞∞2
2
1∞5432
2
3314
5
2 223
4446
145
Distance (Cost) Vector Routing
• Localized version of Bellman-Ford algorithm
• Router receives information from neighbours, and chooses the best option from information received.
• Updates corresponds to stages in global algorithm:
– As router finds out about more destinations, new entries added.
R1 R2 R3
destination - costA - 1B - 2C - 2D - 6
destination - costA - 3B - 1E - 1F - 4
destination - costA - 2 via R1B - 2 via R3C - 3 via R1 D - 7 via R1E - 2 via R3F - 5 via R3 276
146
ARPANET Routing Strategies(3)
• Third Generation (1987)
– Link cost calculations changed– Measure average delay over last 10 seconds– Convert to utilization (0 ≤ U ≤ 1):
where Ts is the “service time” and T is the measured delay.
– Service time is average packet size (600 often used) divided by the speed of the data link.
– Normalize average utilization AU based on current value U and previous average:
AU′ = 0.5 AU + 0.5 U
TT
TTU
s
s
2
2
147
ARPANET Routing Strategies(3)
– Cost =
1, if AU ≤ 0.5
1 + 4(AU – 0.5), if AU > 0.5
– Special cost for satellite link =
2, if AU ≤ 0.75
2 + 4(AU – 0.75), if AU > 0.75
– Cost is in range 1 to 3.
– Maximum penalty for avoiding a congested link or node is 2 extra hops.
148
Routing Protocols
• Two types:– Interior: used within an “autonomous system” (AS)– Exterior: used between differing autonomous
systems.
• An “autonomous system” (RFC 1930) consists of routers (and networks) that: – Use a common routing protocol– Are managed by the same organization– Are connected (except when failures occur)
• Autonomous systems are identified by AS numbers– Assigned by IANA (Internet Authority for Assigned
Numbers) (www.iana.org)– In North America, IANA delegates to the American
Registry for Internet Numbers (ARIN) (www.arin.net)
149
Internetworking of Autonomous Systems
N1.2N1.2
N1.3N1.3
N1.4N1.4
N1.1N1.1
N2.1N2.1
N2.2N2.2
N2.3N2.3
N2.4N2.4
R3 R2
R7
R6
R8
R5
R1R4
AS 1
AS 2OSPFBGP
Physical link
150
Interior versus Exterior Routing
• Interior routing
– Typical situation: corporate network, ISP
– Usual protocol: Open Shortest Path First (OSPF) version 2 [RFC 2328]
– Needs detailed picture of network
– Least cost is the important factor
• Exterior routing
– Typical situation: connections between ISPs
– Usual protocol: Border Gateway Protocol (BGP) version 4 [RFC 1771]
– Less detailed information exchanged
– Reachability is the important factor
151
Exterior Routing with BGP
• Messages sent via TCP connection (BGP inside TCP inside IP)
• Procedures:
1. Neighbour acquisition– A neighbour is another router on the same
(physical) network but is part of a different autonomous system
– Routers agree to regular exchange of information.
2. Neighbour reachability– Maintaining the relationship with status updates
3. Network reachability– Keeping a data base of networks that can be
reached, and the preferred route to reach each network.
152
BGP Messages
• Open
– Begin a neighbour relationship with a new router
• Update
– Announce a new single route, or the deletion of one or more routes
• Keepalive
– Sent periodically to confirm router is still active and maintains the neighbour relationship
– Also acknowledges an Open message
– If keepalive message do not appear on time, connection is assumed to be broken.
• Notification
– Announces an error condition
153
Routing Tables for a BGP router
• RIB: routing information base
• Conceptually, 3 separate tables could be maintained
– Separate implementations are not required
1. Adjacent RIB inward
• Contains information learned from incoming BGP update messages
2. Local RIB
• Contains routing decisions made after applying local decision-making policies
• “The” routing table for this node
3. Adjacent RIB outward
• Contains information the router is willing to advertise via BGP
154
BGP message format
Marker
Length
Type
Authentication information – akin toa connection identifier
Number of octets in message
{Open, Update, Keepalive, Notification}
16
2
1
MessageSpecific
Information
octets
(not used for keepalive message)
155
BGP Open, Notification
• Open message has fields for (not a complete list)
– BGP protocol version (4)
– Identification of AS to which router belongs
– Hold time (period for keepalive messages)
– IP address of router
– Information to authenticate an authorized router
• Notification message indicates the following conditions:
– BGP message error
– BGP procedure error
– Hold timer expired
– Close BGP connection
156
BGP Update (1)
• Two possible functions within one update message: – Withdraw route set, listed by IP address / prefix– Add new single route
• Information about a single new route:– Origin:
– BGP (external), OSPF (internal), Unknown– Autonomous system path: a list of AS traversed for
this route– Allows routers to implement policy decisions
– Use of preferred networks– Avoidance of specific networks
157
BGP Update (2)
• Information about a single new route (continued):– Next hop: IP address of border router to be used as
next hop for IP address(es) listed below.– Could be distinct from the BGP router, if more
than one router in AS has external connections, but only one handles BGP information (example: R2 on slide 284)
– Network layer reachability information (NLRI)– A list of IP addresses to which this route applies– Could be address prefixes.
• Updates are passed on via flooding
158
Example BGP update
1.21.2
1.31.3
1.41.4
1.11.1
R3 R2
R1R4
2.12.1
2.22.2
2.32.3
2.42.4
R7
R6
R8
R5
AS1AS 2NLRI: 1.1, 1.3, 1.4
AS Path: AS1
Next hop: R1
159
BGP update propagation
2.12.1
2.22.2
2.32.3
2.42.4
AS2AS 3NLRI: 1.1, 1.3, 1.4
AS Path: AS2, AS1
Next hop: R7
3.13.1 …R7
R6
R8
R5 R9
160
Interior Routing with OSPF
• OSPF: Open Shortest Path First protocol
• Version 2 specified in RFC 2328
• Computes least cost route based on configurable metric (“cost”)
• Each router keeps track of network topology of which it is aware, including:
– Routers
– Transit networks: can carry data that neither originates nor terminates within the network
– Stub networks: data must originate or terminate within that network
161
OSPF Graph Information
• Network topology stored as a directed graph, with 4 types of nodes and 2 types of edges
• Node types:
– Router
– Transit network
– Stub network
– Host connected directly to router
• Edge types:
– Point to point link joining routers: bi-directional
– Router to network connection
N4
N8
R2
H1
162
Example of Autonomous System
stub network
transit networkrouter
host attachedto router
external networkconnections
164
Routing Information Base
FromTo
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
N3
N6
N8
N9
R1 0R2 0R3 6 0R4 8 0R5 8 6 6R6 8 7 5R7 6 0R8 0R9 0
R10 7 0 0R11 0 0R12 0N1 3N2 3N3 1 1 1 1N4 2N6 1 1 1N7N8 4 3 2N9 1 1 1
N10 2N11 3H1 10
165
SPF Tree for R6
R1
N9
H1
N1
N2
N3
N4N6
N7
N8
R2
R3
R4
R5
R6 R7
R101
6
6
7
1
00
3
3
2
R8
0
R11
3
0
0
4
0
R9
R12 N10
N113
1
toN12N13N14
toN12N15
20
0
10
166
Routing Table for R6
Destination
Next Hop
Distance
N1 R3 10
N2 R3 10
N3 R3 7
N4 R3 8
N6 R10 8
N7 R10 12
N8 R10 10
N9 R10 11
N10 R10 13
N11 R10 14
H1 R10 21
R5 R5 6
R7 R10 8
externalrouters
167
OSPF Messages
• Five types of messages
1. Hello: Protocol to discover new routers– This is the only type of message exchanged
between non-adjacent nodes.
2. Link state request: Request initial database
3. Database description: Reply to link state request
4. Link state update: Announce new information
5. Link state acknowledgement: Confirm receipt of update
• Messages sent in IP packets– Acknowledgements add reliability to IP
• Routers are expected to treat OSPF messages with higher priority than regular data
168
Performance of Routing Algorithms
• Algorithms can be judged on:
– Speed.
– Computational complexity.
– Scalability.
– Speed of convergence after topological change.
– Ability to react to current traffic situation.
– Susceptibility to routing loops.
– Ability to include line characteristics in computing the cost.
169
Advanced Routing Features
• Type of service routing:– Allows choice of path that takes into account link
quality, data rate, etc.
• Load balancing:– If there are multiple routes of equivalent cost to the
destination, traffic can be distributed among different routes.
• Area routing:– A large routing domain can be partitioned into areas
to reduce the amount of routing information kept in each router.
• Authentication:– Each router will only accept routing information from
trusted routers, identified through authentication.
170
Integrated Services Architecture (1)
• Acronym: ISA
• Standards currently under development by IETF
– Base document in RFC 1633
• Categories of traffic:
– Inelastic: constraints on throughput, delay, jitter, and packet loss
– Elastic: can adjust to changes in network conditions
– Varying tolerances for changes in above factors
– E-mail: sensitive to loss, but not delay
– FTP file transfer: sensitive to throughput, but not jitter
171
ISA Services
• Guaranteed service
– Assured data rate
– Upper bound on queuing delay
– No queuing losses
• Controlled load
– Similar to guaranteed service, except that constraints are only expected to be met for a “high percentage” of packets instead of all packets.
• Best effort
– No quality of service parameters applied to traffic.
172
Elements of ISA
• Routing algorithm:
– As an alternative to delay, quality of service can be used to weight graph edges for OSPF
• Admission control
– For any service other than best effort, a reservation must be made using the RSVP protocol (RFC 2205)
• Queuing Discipline:
– Multiple output queues with fair selection for transmission
– Each flow of inelastic traffic can be queued separately
• Discard Policy
– Policy for which packets to discard when a queue is full.
173
ISA Router Architecture
RoutingProtocols
RoutingDatabase
Classification andRoute Selection
PacketScheduler
QoS queues
Best effort queue
TrafficDatabase
ReservationProtocol
AdmissionControl
ManagementAgent
174
Protocol Configuration
• A software vendor wants to sell identical copy of protocol software to all customers.
• Each system running a protocol will have different parameters:
– IP address
– Hardware address
– Location of local router
– Location of local servers for Domain Name Service, printing, time of day, …
• The problem:
– How to “discover” the local custom values when system is initialized?
175
Protocol Configuration Initialization
• Example: plugging your laptop into a data port in the SITE cafeteria tables
• You do not want to have to configure your system; you want to start using the Internet right away
• Problem:
– What address do you use to find an address?
176
Types of Address Discovery
• Fixed:
– Host is assigned a permanent set of addresses for IP, hardware, etc.
– Protocol software needs to find these parameters during initialization, either locally or from a server.
– Required for “well-known” locations (e.g. web server)
• Dynamic
– Host uses a temporary IP address obtained from a server for a specified period of time.
– Addresses are allocated from an available pool
– Examples: ISP dial-up connection, cafeteria data ports
177
Protocol Initialization
• Local, fixed option: manual configuration of IP address.
• Reverse Address Resolution Protocol (RARP)– ARP: Given IP address, find hardware address– RARP: Given hardware address, obtain IP address
– Needs fixed hardware address in network interface card (e.g. Ethernet)
• RARP request for IP address is broadcast over network.
• After obtaining an IP address, the next step is to find a router.– To do this, we need the subnet mask of the network, so
that we can find a router on the same network.– Broadcast ICMP “Address Mask Request” message– Reply contains IP mask– Broadcast ICMP “Gateway discovery” message
178
Dynamic Address Allocation
• Each host obtains a “lease” for an IP address assigned from a pool.
– Provisioning challenge: how large should the pool of IP addresses be for customer base?
• Lease has expiry time
– Lease can be renewed before expiry
– On expiry, IP address is returned to the available pool.
179
DHCP: Dynamic Host Configuration Protocol
• Defined in RFC 2131
• Protocol to automatically:
– Assign an IP address from a pool of available addresses– Assignment can be permanent or temporary– Temporary assignment (a “lease”) will have an expiry
time.
– Locate a server
– Locate a router
– Get the name of a server
• Relies on special IP addresses:
– IP address 0.0.0.0: used to send messages while obtaining IP address
– IP address 255.255.255.255: local network broadcast
180
DHCP Message Format
0 8 16 24 31Bits
Message type HW addr. type
Seconds elapsed Broadcast flag and 15 zeros
Header length Hops to server
Client IP address (if renewing)
“Your new” IP address
Reboot Server IP address
Router IP address
Client Hardware address (16 octets)
Server host name (64 octets)
Reboot file name (128 octets)
Transaction ID
Options (variable)
181
DHCP Message Types
• (not a complete list)
• Discover: request from client to find servers (broadcast)
• Offer: server reply to discover, with offer of configuration parameters (broadcast, possibly by more than one server)
• Request: confirmation of offer, sent from client to specific server
• Acknowledgement: configuration parameters issued by server to client
• Release: client returns allocations to server and cancels lease