187 the network layer services: –deliver packets between any two hosts, reliably or unreliably. a...

181
1 The Network Layer • Services: Deliver packets between any two hosts, reliably or unreliably. A network-wide concern: Transport layer (above): between two end hosts. Data link layer (below): between two physically connected hosts. Network layer: involves each and every host, router, and gateway in the network.

Upload: allyson-morrison

Post on 25-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

1

The Network Layer

• Services:

– Deliver packets between any two hosts, reliably or unreliably.

• A network-wide concern:

– Transport layer (above): between two end hosts.

– Data link layer (below): between two physically connected hosts.

– Network layer: involves each and every host, router, and gateway in the network.

2

Architectural Approaches

• Connectionless - similar to postal system; endpoint puts data to send into a packet and hands to network for delivery

• Connection-oriented - similar to telephone system; endpoints establish and maintain a connection as long as they have data to exchange

3

Connectionless (Datagram) Service

• No connection established

• Source of data adds destination information to data and delivers to network

• Network delivers each data item individually

• No routes set up at connection establishment time - each packet may follow different route to destination (but typically won’t).

• No guarantee of reliable, or in-order delivery (although data link layer may still do link-by-link error control).

• Advantages:– Robust with respect to node / link failures.– Recovery at end to end (transport) level.

• Examples: IP

4

Connection-oriented Service

• One endpoint requests connection from network

• Other endpoint agrees to connection

• Computers exchange data through connection

• Typically uses a “stream” interface

• Source delivers stream of data to network

• Network breaks into packets for delivery

• Data transmission not necessarily continuous; like telephone, connection remains in place while no data transmitted

• One endpoint requests network to break connection when transmission is complete

• Examples: Asynchronous Transfer Mode (ATM), X.25

5

Connection duration and persistence

• Connections can be made on-demand or set up permanently

– Switched connection or switched virtual circuit

– Permanent connection or provisioned virtual circuit

• Permanent connections

– Originally hard-wired

– Now configured at system initialization

• Switched connections

– Computer maintains permanent connection to network

– Network makes connection on demand

6

Virtual circuits

• Virtual: acts like a circuit, but isn’t really one.

• “Reliable” delivery of packets between end hosts.

• All packets within connection follow the same route.

AB C

D

E F

two VCsshare link B-C

7

Virtual circuits (2)

• At connection establishment time:

– Connection setup packet flows from sender to receiver.

– Routing tables updated at intermediate nodes to reflect new virtual circuit (VC).

– Fits well with quality of service (QoS) guarantees: reject call on path if QoS can’t be guaranteed.

– Potential difficulty: recovery from link or router failure.

8

Address and Connection Identifiers

• Asynchronous Transfer Mode (ATM) - 160-bit address, 28-bit connection identifier – Connection identifier

includes: – 12-bit virtual path

identifier (VPI) – 16-bit virtual circuit

identifier (VCI) – Connection identifier local

to each computer – May be different in different

parts of the ATM switch

• Address is a complete, unique identifier

• Connectionless delivery requires address on each packet

• Connection-oriented delivery can use a shorthand that identifies the connection rather than the destination

9

Internetworking

• In the real world, computers are connected by many different technologies

• Internetworking is a scheme for interconnecting multiple networks of dissimilar technologies

• Uses both hardware and software

• Extra hardware positioned between networks

• Software on each attached computer

• System of interconnected networks is called an “internetwork” or an internet

10

Routers

• A router is a hardware component used to interconnect networks

• The router is the main layer 3 building block for large internets.

• A router has interfaces on multiple networks

• Networks can use different technologies

• Router forwards packets between networks

• Transforms packets as necessary to meet standards for each network

11

Internet Architecture

• An internetwork is composed of arbitrarily many networks interconnected by routers

• Routers can have more than two interfaces

12

A virtual network

Net 2

Net 2

Net 3

Net 3

Net 1

Net 1

• Internetworking software builds a single, seamless virtual network out of multiple physical networks

• Universal addressing scheme

• Universal service

• All details of physical networks hidden from users and application programs

13

A virtual network

Net 2

Net 2 router

Physicalnetwork

Net 3

Net 3

Net 1

Net 1

• Internetworking software builds a single, seamless virtual network out of multiple physical networks

• Universal addressing scheme

• Universal service

• All details of physical networks hidden from users and application programs

14

Internetworking Protocols

• TCP/IP is the mostly widely used internetworking protocol suite

– First internetworking protocol suite

– Initially funded through ARPA

– Picked up by NSF

• Others include IPX, VINES, AppleTalk

• TCP/IP is by far the most widely used

– Vendor and platform independent

15

Internet addresses

• One key aspect of virtual network is single, uniform address format

• Cannot use hardware addresses because different technologies have different address formats

• Address format must be independent of any particular hardware address format

• Sending host puts destination internet address in packet

• Destination address can be interpreted by any intermediate router

• Routers examine address and forward packet on to the destination

16

IP addresses

• Addressing in TCP/IP is specified by the Internet Protocol (IP)

• Each host is assigned a 32-bit number

• Called the IP address or Internet address

• Unique across entire Internet

• Each IP address is divided into a prefix and a suffix

• Prefix identifies network to which computer is attached

• Suffix identifies computer within that network

• Address format makes routing efficient

17

Network and Host Numbers

• Every network in a TCP/IP internet is assigned a network number.

• Each host on a specific network is assigned a host number or host address that is unique within that network.

• Host's IP address is the combination of the network number (prefix) and host address (suffix)

• Network numbers must be unique.

• Host addresses may be reused on different networks; combination of network number prefix and host address suffix will be unique.

• Assignment of network numbers must be coordinated globally; assignment of host addresses can be managed locally.

18

IP address format

• IP designers chose 32-bit addresses (see RFC 790)

• Allocate some bits for prefix, some for suffix

– Large prefix, small suffix - many networks, few hosts per network

– Small prefix, large suffix - few networks, many hosts per network

• Because of variety of technologies, need to allow for both large and small networks

• Designers chose a compromise - multiple address formats that allow both large and small prefixes

• Each format is called an address class

• Class of an address is identified by first four bits

19

Dotted Decimal Notation

• 32 bits divided into 4 octets

• Each octet is converted to decimal value

• Dots used to separate the 4 decimal values

• Examples:

32 bit binary number Dotted decimal

10000001 00110100 00000110 00000000 129.52.6.0

11000000 00000101 00110000 00000011 192.5.48.3

10000000 10000000 11111111 00000000 128.128.255.0

20

IP addresses in C/C++

From /usr/include/netinet/in.h

/* Internet address * This definition contains obsolete fields for * compatibility with SunOS 3.x and 4.2bsd. The * presence of subnets renders divisions into fixed * fields misleading at best. New code should use * only the s_addr field. */

struct in_addr {

union {

struct { u_char s_b1,s_b2,s_b3,s_b4; } S_un_b;

struct { u_short s_w1,s_w2; } S_un_w;

u_long S_addr;

} S_un;

#define s_addr S_un.S_addr /* should be used for all code */

};

21

Useful function calls

unsigned long inet_addr( char* cp )

– Converts string with dotted address to 32 bit value

– Example: inet_addr(“129.0.0.1”)socketAddress.sin_addr.s_addr = inet_addr( charIPAddress );

char* inet_ntoa(struct in_addr in)

– Converts 32 bit value of IP address to a string in dotted decimal format.

22

IP Addresses in Java

• Class java.net.InetAddress

static InetAddress getByName(String host)

– Creates new instance of InetAddress based on a string address

– String can either be a dotted decimal IP address (e.g. “129.0.0.1”), or a host name

static InetAddress getByAddress(byte[] address)

– Creates new instance of InetAddress based on bytes containing the 4 values for the IP address

String getHostAddress( )– Returns the IP address as a dotted decimal string

byte[] getAddress( )– Returns the raw IP address as an array of bytes

23

IP Address Classes

Octet 1 Octet 2 Octet 3 Octet 4

0 prefix suffix

10 prefix suffixB

A

110 prefix suffixC

1110 multicastD

1111 reserved for future useE

1.0.0.1 to126.255.255.254

128.0.0.1 to191.255.255.254

192.0.0.1 to223.255.255.254

224.0.0.0 to239.255.255.255

240.0.0.0 to254.255.255.255

Class

24

Special IP addresses

Prefix Suffix Type of address

Purpose

All 0s All 0s This computer Used during rebooting

Network

All 0s Network Identifies a network

Network

All 1s Directed broadcast

Broadcast on specified net

All 1s All 1s Limited broadcast

Broadcast on local net

127 Any Loopback Testing

25

Allocation of IP address classes

Class Bits in prefix

Maximum number of networks

Bits in suffix

Maximum number of hosts / network

A 7 128 24 16777216

B 14 16384 16 65536

C 21 2097152 8 256

26

CIDR addresses

• CIDR = Classless Internet Domain Routing

• Created to allow more flexibility in subnet sizes; in particular, different values between 256 and 65536

• Notation: IP address / # bits in prefix

• Usage:

– Set up 32 bit mask with indicated number of 1 bits followed by 0 bits

– Logical AND with mask and IP address to get network prefix

27

CIDR Example

• Example: allocate 2 sub-networks that can hold 14 hosts each

• Prefix calculated by logical AND:

• Network 1: 128.211.0.16 / 28 ← 28 bits in prefix

• Network 2: 128.211.0.32 / 28

• Mask is: 11111111 11111111 11111111 11110000

• Net 1: 10000000 11010011 00000000 0001––––

– Allows IP addresses 128.211.0.17 through 128.211.30, since suffix cannot be all 0s or all 1s.

• Net 2: 10000000 11010011 00000000 0010––––

28

Routers and IP addressing

• IP address depends on network address

• What about routers - connected to two networks?

• IP address specifies an interface, or network attachment point, not a computer

• Router has multiple IP addresses - one for each interface

Token Ring223.240.129.0

Ethernet 131.108.0.0

WAN 76.0.0.0WAN 76.0.0.0

131.108.99.5

223.240.129.2

223.240.129.17

76.0.0.17

29

IP – Internet Protocol

Version IHL Service type Total length

Identification

Time to live Protocol Header Checksum

Flags Fragment offset

Source address

Destination address

Options

0 4 8 16 19 31

Data: up to 65,516 octets

Bits

Maximum packet size: 65,536 octets

30

IP protocol fields

• Definition: RFC 791, plus subsequent additions

• Version: version number of protocol (currently 4; version 6 also standardized)

• Internet Header Length (IHL): number of 32-bit words in header

– Minimum value: 5 (which indicates no options)

– Larger values used when options are present.

31

IP protocol fields

• Type of service:

– Specifies, precedence (bits 0-2), delay (bit 3), throughput (bit 4), reliability (bit 5) parameters

– 0 bit = normal, 1 bit = exceptional

• Total length: length of packet in octets

• Identification: sequence number

• Flags (3):

– More: indicates packet is a fragment, with more to come

– Don’t fragment: prohibits fragmentation

– (Reserved for future use)

32

IP Protocol Fields

• Fragment offset: Indicates where in original datagram, measured in 64-bit units– Note that this requires fragmentation happen at 64-bit

boundaries (except for last fragment)

• Time to live: specifies, in seconds, time remaining before this packet expires– Every router must decrease this value by at least one.

• Protocol: indicates protocol at next higher level– Current list:

http://www.iana.org/assignments/protocol-numbers– Examples

– 1: ICMP Internet Control Message Protocol

– 6: TCP Transmission Control Protocol– 17: UDP User Datagram Protocol

33

IP Protocol Fields

• Header checksum:

– 16 bit ones-complement addition of all 16 bit words in the header

– Set to zero before computation

– Re-computed at each router

– Some fields, such as time-to-live will change as message travels through network

• Source address: 32 bit IP address

• Destination address: 32 bit IP address

34

IP options

• Defined in RFC 791 and others

• Examples:

– Secure packet

– Routing information provided

– Record route

– Record time stamps

– Stream identifier

35

IP upper level interface

where:

– src = source address

– dst = destination address

– prot = protocol

– TOS = type of service

– TTL = time to live

– BufPTR = buffer pointer

– len = length of buffer

– Id = Identifier

– DF = Don't Fragment

– opt = option data

• Two service primitives: send and receive (recv)

Result = SEND(src,dst,prot,TOS,TTL,BufPTR,en,Id,DF,opt)

Result = RECV(BufPTR,prot,&src,&dst,&TOS,&len,&opt)

36

Internet Control Message Protocol (ICMP)

• Defined in RFC 792, plus updates

• Required for internet compliance

• Carried in IP packets

• ICMP messages often sent as a reply to IP packet

Type Code Checksum

Parameters

0 4 8 16 31

Message content: variable length

Bits

37

ICMP message types

8: Echo

0: Echo reply– Asks for return of this message for testing– Parameters: identifier, sequence number

3: Destination unreachable– Code indicates particular condition:

0: net unreachable1: host unreachable2: protocol unreachable3: port unreachable4: fragmentation required; don’t fragment flag set5: source route failure

– Data: original IP header, plus first 64 bits of data

38

ICMP message types

4: Source quench– Request to slow sending rate of IP packets– Data: as in destination unreachable

5: Redirect– Used to indicate a shorter routing path – Parameters: IP address of suggested router

11: Time exceeded– Time to live counter of IP packet reached zero– Data: as in destination unreachable

12: Parameter problem– Indicates problems with an IP message (usually bad

option format)– Data: as in destination unreachable

39

ICMP message types

13: Timestamp

– Sends message that records sending time, and asks for reply

– Data: sending time, reception time (to be filled in), reply sending time (to be filled in)

14: Timestamp reply

– Reply to timestamp request

– Data: values filled in from ICMP 13 message

17: Address mask request

– Host asks router on LAN for CIDR address mask (usually at reboot)

18: Address mask reply

– Reply to address mask request

– Data: the address mask

40

Network administration functionsthat use ICMP

• Ping: test if a host will respond

– Sends an ICMP echo message to designated host

– Host sends ICMP echo reply

– Used to test connectivity

– Many organizations have disabled ping to prevent denial-of-service attacks

• Traceroute: find route from source to destination

– Sends IP packet with time-to-live of 1

– First router will discard packet and send ICMP time exceeded message

– Next message sent has time-to-live of 2, and so on until destination is reached

– Each router en route will have sent an ICMP message

41

Mapping IP addresses

• Problem: How to map IP addresses onto hardware?

– Address resolution

• Where this takes place: router attached to physical network.

• Three methods used to resolve addresses:

– Table lookup

– “Computation”

– Message exchange

42

Resolution using Table Lookup

• Router keeps table.

• The following could be a table for network 197.15.3.0 / 24

• To save space and time, only the host value of the IP address would be stored.

IP address (32 bits) Hardware address (48 bits)

197.15.3.2 0A:07:4B:12:82:36

197.15.3.3 0A:9C:28:71:32:8D

197.15.3.4 0A:11:C3:68:01:99

197.15.3.5 0A:74:59:32:CC:1F

197.15.3.6 0A:04:BC:00:03:28

197.15.3.7 0A:77:81:0E:52:FA

43

Resolution using Computation

• If hardware addresses are configurable, they can be assigned to correspond with the host part of their IP address

– Example:

– host with IP address 229.123.1.1 is assigned hardware address 1;

– host with IP address 229.123.1.2 is assigned hardware address 2;

– … and so on.

• Computation: logical AND with value 000000FF.hardware_address = ip_address & 0xff

44

Resolution using Message Exchange

• Example: Ethernet Address Resolution Protocol (ARP)

– See RFC 826

• Router sends broadcast ARP message to LAN to query hosts as to who matches the IP address

– Only the host with the matching IP address replies directly to router

– Router then has hardware address

45

ARP message format

• There is a generic format in RFC 826

• The following is specific for Ethernet: 32 bit protocol (P) addresses and 48 bit hardware (H) addresses

Sender’s P. address pt. 2

0 8 16 31

Target protocol address

Bits

H. addr. length P. addr. length Operation

Target hardware address, part 2

Target H. address pt. 1

Sender’s H. address pt. 2 Sender’s P. address pt. 1

Sender’s hardware address, part 1

Protocol address type: 0800Hardware address type: 0001

46

Transmission of ARP messages

Ethernet frame

ARP packet

Preamble data CRCSourceAddr.

Dest.Addr.

7 46 – 1500 46 6 2

SFD

1 octets

octets

PaddingARP

octets1828

0806

Frametype

47

IP Fragmentation and Reassembly

• Construction of an IP packet requires obeying maximum frame sizes at each data link layer

– MTU: maximum transmission unit

– Example: IP packet carried inside an Ethernet frame (see next slide) can have, at most, 1478 octets of user data + 20 octets of IP header = 1498

• RFC 791 says any part of the internet must have an MTU 68 octets

– Any host must be able to receive 576 octets (possibly in fragments)

• If the IP “don’t fragment” flag is set, and there is more data than the MTU allows, a router will trash the IP packet and send an ICMP message (more on this later).

• Otherwise, router has to separate user data into fragments of allowable size.

• Fragmentation can be done at any router; reassembly is only done at final destination.

48

Example of MTU: Ethernet frames

Ethernet frame

IP Packet

Preamble data CRCSourceAddr.

Dest.Addr.

7 46 – 1500 46 6 2

SFD

1

1500 ( = MTU)

octets

octets

Layer 4 data

octets

0800

Frametype

SourceAddr.

Dest.Addr.

44

IP

12 24 – 1480

49

Example of Fragmented Data

User data: 2276 octets

TL=816, FO=185, more=0 User data: 796 octets

TL=1500, FO=0, more=1 User data: 1480 octets

20

20

TL = total length, FO = frame offset (in 8-octet/64-bit units)

With an MTU of 1500, this could be sent as:

50

IP Fragmentation

• The frame offset is used instead of a “fragment sequence number” because this allows for further fragmentation at a subsequent router

TL=816, FO=185, more=0 796TL=1500, FO=0, more=1 1480

TL=700, FO=100, more=1 680

TL=820, FO=0, more=1 800

TL=816, FO=185, more=0 796

MTU = 820:

51

Reassembly

• Reassembly is only done at the destination

– i.e. host with IP address in destination field

• Fragments are reassembled based on matching source address, destination address, identification field (sequence number), and protocol

• A reassembly timer is often used as the holding time for resources while waiting for all fragments

– Timer started when first fragment arrives.

– Timer cancelled when contiguous data from frame offset 0, to a fragment where the ‘more’ flag is 0 has arrived.

– If timer expires, buffer is released and fragments are trashed (and ICMP “time exceeded” message returned).

• Alternative: use ‘Time to live’ field of first fragment

52

IP Version 6 (IPv6)

• Defined in RFC 2460 and others

• Enhancements:

– 128 bit addresses

– Revised (incompatible) base header format

– Extension headers used for additional information

– Support for Quality of Service specification

– Extensibility

– Modifications to accommodate faster routing

53

IPv6 addresses

• IPv4 addresses have first 96 bits as 0 in IPv6

• New shorthand notation: colon hexadecimal

105.220.136.100.255.255.255.255.0.0.18.128.140.10.255.255

becomes

69DC:8864:FFFF:FFFF:0:1280:8C0A:FFFF

FFOC:0:0:0:0:0:0:0:B1

becomes

FFOC : : B1

• In IPv6, an IP address is assigned to an interface, not a node– One device can have 2 or more IPv6 addresses on the same

network– Intended to speed routing of packets

– Example: one address could be the “higher priority” interface.

54

IPv6 multiple headers

• Each extension header will identify its own length, as well as the type of extension header (“next header”) or data that follows.

IPv6 base dataExtension 1

40 octets

Extension N…

optional

55

IPv6 Base Header

Version Traffic class Flow label

Payload length

Source address

0 4 12 16 24 31Bits

Next header Hop limit

Destination address

56

IPv6 base header fields (1)

• Version: 6

• Traffic class:

– Available for establishing classes or priorities for packet handling

– First 6 bits: differentiated services field

– Last 2 bits: reserved for congestion notification (not yet standardized)

• Flow label: identifier for a sequence of packets from a single source, and with similar transmission requirements

– Example: one flow could identify a specific video transmission

57

IPv6 base header fields (2)

• Payload length (in octets):

– Length of all extension headers plus upper layer data

– Does not include the fixed header.

• Next header: identifies type of header following this header

– Could indicate upper level protocol, or IPv6 extension header

– Values are the protocol numbers defined in: http://www.iana.org/assignments/protocol-numbers

58

IPv6 base header fields (3)

• Hop limit: after visiting this many routers, packet will be discarded.

• Source, destination addresses

– Destination address may not be packet’s ultimate destination

– Available modes:

– Unicast: single destination

– Anycast: choose one destination from a list

– Multicast: specific group of destinations

– Broadcast: to everyone

59

Extension headers

• Recommended order of appearance:– IPv6 base (required) – Hop-by-hop options (next header = 0)– Destination options (next header = 60)

– To be processed by first destination in IPv6 header, plus destinations in routing header.

– Routing header (next header = 43)– Fragmentation header (next = 44)– Authentication (next header = 51)– Security / Encapsulation (next header = 50)– Destination options (next header = 60)

– For packet’s final destination– Upper layer protocol (next header = 6 for TCP, 17 for UDP,

58 for ICMPv6, 41 for IPv6 inside IPv6)

60

Hop-by-Hop Options

• “Jumbo payload”: packet is larger than 65,535 octets

– Payload length in fixed header must be zero

– No fragment header

• “Router alert”: information should be examined by each router along the way

– Example: using a protocol such as the Resource reSerVation Protocol (RSVP) to set up quality of service parameters.

61

Fragmentation in IPv6

• An extension header, the “fragment header” contains the fragmentation information not contained in the base header

• All fragmentation in IPv6 must be done by original sender

– This means that the sender has to discover the minimum MTU for the entire transmission.

– Find MTU by sending decreasingly larger ICMP “echo” messages with “don’t fragment” set, until an ICMP “echo reply” is returned instead of “destination unreachable”

– IPv6 has the rule that networks must have an MTU 1280 octets

62

Authentication Codes

• Message Authentication Code (MAC):

– carried in authentication header.

• Assume that sender A and receiver B have a shared secret key, KAB.

• MAC = f(KAB, M), where f is a mutually-agreed encryption function

• Receiving the correct MAC means:

– receiver knows that message is not altered.

– message is from correct sender

– sequence of message is correct

63

Congestion

• Congestion occurs when the number of packets being transmitted through the network approaches the packet handling capacity of the network

• Congestion control aims to keep number of packets below level at which performance falls off dramatically

• Data network is a network of queues

• Generally 80% utilization is critical

• Finite queues mean data may be lost

64

Queues at a Node

in

out

65

Router Packet Handling

• Packets arriving are stored at input buffers

• Routing decision made

• Packet moves to output buffer

• Packets queued for output transmitted as fast as possible

• If packets arrive to fast to be routed, or to be output, buffers will fill and overflow.

– Can discard packets

– Can use flow control– Can propagate congestion through

network

66

Congestion Principles

• Usually occurs at a point of transition to reduced throughput.

• Occurs when the higher capacity part of a system is currently carrying more traffic than the lower capacity part can handle.

• Difference from flow control:

– Flow control is one sender agreeing not to overflow one receiver at the endpoints of a transmission

– Congestion is usually caused by multiple senders, and occurs at an intermediate point in the network

– This makes congestion more difficult to detect, and to alleviate.

67

Implicit Congestion Detection

• What are the signs of congestion?

– Increased transmission time

– Packets spend more time in queues that are longer: delay increases

– Disappearance of packets

– On a fibre-based network (or ones with data link error control), disappearance of packets can be interpreted as a sign of congestion.

– Sending timers (at transport layer) start expiring.

68

Interaction of Queues

69

Idealized Performance

• Network can accept load up to its capacity

• Additional load will be delivered at capacity throughput rates.

– Packets are queued up at intermediate points

70

Idealized Performance: Throughput

0

0.2

0.4

0.6

0.8

1

1.2

0 0.5 1 1.5 2

Normalized load

No

rmal

ized

th

rou

gh

pu

t

71

Idealized Performance: Delay

0 0.5 1 1.5

Normalized load

Del

ay

72

Practical Performance

Load

Del

ay

Load

No

rmal

ized

th

rou

gh

pu

t

73

Practical Performance

• Ideal assumes infinite buffers and no overhead

• Buffers are finite

• Overheads occur in exchanging congestion control messages

74

The Congestion Control Paradox

• When congestion occurs, the problem is that there are too many packets in the network

• If packets are trashed, senders will likely resend them, along with new packets.

– Result: increased congestion

• If one node sends out messages to announce it is congested, then it increases the number of extra overhead packets in the network.

– Result: increased congestion

• If one node asks its neighbours to slow down, then the output queues of the neighbouring nodes will start filling up.

– Result: increased congestion

75

Congestion Control

• Implicit

– No action taken

– It is assumed senders will notice evidence of congestion and deal with it themselves.

– What can senders do?– Slow rate of packet sending– Increase timeout length for sent packets

• Explicit

– Various mechanisms to announce or alleviate congestion, taken by intermediate network notes.

76

Implicit Congestion Signaling

• Transmission delay may increase with congestion

• Packet may be discarded

• Source can detect these as implicit indications of congestion

• Useful on connectionless (datagram) networks

– Example: IP leaves congestion (and flow) control to upper layer (normally TCP).

• Used in frame relay LAPF

77

Explicit Congestion Signaling

• Network alerts end systems of increasing congestion

• End systems take steps to reduce offered load

• Backwards

– Congestion avoidance in opposite direction to packet required

• Forwards

– Congestion avoidance in same direction as packet required

78

Backpressure

• If node becomes congested it can slow down or halt flow of packets from other nodes

• May mean that other nodes have to apply control on incoming packet rates

• Propagates back to source

• Can restrict to logical connections generating most traffic

• Used in connection oriented that allow hop by hop congestion control (e.g. X.25)

• Not used in ATM nor frame relay

• Only recently developed for IP

79

Choke Packet

• Control packet

– Generated at congested node

– Sent to source node

– e.g. ICMP source quench

– From router or destination

– Source cuts back until no more source quench message

– Sent for every discarded packet, or anticipated

• Rather crude mechanism

80

Categories of Explicit Signaling

• Binary

– A bit set in a packet indicates congestion

• Credit based

– Indicates how many packets source may send

– Common for end to end flow control

• Rate based

– Supply explicit data rate limit

– e.g. ATM

81

TCP Slow Start

0

4096

8192

12288

16384

20480

24576

28672

32768

36864

40960

45056

Transmission Number

Co

ng

esti

on

Win

do

w (

byt

es) Threshold 1

Threshold 2

Timeout

82

Rate-based Congestion Control

• Regulate rate at which sender can inject packets into network:

• A packet must match up with (and remove) a token before entering network.

• Tokens added to bucket at rate r.

• At most b tokens can accumulate in bucket; tokens overflow and are lost after that– Bucket size b controls “burstiness”

• Max. number of packets entering network in [ t, t + δ ] is b + δr

tokens arriveat fixed rate

“bucket” of tokens

to network

storage for upto b tokens

packet waiting area

83

Congestion Control in Packet Switched Networks

• Send control packet to some or all source nodes

– Requires additional traffic during congestion

• Rely on routing information

– May react too quickly

• End to end probe packets

– Adds to overhead

• Add congestion info to packets as they cross nodes

– Either backwards or forwards

84

Traffic Management

• Fairness

• Quality of service

– May want different treatment for different connections

– What is more critical: delay or loss?

• Reservations

– e.g. ATM (Asynchronous Transfer Mode)

– Traffic contract between user and network

85

Case Study: ATM Traffic Management

• ATM standards specify several service categories

• Network traffic is managed to achieve Quality of Service (QoS) goals

• For each of the service categories (on subsequent slides):

– What is the highest priority for QoS?– Delay– Loss

– What would be a congestion control / avoidance strategy?

86

ATM Service Categories

• Real time

– Constant bit rate (CBR)

– Real time variable bit rate (rt-VBR)

• Non-real time

– Non-real time variable bit rate (nrt-VBR)

– Available bit rate (ABR)

– Unspecified bit rate (UBR)

87

Real Time Services

• QoS parameters:

– Amount of delay

– Variation of delay (jitter)

88

CBR: Constant Bit Rate

• Fixed data rate continuously available

• Tight upper bound on delay

• Uncompressed audio and video

– Video conferencing

– Interactive audio

– Audio / video distribution and retrieval

89

rt-VBR: Real-time Variable Bit Rate

• Time sensitive application

– Tightly constrained delay and delay variation

• rt-VBR applications transmit at a rate that varies with time

• Example: compressed video

– Produces varying sized image frames

– Original (uncompressed) frame rate constant

– So compressed data rate varies

• Can statistically multiplex connections

90

nrt-VBR: Non-real-time Variable Bit Rate

• May be able to characterize expected traffic flow

• Improve Quality of Service (QoS) in loss and delay

• End system specifies:

– Peak cell rate

– Sustainable or average rate

– Measure of how bursty traffic is

• e.g. Airline reservations, banking transactions

91

UBR: Unspecified Bit Rate

• May be additional capacity over and above that used by CBR and VBR traffic

– Not all resources dedicated

– Bursty nature of VBR

• For application that can tolerate some cell loss or variable delays

– e.g. TCP based traffic

• Cells forwarded on FIFO basis

• Best efforts service

92

ABR: Available Bit Rate

• Application specifies peak cell rate (PCR) and minimum cell rate (MCR)

• Resources allocated to give at least MCR

• Spare capacity shared among all ABR sources

• e.g. LAN interconnection

93

Asynchronous Transfer Mode (ATM)

• Properties of ATM:

– Small, fixed-sized packets, called “cells”

– ATM networks are connection-oriented: a connection must be set up at the start of a call

– Set up a “virtual path” (VP) on a “virtual channel” (VC)

– Subsequent cells will follow the same route to destination

– Control signaling on separate channel from user data

– Cell delivery is not guaranteed, but cell order is preserved

– Traffic management is taken into account when setting up a connection.

– High speed: data rates up to 622.08 Mbits / s

94

ATM Reference Model

Plane management

Layer management

Control plane User plane

ATM layer

ATM adaptation layer

Physical layer

Upper layer Upper layer

ATM layer

• ATM layer is approximately equivalent to the OSI network layer

95

Reference Model Layers

• Physical layer:

– Handles equivalent of OSI physical and data link layers

• ATM layer

– Deals with cells, and cell transport

– Defines cell layout, and header fields

– Establishment and release of virtual circuits

– Congestion control

• AAL: ATM adaptation layer

– Provides for transmission of packets larger than a cell.

– Various AAL protocols deal with different ATM service categories (CBR, etc.)

96

Reference Model Planes

• User plane

– Provides for user information transfer

• Control plane

– Call and connection control

• Management plane

– Plane management– whole system functions

– Layer management– Resources and parameters in protocol

entities

97

ATM Connection Setup

• Performed in control plane: VP0, VC5

• ITU protocol Q.2931

setupsetup

setupcall proceeding

connectcall proceeding

connectconnect

connect ackconnect ack

connect ack

releaserelease

releaserelease completerelease complete

release complete

98

ATM Cells

• Fixed size: 53 octets

– 5 octet header

– 48 octet information field

• Small cells reduce queuing delay for high priority cells

• Small cells can be switched more efficiently

• Easier to implement switching of small cells in hardware

99

ATM Cell Format

• Ordered transmission of 53 octet cells

• 5 octet header identifies virtual path, virtual channel , which together comprise a “connection identifier”

VPI: virtual path identifier - used for routing

VCI: virtual channel identifier - identifies transmissions within

PTI: payload type

CLP: cell loss priority

HEC: header error check

VPI HECCLPPTIVCI

12 16 3 1 8bits

upper level data

384 (= 48 octets)

100

User – Network Interface (UNI) cell

• First 4 bits of virtual path identifier used as a flow control field for a cell entering the network

• Will be overwritten by first router

GFC: generic flow control

VPI HECCLPPTIVCI

8 16 3 1 8bits

upper level data

384 (= 48 octets)

GFC

4

101

ATM payload type field

• Three bits:

0 0 0: User data cell type 0, no congestion

0 0 1: User data cell type 1, no congestion

0 1 0: User data cell type 0, congestion

0 1 1: User data cell type 1, congestion

1 0 0: Operation / administration / maintenance (OAM) message, this hop

1 0 1: OAM message, end to end

1 1 0: Resource management cell

1 1 1: Reserved for future use

102

ATM Traffic Management

• High speed, small cell size, limited overhead bits

• Still evolving

• Requirements

– Majority of traffic not amenable to flow control

– Feedback slow due to reduced transmission time compared with propagation delay

– Wide range of application demands

– Different traffic patterns

– Different network services

– High speed switching and transmission increases volatility

103

Latency/Speed Effects

• ATM 622.08 Mbps

• ~6.8x10-7 seconds to insert single cell

• Time to traverse network depends on propagation delay, switching delay

• Assume propagation at two-thirds speed of light

• If source and destination on opposite sides of Canada, propagation time ~ 2.75x10-2 seconds

• Given implicit congestion control, by the time dropped cell notification has reached source, 1.7x107 bits have been transmitted

• So, this is not a good strategy for ATM

104

Cell Delay Variation

• For ATM voice/video, data is a stream of cells

• Delay across network must be short

• Rate of delivery must be constant

• There will always be some variation in transit

• Delay cell delivery to application so that constant bit rate can be maintained to application

105

Network Contribution to Cell Delay Variation

• Packet switched networks in general

– Queuing delays

– Routing decision time

• ATM

– ATM protocol designed to minimize processing overheads at switches

– ATM switches have very high throughput

– Only noticeable delay is from congestion

– Must not accept load that causes congestion

106

Cell Delay Variation At The User-Network Interface

• Application produces data at fixed rate

• Processing at three layers of ATM causes delay

– Interleaving cells from different connections

– Operation and maintenance cell interleaving

– If using synchronous digital hierarchy frames, these are inserted at physical layer

– Can not predict these delays

107

Traffic and Congestion Control Framework

• ATM layer traffic and congestion control should support QoS classes for all foreseeable network services

• Should not rely on AAL protocols that are network specific, nor higher level application specific protocols

• Should minimize network and end to end system complexity

108

Timings Considered

• Cell insertion time

• Round trip propagation time

• Connection duration

• Long term

• Determine whether a given new connection can be accommodated

• Agree performance parameters with subscriber

109

Traffic Management and Congestion Control Techniques

• Resource management using virtual paths

• Connection admission control

• Usage parameter control

• Selective cell discard

• Traffic shaping

– Use the token bucket scheme for rate-based congestion control.

110

Resource Management Using Virtual Paths

• Separate traffic flow according to service characteristics

• User to user application

• User to network application

• Network to network application

• Concern with:

– Cell loss ratio

– Cell transfer delay

– Cell delay variation

111

Connection Admission Control

• First line of defense

• User specifies traffic characteristics for new connection by selecting a QoS

• Network accepts connection only if it can meet the demand

• Traffic contract

– Peak cell rate

– Cell delay variation

– Sustainable cell rate

– Burst tolerance

112

Usage Parameter Control

• Protection of network resources from overload by one connection

• Monitor connection to ensure traffic conforms to contract

– Monitor peak cell rate

– Measure cell delay variation

– Determine average cell rate

– Track burst sizes

• Discard cells that do not conform to traffic contract

– Called traffic policing

113

ATM-ABR Traffic Management

• Some applications (Web, file transfer) do not have well defined traffic characteristics

• Best efforts

– Allow these applications to share unused capacity

– If congestion builds, cells are dropped

• Closed loop control

– ABR connections share available capacity

– Share varies between minimum cell rate (MCR) and peak cell rate (PCR)

– ARB flow limited to available capacity by feedback

– Buffers absorb excess traffic during feedback delay

– Low cell loss

114

Feedback Mechanisms

• Transmission rate characteristics:

– Allowed cell rate

– Minimum cell rate

– Peak cell rate

– Initial cell rate

• Start with ACR=ICR

• Adjust ACR based on feedback from network

– Resource management cells– Congestion indication bit– No increase bit– Explicit cell rate field

115

Routers

• The main function of a router is to decide how best to forward packets, based on its network address.

• Action: look up identifier in a routing table, and forward packets to appropriate outgoing link, or to upper layer if applicable.

A

CBD

116

Properties Desired for Routing

• Correctness: send packet “closer” to destination

• Simplicity: less error-prone, faster

• Robustness: ability to react to changes

• Stability: routing algorithms should converge to a stable state

• Fairness: guarantee that packets are not held up indefinitely

• Performance: speed, throughput

• Scalability: can deal with ever-increasing number of network nodes

• Security: filtering of malicious activity

117

Performance Criteria

• Used for selection of route

• Criterion is used to measure the “least cost” route

• Cost could be…

– Number of hops

– $ price of link

– Delay time

– Suitability for QoS requirements

118

Costing of Routes

1

2 3

4 5

6

23

3 11 2

4

8

5

1

1

1

7

23

6

35

2

8

119

Routing Decision Time and Place

• Time

– Datagram service: on arrival of each packet

– Virtual circuit service: at connection setup

• Place

– Distributed

– Made by each node

– Centralized

– Source

– Initial sender specifies route (e.g. IP option)

120

Network Information Sourceand Update Timing

• Routing decisions usually (but not always!) based on knowledge of network

• Distributed routing– Nodes use local knowledge– May collect information from adjacent nodes– May collect information from all nodes on a potential

route

• Central routing– Collect information from all nodes

• Update timing– When is network info held by nodes updated?

– Fixed routing – requires human intervention– Adaptive - regular updates

121

Routing Strategies

• Fixed

• Flooding

• Random

• Adaptive

122

Fixed Routing

• Single permanent route for each source to destination pair

• Determine routes using a least cost algorithm

• Route fixed, at least until a change in network topology

123

Our example again…

1

2 3

4 5

6

23

3 11 2

4

8

5

1

1

1

7

23

6

35

2

8

124

Central Routing Table

From: To: 1 2 3 4 5 6

1 – 2 3 4 4 4

2 1 – 3 4 4 4

3 1 5 – 5 5 5

4 2 2 5 – 5 5

5 4 2 3 4 – 6

6 5 5 5 5 5 –

125

Local Routing Tables

12456

15555

12356

2

55

2

5

12345

5

55

55

23456

23444

13456

13444

1 2 3 4

4

34

2

6

1

34

2

6

5 6

261

126

Flooding

• No network info required

• Packet sent by node to every neighbor

• Incoming packets retransmitted on every link except incoming link

• Eventually a number of copies will arrive at destination

• Each packet is uniquely numbered so duplicates can be discarded

• Nodes can remember packets already forwarded to keep network load in bounds

• Can include a hop count in packets

127

Flooding Example

1

2 3

4 5

6

11

1

1,2

1,2

1,4

1,4

1,4

1,3

1,31,3

1,3

1,3,4,5

2,3,4

2,3,4

2,3,4

1,3,6

3 2

2,3 3,5

4,5

4

3

2,44

2,42

128

• Once more, but with routing tables…

– Assume packets carry a hop count for each node.

• Note: due to space limitations, the routing table for node 4 will not appear.

129

1

2 3

4 5

6

11

1

1,2

1,2

1,4

1,4

1,4

1,3

1,31,3

1,3

1,3,4,5

2,3,4

2,3,4

2,3,4

1,3,6

3 2

2,3 3,5

4,5

4

3

2,44

2,42

23456

13456

12456

12345

12346

1 134

11

1 124

11

5 1

3

34

2

11

3

6

2

1

3

3

2

13

35

2

21

234

111

130

Properties of Flooding

• All possible routes are tried

– Very robust

• At least one packet will have taken minimum hop count route

– Can be used to set up virtual circuit

• All nodes are visited

– Useful to distribute information

131

Random Routing

• Node selects one outgoing path for retransmission of incoming packet

• Selection can be random or round robin

• Can select outgoing path based on probability calculation

• No network info needed

• Route is typically not least cost nor minimum hop

132

Adaptive Routing

• Used by almost all packet switching networks

• Routing decisions change as conditions on the network change

– Failure

– Congestion

• Requires info about network

• Decisions more complex

• Tradeoff between quality of network info and overhead

– Reacting too quickly can cause oscillation

– Reacting too slowly to be relevant

133

Adaptive Routing

• Two factors used to make decision:

– Sending the packet in “generally” the right direction.

– Minimizing congestion

• Instead of having one entry in routing table for a destination, keep a list of alternative links.

• Each alternative has a bias factor Bi that indicates the preference for correct routing.

– Lowest bias factor implies “shortest” route to destination.

• Route packets based on the combination of the current outgoing queue length Qi for a particular link, and the bias factor.

– That is, minimize Qi + Bi over the set of alternatives.

134

Classification

• Based on information sources

– Local (isolated)

– Route to outgoing link with shortest queue

– Can include bias for each destination

– Rarely used - does not take advantage of easily available information about other nodes.

– Adjacent nodes

– All nodes

135

Local Adaptive Routing Example

To 1

To 2

To 3

To 5

1235

9630

Bias fordestination 6

Result: Chooselink to 3, since sumof bias and queuelength is 4

136

ARPANET Routing Strategies(1)

• First Generation (1969)

– Distributed adaptive

– Estimated delay as performance criterion (“cost”)

– Use modified Bellman-Ford algorithm (1962)

– Node exchanges delay vector with neighbors every 128 ms

– Update routing table based on incoming info

– Does not consider link speed, just queue length

– Queue length not a good measurement of delay

– Responds slowly to congestion

137

Bellman-Ford Algorithm

• Determines shortest paths from a source node s to all other nodes.

• For all nodes, keep the current best known shortest path– Initialize to 0 for the source and +∞ for all other nodes

• Algorithm proceeds by hop count from source node– Start with hop count of 0.

• Keep a set of edges E which have been examined.– Start with an empty set

• Repeat until E includes all edges: – Add one to current hop count– Add all edges that can be reached in this hop count to E.– For each edge added, if cost of edge to node is lower than current

minimum, replace current minimum.– Update the current best known shortest paths to all nodes, based

on inclusion of this edge.

138

Example: Bellman-Ford Algorithm

23456

1

2 3

4 5

6

23

3 11 2

4

8

5

1

1

1

7

23

6

35

2

8

1

2 3

4 5

6

∞∞2

2

1∞543

104

139

The result

1

2 3

4 5

6

1 2

1

1

2

1

2 3

4 5

6

23456

2

2

4

13

140

Distance (Cost) Vector Routing

• Localized version of Bellman-Ford algorithm

• Router receives information from neighbours, and chooses the best option from information received.

• Updates corresponds to stages in global algorithm:

– As router finds out about more destinations, new entries added.

R1 R2 R3

destination - costA - 1B - 2C - 2D - 6

destination - costA - 3B - 1E - 1F - 4

destination - costA - 2 via R1B - 2 via R3C - 3 via R1 D - 7 via R1E - 2 via R3F - 5 via R3

141

ARPANET Routing Strategies(2)

• Second Generation (1979)

– Uses delay as performance criterion– Delay measured directly

– Computed every 10 s by time-stamping packets.

– Significant changes passed on via flooding

– Uses Dijkstra’s algorithm (1959)

– Good under light and medium loads

– Under heavy loads, little correlation between reported delays and those experienced

– Why? Routers all recompute routing tables at same time, and could all switch from a heavily loaded link to a lightly loaded link – which just moves congestion elsewhere.

142

Dijkstra’s Algorithm

• Determines shortest paths from a source node s to all other nodes.

• For all nodes, keep the current best known shortest path

– Initialize to 0 for the source and +∞ for all other nodes

• Keep a set of nodes N for which the shortest path is known.

– Initialize this set to {s}.

• Repeat until N includes all nodes:

– For each node not in N, what would be the shortest path from s to the node by taking, as the last hop, an edge from a node in N?

– Whichever node results in the minimum shortest path, add that node to N.

– Update the current best known shortest paths to all nodes, based on inclusion of the new node.

143

Example: Dijkstra’s Algorithm

23456

1

2 3

4 5

6

23

3 11 2

4

8

5

1

1

1

7

23

6

35

2

8

1

2 3

4 5

6

∞∞2

2

1∞5432

2

3314

5

2 223

4446

144

The result

1

2 3

4 5

6

1 2

1

1

2

1

2 3

4 5

6

23456

∞∞2

2

4

1543

145

Distance (Cost) Vector Routing

• Localized version of Bellman-Ford algorithm

• Router receives information from neighbours, and chooses the best option from information received.

• Updates corresponds to stages in global algorithm:

– As router finds out about more destinations, new entries added.

R1 R2 R3

destination - costA - 1B - 2C - 2D - 6

destination - costA - 3B - 1E - 1F - 4

destination - costA - 2 via R1B - 2 via R3C - 3 via R1 D - 7 via R1E - 2 via R3F - 5 via R3 276

146

ARPANET Routing Strategies(3)

• Third Generation (1987)

– Link cost calculations changed– Measure average delay over last 10 seconds– Convert to utilization (0 ≤ U ≤ 1):

where Ts is the “service time” and T is the measured delay.

– Service time is average packet size (600 often used) divided by the speed of the data link.

– Normalize average utilization AU based on current value U and previous average:

AU′ = 0.5 AU + 0.5 U

TT

TTU

s

s

2

2

147

ARPANET Routing Strategies(3)

– Cost =

1, if AU ≤ 0.5

1 + 4(AU – 0.5), if AU > 0.5

– Special cost for satellite link =

2, if AU ≤ 0.75

2 + 4(AU – 0.75), if AU > 0.75

– Cost is in range 1 to 3.

– Maximum penalty for avoiding a congested link or node is 2 extra hops.

148

Routing Protocols

• Two types:– Interior: used within an “autonomous system” (AS)– Exterior: used between differing autonomous

systems.

• An “autonomous system” (RFC 1930) consists of routers (and networks) that: – Use a common routing protocol– Are managed by the same organization– Are connected (except when failures occur)

• Autonomous systems are identified by AS numbers– Assigned by IANA (Internet Authority for Assigned

Numbers) (www.iana.org)– In North America, IANA delegates to the American

Registry for Internet Numbers (ARIN) (www.arin.net)

149

Internetworking of Autonomous Systems

N1.2N1.2

N1.3N1.3

N1.4N1.4

N1.1N1.1

N2.1N2.1

N2.2N2.2

N2.3N2.3

N2.4N2.4

R3 R2

R7

R6

R8

R5

R1R4

AS 1

AS 2OSPFBGP

Physical link

150

Interior versus Exterior Routing

• Interior routing

– Typical situation: corporate network, ISP

– Usual protocol: Open Shortest Path First (OSPF) version 2 [RFC 2328]

– Needs detailed picture of network

– Least cost is the important factor

• Exterior routing

– Typical situation: connections between ISPs

– Usual protocol: Border Gateway Protocol (BGP) version 4 [RFC 1771]

– Less detailed information exchanged

– Reachability is the important factor

151

Exterior Routing with BGP

• Messages sent via TCP connection (BGP inside TCP inside IP)

• Procedures:

1. Neighbour acquisition– A neighbour is another router on the same

(physical) network but is part of a different autonomous system

– Routers agree to regular exchange of information.

2. Neighbour reachability– Maintaining the relationship with status updates

3. Network reachability– Keeping a data base of networks that can be

reached, and the preferred route to reach each network.

152

BGP Messages

• Open

– Begin a neighbour relationship with a new router

• Update

– Announce a new single route, or the deletion of one or more routes

• Keepalive

– Sent periodically to confirm router is still active and maintains the neighbour relationship

– Also acknowledges an Open message

– If keepalive message do not appear on time, connection is assumed to be broken.

• Notification

– Announces an error condition

153

Routing Tables for a BGP router

• RIB: routing information base

• Conceptually, 3 separate tables could be maintained

– Separate implementations are not required

1. Adjacent RIB inward

• Contains information learned from incoming BGP update messages

2. Local RIB

• Contains routing decisions made after applying local decision-making policies

• “The” routing table for this node

3. Adjacent RIB outward

• Contains information the router is willing to advertise via BGP

154

BGP message format

Marker

Length

Type

Authentication information – akin toa connection identifier

Number of octets in message

{Open, Update, Keepalive, Notification}

16

2

1

MessageSpecific

Information

octets

(not used for keepalive message)

155

BGP Open, Notification

• Open message has fields for (not a complete list)

– BGP protocol version (4)

– Identification of AS to which router belongs

– Hold time (period for keepalive messages)

– IP address of router

– Information to authenticate an authorized router

• Notification message indicates the following conditions:

– BGP message error

– BGP procedure error

– Hold timer expired

– Close BGP connection

156

BGP Update (1)

• Two possible functions within one update message: – Withdraw route set, listed by IP address / prefix– Add new single route

• Information about a single new route:– Origin:

– BGP (external), OSPF (internal), Unknown– Autonomous system path: a list of AS traversed for

this route– Allows routers to implement policy decisions

– Use of preferred networks– Avoidance of specific networks

157

BGP Update (2)

• Information about a single new route (continued):– Next hop: IP address of border router to be used as

next hop for IP address(es) listed below.– Could be distinct from the BGP router, if more

than one router in AS has external connections, but only one handles BGP information (example: R2 on slide 284)

– Network layer reachability information (NLRI)– A list of IP addresses to which this route applies– Could be address prefixes.

• Updates are passed on via flooding

158

Example BGP update

1.21.2

1.31.3

1.41.4

1.11.1

R3 R2

R1R4

2.12.1

2.22.2

2.32.3

2.42.4

R7

R6

R8

R5

AS1AS 2NLRI: 1.1, 1.3, 1.4

AS Path: AS1

Next hop: R1

159

BGP update propagation

2.12.1

2.22.2

2.32.3

2.42.4

AS2AS 3NLRI: 1.1, 1.3, 1.4

AS Path: AS2, AS1

Next hop: R7

3.13.1 …R7

R6

R8

R5 R9

160

Interior Routing with OSPF

• OSPF: Open Shortest Path First protocol

• Version 2 specified in RFC 2328

• Computes least cost route based on configurable metric (“cost”)

• Each router keeps track of network topology of which it is aware, including:

– Routers

– Transit networks: can carry data that neither originates nor terminates within the network

– Stub networks: data must originate or terminate within that network

161

OSPF Graph Information

• Network topology stored as a directed graph, with 4 types of nodes and 2 types of edges

• Node types:

– Router

– Transit network

– Stub network

– Host connected directly to router

• Edge types:

– Point to point link joining routers: bi-directional

– Router to network connection

N4

N8

R2

H1

162

Example of Autonomous System

stub network

transit networkrouter

host attachedto router

external networkconnections

163

AS as a Directed Graph

164

Routing Information Base

FromTo

R1

R2

R3

R4

R5

R6

R7

R8

R9

R10

R11

R12

N3

N6

N8

N9

R1 0R2 0R3 6 0R4 8 0R5 8 6 6R6 8 7 5R7 6 0R8 0R9 0

R10 7 0 0R11 0 0R12 0N1 3N2 3N3 1 1 1 1N4 2N6 1 1 1N7N8 4 3 2N9 1 1 1

N10 2N11 3H1 10

165

SPF Tree for R6

R1

N9

H1

N1

N2

N3

N4N6

N7

N8

R2

R3

R4

R5

R6 R7

R101

6

6

7

1

00

3

3

2

R8

0

R11

3

0

0

4

0

R9

R12 N10

N113

1

toN12N13N14

toN12N15

20

0

10

166

Routing Table for R6

Destination

Next Hop

Distance

N1 R3 10

N2 R3 10

N3 R3 7

N4 R3 8

N6 R10 8

N7 R10 12

N8 R10 10

N9 R10 11

N10 R10 13

N11 R10 14

H1 R10 21

R5 R5 6

R7 R10 8

externalrouters

167

OSPF Messages

• Five types of messages

1. Hello: Protocol to discover new routers– This is the only type of message exchanged

between non-adjacent nodes.

2. Link state request: Request initial database

3. Database description: Reply to link state request

4. Link state update: Announce new information

5. Link state acknowledgement: Confirm receipt of update

• Messages sent in IP packets– Acknowledgements add reliability to IP

• Routers are expected to treat OSPF messages with higher priority than regular data

168

Performance of Routing Algorithms

• Algorithms can be judged on:

– Speed.

– Computational complexity.

– Scalability.

– Speed of convergence after topological change.

– Ability to react to current traffic situation.

– Susceptibility to routing loops.

– Ability to include line characteristics in computing the cost.

169

Advanced Routing Features

• Type of service routing:– Allows choice of path that takes into account link

quality, data rate, etc.

• Load balancing:– If there are multiple routes of equivalent cost to the

destination, traffic can be distributed among different routes.

• Area routing:– A large routing domain can be partitioned into areas

to reduce the amount of routing information kept in each router.

• Authentication:– Each router will only accept routing information from

trusted routers, identified through authentication.

170

Integrated Services Architecture (1)

• Acronym: ISA

• Standards currently under development by IETF

– Base document in RFC 1633

• Categories of traffic:

– Inelastic: constraints on throughput, delay, jitter, and packet loss

– Elastic: can adjust to changes in network conditions

– Varying tolerances for changes in above factors

– E-mail: sensitive to loss, but not delay

– FTP file transfer: sensitive to throughput, but not jitter

171

ISA Services

• Guaranteed service

– Assured data rate

– Upper bound on queuing delay

– No queuing losses

• Controlled load

– Similar to guaranteed service, except that constraints are only expected to be met for a “high percentage” of packets instead of all packets.

• Best effort

– No quality of service parameters applied to traffic.

172

Elements of ISA

• Routing algorithm:

– As an alternative to delay, quality of service can be used to weight graph edges for OSPF

• Admission control

– For any service other than best effort, a reservation must be made using the RSVP protocol (RFC 2205)

• Queuing Discipline:

– Multiple output queues with fair selection for transmission

– Each flow of inelastic traffic can be queued separately

• Discard Policy

– Policy for which packets to discard when a queue is full.

173

ISA Router Architecture

RoutingProtocols

RoutingDatabase

Classification andRoute Selection

PacketScheduler

QoS queues

Best effort queue

TrafficDatabase

ReservationProtocol

AdmissionControl

ManagementAgent

174

Protocol Configuration

• A software vendor wants to sell identical copy of protocol software to all customers.

• Each system running a protocol will have different parameters:

– IP address

– Hardware address

– Location of local router

– Location of local servers for Domain Name Service, printing, time of day, …

• The problem:

– How to “discover” the local custom values when system is initialized?

175

Protocol Configuration Initialization

• Example: plugging your laptop into a data port in the SITE cafeteria tables

• You do not want to have to configure your system; you want to start using the Internet right away

• Problem:

– What address do you use to find an address?

176

Types of Address Discovery

• Fixed:

– Host is assigned a permanent set of addresses for IP, hardware, etc.

– Protocol software needs to find these parameters during initialization, either locally or from a server.

– Required for “well-known” locations (e.g. web server)

• Dynamic

– Host uses a temporary IP address obtained from a server for a specified period of time.

– Addresses are allocated from an available pool

– Examples: ISP dial-up connection, cafeteria data ports

177

Protocol Initialization

• Local, fixed option: manual configuration of IP address.

• Reverse Address Resolution Protocol (RARP)– ARP: Given IP address, find hardware address– RARP: Given hardware address, obtain IP address

– Needs fixed hardware address in network interface card (e.g. Ethernet)

• RARP request for IP address is broadcast over network.

• After obtaining an IP address, the next step is to find a router.– To do this, we need the subnet mask of the network, so

that we can find a router on the same network.– Broadcast ICMP “Address Mask Request” message– Reply contains IP mask– Broadcast ICMP “Gateway discovery” message

178

Dynamic Address Allocation

• Each host obtains a “lease” for an IP address assigned from a pool.

– Provisioning challenge: how large should the pool of IP addresses be for customer base?

• Lease has expiry time

– Lease can be renewed before expiry

– On expiry, IP address is returned to the available pool.

179

DHCP: Dynamic Host Configuration Protocol

• Defined in RFC 2131

• Protocol to automatically:

– Assign an IP address from a pool of available addresses– Assignment can be permanent or temporary– Temporary assignment (a “lease”) will have an expiry

time.

– Locate a server

– Locate a router

– Get the name of a server

• Relies on special IP addresses:

– IP address 0.0.0.0: used to send messages while obtaining IP address

– IP address 255.255.255.255: local network broadcast

180

DHCP Message Format

0 8 16 24 31Bits

Message type HW addr. type

Seconds elapsed Broadcast flag and 15 zeros

Header length Hops to server

Client IP address (if renewing)

“Your new” IP address

Reboot Server IP address

Router IP address

Client Hardware address (16 octets)

Server host name (64 octets)

Reboot file name (128 octets)

Transaction ID

Options (variable)

181

DHCP Message Types

• (not a complete list)

• Discover: request from client to find servers (broadcast)

• Offer: server reply to discover, with offer of configuration parameters (broadcast, possibly by more than one server)

• Request: confirmation of offer, sent from client to specific server

• Acknowledgement: configuration parameters issued by server to client

• Release: client returns allocations to server and cancels lease