ip networks

240
13PIT101 Multimedia Communication & Networks UNIT - I Dr.A.Kathirvel Professor & Head/IT - VCEW

Upload: ayyakathir

Post on 01-Nov-2014

1.206 views

Category:

Education


4 download

DESCRIPTION

MULTIMEDIA COMMUNICATION & NETWORKS

TRANSCRIPT

Page 1: IP NETWORKS

13PIT101

Multimedia Communication & Networks

UNIT - I

Dr.A.Kathirvel

Professor & Head/IT - VCEW

Page 2: IP NETWORKS

Unit - I

Open Data Network Model – Narrow Waist Model of the

Internet - Success and Limitations of the Internet – Suggested

Improvements for IP and TCP – Significance of UDP in

modern Communication – Network level Solutions – End to

End Solutions – Best Effort service model – Scheduling and

Dropping policies for Best Effort Service model

Page 3: IP NETWORKS

Open Data Network Models

Page 4: IP NETWORKS

LAYERED TASKS

We use the concept of layers in our daily life. As an example, let us consider

two friends who communicate through postal mail. The process of sending a

letter to a friend would be complex if there were no services available from

the post office.

Sender, Receiver, and Carrier

Hierarchy

Topics discussed in this section:

Page 5: IP NETWORKS

Figure 1 Tasks involved in sending a letter

Page 6: IP NETWORKS

THE OSI MODEL

Established in 1947, the International Standards Organization (ISO) is a

multinational body dedicated to worldwide agreement on international

standards. An ISO standard that covers all aspects of network communications

is the Open Systems Interconnection (OSI) model. It was first introduced in the

late 1970s.

Layered Architecture

Peer-to-Peer Processes

Encapsulation

Topics discussed in this section:

Page 7: IP NETWORKS

ISO is the organization.

OSI is the model.

Note

Page 8: IP NETWORKS

Figure 2 Seven layers of the OSI model

Page 9: IP NETWORKS

Figure 3 The interaction between layers in the OSI model

Page 10: IP NETWORKS

Figure 4 An exchange using the OSI model

Page 11: IP NETWORKS

LAYERS IN THE OSI MODEL

In this section we briefly describe the functions of each layer in

the OSI model.

Physical Layer

Data Link Layer

Network Layer

Transport Layer

Session Layer

Presentation Layer

Application Layer

Topics discussed in this section:

Page 12: IP NETWORKS

Figure 5 Physical layer

Page 13: IP NETWORKS

The physical layer is responsible for movements of

individual bits from one hop (node) to the next.

Note

Page 14: IP NETWORKS

Figure 6 Data link layer

Page 15: IP NETWORKS

The data link layer is responsible for moving

frames from one hop (node) to the next.

Note

Page 16: IP NETWORKS

Figure 7 Hop-to-hop delivery

Page 17: IP NETWORKS

Figure 8 Network layer

Page 18: IP NETWORKS

The network layer is responsible for the

delivery of individual packets from

the source host to the destination host.

Note

Page 19: IP NETWORKS

Figure 9 Source-to-destination delivery

Page 20: IP NETWORKS

Figure 10 Transport layer

Page 21: IP NETWORKS

The transport layer is responsible for the delivery

of a message from one process to another.

Note

Page 22: IP NETWORKS

Figure 11 Reliable process-to-process delivery of a message

Page 23: IP NETWORKS

Figure 12 Session layer

Page 24: IP NETWORKS

The session layer is responsible for dialog

control and synchronization.

Note

Page 25: IP NETWORKS

Figure 13 Presentation layer

Page 26: IP NETWORKS

The presentation layer is responsible for translation,

compression, and encryption.

Note

Page 27: IP NETWORKS

Figure 14 Application layer

Page 28: IP NETWORKS

The application layer is responsible for

providing services to the user.

Note

Page 29: IP NETWORKS

Figure 15 Summary of layers

Page 30: IP NETWORKS

TCP/IP PROTOCOL SUITE

The layers in the TCP/IP protocol suite do not exactly match those

in the OSI model. The original TCP/IP protocol suite was defined

as having four layers: host-to-network, internet, transport, and

application. However, when TCP/IP is compared to OSI, we can

say that the TCP/IP protocol suite is made of five layers: physical,

data link, network, transport, and application.

Physical and Data Link Layers

Network Layer

Transport Layer

Application Layer

Topics discussed in this section:

Page 31: IP NETWORKS

Figure 16 TCP/IP and OSI model

Page 32: IP NETWORKS

ADDRESSING

Four levels of addresses are used in an internet employing the TCP/IP

protocols: physical, logical, port, and specific.

Physical Addresses

Logical Addresses

Port Addresses

Specific Addresses

Topics discussed in this section:

Page 33: IP NETWORKS

Figure 17 Addresses in TCP/IP

Page 34: IP NETWORKS

Figure 18 Relationship of layers and addresses in TCP/IP

Page 35: IP NETWORKS

In Figure 19 a node with physical address 10 sends a frame to a

node with physical address 87. The two nodes are connected by

a link (bus topology LAN). As the figure shows, the computer

with physical address 10 is the sender, and the computer with

physical address 87 is the receiver.

Example 1

Page 36: IP NETWORKS

Figure 19 Physical addresses

Page 37: IP NETWORKS

Most local-area networks use a 48-bit (6-byte) physical

address written as 12 hexadecimal digits; every byte (2

hexadecimal digits) is separated by a colon, as shown below:

Example 2

07:01:02:01:2C:4B

A 6-byte (12 hexadecimal digits) physical address.

Page 38: IP NETWORKS

Figure 20 shows a part of an internet with two routers connecting

three LANs. Each device (computer or router) has a pair of

addresses (logical and physical) for each connection. In this case,

each computer is connected to only one link and therefore has

only one pair of addresses. Each router, however, is connected to

three networks (only two are shown in the figure). So each router

has three pairs of addresses, one for each connection.

Example 3

Page 39: IP NETWORKS

Figure 20 IP addresses

Page 40: IP NETWORKS

Figure 21 shows two computers communicating via the Internet.

The sending computer is running three processes at this time

with port addresses a, b, and c. The receiving computer is

running two processes at this time with port addresses j and k.

Process a in the sending computer needs to communicate with

process j in the receiving computer. Note that although physical

addresses change from hop to hop, logical and port addresses

remain the same from the source to destination.

Example 4

Page 41: IP NETWORKS

Figure 21 Port addresses

Page 42: IP NETWORKS

The physical addresses will change from hop to hop,

but the logical addresses usually remain the same.

Note

Page 43: IP NETWORKS

Example 5

A port address is a 16-bit address represented by one decimal

number as shown.

753

A 16-bit port address represented

as one single number.

Page 44: IP NETWORKS

Narrow waist Model of the

Internet

Page 45: IP NETWORKS

Fundamental Goal

• “technique for multiplexed utilization of existing

interconnected networks”

• Multiplexing (sharing)

– Shared use of a single communications channel

• Existing networks (interconnection)

Page 46: IP NETWORKS

Fundamental Goal: Sharing

• No connection setup

• Forwarding based on destination address in packet

• Efficient sharing of resources

Tradeoff: Resource management potentially more difficult.

Packet Switching

Page 47: IP NETWORKS

Type of Packet Switching: Datagrams

• Information for forwarding traffic is contained in destination address of packet

• No state established ahead of time (helps fate sharing)

• Basic building block

• Minimal assumption about network service

Alternatives • Circuit Switching: Signaling protocol sets up entire path out-of-band. (cf.

the phone network)

• Virtual Circuits: Hybrid approach. Packets carry “tags” to indicate path, forwarding over IP

• Source routing: Complete route is contained in each data packet

Page 48: IP NETWORKS

An Age-Old Debate

• Resource control, accounting, ability to “pin” paths, etc.

It is held that packet switching was one of the Internet’s greatest design choices.

Of course, there are constant attempts to shoehorn the best aspects of

circuits into packet switching.

Examples: Capabilities, MPLS, ATM, IntServ QoS, etc.

Circuit Switching

Packet Switching

• Sharing of resources, soft state (good resilience properties),

etc.

Page 49: IP NETWORKS

Stopping Unwanted Traffic is Hard February 2000 March 2006

Page 50: IP NETWORKS

Research: Stopping Unwanted Traffic

• Datagram networks: easy for anyone to send traffic to anyone

else…even if they don’t want it!

Possible Defenses

• Monitoring + Filtering: Detect DoS attack and install filters to

drop traffic.

• Capabilities: Only accept traffic that carries a “capability”

cnn.com

Page 51: IP NETWORKS

“This set of goals might seem to be nothing more than a checklist of all the desirable network features.

It is important to understand that these goals are in

order of importance, and an entirely different

network architecture would result if the order

were changed.”

The Design Goals of Internet, v1

• Interconnection/Multiplexing (packet switching)

• Resilience/Survivability (fate sharing)

• Heterogeneity

– Different types of services

– Different types of networks

• Distributed management

• Cost effectiveness

• Ease of attachment

• Accountability

These goals were prioritized for a military network.

Should priorities change as the network evolves?

Decreasing

Priority

Page 52: IP NETWORKS

Fundamental Goal: Interconnection

• Need to interconnect many existing networks

• Hide underlying technology from applications

• Decisions:

– Network provides minimal functionality

– “Narrow waist”

Tradeoff: No assumptions, no guarantees.

Technology

Applications

email WWW phone...

SMTP HTTP RTP...

TCP UDP…

IP

ethernet PPP…

CSMA async sonet...

copper fiber radio...

Page 53: IP NETWORKS

The Internet Protocol Suite

53

UDP TCP

Data Link

Physical

Applications

The Hourglass Model

Waist

The waist facilitates interoperability

FTP HTTP TFTP DNS

TCP UDP

IP

Ethernet SONET 802.11

Page 54: IP NETWORKS

The “Curse of the Narrow Waist”

• IP over anything, anything over IP

– Has allowed for much innovation both above and below the

IP layer of the stack

– An IP stack gets a device on the Internet

• Drawback: very difficult to make changes to IP

– But…people are trying – NSF GENI project: http://www.geni.net/

Page 55: IP NETWORKS

Interconnection: “Gateways”

• Interconnect heterogeneous networks

• No state about ongoing connections

– Stateless packet switches

• Generally, router == gateway

• But, we can think of your home router/NAT as also performing the function

of a gateway

Home

Network Internet

192.168.1.51

192.168.1.52

68.211.6.120:50878

68.211.6.120:50879

Page 56: IP NETWORKS

Network Address Translation

• For outbound traffic, the gateway:

– Creates a table entry for computer's local IP address and port number

– Replaces the sending computer's non-routable IP address with the gateway IP address.

– replaces the sending computer's source port

• For inbound traffic, the gateway:

– checks the destination port on the packet

– rewrites the destination address and destination port those in the table and forwards traffic to local machine

Page 57: IP NETWORKS

NAT Traversal

• Problem: Machines behind NAT not globally addressable or routable.

Can’t initiate inbound connections. • One solution: Simple Traversal of UDP Through NATs

– STUN client contacts STUN server

– STUN server tells client which IP/Port the NAT mapped it to

– STUN client uses that IP/Port for call establishment/incoming

messages

More next time.

Home

Network 1

Home

Network 2 Relay node

Page 58: IP NETWORKS

Goal #2: Survivability

• Network should continue to work, even if some devices fail, are

compromised, etc.

• Failures on the Abilene (Internet 2) backbone network over the course of 6

months

Thanks to Yiyi Huang

How well does the current Internet support

survivability?

Page 59: IP NETWORKS

Goal #2: Survivability

• Replication

– Keep state at multiple places in the network, recover when nodes crash

• Fate-sharing

– Acceptable to lose state information for some entity if the entity itself is

lost

Two Options

Reasons for Fate Sharing

• Can support arbitrarily complex failure scenarios

• Engineering is easier

Some reversals of this trend:

NAT, Routing Control Platform

Page 60: IP NETWORKS

Goal #3: Heterogeneous Services

• TCP/IP designed as a monolithic transport

– TCP for flow control, reliable delivery

– IP for forwarding

• Became clear that not every type of application would need reliable, in-order delivery

– Example: Voice and video over networks

– Example: DNS

– Why don’t these applications require reliable, in-order delivery?

– Narrow waist: allowed proliferation of transport protocols

Page 61: IP NETWORKS

Topic: Voice and Video over Networks

Loss i A chor Fra e (I-Frame) Propagates to Depe de t Fra es

(P and B-Frames)

• Deadlines: Timeliness more important than 100% reliability.

• Propagation of errors: Some losses more devastating than others

Page 62: IP NETWORKS

Goal #3b: Heterogeneous Networks

• Build minimal functionality into the network

– No need to re-engineer for each type of network

• “Best effort” service model. – Lost packets

– Out-of-order packets

– No quality guarantees

– No information about failures, performance, etc.

Tradeoff: Network management more difficult

Page 63: IP NETWORKS

Research: Network Anomaly Detection

• Operators want to detect when a traffic flow from ingress to egress generates a “spike”.

• Problem: Today’s protocols don’t readily expose this information.

• Management/debuggability not initially a high priority!

Page 64: IP NETWORKS

Goal #4: Distributed Management

• Addressing (ARIN, RIPE, APNIC, etc.)

– Though this was recently threatened.

• Naming (DNS)

• Routing (BGP)

Many examples:

No single entity in charge.

Allows for organic growth, scalable management.

Tradeoff: No one party has visibility/control.

Page 65: IP NETWORKS

No Owner, No Responsible Party

• Hard to figure out who/what’s causing a problem

• Worse yet, local actions have global effects…

“Some of the most significant problems with the Internet today relate to

lack of sufficient tools for distributed management, especially in the area of

routing.”

Page 66: IP NETWORKS

Local Actions, Global Consequences

“…a glitch at a small ISP… triggered a major outage in Internet access across

the country. The problem started when MAI Network Services...passed bad

router information from one of its customers onto Sprint.”

-- news.com, April 25, 1997

Page 67: IP NETWORKS

Goal #5: Cost Effectiveness

• Packet headers introduce high overhead

• End-to-end retransmission of lost packets

– Potentially wasteful of bandwidth by placing burden on the

edges of the network

Arguably a good tradeoff. Current trends are to exploit

redundancy even more.

Page 68: IP NETWORKS

Goal #6: Ease of Attachment

• IP is “plug and play” Anything with a working IP stack can connect to the Internet (hourglass model)

• A huge success!

– Lesson: Lower the barrier to innovation/entry and people

will get creative (e.g., Cerf and Kahn probably did not

think about IP stacks on phones, sensors, etc.)

• But….

Tradeoff: Burden on end systems/programmers.

Page 69: IP NETWORKS

Goal #7: Accountability

• Note: Accountability mentioned in early papers on TCP/IP, but

not prioritized

• Datagram networks make accounting tricky.

– The phone network has had an easier time figuring out

billing

– Payments/billing on the Internet is much less precise

Tradeoff: Broken payment models and incentives.

Page 70: IP NETWORKS

Success and Limitations of the Internet

• Success of Internet

– e-com, Internet Marketing etc..

• The quality of information resources might not always be

reliable and accurate.

• Searching of information can be very tedious.

• Internet is definetly not 100% secure.

• Performance and speed are the main limitations to today's

Internet

Page 71: IP NETWORKS

71

Transport Protocols

• Provide logical communication between

application processes running on

different hosts

• Run on end hosts

– Sender: breaks application messages

into segments,

and passes to network layer

– Receiver: reassembles segments into

messages, passes to application layer

• Multiple transport protocol available to

applications

– Internet: TCP and UDP

application transport network data link physical

application transport network data link physical

network data link physical

network data link physical

network data link physical

network data link physical

network data link physical

Page 72: IP NETWORKS

72

Internet Transport Protocols

• Datagram messaging service (UDP)

– No-frills extension of “best-effort” IP

• Reliable, in-order delivery (TCP)

– Connection set-up

– Discarding of corrupted packets

– Retransmission of lost packets

– Flow control

– Congestion control (next lecture)

• Other services not available

– Delay guarantees

– Bandwidth guarantees

Page 73: IP NETWORKS

73

Multiplexing and Demultiplexing

• Host receives IP datagrams

– Each datagram has source and

destination IP address,

– Each datagram carries one

transport-layer segment

– Each segment has source and

destination port number

• Host uses IP addresses and port

numbers to direct the segment to

appropriate socket

source port # dest port #

32 bits

application data

(message)

other header fields

TCP/UDP segment format

Page 74: IP NETWORKS

74

Unreliable Message Delivery Service

• Lightweight communication between processes

– Avoid overhead and delays of ordered, reliable delivery

– Send messages to and receive them from a socket

• User Datagram Protocol (UDP)

– IP plus port numbers to support (de)multiplexing

– Optional error checking on the packet contents

SRC port DST port

checksum length

DATA

Page 75: IP NETWORKS

75

Why Would Anyone Use UDP?

• Finer control over what data is sent and when

– As soon as an application process writes into the socket

– … UDP will package the data and send the packet • No delay for connection establishment

– UDP just blasts away without any formal preliminaries

– … which avoids introducing any unnecessary delays

• No connection state

– No allocation of buffers, parameters, sequence #s, etc.

– … making it easier to handle many active clients at once

• Small packet header overhead

– UDP header is only eight-bytes long

Page 76: IP NETWORKS

76

Popular Applications That Use UDP

• Multimedia streaming

– Retransmitting lost/corrupted packets is not worthwhile

– By the time the packet is retransmitted, it’s too late

– E.g., telephone calls, video conferencing, gaming

• Simple query protocols like Domain Name System

– Overhead of connection establishment is overkill

– Easier to have application retransmit if needed

“Address for www.cnn.com?”

“12.3.4.15”

Page 77: IP NETWORKS

77

Transmission Control Protocol (TCP)

• Connection oriented

– Explicit set-up and tear-down of TCP session

• Stream-of-bytes service

– Sends and receives a stream of bytes, not messages

• Reliable, in-order delivery

– Checksums to detect corrupted data

– Acknowledgments & retransmissions for reliable delivery

– Sequence numbers to detect losses and reorder data

• Flow control

– Prevent overflow of the receiver’s buffer space

• Congestion control

– Adapt to network congestion for the greater good

Page 78: IP NETWORKS

78

An Analogy: Talking on a Cell Phone

• Alice and Bob on their cell phones

– Both Alice and Bob are talking

• What if Alice couldn’t understand Bob?

– Bob asks Alice to repeat what she said

• What if Bob hasn’t heard Alice for a while?

– Is Alice just being quiet?

– Or, have Bob and Alice lost reception?

– How long should Bob just keep on talking?

– Maybe Alice should periodically say “uh huh”

– … or Bob should ask “Can you hear me now?”

Page 79: IP NETWORKS

79

Some Take-Aways from the Example

• Acknowledgments from receiver

– Positive: “okay” or “ACK”

– Negative: “please repeat that” or “NACK”

• Timeout by the sender (“stop and wait”) – Don’t wait indefinitely without receiving some response

– … whether a positive or a negative acknowledgment • Retransmission by the sender

– After receiving a “NACK” from the receiver – After receiving no feedback from the receiver

Page 80: IP NETWORKS

80

Challenges of Reliable Data Transfer

• Over a perfectly reliable channel

– All of the data arrives in order, just as it was sent

– Simple: sender sends data, and receiver receives data

• Over a channel with bit errors

– All of the data arrives in order, but some bits corrupted

– Receiver detects errors and says “please repeat that”

– Sender retransmits the data that were corrupted

• Over a lossy channel with bit errors

– Some data are missing, and some bits are corrupted

– Receiver detects errors but cannot always detect loss

– Sender must wait for acknowledgment (“ACK” or “OK”) – … and retransmit data after some time if no ACK arrives

Page 81: IP NETWORKS

81

TCP Support for Reliable Delivery

• Checksum

– Used to detect corrupted data at the receiver

– …leading the receiver to drop the packet • Sequence numbers

– Used to detect missing data

– ... and for putting the data back in order

• Retransmission

– Sender retransmits lost or corrupted data

– Timeout based on estimates of round-trip time

– Fast retransmit algorithm for rapid retransmission

Page 82: IP NETWORKS

82

TCP Segments

Page 83: IP NETWORKS

83

TCP “Stream of Bytes” Service

Host A

Host B

Page 84: IP NETWORKS

84

…Emulated Using TCP “Segments”

Host A

Host B

TCP Data

TCP Data

Segment sent when: 1. Segment full (Max Segment Size), 2. Not full, but times out, or 3. “Pushed” by application.

Page 85: IP NETWORKS

85

TCP Segment

• IP packet

– No bigger than Maximum Transmission Unit (MTU)

– E.g., up to 1500 bytes on an Ethernet

• TCP packet

– IP packet with a TCP header and data inside

– TCP header is typically 20 bytes long

• TCP segment

– No more than Maximum Segment Size (MSS) bytes

– E.g., up to 1460 consecutive bytes from the stream

IP Hdr IP Data

TCP Hdr TCP Data (segment)

Page 86: IP NETWORKS

86

Sequence Numbers

Host A

Host B

TCP Data

TCP Data

TCP HDR

TCP HDR

ISN (initial sequence number)

Sequence number = 1st byte ACK sequence

number = next expected byte

Page 87: IP NETWORKS

87

Initial Sequence Number (ISN)

• Sequence number for the very first byte

– E.g., Why not a de facto ISN of 0?

• Practical issue

– IP addresses and port #s uniquely identify a connection

– Eventually, though, these port #s do get used again

– … and there is a chance an old packet is still in flight – … and might be associated with the new connection

• So, TCP requires changing the ISN over time

– Set from a 32-bit clock that ticks every 4 microseconds

– … which only wraps around once every 4.55 hours! • But, this means the hosts need to exchange ISNs

Page 88: IP NETWORKS

88

TCP Three-Way Handshake

Page 89: IP NETWORKS

89

Establishing a TCP Connection

• Three-way handshake to establish connection

– Host A sends a SYN (open) to the host B

– Host B returns a SYN acknowledgment (SYN ACK)

– Host A sends an ACK to acknowledge the SYN ACK

A B

Each host tells its

ISN to the other

host.

Page 90: IP NETWORKS

90

TCP Header

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags: SYN FIN RST PSH URG ACK

Page 91: IP NETWORKS

91

Step 1: A’s Initial SYN Packet

A’s port B’s port

A’s Initial Sequence Number

Acknowledgment

Advertised window 20 Flags 0

Checksum Urgent pointer

Options (variable)

Flags: SYN FIN RST PSH URG ACK

A tells B it wants to open a connection…

Page 92: IP NETWORKS

92

Step 2: B’s SYN-ACK Packet

B’s port A’s port

B’s Initial Sequence Number

A’s ISN plus 1

Advertised window 20 Flags 0

Checksum Urgent pointer

Options (variable)

Flags: SYN FIN RST PSH URG ACK

B tells A it accepts, and is ready to hear the next byte…

… upon receiving this packet, A can start sending data

Page 93: IP NETWORKS

93

Step 3: A’s ACK of the SYN-ACK

A’s port B’s port

B’s ISN plus 1

Advertised window 20 Flags 0

Checksum Urgent pointer

Options (variable)

Flags: SYN FIN RST PSH URG ACK

A tells B it wants is okay to start sending

Sequence number

… upon receiving this packet, B can start sending data

Page 94: IP NETWORKS

94

What if the SYN Packet Gets Lost?

• Suppose the SYN packet gets lost

– Packet is lost inside the network, or

– Server rejects the packet (e.g., listen queue is full)

• Eventually, no SYN-ACK arrives

– Sender sets a timer and wait for the SYN-ACK

– … and retransmits the SYN-ACK if needed

• How should the TCP sender set the timer?

– Sender has no idea how far away the receiver is

– Hard to guess a reasonable length of time to wait

– Some TCPs use a default of 3 or 6 seconds

Page 95: IP NETWORKS

95

SYN Loss and Web Downloads

• User clicks on a hypertext link

– Browser creates a socket and does a “connect”

– The “connect” triggers the OS to transmit a SYN

• If the SYN is lost…

– The 3-6 seconds of delay may be very long

– The user may get impatient

– … and click the hyperlink again, or click “reload”

• User triggers an “abort” of the “connect”

– Browser creates a new socket and does a “connect”

– Essentially, forces a faster send of a new SYN packet!

– Sometimes very effective, and the page comes fast

Page 96: IP NETWORKS

96

TCP Retransmissions

Page 97: IP NETWORKS

97

Automatic Repeat reQuest (ARQ)

Time

Tim

eout

• Automatic Repeat Request

– Receiver sends acknowledgment

(ACK) when it receives packet

– Sender waits for ACK and timeouts if it

does not arrive within some time period

• Simplest ARQ protocol

– Stop and wait

– Send a packet, stop and wait until ACK

arrives

Sender Receiver

Page 98: IP NETWORKS

98

Reasons for Retransmission

Tim

eout

Tim

eout

Tim

eout

Tim

eout

Tim

eout

Tim

eout

ACK lost

DUPLICATE PACKET

Packet lost Early timeout

DUPLICATE PACKETS

Page 99: IP NETWORKS

99

How Long Should Sender Wait?

• Sender sets a timeout to wait for an ACK

– Too short: wasted retransmissions

– Too long: excessive delays when packet lost

• TCP sets timeout as a function of the RTT

– Expect ACK to arrive after an RTT

– … plus a fudge factor to account for queuing

• But, how does the sender know the RTT?

– Can estimate the RTT by watching the ACKs

– Smooth estimate: keep a running average of the RTT

• EstimatedRTT = a * EstimatedRTT + (1 –a ) * SampleRTT

– Compute timeout: TimeOut = 2 * EstimatedRTT

Page 100: IP NETWORKS

100

Example RTT Estimation RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Page 101: IP NETWORKS

101

A Flaw in This Approach

• An ACK doesn’t really acknowledge a transmission

– Rather, it acknowledges receipt of the data

• Consider a retransmission of a lost packet

– If you assume the ACK goes with the 1st transmission

– … the SampleRTT comes out way too large

• Consider a duplicate packet

– If you assume the ACK goes with the 2nd transmission

– … the Sample RTT comes out way too small • Simple solution in the Karn/Partridge algorithm

– Only collect samples for segments sent one single time

Page 102: IP NETWORKS

102

Yet Another Limitation…

• Doesn’t consider variance in the RTT

– If variance is small, the EstimatedRTT is pretty accurate

– … but, if variance is large, the estimate isn’t all that good

• Better to directly consider the variance

– Consider difference: SampleRTT – EstimatedRTT

– Boost the estimate based on the difference

• Jacobson/Karels algorithm

– See Section 5.2 of the Peterson/Davie book for details

Page 103: IP NETWORKS

103

TCP Sliding Window

Page 104: IP NETWORKS

104

Motivation for Sliding Window

• Stop-and-wait is inefficient

– Only one TCP segment is “in flight” at a time

– Especially bad when delay-bandwidth product is high

• Numerical example

– 1.5 Mbps link with a 45 msec round-trip time (RTT)

• Delay-bandwidth product is 67.5 Kbits (or 8 KBytes)

– But, sender can send at most one packet per RTT

• Assuming a segment size of 1 KB (8 Kbits)

• … leads to 8 Kbits/segment / 45 msec/segment 182 Kbps

• That’s just one-eighth of the 1.5 Mbps link capacity

Page 105: IP NETWORKS

105

Sliding Window • Allow a larger a ou t of data i flight

– Allow sender to get ahead of the receiver

– … though ot too far ahead

Page 106: IP NETWORKS

106

Receiver Buffering

• Window size

– Amount that can be sent without acknowledgment

– Receiver needs to be able to store this amount of data

• Receiver advertises the window to the receiver

– Tells the receiver the amount of free space left

– … and the sender agrees not to exceed this amount

Window Size

Outstanding Un-ack’d data

Data OK to send

Data not OK to send yet

Data ACK’d

Page 107: IP NETWORKS

107

TCP Header for Receiver Buffering

Source port Destination port

Sequence number

Acknowledgment

Advertised window HdrLen Flags 0

Checksum Urgent pointer

Options (variable)

Data

Flags: SYN FIN RST PSH URG ACK

Page 108: IP NETWORKS

108

Fast Retransmission

Page 109: IP NETWORKS

109

Timeout is Inefficient • Timeout-based retransmission

– Sender transmits a packet and waits until timer expires

– … and then retransmits from the lost packet onward

Page 110: IP NETWORKS

110

Fast Retransmission

• Better solution possible under sliding window

– Although packet n might have been lost

– … packets n+1, n+2, and so on might get through

• Idea: have the receiver send ACK packets

– ACK says that receiver is still awaiting nth packet

• And repeated ACKs suggest later packets have arrived

– Sender can view the “duplicate ACKs” as an early hint • … that the nth packet must have been lost

• … and perform the retransmission early

• Fast retransmission

– Sender retransmits data after the triple duplicate ACK

Page 111: IP NETWORKS

111

Effectiveness of Fast Retransmit

• When does Fast Retransmit work best?

– Long data transfers

• High likelihood of many packets in flight

– High window size

• High likelihood of many packets in flight

– Low burstiness in packet losses

• Higher likelihood that later packets arrive successfully

• Implications for Web traffic

– Most Web transfers are short (e.g., 10 packets)

• Short HTML files or small images

– So, often there aren’t many packets in flight – … making fast retransmit less likely to “kick in”

– Forcing users to like “reload” more often…

Page 112: IP NETWORKS

112

Tearing Down the Connection

Page 113: IP NETWORKS

113

Tearing Down the Connection

• Closing the connection

– Finish (FIN) to close and receive remaining bytes

– And other host sends a FIN ACK to acknowledge

– Reset (RST) to close and not receive remaining bytes

time A

B

Page 114: IP NETWORKS

114

Sending/Receiving the FIN Packet

• Sending a FIN: close()

– Process is done sending

data via the socket

– Process invokes

“close()” to close the socket

– Once TCP has sent all of

the outstanding bytes…

– … then TCP sends a FIN

• Receiving a FIN: EOF

– Process is reading data

from the socket

– Eventually, the attempt

to read returns an EOF

Page 115: IP NETWORKS

2.115

Suggested improvement for

IP and TCP

Page 116: IP NETWORKS

• The TCP/IP data path has improved pathlength and scalability, and it provides

virtual storage constraint relief. Communications Server does the following:

• Reduces extended common storage area (ECSA) consumption for TCP/IP

workloads

• Communications Server housed portions of inbound datagrams in ECSA, and in

certain circumstances, system outages caused by ECSA usage spikes could occur.

Communications Server does not use ECSA to hold inbound IP traffic.

• Reduces system pathlength for the TCP/IP data path. This results in more efficient

TCP/IP communications (potentially lower utilization of the LPAR), and can lead to

improved network response time if the z/OS image is currently MIPs-constrained.

• Improves scalability.

• The UDP layer is enhanced to enable more efficient processing of incoming

datagrams when an application has multiple threads concurrently reading datagrams

from the same datagram socket. With this enhancement, the UDP layer now wakes

up only a single thread to process an incoming datagram, which reduces overhead

by avoiding the unnecessary resumption and suspension of multiple threads for

every incoming datagram.

Page 117: IP NETWORKS

2.117

Significance of UDP in

modern communication

Page 118: IP NETWORKS

• In situations where your really want to get a simple answer to another server

quickly, UDP works best. In general, you want the answer to be in one response

packet, and you are prepared to implement your own protocol for reliability or

resends. DNS is the perfect description of this use case. The costs of connection

setups are way to high (yet, DNS does support a TCP mode as well).

• Another case is when you are delivering data that can be lost because newer data

coming in will replace that previous data/state. Weather data, video streaming, a

stock quotation service (not used for actual trading), or gaming data come to mind.

• Another case is when you are managing a tremendous amount of state and you want

to avoid using TCP because the OS cannot handle that many sessions. This is a rare

case today. In fact, there are now user-land TCP stacks that can be used so that the

application writer may have finer grained control over the resources needed for that

TCP state. Prior to 2003, UDP was really the only game in town.

• One other case is for multicast traffic. UDP can be multicasted to multiple hosts

whereas TCP cannot do this at all.

Page 119: IP NETWORKS

Telecommunications

• Tele (Far) + Communications

• Early telecommunications

– smoke signals and drums

– visual telegraphy (or semaphore in 1792)

• Telegraph and telephone

– Telegraph (1839)

– Telephone (1876)

• Radio and television

• Telephony

– Voice and Data

Page 120: IP NETWORKS

Communications and Networks

• Data Communications

– Transmission of signals

• Encoding, interfacing, signal integrity, multiplexing

etc.

• Networking

– Topology & architecture used to interconnect devices

• Networks of communication systems

Page 121: IP NETWORKS

Network Trends (1980-Present)

Microcontroller Networking

Wireless

Voice, Image, Data, Video

Integrated Systems!

Microcontroller

Page 122: IP NETWORKS

Communication Systems

• Process describing transfer of information, data, instructions between one or more systems through some media

– Examples

• people, computers, cell phones, etc.

• Computer communication systems

• Signals passing through the communication channel can be Digital, or analog

– Analog signals: continuous electrical waves

– Digital signals: individual electrical pulses (bits)

• Receivers and transmitters: desktop computers, mainframe computers, etc.

TX

RX

RX

RX

Communication channel

Communication

media

Amp/Adaptor

Page 123: IP NETWORKS

Communication Systems

Page 124: IP NETWORKS

Communications Components

• Basic components of a

communication system

– Communication technologies

– Communication devices

– Communication channels

– Communication software

Page 125: IP NETWORKS

A Communications Model

Page 126: IP NETWORKS

Communications Tasks

Transmission system utilization Addressing

Interfacing Routing

Signal generation Recovery

Synchronization Message formatting

Exchange management Security

Error detection and correction Network management

Flow control

Page 127: IP NETWORKS

Data Communications Model

Page 128: IP NETWORKS

Communication Technology Applications

voice mail Twitter

e-mail instant

messaging chat rooms

newsgroups telephony videoconferencing

collaboration groupware global positioning system (GPS)

Page 129: IP NETWORKS

Communication Technologies - Applications

• Different technologies allowing us to communicate

– Examples: Voice mail, fax, email, instant message, chat rooms,

news groups, telephony, GPS, and more

• Voice mail: Similar to answering machine but digitized

• Fax: Sending hardcopy of text or photographs between computers

using fax modem

• Email: electronic mail – sending text, files, images between different

computer networks - must have email software

– More than 1.3 billion people send 244 billion messages monthly!

• Chat rooms: Allows communications in real time when connected to

the Internet

Page 130: IP NETWORKS

Communication Technologies – Applications

(cont)

• Telephony: Talking to other people over the Internet (also called VoIP)

– Sends digitized audio signals over the Internet

– Requires Internet telephone software

• Groupware: Software application allowing a group of people to communicate with each other (exchange data)

– Address book, appointment book, schedules, etc.

• GPS: consists of receivers connected to satellite systems

– Determining the geographical location of the receiver

– Used for cars, advertising, hiking, tracking, etc.

Page 131: IP NETWORKS

Communication Devices

• Any type of hardware capable of transmitting data,

instructions, and information between devices

– Functioning as receiver, transmitter, adaptor, converter

– Basic characteristics: How fast, how far, how much data!

• Examples: Dial-up modems, ISDN, DSL modems, network

interface cards

Page 132: IP NETWORKS

Communication Devices(Cont)

– Dial-up modem: uses standard phone lines

• Converts digital information into analog

• Consists of a modulator and a demodulator

• Can be external, internal, wireless

– ISDN and DSL Modem: Allows digital communication between networks and computers

• Requires a digital modem

• Digital is better than analog – why?

– Cable modem: a modem that transmits and receives data over the cable television (CATV) network

• Also called broadband modem (carrying multiple signals)

• The incoming signal is split

• Requires a cable modem

– Network interface cards: Adaptor cards residing in the computer to transmit and receiver data over the network (NIC)

• Operate with different network technologies (e.g., Ethernet)

Page 133: IP NETWORKS

Communication Software

• Examples of applications (Layer 7) take advantage of the transport (Layer 4) services of TCP and UDP

– Hypertext Transfer Protocol (HTTP): A client/server application that uses TCP for transport to retrieve HTML pages.

– Domain Name Service (DNS): A name-to-address translation application that uses both TCP and UDP transport.

– Telnet: A virtual terminal application that uses TCP for transport.

– File Transport Protocol (FTP): A file transfer application that uses TCP for transport.

– Trivial File Transfer Protocol (TFTP): A file transfer application that uses UDP for transport.

– Network Time Protocol (NTP): An application that synchronizes time with a time source and uses UDP for transport.

– Border Gateway Protocol (BGP): An exterior gateway routing protocol that uses TCP for transport. BGP is used to exchange routing information for the Internet and is the protocol used between service providers.

Page 134: IP NETWORKS

Communication Channels

• A channel is a path between two communication devices

• Channel capacity: How much data can be passed through the channel

(bit/sec)

– Also called channel bandwidth

– The smaller the pipe the slower data transfer!

• Consists of one or more transmission media

– Materials carrying the signal

– Two types:

• Physical: wire cable

• Wireless: Air destinatio

n network

server

T1

lines

T1

lines

T1

lines

T3

lines

Page 135: IP NETWORKS

Physical Transmission Media

• A tangible media

– Examples: Twisted-pair cable, coaxial cable, Fiber-optics, etc.

• Twisted-pair cable:

– One or more twisted wires bundled together (why?)

– Made of copper

• Coax-Cable:

– Consists of single copper wire surrounded by three layers of insulating and metal materials

– Typically used for cable TV

• Fiber-optics:

– Strands of glass or plastic used to transmit light

– Very high capacity, low noise, small size, less suitable to natural disturbances

Page 136: IP NETWORKS

Physical Transmission Media

plastic outer

coating

woven or braided metal

insulatin

g material

copper wire

twisted-pair cable twisted-pair wire

protective

coating

glass cladding

optical fiber

core

Page 137: IP NETWORKS

Wireless Transmission Media

• Broadcast Radio

– Distribute signals through the air over long distance

– Uses an antenna

– Typically for stationary locations

– Can be short range

• Cellular Radio

– A form of broadcast radio used for mobile communication

– High frequency radio waves to transmit voice or data

– Utilizes frequency-reuse

Page 138: IP NETWORKS

Wireless Transmission Media

• Microwaves

– Radio waves providing high speed transmission

– They are point-to-point (can’t be obstructed)

– Used for satellite communication

• Infrared (IR)

– Wireless transmission media that sends signals using infrared light- waves - Such as?

Page 139: IP NETWORKS

Physical Transmission Media

100 Mbps is how many bits per sec?

Which is bigger:

10,000 Mbps, 0.01Tbps or 10Gbps?

Wireless channel capacity:

Page 140: IP NETWORKS

Networks

• Collection of computers and devices connected together

• Used to transfer information or files, share resources, etc.

• What is the largest network?

• Characterized based on their geographical coverage, speed, capacities

• Networks are categorized based on the following characteristics:

– Network coverage: LAN, MAN, WAN

– Network topologies: how the computers are connected together

– Network technologies

– Network architecture

Page 141: IP NETWORKS

Network coverage • Local Area Networks:

– Used for small networks (school, home, office)

– Examples and configurations:

• Wireless LAN or Switched LAN

• ATM LAN, Frame Ethernet LAN

• Peer-2-PEER: connecting several computers together (<10)

• Client/Server: The serves shares its resources between different clients

• Metropolitan Area Network

– Backbone network connecting all LANs

– Can cover a city or the entire country

• Wide Area Network

– Typically between cities and countries

– Technology:

• Circuit Switch, Packet Switch, Frame Relay, ATM

– Examples:

• Internet P2P: Networks with the same network software can be connected together (Napster)

Page 142: IP NETWORKS

LAN v.s WAN

LAN - Local Area Network a group of

computers connected within a building or a

campus (Example of LAN may consist of

computers located on a single floor or a

building or it might link all the computers in a

small company.

WAN - A network consisting of

computers of LAN's connected

across a distance WAN can cover

small to large distances, using

different topologies such as

telephone lines, fiber optic cabling,

satellite transmissions and

microwave transmissions.

Page 143: IP NETWORKS

Network Topologies

• Configuration or physical arrangement in which devices are connected together

• BUS networks: Single central cable connected a number of devices

– Easy and cheap

– Popular for LANs

• RING networks: a number of computers are connected on a closed loop

– Covers large distances

– Primarily used for LANs and WANs

• STAR networks: connecting all devices to a central unit

– All computers are connected to a central device called hub

– All data must pass through the hub

– What is the problem with this?

– Susceptible to failure

Page 144: IP NETWORKS

Network Topologies personal

computer

personal

computer

personal

computer

personal

computer

personal

computer

host

compute

r

printer

file server

personal computer

personal computer

personal computer

personal computer

Page 145: IP NETWORKS

Network Architecture • Refers to how the computer or devices are designed in a network

• Basic types:

– Centralized – using mainframes

– Peer-2-Peer:

• Each computer (peer) has equal responsibilities, capacities, sharing hardware, data, with the other computers on the peer-to-peer network

• Good for small businesses and home networks

• Simple and inexpensive

– Client/Server:

• All clients must request service from the server

• The server is also called a host

• Different servers perform different tasks: File server, network server, etc.

client client client

serve

r laser

printer

Page 146: IP NETWORKS

P2P vs Client-Server

Peer-to-Peer Examples

Peers make a portion of their resources, such

as processing power, disk storage or network

bandwidth, directly available to other

network participants, without the need for

central coordination by servers or stable hosts

Page 147: IP NETWORKS

(Data) Network Technologies

• Vary depending on the type of devices we use for interconnecting computers and devices together

• Ethernet:

– LAN technology allowing computers to access the network

– Susceptible to collision

– Can be based on BUS or STAR topologies

– Operates at 10Mbps or 100Mbps, (10/100)

– Fast Ethernet operates at 100 Mbps /

– Gigabit Ethernet (1998 IEEE 802.3z)

– 10-Gigabit Ethernet (10GE or 10GbE or 10 GigE)

• 10GBASE-R/LR/SR (long range short range, etc.)

• Physical layer

– Gigabit Ethernet using optical fiber, twisted pair cable, or balanced copper cable

Project Topic

Page 148: IP NETWORKS

(Data) Network Technologies • Token Ring

– LAN technology

– Only the computer with the token can transmit

– No collision

– Typically 72-260 devices can be connected together

• TCP/IP and UDP

– Uses packet transmission

• 802.11

– Standard for wireless LAN

– Wi-Fi (wireless fidelity) is used to describe that the device is in 802.11 family or standards

– Typically used for long range (300-1000 feet)

– Variations include: .11 (1-2 Mbps); .11a (up to 54 Mbps); .11b (up to 11 Mbps); .11g (54 Mbps and higher

Project Topic

Page 149: IP NETWORKS

(Data) Network Technologies

• 802.11n

– Next generation wireless LAN technology

– Improving network throughput (600 Mbps compared to 450 Mbps) – thus potentially supporting a user throughput of 110 Mbit/s

• WiMAX

– Worldwide Interoperability for Microwave Access

– Provides wireless transmission of data from point-to-multipoint links to portable and fully mobile internet access (up to 3 Mbit/s)

– The intent is to deliver the last mile wireless broadband access as an alternative to cable and DSL

– Based on the IEEE 802.16(d/e) standard (also called Broadband Wireless Access)

http://www.broadcom.com/collateral/wp/802_11n-WP100-R.pdf

Project Topic

Page 150: IP NETWORKS

Network Technologies

• Personal area network (PAN)

– A low range computer network

– PANs can be used for communication among the personal devices themselves

– Wired with computer buses such as USB and FireWire.

• Wireless personal area network (WPAN)

– Uses network technologies such as IrDA, Bluetooth, UWB, Z-Wave and ZigBee

• Internet Mobile Protocols

– Supporting multimedia Internet traffic

– IGMP & MBONE for multicasting

– RTP, RTCP, & RSVP (used to handle multimedia on the Internet)

• VoIP

Project Topic RTP: Real-time Transport Protocol

Page 151: IP NETWORKS

Network Technologies • Zigbee

– High level communication protocols using small, low-power digital radios based on the IEEE 802.15.4

– Wireless mesh networking proprietary standard

• Bluetooth

– Uses radio frequency

– Typically used for close distances (short range- 33 feet or so)

– Transmits at 1Mbps

– Used for handheld computers to communicate with the desktop

• IrDA

– Infrared (IR) light waves

– Transfers at a rate of 115 Kbps to 4 Mbps

– Requires light-of-sight transmission

• RFID

– Radio frequency identification

– Uses tags which are places in items

– Example: merchandises, toll-tags, courtesy calls, sensors!

• WAP

– Wireless application protocol

– Data rate of 9.6-153 kbps depending on the service type

– Used for smart phones and PDAs to access the Internet (email, web, etc)

Project Topic

Page 152: IP NETWORKS

Network Examples • IEEE 802.15.4

– Low-rate wireless personal area networks (LR-WPANs)

– Bases for e ZigBee, WirelessHART, and MiWi specification

– Also used for 6LoWPAN and standard Internet protocols to build a Wireless Embedded Internet (WEI)

• Intranets

– Used for private networks

– May implement a firewall

• Hardware and software that restricts access to data and information on a network

• Home networks

– Ethernet

– Phone line

– HomeRF (radio frequency- waves)

– Intelligent home network

• Vehicle-to-Vehicle (car2Car) - http://www.car-to-car.org/

– A wireless LAN based communication system to guarantee European-wide inter-vehicle operability

Project Topic Car2Car Technology: http://www.youtube.com/watch?v=8tFUsN3ZgR4

Page 153: IP NETWORKS

Network Examples

• Interplanetary (Internet) Network

http://www.ece.gatech.edu/research/labs/bwn/deepspace/

Project Topic

Page 154: IP NETWORKS

Network Example: Telephone Networks

• Called the Public Switched Telephone Network (PSTN)

• World-wide and voice oriented (handles voice and data)

• Data/voice can be transferred within the PSTN using different technologies (data transfer rate bps)

• Dial-up lines:

– Analog signals passing through telephone lines

– Requires modems (56 kbps transfer rate)

• ISDN lines:

– Integrated Services Digital Network

– Digital transmission over the telephone lines

– Can carry (multiplex) several signals on a single line

• DSL

– Digital subscribe line

– ADSL (asymmetric DSL)

• receiver operated at 8.4 Mbps, transmit at 640 kbps

• T-Carrier lines: carries several signals over a single line: T1,T3

• Frame Relay

• ATM:

– Asynchronous Transfer Mode

– Fast and high capacity transmitting technology

– Packet technology

Project Topic

Switching Technologies: Technologies: •Circuit Switching •Packet Switching •Message Switching •Burst Switching

Page 155: IP NETWORKS

Network Example: Optical Networks

• Fiber-to-the-x

– Broadband network architecture that uses optical fiber to replace copper

– Used for last mile telecommunications

– Examples: Fiber-to-the-home (FTTH); Fiber-to-the-building (FTTB); Fiber-to-the premises (FTTP)

• Fiber Distribution Network (reaching different customers)

– Active optical networks (AONs)

– Passive optical networks (PONs)

Project Topic

Page 156: IP NETWORKS

Network Example

• Smart Grid

– Delivering electricity from suppliers

to consumers using digital

technology to save energy

• Storage Area Networks

• Computational Grid Networks

http://rekuwait.wordpress.com/2009/06/18/smart-electric-grid/

Project Topic

Page 157: IP NETWORKS

Network Example: Telephone Networks

Page 158: IP NETWORKS

Network Examples

Page 159: IP NETWORKS

Network Examples

Public Telephone

Network

T-Carrier Dedicated

Lines Dail-up

DSL ISDN

ATM

What about Cable Internet Services?

Page 160: IP NETWORKS

2.160

solutions

Page 161: IP NETWORKS

Cluster-based Storage Systems

Client

Commodity Ethernet Switch

Servers

Ethernet: 1-10Gbps

Round Trip Time (RTT): 100-10us

Page 162: IP NETWORKS

Cluster-based Storage Systems

Client Switch

Storage

Servers

R

R

R

R

1

2

Data Block

Server

Request Unit

(SRU)

3

4

Synchronized Read

Client now sends

next batch of requests

1 2 3 4

Page 163: IP NETWORKS

Synchronized Read Setup

• Test on an Ethernet-based storage cluster

• Client performs synchronized reads

• Increase # of servers involved in transfer

– Data block size is fixed (FS read)

• TCP used as the data transfer protocol

Page 164: IP NETWORKS

TCP Throughput Collapse

Collapse!

Cluster Setup

1Gbps Ethernet

Unmodified TCP

S50 Switch

1MB Block Size

• TCP Incast

• Cause of throughput collapse:

coarse-grained TCP timeouts

Page 165: IP NETWORKS

Solution: µsecond TCP + no minRTO

more servers

High throughput for up to 47 servers

Simulation scales to thousands of servers

Throughput

(Mbps)

Unmodified TCP

Our solution

Page 166: IP NETWORKS

Overview

• Problem: Coarse-grained TCP timeouts (200ms) too expensive

for datacenter applications

• Solution: microsecond granularity timeouts

– Improves datacenter app throughput & latency

– Also safe for use in the wide-area (Internet)

Page 167: IP NETWORKS

Outline

• Overview

• Why are TCP timeouts expensive?

• How do coarse-grained timeouts affect apps?

• Solution: Microsecond TCP Retransmissions

• Is the solution safe?

Page 168: IP NETWORKS

TCP: data-driven loss recovery

Sender Receiver

1

2 3

4

5

Ack 1

Ack 1

Ack 1

Ack 1

3 duplicate ACKs for 1

(packet 2 is probably lost)

2

Seq #

Retransmit packet 2

immediately Ack 5

In datacenters

data-driven recovery

in µsecs after loss.

Page 169: IP NETWORKS

TCP: timeout-driven loss recovery

Sender Receiver

1

2 3

4

5

1

Retransmission

Timeout

(RTO)

Ack 1

Seq #

Timeouts are expensive (msecs

to recover after loss)

Retransmit packet

Page 170: IP NETWORKS

TCP: Loss recovery comparison

Sender Receiver

1 2 3 4 5

Ack 1

Ack 1

Ack 1

Ack 1

Retransmit

2

Seq #

Ack 5

Sender Receiver

1 2 3 4

5

1

Retransmission

Timeout

(RTO)

Ack 1

Seq #

Timeout driven recovery is

slow (ms)

Data-driven recovery is

super fast (µs) in datacenters

Page 171: IP NETWORKS

RTO Estimation and Minimum Bound

• Jacobson’s TCP RTO Estimator – RTOEstimated = SRTT + (4 * RTTVAR)

• Actual RTO = max(minRTO, RTOEstimated)

• Minimum RTO bound (minRTO) = 200ms

– TCP timer granularity

– Safety (Allman99)

– minRTO (200ms) >> Datacenter RTT (100µs)

– 1 TCP Timeout lasts 1000 datacenter RTTs!

Page 172: IP NETWORKS

Outline

• Overview

• Why are TCP timeouts expensive?

• How do coarse-grained timeouts affect apps?

• Solution: Microsecond TCP Retransmissions

• Is the solution safe?

Page 173: IP NETWORKS

Single Flow TCP Request-Response

Client Switch Server

Data Data Data

time Request sent

R

Response sent

Response dropped

Response resent

200ms

Page 174: IP NETWORKS

Apps Sensitive to 200ms Timeouts

• Single flow request-response

– Latency-sensitive applications

• Barrier-Synchronized workloads

– Parallel Cluster File Systems

• Throughput-intensive

– Search: multi-server queries

• Latency-sensitive

Page 175: IP NETWORKS

Link Idle Time Due To Timeouts

Client Switch

R

R

R

R

1

2

3

4

Synchronized Read

4

1 2 3 4 Server

Request Unit

(SRU)

time

Req. sent

Rsp. sent

4 dropped Response

resent 1 – 3 done Link Idle!

Page 176: IP NETWORKS

Client Link Utilization

200ms

Link Idle!

Page 177: IP NETWORKS

200ms timeouts Throughput Collapse

• [Nagle04] called this Incast

• Provided application level solutions

• Cause of throughput collapse: TCP timeouts

• [FAST08] Search for network level solutions to TCP Incast

Collapse!

Cluster Setup

1Gbps Ethernet

200ms minRTO

S50 Switch

1MB Block Size

Page 178: IP NETWORKS

Results from our previous work

(FAST08)

Network Level Solutions Results / Conclusions

Increase Switch Buffer Size Delays throughput collapse

Throughput collapse inevitable

Expensive

Page 179: IP NETWORKS

Results from our previous work (FAST08)

Network Level Solutions Results / Conclusions

Increase Switch Buffer Size Delays throughput collapse

Throughput collapse inevitable

Expensive

Alternate TCP Implementations

(avoiding timeouts, aggressive data-

driven recovery, disable slow start)

Throughput collapse inevitable

because timeouts are inevitable

(complete window loss a common

case)

Page 180: IP NETWORKS

Results from our previous work (FAST08)

Network Level Solutions Results / Conclusions

Increase Switch Buffer Size Delays throughput collapse

Throughput collapse inevitable

Expensive

Alternate TCP Implementations

(avoiding timeouts, aggressive data-

driven recovery, disable slow start)

Throughput collapse inevitable

because timeouts are inevitable

(complete window loss a common

case)

Ethernet Flow Control Effective

Limited effectiveness (works for

simple topologies)

head-of-line blocking

Page 181: IP NETWORKS

Results from our previous work (FAST08)

Network Level Solutions Results / Conclusions

Increase Switch Buffer Size Delays throughput collapse

Throughput collapse inevitable

Expensive

Alternate TCP Implementations

(avoiding timeouts, aggressive data-

driven recovery, disable slow start)

Throughput collapse inevitable

because timeouts are inevitable

(complete window loss a common

case)

Ethernet Flow Control Effective

Limited effectiveness (works for

simple topologies)

head-of-line blocking

Reducing minRTO (in simulation) Very effective

Implementation concerns (µs timers

for OS, TCP)

Safety concerns

Page 182: IP NETWORKS

Outline

• Overview

• Why are TCP timeouts expensive?

• How do coarse-grained timeouts affect apps?

• Solution: Microsecond TCP Retransmissions

– and eliminate minRTO

• Is the solution safe?

Page 183: IP NETWORKS

µsecond Retransmission Timeouts (RTO)

RTO = max( minRTO, f(RTT) )

200ms

200µs?

0?

RTT tracked in

milliseconds

Track RTT in µsecond

Page 184: IP NETWORKS

Lowering minRTO to 1ms

• Lower minRTO to as low a value as possible without changing

timers/TCP impl.

• Simple one-line change to Linux

• Uses low-resolution 1ms kernel timers

Page 185: IP NETWORKS

Default minRTO: Throughput Collapse

Unmodified TCP (200ms minRTO)

Page 186: IP NETWORKS

Lowering minRTO to 1ms helps

Millisecond retransmissions are not enough

Unmodified TCP (200ms minRTO)

1ms minRTO

Page 187: IP NETWORKS

Requirements for µsecond RTO

• TCP must track RTT in microseconds

– Modify internal data structures

– Reuse timestamp option

• Efficient high-resolution kernel timers

– Use HPET for efficient interrupt signaling

Page 188: IP NETWORKS

Solution: µsecond TCP + no minRTO

Unmodified TCP (200ms minRTO) more servers

1ms minRTO

microsecond TCP + no minRTO

• High throughput for up to 47 servers

Page 189: IP NETWORKS

Simulation: Scaling to thousands

Block Size = 80MB, Buffer = 32KB, RTT = 20us

Page 190: IP NETWORKS

Synchronized Retransmissions At Scale

Simultaneous retransmissions successive timeouts

Successive RTO = RTO * 2backoff

Page 191: IP NETWORKS

Simulation: Scaling to thousands

Desynchronize retransmissions to scale further

Successive RTO = (RTO + (rand(0.5)*RTO) ) * 2backoff

For use within datacenters only

Page 192: IP NETWORKS

Outline • Overview

• Why are TCP timeouts expensive?

• The Incast Workload

• Solution: Microsecond TCP Retransmissions

• Is the solution safe?

– Interaction with Delayed-ACK within datacenters

– Performance in the wide-area

Page 193: IP NETWORKS

Delayed-ACK (for RTO > 40ms)

Delayed-Ack: Optimization to reduce #ACKs sent

Seq #

Sender Receiver

1

Ack 1

40ms

Sender Receiver

1

Ack 2

Seq #

2

Sender Receiver

1

Ack 0

Seq #

2

Page 194: IP NETWORKS

µsecond RTO and Delayed-ACK

Premature Timeout

RTO on sender triggers before Delayed-ACK on receiver

Sender Receiver

1

Ack 1

Seq #

1

RTO < 40ms

Timeout

Retransmit packet

Seq #

Sender Receiver

1

Ack 1

40ms

RTO > 40ms

Page 195: IP NETWORKS

Impact of Delayed-ACK

Page 196: IP NETWORKS

Is it safe for the wide-area?

• Stability: Could we cause congestion collapse?

– No: Wide-area RTOs are in 10s, 100s of ms

– No: Timeouts result in rediscovering link capacity (slow down the rate

of transfer)

• Performance: Do we timeout unnecessarily?

– [Allman99] Reducing minRTO increases the chance of premature

timeouts

• Premature timeouts slow transfer rate

– Today: detect and recover from premature timeouts

– Wide-area experiments to determine performance impact

Page 197: IP NETWORKS

Wide-area Experiment

Do microsecond timeouts harm wide-area throughput?

Microsecond TCP

+

No minRTO

Standard TCP

BitTorrent

Seeds

BitTorrent

Clients

Page 198: IP NETWORKS

Wide-area Experiment: Results

No noticeable difference in throughput

Page 199: IP NETWORKS

2.199

Best Effort Service Model –

scheduling and policy

Page 200: IP NETWORKS

200

Question to the Class?

• Flow AD requires b/w, delay, loss guarantees

• Cross traffic is unpredictable

• Can IP provide this?

• What modifications are necessary to accomplish this?

A B C D

Cross Traffic E F

5 Mbps 10 Mbps

Page 201: IP NETWORKS

201

Limitations of IP

• IP provides only best effort service

• IP does not participate in resource management

– Cannot provide service guarantees on a per flow basis

– Cannot provide service differentiation among traffic aggregates

• Early efforts

– Tenet group at Berkeley

– ATM

• IETF efforts

– Integrated services initiative

– Differentiated services initiative

Page 202: IP NETWORKS

202

So, what is required?

• Flow differentiation

– Simple FIFO scheduling will not work!

• Admission control

• Resource reservation

• Flow specification

Page 203: IP NETWORKS

203

Integrated Services Internet

• Enhance IP’s service model – Old model: single best-effort service class

– New model: multiple service classes, including best-effort and QoS classes

• Create protocols and algorithms to support new service models

– Old model: no resource management at IP level

– New model: explicit resource management at IP level

• Key architecture difference

– Old model: stateless

– New model: per flow state maintained at routers

• used for admission control and scheduling

• set up by signaling protocol

Page 204: IP NETWORKS

204

Integrated Services Network

• Flow or session as QoS

abstractions

• Each flow has a fixed or

stable path

• Routers along the path

maintain the state of the

flow

Page 205: IP NETWORKS

205

Integrated Services Example

Sender Receiver

• Achieve per-flow bandwidth and delay guarantees

– Example: guarantee 1MBps and < 100 ms delay to a flow

Page 206: IP NETWORKS

206

Integrated Services Example

Sender Receiver

• Allocate resources - perform per-flow admission control

Page 207: IP NETWORKS

207

Integrated Services Example

Sender Receiver

• Install per-flow state

Page 208: IP NETWORKS

208

Sender Receiver

• Install per flow state

Integrated Services Example

Page 209: IP NETWORKS

209

Integrated Services Example: Data Path

Sender Receiver

• Per-flow classification

Page 210: IP NETWORKS

210

Integrated Services Example: Data Path

Sender Receiver

• Per-flow buffer management

Page 211: IP NETWORKS

211

Integrated Services Example

Sender Receiver

• Per-flow scheduling

Page 212: IP NETWORKS

212

How Things Fit Together

Admission

Control

Data In Data Out

Co

ntr

ol

Pla

ne

D

ata

Pla

ne

Scheduler

Routing Routing

Messages RSVP

messages

Classifier

RSVP

Route Lookup

Forwarding Table Per Flow QoS Table

Page 213: IP NETWORKS

213

Service Classes

• Service can be viewed as a contract between network and

communication client

– end-to-end service

– other service scopes possible

• Three common services

– best-effort (“elastic” applications) – hard real-time (“real-time” applications) – soft real-time (“tolerant” applications)

Page 214: IP NETWORKS

214

Hard Real Time: Guaranteed Services

• Service contract

– network to client: guarantee a deterministic upper bound on

delay for each packet in a session

– client to network: the session does not send more than it

specifies

• Algorithm support

– admission control based on worst-case analysis

– per flow classification/scheduling at routers

Page 215: IP NETWORKS

215

Soft Real Time: Controlled Load Service

• Service contract:

– network to client: similar performance as an unloaded best-

effort network

– client to network: the session does not send more than it

specifies

• Algorithm Support

– admission control based on measurement of aggregates

– scheduling for aggregate possible

Page 216: IP NETWORKS

Improving QOS in IP Networks

Thus far: “making the best of best effort”

Future: next generation Internet with QoS guarantees

– RSVP: signaling for resource reservations

– Differentiated Services: differential guarantees

– Integrated Services: firm guarantees

• simple model

for sharing and

congestion

studies:

Page 217: IP NETWORKS

Principles for QOS Guarantees

• Example: 1MbpsI P phone, FTP share 1.5 Mbps link.

– bursts of FTP can congest router, cause audio loss

– want to give priority to audio over FTP

packet marking needed for router to distinguish between different

classes; and new router policy to treat packets accordingly

Principle 1

Page 218: IP NETWORKS

Principles for QOS Guarantees (more)

• what if applications misbehave (audio sends higher than declared rate)

– policing: force source adherence to bandwidth allocations

• marking and policing at network edge:

– similar to ATM UNI (User Network Interface)

provide protection (isolation) for one class from others

Principle 2

Page 219: IP NETWORKS

Principles for QOS Guarantees (more)

• Allocating fixed (non-sharable) bandwidth to flow:

inefficient use of bandwidth if flows doesn’t use its allocation

While providing isolation, it is desirable to use resources

as efficiently as possible

Principle 3

Page 220: IP NETWORKS

Principles for QOS Guarantees (more)

• Basic fact of life: can not support traffic demands beyond

link capacity

Call Admission: flow declares its needs, network may

block call (e.g., busy signal) if it cannot meet needs

Principle 4

Page 221: IP NETWORKS

Summary of QoS Principles

Let’s next look at mechanisms for achieving this ….

Page 222: IP NETWORKS

Scheduling And Policing Mechanisms

• scheduling: choose next packet to send on link; allocate link capacity and

output queue buffers to each connection (or connections aggregated into

classes)

• FIFO (first in first out) scheduling: send in order of arrival to queue

– discard policy: if packet arrives to full queue: who to discard?

• Tail drop: drop arriving packet

• priority: drop/remove on priority basis

• random: drop/remove randomly

Page 223: IP NETWORKS

Need for a Scheduling Discipline

• Why do we need a non-trivial scheduling discipline?

• Per-connection delay, bandwidth, and loss are determined by the

scheduling discipline

– The NE can allocate different mean delays to different connections by

its choice of service order

– it can allocate different bandwidths to connections by serving at least a

certain number of packets from a particular connection in a given time

interval

– Finally, it can allocate different loss rates to connections by giving them

more or fewer buffers

Page 224: IP NETWORKS

FIFO Scheduling

• Disadvantage with strict FIFO scheduling is that the scheduler

cannot differentiate among connections -- it cannot explicitly

allocate some connections lower mean delays than others

• A more sophisticated scheduling discipline can achieve this

objective (but at a cost)

• The conservation law

– “the sum of the mean queueing delays received by the set of multiplexed connections, weighted by their fair share of

the link’s load, is independent of the scheduling discipline”

Page 225: IP NETWORKS

Requirements

• A scheduling discipline must satisfy four requirements:

– Ease of implementation -- pick a packet every few microsecs; a

scheduler that takes O(1) and not O(N) time

– Fairness and Protection (for best-effort connections) -- FIFO does

not offer any protection because a misbehaving connection can

increase the mean delay of all other connections. Round-robin

scheduling?

– Performance bounds -- deterministic or statistical; common

performance parameters: bandwidth, delay (worst-case, average),

delay-jitter, loss

– Ease and efficiency of admission control -- to decide given the

current set of connections and the descriptor for a new connection,

whether it is possible to meet the new connection’s performance bounds without jeopardizing the performance of existing

connections

Page 226: IP NETWORKS

Schedulable Region

Page 227: IP NETWORKS

Designing a scheduling discipline

• Four principal degrees of freedom:

– the number of priority levels

– whether each level is work-conserving or non-work-conserving

– the degree of aggregation of connections within a level

– service order within a level

• Each feature comes at some cost

– for a small LAN switch -- a single priority FCFS scheduler or at most

2-priority scheduler may be sufficient

– for a heavily loaded wide-area public switch with possibly

noncooperative users, a more sophisticated scheduling discipline may

be required.

Page 228: IP NETWORKS

Work conserving and non-work conserving

disciplines

• A work-conserving scheduler is idle only when there is no packet awaiting

service

• A non-work-conserving scheduler may be idle even if it has packets to

serve

– makes the traffic arriving at downstream switches more predictable

– reduces buffer size necessary at output queues and the delay jitter

experienced by a connection

– allows the switch to send a packet only when the packet is eligible

– for example, if the (k+1)th packet on connection A becomes eligible for

service only i seconds after the service of the kth packet, the

downstream swicth receives packets on A no faster than one every i

secs.

Page 229: IP NETWORKS

Eligibility times

• By choosing eligibility times carefully, the output from a switch can be

mode more predictable (so that bursts won’t build up in the n/w) • Two approaches: rate-jitter and delay-jitter

• rate-jitter: peak rate guarantee for a connection

– E(1) = A(1); E(k+1) = max(E(k) + Xmin, A(k+1)) where Xmin is the

time taken to serve a fixed-sized packet at peak rate)

• delay-jitter: at every switch, the input arrival pattern is fully reconstructed

– E(0,k) = A (0,k); E(i+1, k) = E(i,k) + D + L where D is the delay bound

at the previous switch and L is the largest possible delay on the link

between switch i and i+1

Page 230: IP NETWORKS

Pros and Cons

• Reduces delay jitter: Con -- we can remove jitter at endpoints with an

elasticity buffer; Pro--reduces buffers(expensive) at the switches

• Increases mean delay, problem?: pro--for playback applications, which

delay packets until the delay-jitter bound, increasing mean delay does not

affect the perceived performance

• Wasted bandwidth, problem?: pro--It can serve best-effort packets when

there are no eligible packets to serve

• Needs accurate source descriptors -- no rebuttal from the non-work

conserving camp

Page 231: IP NETWORKS

Priority Scheduling

transmit highest priority queued packet

• multiple classes, with different priorities

– class may depend on marking or other header info, e.g. IP

source/dest, port numbers, etc..

Page 232: IP NETWORKS

Priority Scheduling

• The scheduler serves a packet from priority level k only if

there are no packets awaiting service in levels k+1, k+2, …, n

• at least 3 levels of priority in an integrated services network?

• Starvation? Appropriate admission control and policing to

restrict service rates from all but the lowest priority level

• Simple implementation

Page 233: IP NETWORKS

Round Robin Scheduling

• multiple classes

• cyclically scan class queues, serving one from each class (if available)

• provides protection against misbehaving sources (also guarantees a

minimum bandwidth to every connection)

Page 234: IP NETWORKS

Max-Min Fair Share

• Fair Resource allocation to best-effort connections?

• Fair share allocates a user with a “small” demand what it wants, and evenly distributes unused resources to the “big” users.

• Maximize the minimum share of a source whose demand is not fully

satisfied.

– Resources are allocated in order of increasing demand

– no source gets a resource share larger than its demand

– sources with unsatisfied demand s get an equal share of resource

• A Generalized Processor Sharing (GPS) server will implement max-min

fair share

Page 235: IP NETWORKS

Weighted Fair Queueing

• generalized Round Robin (offers differential service to

each connection/class)

• each class gets weighted amount of service in each cycle

Page 236: IP NETWORKS

Policing Mechanisms

Goal: limit traffic to not exceed declared parameters

Three common-used criteria:

• (Long term) Average Rate: how many pkts can be sent per unit

time (in the long run)

– crucial question: what is the interval length: 100 packets

per sec or 6000 packets per min have same average!

• Peak Rate: e.g., 6000 pkts per min. (ppm) avg.; 1500 ppm

peak rate

• (Max.) Burst Size: max. number of pkts sent consecutively

(with no intervening idle)

Page 237: IP NETWORKS

Traffic Regulators

• Leaky bucket controllers

• Token bucket controllers

Page 238: IP NETWORKS

Policing Mechanisms

Token Bucket: limit input to specified Burst Size and Average Rate.

• bucket can hold b tokens

• tokens generated at rate r token/sec unless bucket full

• over interval of length t: number of packets admitted less than or equal to

(r t + b).

Page 239: IP NETWORKS

Policing Mechanisms (more)

• token bucket, WFQ combine to provide guaranteed upper bound on

delay, i.e., QoS guarantee!

WFQ

token rate, r

bucket size, b

per-flow rate, R

D = b/R max

arriving

traffic

Page 240: IP NETWORKS

Queries