2: application layer 1 cmpt 371 data communications and networking chapter 2 application layer - 2

2: Application Layer 1

CMPT 371Data Communications

and Networking

Chapter 2Application Layer - 2


Chapter 2 outline 2.1 Principles of app layer protocols 2.2 Web and HTTP 2.3 FTP 2.4 Electronic Mail

SMTP, POP3, IMAP 2.5 DNS 2.6 Content distribution

Network Web caching Content distribution networks P2P file sharing


Content Distribution Problem of a single server

Bottleneck, single point of failure, …

Content Distribution Distribute (Replicate) contents at different

place Direct requests to appropriate places


Client-side Caching


Limit of Client-side Caching

Not shared !


Web caches (proxy server)

user sets browser: Web accesses via proxy

browser sends all HTTP requests to proxy object in cache: cache

returns object else cache requests

object from origin server, then returns object to client

Proxy: both client and server; typically installed by ISP (university, company, residential ISP)

Goal: satisfy client request without involving origin server

client

Proxyserver

client

HTTP request

HTTP request

HTTP response

HTTP response

HTTP request

HTTP response

origin server

origin server


More about Web cachingWhy Web caching? Reduce response time for client request. Reduce traffic on an institution’s access

link. Internet dense with caches enables

“poor” content providers to effectively deliver content


00.10.20.30.40.50.60.70.80.9

1

1 2 3 4 5 6 7 8 9 10

Obejct ID

Acc

ess

Prob

abili

ty

More about Web cachingWhy is caching effective for Web, even

if cache space is quite limited ? Zipf distribution

|,|,...,2,1,)/1(

)/1(||

1

S

jj

jp S

j

j


Caching example (1)Assumptions average object size = 100,000

bits avg. request rate from

institution’s browser to origin serves = 15/sec

delay from institutional router to any origin server and back to router = 2 sec

Consequences utilization on LAN = 15% utilization on access link = 100% ! total delay = Internet delay +

access delay + LAN delay = 2 sec + some minutes + some

milliseconds

originservers

public Internet

institutionalnetwork 10 Mbps LAN

1.5 Mbps access link

institutionalcache


Caching example (2)Possible solution increase bandwidth of

access link to, say, 10 Mbps

Consequences (if 10 Mbps) utilization on LAN = 15% utilization on access link = 15% Total delay = Internet delay +

access delay + LAN delay = 2 sec + some msecs + some

msecs often a costly upgrade

originservers

public Internet


upgraded from 1.5 to 10 Mbps

institutionalcache


Caching example (3)Install cache suppose hit rate is .4Consequence 40% requests will be satisfied

almost immediately 60% requests satisfied by

origin server utilization of access link

reduced to 60%, resulting in negligible delays (say 10 msec)

total delay = Internet delay + access delay + LAN delay

= 60%*2 sec + 40%*0.01 secs + some milliseconds < 1.3 secs

originservers

public Internet


1.5 Mbps access link

institutionalcache


More about Web caching Problem of Web caching

Extra space/machine (proxy) Inconsistency (out-of-date

objects…)


Consistency of Cached Objects Solution 1: no caching


Consistency of Cached Objects Solution 2: Manually update


Conditional GET Goal: don’t send object if

client has up-to-date cached version

client: specify date of cached copy in HTTP requestIf-modified-since:

<date> server: response contains

no object if cached copy is up-to-date: HTTP/1.0 304 Not

Modified

client serverHTTP request msgIf-modified-since:

<date>

HTTP responseHTTP/1.0

304 Not Modified

object not

modified

HTTP request msgIf-modified-since:

<date>

HTTP responseHTTP/1.0 200 OK

<data>

object modified


Hierarchical cache

Calculation HR of Proxy 1 = 90% HR of Proxy 2 = 95% Independent Joint HR = ?

99.5%

How to measure cache ? Hit Ratio (HR)

client

ProxyServer 1

client

origin server

ProxyServer 2


Hierarchical cache

Calculation HR of Proxy 1 = 90% HR of Proxy 2 = 95% Independent Joint HR = ?

How about average delay?

How to measure cache ? Hit Ratio (HR)

client

ProxyServer 1

client

origin server

ProxyServer 2


Content distribution networks (CDNs)

Ping www.Microsoft.com www.Netflix.com www.ibm.com www.apple.com

What did you see ?

origin server in North America

CDN distribution node

CDN serverin S. America CDN server

in Europe

CDN serverin Asia

http://www.microsoft.com/

http://www.netflix.com/

http://www.ibm.com/

http://www.apple.com/


Content distribution networks (CDNs)

Content replication CDN company (e.g., Akamai)

installs hundreds of CDN servers throughout Internet in lower-tier ISPs, close to

users Content providers (e.g.,

Netflix) are the CDN company’s customers.

CDN replicates its customers’ content in CDN servers. When provider updates content, CDN updates servers

origin server in North America

CDN distribution node

CDN serverin S. America CDN server

in Europe

CDN serverin Asia


CDN example

origin server www.foo.com distributes HTML Replaces: http://www.foo.com/sports.ruth.gif

with http://www.cdn.com/www.foo.com/sports/ruth.gif

HTTP request for www.foo.com/sports/sports.html

DNS query for www.cdn.com

HTTP request for www.cdn.com/www.foo.com/sports/ruth.gif

1

2

3

Origin server

CDNs authoritative DNS server

NearbyCDN server

CDN company cdn.com distributes gif files uses its authoritative

DNS server to route redirect requests


More about CDNsrouting requests CDN creates a “map”,

indicating distances from leaf ISPs and CDN nodes

when query arrives at authoritative DNS server: server determines ISP

from which query originates

uses “map” to determine best CDN server

Caching vs. CDN Pull: passive Push: active


Client-server architectureserver:

always-on host permanent IP address server farms for

scalingclients:

communicate with server may be intermittently

connected may have dynamic IP

addresses do not communicate

directly with each other

client/server


Pure P2P architecture no always-on server arbitrary end systems

directly communicate peers are intermittently

connected and change IP addresses self scalability – new peers

bring new resources Three topics: File distribution Searching for information Case Study: BitTorrent, Skype

peer-peer


P2P file sharingExample Alice runs P2P client

application on her notebook computer Intermittently

connects to Internet Asks for “X.mp3” Application displays

other peers that have copy of X.mp3.

Alice chooses one of the peers, Bob.

File is copied from Bob’s PC to Alice’s notebook: HTTP

While Alice downloads, other users uploading from Alice. Alice’s peer is both a

Web client and a transient Web server

All peers are servers = highly scalable!


File Distribution: Server-Client vs P2PQuestion : How much time to distribute file

from one server to N peers?

us

u2d1 d2u1

uN

dN

Server

Network (with abundant bandwidth)

File, size F

us: server upload bandwidthui: peer i upload bandwidthdi: peer i download bandwidth


File distribution time: server-client

us

u2d1 d2u1

uN

dN

Server


F Server transmission:

must sequentially sends (upload) N copies: NF/us time

Client: each must download the file client i takes F/di time to

download

increases linearly in N(for large N)

= Dcs = max { NF/us, F/min(di) }i

Time to distribute F to N clients using

client/server approach


File distribution time: P2P

us

u2d1 d2u1

uN

dN

Server


F Server transmission:

must upload at least one copy: F/us time

Client: each must down a copy client i takes F/di time to download but also share (upload)

Clients (peers): as a whole must download NF bits fastest possible overall download

rate: us + ui

DP2P = max { F/us, F/min(di) , NF/(us + ui) }i

increases linearly in N

(for large N) ?


0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30 35

N

Min

imum

Dis

tribu

tion

Tim

e P2PClient-Server

Server-client vs. P2P: exampleClient upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us


File distribution: BitTorrent

tracker: tracks peers participating in torrent

torrent: group of peers exchanging chunks of a file

obtain listof peers

trading chunks

peer

P2P file distribution

Alice arrives …… obtains listof peers from tracker… and begins exchanging file chunks with peers in torrent


BitTorrent (1) file divided into 256KB chunks. peer joining torrent:

has no chunks, but will accumulate them over time

registers with tracker to get list of peers, connects to subset of peers (“neighbors”)

while downloading, peer uploads chunks to other peers.

peers may come and go: churn once peer has entire file, it may (selfishly) leave or

(altruistically) remain


BitTorrent (2)Requesting Chunks at any given time,

different peers have different subsets of file chunks

periodically, a peer (Alice) asks each neighbor for list of chunks that they have.

Alice sends requests for her missing chunks rarest first

Sending Chunks: tit-for-tat Alice sends chunks to four

neighbors currently sending her chunks at the highest rate re-evaluate top 4 every

10 secs every 30 secs: randomly

select another peer, starts sending chunks newly chosen peer may

join top 4 “optimistically unchoke”


BitTorrent: Tit-for-tat(1) Alice “optimistically unchokes” Bob

(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates(3) Bob becomes one of Alice’s top-four providers

With higher upload rate, can find better trading partners & get file faster!


P2P Case study: Skype inherently P2P: pairs

of users communicate. proprietary

application-layer protocol (inferred via reverse engineering)

hierarchical overlay with SNs

Index maps usernames to IP addresses; distributed over SNs

Skype clients (SC)

Supernode (SN)

Skype login server


Peers as relays Problem when both

Alice and Bob are behind “NATs”. NAT prevents an outside

peer from initiating a call to insider peer (see later)

Solution: Using Alice’s and Bob’s

SNs, Relay is chosen Each peer initiates

session with relay. Peers can now

communicate through NATs via relay


P2P: searching for information

Index in P2P system: maps information to peer location(location = IP address & port number)

So many files But, where are they ?


P2P: centralized directoryoriginal “Napster”

design1) when peer connects,

it informs central server: IP address content

2) Alice queries for “X.mp3”

3) Alice requests file from Bob

centralizeddirectory server

peers

Alice

Bob

1

1

1

12

3


P2P: problems with centralized directory Single point of failure Performance

bottleneck Copyright

infringement

file transfer is decentralized, but locating content is highly centralized


P2P: decentralized directory Each peer is either a

group leader or assigned to a group leader.

Group leader tracks the content in all its children.

Peer queries group leader; group leader may query other group leaders.

ord inary peer

group-leader peer

neighoring re la tionshipsin overlay network


More about decentralized directory

advantages of approach no centralized directory server

location service distributed over peers more difficult to shut down

disadvantages of approach bootstrap node needed group leaders can get overloaded


P2P: Query flooding Gnutella no hierarchy use bootstrap node to

learn about others join message

Send query to neighbors Neighbors forward query If queried peer has

object, it sends message back to querying peer

join


P2P: more on query floodingPros peers have similar

responsibilities: no group leaders

highly decentralized no peer maintains

directory info

Cons excessive query

traffic query radius: may

not have content when present

bootstrap node maintenance of

overlay network


DHT: A New Story… Motivation:

Frustrated by popularity of all these “half-baked” P2P apps

We can do better! Guaranteed lookup success for files in system Provable bounds on search time Provable scalability to millions of node


P2P: Content Addressing (Hash Routing)

Hash routing Given an object identifier I, calculate its hash value

H=hash(I), and (hopefully) find it (or its location info) in peer H

Not a new idea Load balancing – hash IP address, re-direct to different

servers

hash table

applicationget (key) data

node node node….

put(key, data)


Hash Routing Two alternatives

Node can cache each (existing) object that hashes within its range

Pointer-based: level of indirection - node caches pointer to location(s) of object

What’s new in P2P? Dynamic overlay

• peer join/leave• number of peers is not fixed

Traditional hash function doesn’t work• SHA-1

0-9999500-9999

1000-19991500-4999

9000-9500

4500-6999

8000-8999 7000-8500


Distributed Hash Table (DHT)Challenges For each object, node(s) whose range(s) cover that

object must be reachable via a “short” path # neighbors for each node should scale well (e.g.,

should not be O(N)) Fully distributed (no centralized bottleneck/single

point of failure) DHT mechanism should gracefully handle nodes

joining/leaving need to repartition the range space over existing

nodes need to reorganize neighbor set need bootstrap mechanism to connect new nodes into

the existing DHT infrastructure


Case Studies Structure overlay (p2p) systems – Consistent Hashing

Chord CAN (Content Addressable Network)

Key Questions Q1: How is hash space divided “evenly” among existing

nodes? Q2: How is routing implemented that connects an arbitrary

node to the node responsible for a given object? Q3: How is the hash space repartitioned when nodes

join/leave? Let N be the number of nodes in the overlay Let H be the size of the range of the hash function

(when applicable)


Chord Associate to each node and file a unique id in

an uni-dimensional space (a Ring) E.g., pick from the range [0...2m-1] Usually the hash of the file or IP address

Properties: Routing table size is O(log N) , where N is the total

number of nodes Guarantees that a file is found in O(log N) hops

from MIT in 2001


Consistent Hashing

N32

N90

N105

K80

K20

K5

Circular ID space

Key 5Node 105

A key is stored at its successor: node with next higher ID (Key – Hashed value of a file identifier)


Chord Basic Lookup

N32

N90

N105

N60

N10N120

K80

“ Where is key 80?”

“ N90 has K80”


Chord “Finger Table”

N80

1/21/4

1/8

1/161/321/641/128

Entry i in the finger table of node n is the first node that succeeds or equals n + 2i

In other words, the ith finger points 1/2n-i way around the ring


Chord Join Assume a hash space [0..7]

Node n1 joins0

1

2

34

5

6

7i id+2i succ0 2 11 3 12 5 1

Succ. Table


Chord Join

Node n2 joins0

1

2

34

5

6

7i id+2i succ0 2 21 3 12 5 1

Succ. Table

i id+2i succ0 3 11 4 12 6 1

Succ. Table


Chord Join

Nodes n0, n6 join 0

1

2

34

5

6

7i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 6

Succ. Table

i id+2i succ0 7 01 0 02 2 2

Succ. Table


Chord Join

Nodes: n1, n2, n0, n6

Keys: f7, f1

01

2

34

5

6

7 i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 6

Succ. Table

7

Key1

Key

i id+2i succ0 7 01 0 02 2 2

Succ. Table


Chord Routing Upon receiving a query for

file id, a node first calculates the key (Hash id)

Checks whether stores the key locally

If not, forwards the query to the largest node in its successor table that does not exceed the key

01

2

34

5

6

7 i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 6

Succ. Table

7

Key1

Key

i id+2i succ0 7 01 0 02 2 2

Succ. Table

query(7)


Chord Summary

Routing table size?Log N fingers

Routing time?Each hop expects to 1/2 the distance to the

desired key => expect O(log N) hops.

Note: so far only the basic Chord; many practical issues remain (not covered in this course though …)


A few words about BitCoin (and other digital/virtual currency) Two key issues for a currency

Generation (where does it come from ?) Distribution (how to use it, i.e., buy/sell

transactions?) BitCoin – open source p2p currency

Mining (hashing) Verification


Chapter 2: Summary

application service requirements: reliability, bandwidth,

delay client-server paradigm Internet transport

service model connection-oriented,

reliable: TCP unreliable, datagrams:

UDP

Our study of network apps now complete!

specific protocols: HTTP FTP SMTP, POP, IMAP DNS

content distribution caches, CDNs P2P


Chapter 2: Summary

typical request/reply message exchange: client requests info or

service server responds with

data, status code message formats:

headers: fields giving info about data

data: info being communicated

More importantly: learned about protocols

control vs. data msgs in-band, out-of-band

centralized vs. decentralized

stateless vs. stateful reliable vs. unreliable msg

transfer “complexity at network

edge” – many protocols security: authentication

2: application layer 1 cmpt 371 data communications and networking chapter 2 application layer - 2

Documents