2: application layer 1 cmpt 371 data communications and networking chapter 2 application layer - 2
DESCRIPTION
2: Application Layer 3 Content Distribution r Problem of a single server m Bottleneck, single point of failure, … r Content Distribution m Distribute (Replicate) contents at different place m Direct requests to appropriate placesTRANSCRIPT
2: Application Layer 1
CMPT 371Data Communications
and Networking
Chapter 2Application Layer - 2
2: Application Layer 2
Chapter 2 outline 2.1 Principles of app layer protocols 2.2 Web and HTTP 2.3 FTP 2.4 Electronic Mail
SMTP, POP3, IMAP 2.5 DNS 2.6 Content distribution
Network Web caching Content distribution networks P2P file sharing
2: Application Layer 3
Content Distribution Problem of a single server
Bottleneck, single point of failure, …
Content Distribution Distribute (Replicate) contents at different
place Direct requests to appropriate places
2: Application Layer 4
Client-side Caching
2: Application Layer 5
Limit of Client-side Caching
Not shared !
2: Application Layer 6
Web caches (proxy server)
user sets browser: Web accesses via proxy
browser sends all HTTP requests to proxy object in cache: cache
returns object else cache requests
object from origin server, then returns object to client
Proxy: both client and server; typically installed by ISP (university, company, residential ISP)
Goal: satisfy client request without involving origin server
client
Proxyserver
client
HTTP request
HTTP request
HTTP response
HTTP response
HTTP request
HTTP response
origin server
origin server
2: Application Layer 7
More about Web cachingWhy Web caching? Reduce response time for client request. Reduce traffic on an institution’s access
link. Internet dense with caches enables
“poor” content providers to effectively deliver content
2: Application Layer 8
00.10.20.30.40.50.60.70.80.9
1
1 2 3 4 5 6 7 8 9 10
Obejct ID
Acc
ess
Prob
abili
ty
More about Web cachingWhy is caching effective for Web, even
if cache space is quite limited ? Zipf distribution
|,|,...,2,1,)/1(
)/1(||
1
S
jj
jp S
j
j
2: Application Layer 9
Caching example (1)Assumptions average object size = 100,000
bits avg. request rate from
institution’s browser to origin serves = 15/sec
delay from institutional router to any origin server and back to router = 2 sec
Consequences utilization on LAN = 15% utilization on access link = 100% ! total delay = Internet delay +
access delay + LAN delay = 2 sec + some minutes + some
milliseconds
originservers
public Internet
institutionalnetwork 10 Mbps LAN
1.5 Mbps access link
institutionalcache
2: Application Layer 10
Caching example (2)Possible solution increase bandwidth of
access link to, say, 10 Mbps
Consequences (if 10 Mbps) utilization on LAN = 15% utilization on access link = 15% Total delay = Internet delay +
access delay + LAN delay = 2 sec + some msecs + some
msecs often a costly upgrade
originservers
public Internet
institutionalnetwork 10 Mbps LAN
upgraded from 1.5 to 10 Mbps
institutionalcache
2: Application Layer 11
Caching example (3)Install cache suppose hit rate is .4Consequence 40% requests will be satisfied
almost immediately 60% requests satisfied by
origin server utilization of access link
reduced to 60%, resulting in negligible delays (say 10 msec)
total delay = Internet delay + access delay + LAN delay
= 60%*2 sec + 40%*0.01 secs + some milliseconds < 1.3 secs
originservers
public Internet
institutionalnetwork 10 Mbps LAN
1.5 Mbps access link
institutionalcache
2: Application Layer 12
More about Web caching Problem of Web caching
Extra space/machine (proxy) Inconsistency (out-of-date
objects…)
2: Application Layer 13
Consistency of Cached Objects Solution 1: no caching
2: Application Layer 14
Consistency of Cached Objects Solution 2: Manually update
2: Application Layer 15
Conditional GET Goal: don’t send object if
client has up-to-date cached version
client: specify date of cached copy in HTTP requestIf-modified-since:
<date> server: response contains
no object if cached copy is up-to-date: HTTP/1.0 304 Not
Modified
client serverHTTP request msgIf-modified-since:
<date>
HTTP responseHTTP/1.0
304 Not Modified
object not
modified
HTTP request msgIf-modified-since:
<date>
HTTP responseHTTP/1.0 200 OK
<data>
object modified
2: Application Layer 16
Hierarchical cache
Calculation HR of Proxy 1 = 90% HR of Proxy 2 = 95% Independent Joint HR = ?
99.5%
How to measure cache ? Hit Ratio (HR)
client
ProxyServer 1
client
origin server
ProxyServer 2
2: Application Layer 17
Hierarchical cache
Calculation HR of Proxy 1 = 90% HR of Proxy 2 = 95% Independent Joint HR = ?
How about average delay?
How to measure cache ? Hit Ratio (HR)
client
ProxyServer 1
client
origin server
ProxyServer 2
2: Application Layer 18
Content distribution networks (CDNs)
Ping www.Microsoft.com www.Netflix.com www.ibm.com www.apple.com
What did you see ?
origin server in North America
CDN distribution node
CDN serverin S. America CDN server
in Europe
CDN serverin Asia
2: Application Layer 19
Content distribution networks (CDNs)
Content replication CDN company (e.g., Akamai)
installs hundreds of CDN servers throughout Internet in lower-tier ISPs, close to
users Content providers (e.g.,
Netflix) are the CDN company’s customers.
CDN replicates its customers’ content in CDN servers. When provider updates content, CDN updates servers
origin server in North America
CDN distribution node
CDN serverin S. America CDN server
in Europe
CDN serverin Asia
2: Application Layer 20
CDN example
origin server www.foo.com distributes HTML Replaces: http://www.foo.com/sports.ruth.gif
with http://www.cdn.com/www.foo.com/sports/ruth.gif
HTTP request for www.foo.com/sports/sports.html
DNS query for www.cdn.com
HTTP request for www.cdn.com/www.foo.com/sports/ruth.gif
1
2
3
Origin server
CDNs authoritative DNS server
NearbyCDN server
CDN company cdn.com distributes gif files uses its authoritative
DNS server to route redirect requests
2: Application Layer 21
More about CDNsrouting requests CDN creates a “map”,
indicating distances from leaf ISPs and CDN nodes
when query arrives at authoritative DNS server: server determines ISP
from which query originates
uses “map” to determine best CDN server
Caching vs. CDN Pull: passive Push: active
2: Application Layer 22
Client-server architectureserver:
always-on host permanent IP address server farms for
scalingclients:
communicate with server may be intermittently
connected may have dynamic IP
addresses do not communicate
directly with each other
client/server
2: Application Layer 23
Pure P2P architecture no always-on server arbitrary end systems
directly communicate peers are intermittently
connected and change IP addresses self scalability – new peers
bring new resources Three topics: File distribution Searching for information Case Study: BitTorrent, Skype
peer-peer
2: Application Layer 24
P2P file sharingExample Alice runs P2P client
application on her notebook computer Intermittently
connects to Internet Asks for “X.mp3” Application displays
other peers that have copy of X.mp3.
Alice chooses one of the peers, Bob.
File is copied from Bob’s PC to Alice’s notebook: HTTP
While Alice downloads, other users uploading from Alice. Alice’s peer is both a
Web client and a transient Web server
All peers are servers = highly scalable!
2: Application Layer 25
File Distribution: Server-Client vs P2PQuestion : How much time to distribute file
from one server to N peers?
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
File, size F
us: server upload bandwidthui: peer i upload bandwidthdi: peer i download bandwidth
2: Application Layer 26
File distribution time: server-client
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F Server transmission:
must sequentially sends (upload) N copies: NF/us time
Client: each must download the file client i takes F/di time to
download
increases linearly in N(for large N)
= Dcs = max { NF/us, F/min(di) }i
Time to distribute F to N clients using
client/server approach
2: Application Layer 27
File distribution time: P2P
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F Server transmission:
must upload at least one copy: F/us time
Client: each must down a copy client i takes F/di time to download but also share (upload)
Clients (peers): as a whole must download NF bits fastest possible overall download
rate: us + ui
DP2P = max { F/us, F/min(di) , NF/(us + ui) }i
increases linearly in N
(for large N) ?
2: Application Layer 28
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35
N
Min
imum
Dis
tribu
tion
Tim
e P2PClient-Server
Server-client vs. P2P: exampleClient upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
2: Application Layer 29
File distribution: BitTorrent
tracker: tracks peers participating in torrent
torrent: group of peers exchanging chunks of a file
obtain listof peers
trading chunks
peer
P2P file distribution
Alice arrives …… obtains listof peers from tracker… and begins exchanging file chunks with peers in torrent
2: Application Layer 30
BitTorrent (1) file divided into 256KB chunks. peer joining torrent:
has no chunks, but will accumulate them over time
registers with tracker to get list of peers, connects to subset of peers (“neighbors”)
while downloading, peer uploads chunks to other peers.
peers may come and go: churn once peer has entire file, it may (selfishly) leave or
(altruistically) remain
2: Application Layer 31
BitTorrent (2)Requesting Chunks at any given time,
different peers have different subsets of file chunks
periodically, a peer (Alice) asks each neighbor for list of chunks that they have.
Alice sends requests for her missing chunks rarest first
Sending Chunks: tit-for-tat Alice sends chunks to four
neighbors currently sending her chunks at the highest rate re-evaluate top 4 every
10 secs every 30 secs: randomly
select another peer, starts sending chunks newly chosen peer may
join top 4 “optimistically unchoke”
2: Application Layer 32
BitTorrent: Tit-for-tat(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates(3) Bob becomes one of Alice’s top-four providers
With higher upload rate, can find better trading partners & get file faster!
2: Application Layer 33
P2P Case study: Skype inherently P2P: pairs
of users communicate. proprietary
application-layer protocol (inferred via reverse engineering)
hierarchical overlay with SNs
Index maps usernames to IP addresses; distributed over SNs
Skype clients (SC)
Supernode (SN)
Skype login server
2: Application Layer 34
Peers as relays Problem when both
Alice and Bob are behind “NATs”. NAT prevents an outside
peer from initiating a call to insider peer (see later)
Solution: Using Alice’s and Bob’s
SNs, Relay is chosen Each peer initiates
session with relay. Peers can now
communicate through NATs via relay
2: Application Layer 35
P2P: searching for information
Index in P2P system: maps information to peer location(location = IP address & port number)
So many files But, where are they ?
2: Application Layer 36
P2P: centralized directoryoriginal “Napster”
design1) when peer connects,
it informs central server: IP address content
2) Alice queries for “X.mp3”
3) Alice requests file from Bob
centralizeddirectory server
peers
Alice
Bob
1
1
1
12
3
2: Application Layer 37
P2P: problems with centralized directory Single point of failure Performance
bottleneck Copyright
infringement
file transfer is decentralized, but locating content is highly centralized
2: Application Layer 38
P2P: decentralized directory Each peer is either a
group leader or assigned to a group leader.
Group leader tracks the content in all its children.
Peer queries group leader; group leader may query other group leaders.
ord inary peer
group-leader peer
neighoring re la tionshipsin overlay network
2: Application Layer 39
More about decentralized directory
advantages of approach no centralized directory server
location service distributed over peers more difficult to shut down
disadvantages of approach bootstrap node needed group leaders can get overloaded
2: Application Layer 40
P2P: Query flooding Gnutella no hierarchy use bootstrap node to
learn about others join message
Send query to neighbors Neighbors forward query If queried peer has
object, it sends message back to querying peer
join
2: Application Layer 41
P2P: more on query floodingPros peers have similar
responsibilities: no group leaders
highly decentralized no peer maintains
directory info
Cons excessive query
traffic query radius: may
not have content when present
bootstrap node maintenance of
overlay network
2: Application Layer 42
DHT: A New Story… Motivation:
Frustrated by popularity of all these “half-baked” P2P apps
We can do better! Guaranteed lookup success for files in system Provable bounds on search time Provable scalability to millions of node
2: Application Layer 43
P2P: Content Addressing (Hash Routing)
Hash routing Given an object identifier I, calculate its hash value
H=hash(I), and (hopefully) find it (or its location info) in peer H
Not a new idea Load balancing – hash IP address, re-direct to different
servers
hash table
applicationget (key) data
node node node….
put(key, data)
2: Application Layer 44
Hash Routing Two alternatives
Node can cache each (existing) object that hashes within its range
Pointer-based: level of indirection - node caches pointer to location(s) of object
What’s new in P2P? Dynamic overlay
• peer join/leave• number of peers is not fixed
Traditional hash function doesn’t work• SHA-1
0-9999500-9999
1000-19991500-4999
9000-9500
4500-6999
8000-8999 7000-8500
2: Application Layer 45
Distributed Hash Table (DHT)Challenges For each object, node(s) whose range(s) cover that
object must be reachable via a “short” path # neighbors for each node should scale well (e.g.,
should not be O(N)) Fully distributed (no centralized bottleneck/single
point of failure) DHT mechanism should gracefully handle nodes
joining/leaving need to repartition the range space over existing
nodes need to reorganize neighbor set need bootstrap mechanism to connect new nodes into
the existing DHT infrastructure
2: Application Layer 46
Case Studies Structure overlay (p2p) systems – Consistent Hashing
Chord CAN (Content Addressable Network)
Key Questions Q1: How is hash space divided “evenly” among existing
nodes? Q2: How is routing implemented that connects an arbitrary
node to the node responsible for a given object? Q3: How is the hash space repartitioned when nodes
join/leave? Let N be the number of nodes in the overlay Let H be the size of the range of the hash function
(when applicable)
2: Application Layer 47
Chord Associate to each node and file a unique id in
an uni-dimensional space (a Ring) E.g., pick from the range [0...2m-1] Usually the hash of the file or IP address
Properties: Routing table size is O(log N) , where N is the total
number of nodes Guarantees that a file is found in O(log N) hops
from MIT in 2001
2: Application Layer 48
Consistent Hashing
N32
N90
N105
K80
K20
K5
Circular ID space
Key 5Node 105
A key is stored at its successor: node with next higher ID (Key – Hashed value of a file identifier)
2: Application Layer 49
Chord Basic Lookup
N32
N90
N105
N60
N10N120
K80
“ Where is key 80?”
“ N90 has K80”
2: Application Layer 50
Chord “Finger Table”
N80
1/21/4
1/8
1/161/321/641/128
Entry i in the finger table of node n is the first node that succeeds or equals n + 2i
In other words, the ith finger points 1/2n-i way around the ring
2: Application Layer 51
Chord Join Assume a hash space [0..7]
Node n1 joins0
1
2
34
5
6
7i id+2i succ0 2 11 3 12 5 1
Succ. Table
2: Application Layer 52
Chord Join
Node n2 joins0
1
2
34
5
6
7i id+2i succ0 2 21 3 12 5 1
Succ. Table
i id+2i succ0 3 11 4 12 6 1
Succ. Table
2: Application Layer 53
Chord Join
Nodes n0, n6 join 0
1
2
34
5
6
7i id+2i succ0 2 21 3 62 5 6
Succ. Table
i id+2i succ0 3 61 4 62 6 6
Succ. Table
i id+2i succ0 1 11 2 22 4 6
Succ. Table
i id+2i succ0 7 01 0 02 2 2
Succ. Table
2: Application Layer 54
Chord Join
Nodes: n1, n2, n0, n6
Keys: f7, f1
01
2
34
5
6
7 i id+2i succ0 2 21 3 62 5 6
Succ. Table
i id+2i succ0 3 61 4 62 6 6
Succ. Table
i id+2i succ0 1 11 2 22 4 6
Succ. Table
7
Key1
Key
i id+2i succ0 7 01 0 02 2 2
Succ. Table
2: Application Layer 55
Chord Routing Upon receiving a query for
file id, a node first calculates the key (Hash id)
Checks whether stores the key locally
If not, forwards the query to the largest node in its successor table that does not exceed the key
01
2
34
5
6
7 i id+2i succ0 2 21 3 62 5 6
Succ. Table
i id+2i succ0 3 61 4 62 6 6
Succ. Table
i id+2i succ0 1 11 2 22 4 6
Succ. Table
7
Key1
Key
i id+2i succ0 7 01 0 02 2 2
Succ. Table
query(7)
2: Application Layer 56
Chord Summary
Routing table size?Log N fingers
Routing time?Each hop expects to 1/2 the distance to the
desired key => expect O(log N) hops.
Note: so far only the basic Chord; many practical issues remain (not covered in this course though …)
2: Application Layer 57
A few words about BitCoin (and other digital/virtual currency) Two key issues for a currency
Generation (where does it come from ?) Distribution (how to use it, i.e., buy/sell
transactions?) BitCoin – open source p2p currency
Mining (hashing) Verification
2: Application Layer 58
Chapter 2: Summary
application service requirements: reliability, bandwidth,
delay client-server paradigm Internet transport
service model connection-oriented,
reliable: TCP unreliable, datagrams:
UDP
Our study of network apps now complete!
specific protocols: HTTP FTP SMTP, POP, IMAP DNS
content distribution caches, CDNs P2P
2: Application Layer 59
Chapter 2: Summary
typical request/reply message exchange: client requests info or
service server responds with
data, status code message formats:
headers: fields giving info about data
data: info being communicated
More importantly: learned about protocols
control vs. data msgs in-band, out-of-band
centralized vs. decentralized
stateless vs. stateful reliable vs. unreliable msg
transfer “complexity at network
edge” – many protocols security: authentication