computer networks and applications - webcms3 · pdf filecomputer networks and applications...
TRANSCRIPT
Introduction(Protocol Layering, Security) &
Application Layer (Principles, Web, FTP)
Computer Networks and Applications
Week 2
COMP 3331/COMP 9331
Reading Guide: Chapter 1, Sections 1.5 - 1.7 Chapter 2, Sections 2.1 – 2.3
1
Course Introduction
v Web: http://www.cse.unsw.edu.au/~cs3331 v Read course outline on the webpage v Labs begin in Week 2
§ Please attend your allocated slot § Please read the Tools of the Trade introduction
document § You get one week to work on your reports. Lab
Reports due in Week 3 before your next lab, e.g., for students attending the Monday 12noon lab, the report is due 11:59am, Monday.
§ Individual submissions please
RECAP
2
1. Introduction: roadmap 1.1 what is the Internet? 1.2 network edge
§ end systems, access networks, links
1.3 network core § packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks 1.5 protocol layers, service models 1.6 networks under attack: security 1.7 history
3
Three (networking) design steps
v Break down the problem into tasks
v Organize these tasks
v Decide who does what
4
Tasks in Networking
v What does it take to send packets across country?
v Simplistic decomposition: § Task 1: send along a single wire
§ Task 2: stitch these together to go across country
v This gives idea of what I mean by decomposition
5
Tasks in Networking (bottom up)
v Bits on wire v Packets on wire v Deliver packets within local network v Deliver packets across global network v Ensure that packets get to the destination v Do something with the data
6
Resulting Modules
v Bits on wire (Physical) v Packets on wire (Physical) v Delivery packets within local network (Datalink) v Deliver packets across global network (Network) v Ensure that packets get to the dst. (Transport) v Do something with the data (Application)
This is decomposition… Now, how do we organize these tasks?
7
Dear John, Your days are numbered.
--Pat
Inspiration… v CEO A writes letter to CEO B
§ Folds letter and hands it to administrative aide » Aide:
» Puts letter in envelope with CEO B’s full name
» Takes to FedEx
v FedEx Office § Puts letter in larger envelope § Puts name and street address on FedEx envelope § Puts package on FedEx delivery truck
v FedEx delivers to other company
8
CEO
Aide
FedEx
CEO
Aide
FedEx Location Fedex Envelope (FE)
The Path of the Letter
Letter
Envelope
Semantic Content
Identity
“Peers” on each side understand the same things No one else needs to (abstraction)
Lowest level has most packaging
9
The Path Through FedEx
Truck
Sorting Office
Airport
FE
Sorting Office
Airport
Truck
Sorting Office
Airport
Crate Crate
FE
New Crate Crate
FE
Higher “Stack” at Ends
Partial “Stack” During Transit
Deepest Packaging (Envelope+FE+Crate) at the Lowest Level of Transport
Highest Level of “Transit Stack” is Routing
10
In the context of the Internet
Applications
…built on…
…built on…
…built on…
…built on…
Reliable (or unreliable) transport
Best-effort global packet delivery
Best-effort local packet delivery
Physical transfer of bits
11
Internet protocol stack v application: supporting network
applications § FTP, SMTP, HTTP, Skype, ..
v transport: process-process data transfer § TCP, UDP
v network: routing of datagrams from source to destination § IP, routing protocols
v link: data transfer between neighboring network elements § Ethernet, 802.111 (WiFi), PPP
v physical: bits “on the wire” 12
Three Observations v Each layer:
§ Depends on layer below § Supports layer above § Independent of others
v Multiple versions in layer § Interfaces differ somewhat § Components pick which lower-
level protocol to use
v But only one IP layer § Unifying protocol
v v
Quiz: What are the benefits of layering?
2-15
An Example: No Layering
v No layering: each new application has to be re-implemented for every network technology !
ssh HTTP
Wireless Ether-net
Fiber optic
Application
Transmission Media
Skype
An Example: Benefit of Layering
v Introducing an intermediate layer provides a common abstraction for various network technologies
Skype ssh HTTP
Wireless Ethernet Fiber optic
Application
Transmission Media
Transport & Network
16
v Layer N may duplicate lower level functionality § E.g., error recovery to retransmit lost data
v Information hiding may hurt performance § E.g. packet loss due to corruption vs. congestion
v Headers start to get really big § E.g., typically TCP + IP + Ethernet headers add up to
54 bytes v Layer violations when the gains too great to resist
§ E.g., TCP-over-wireless v Layer violations when network doesn’t trust ends
§ E.g., Firewalls
Is Layering Harmful?
17
Distributing Layers Across Network
v Layers are simple if only on a single machine § Just stack of modules interacting with those above/
below
v But we need to implement layers across machines § Hosts § Routers § Switches
v What gets implemented where?
18
What Gets Implemented on Host?
v Bits arrive on wire, must make it up to application
v Therefore, all layers must exist at host!
19
What Gets Implemented on Router?
v Bits arrive on wire § Physical layer necessary
v Packets must be delivered to next-hop § datalink layer necessary
v Routers participate in global delivery § Network layer necessary
v Routers don’t support reliable delivery § Transport layer (and above) not supported
20
21
Internet Layered Architecture
HTTP
TCP
IP
Ethernet interface
HTTP
TCP
IP
Ethernet interface
IP IP
Ethernet interface
Ethernet interface
SONET interface
SONET interface
host host
router router
HTTP message
TCP segment
IP packet IP packet IP packet
21
Logical Communication
v Layers interacts with peer’s corresponding layer
Transport Network Datalink Physical
Transport Network Datalink Physical
Network Datalink Physical
Application Application
Host A Host B Router
22
Physical Communicationv Communication goes down to physical network v Then from network peer to peer v Then up to relevant layer
Transport Network Datalink Physical
Transport Network Datalink Physical
Network Datalink Physical
Application Application
Host A Host B Router
23
source application transport network
link physical
Ht Hn M
segment Ht
datagram
destination
application transport network
link physical
Ht Hn Hl M
Ht Hn M
Ht M
M
network link
physical
link physical
Ht Hn Hl M
Ht Hn M
Ht Hn M
Ht Hn Hl M
router
switch
Encapsulation message M
Ht M
Hn
frame
24
1. Introduction: roadmap 1.1 what is the Internet? 1.2 network edge
§ end systems, access networks, links
1.3 network core § packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks 1.5 protocol layers, service models 1.6 networks under attack: security 1.7 history
25
Self Study
Network security
v field of network security: § how bad guys can attack computer networks § how we can defend networks against attacks § how to design architectures that are immune to
attacks v Internet not originally designed with (much)
security in mind § original vision: “a group of mutually trusting users
attached to a transparent network” J § Internet protocol designers playing “catch-up” § security considerations in all layers!
26 Disclaimer: This is a high-level view, details will be covered later
Self Study
Bad guys: put malware into hosts via Internet
v malware can get in host from: § virus: self-replicating infection by receiving/executing
object (e.g., e-mail attachment)
§ worm: self-replicating infection by passively receiving object that gets itself executed
v spyware malware can record keystrokes, web sites visited, upload info to collection site
v infected host can be enrolled in botnet, used for spam. DDoS attacks
27
Self Study
target
Denial of Service (DoS): attackers make resources (server, bandwidth) unavailable to legitimate traffic by overwhelming resource with bogus traffic
1. select target
2. break into hosts around the network (see botnet)
3. send packets to target from compromised hosts
Bad guys: attack server, network infrastructure
28
Self Study
Bad guys can sniff packets packet “sniffing”:
§ broadcast media (shared ethernet, wireless) § promiscuous network interface reads/records all packets
(e.g., including passwords!) passing by
A
B
C
src:B dest:A payload
v wireshark software used for end-of-chapter labs is a (free) packet-sniffer
29
Self Study
Bad guys can use fake addresses
IP spoofing: send packet with false source address
A
B
C
src:B dest:A payload
… lots more on security (throughout, Chapter 8)
30
Self Study
Source: www.dilbert.com
31
1. Introduction : roadmap 1.1 what is the Internet? 1.2 network edge
§ end systems, access networks, links
1.3 network core § packet switching, circuit switching, network structure
1.4 delay, loss, throughput in networks 1.5 protocol layers, service models 1.6 networks under attack: security 1.7 history
Self Study
Hoobes’Internet timeline: http://www.zakon.org/robert/internet/timeline/
32
Internet history
v 1961: Kleinrock - queueing theory shows effectiveness of packet-switching
v 1964: Baran - packet-switching in military nets
v 1967: ARPAnet conceived by Advanced Research Projects Agency
v 1969: first ARPAnet node operational
v 1972: § ARPAnet public demo § NCP (Network Control
Protocol) first host-host protocol
§ first e-mail program § ARPAnet has 15 nodes
1961-1972: Early packet-switching principles
Self Study
33
v 1970: ALOHAnet satellite network in Hawaii
v 1974: Cerf and Kahn - architecture for interconnecting networks
v 1976: Ethernet at Xerox PARC v late70’s: proprietary
architectures: DECnet, SNA, XNA
v late 70’s: switching fixed length packets (ATM precursor)
v 1979: ARPAnet has 200 nodes
Cerf and Kahn’s internetworking principles: § minimalism, autonomy - no
internal changes required to interconnect networks
§ best effort service model § stateless routers § decentralized control
define today’s Internet architecture
1972-1980: Internetworking, new and proprietary nets Internet history Self Study
34
v 1983: deployment of TCP/IP
v 1982: smtp e-mail protocol defined
v 1983: DNS defined for name-to-IP-address translation
v 1985: ftp protocol defined v 1988: TCP congestion
control
v new national networks: Csnet, BITnet, NSFnet, Minitel
v 100,000 hosts connected to confederation of networks
1980-1990: new protocols, a proliferation of networks Internet history Self Study
35
v early 1990’s: ARPAnet decommissioned
v 1991: NSF lifts restrictions on commercial use of NSFnet (decommissioned, 1995)
v early 1990s: Web § hypertext [Bush 1945, Nelson
1960’s] § HTML, HTTP: Berners-Lee § 1994: Mosaic, later Netscape § late 1990’s:
commercialization of the Web
late 1990’s – 2000’s: v more killer apps: instant
messaging, P2P file sharing v network security to
forefront v est. 50 million host, 100
million+ users v backbone links running at
Gbps
1990, 2000’s: commercialization, the Web, new apps
Internet history Self Study
36
2005-present v ~750 million hosts
§ Smartphones and tablets
v Aggressive deployment of broadband access v Increasing ubiquity of high-speed wireless access v Emergence of online social networks:
§ Facebook: soon one billion users v Service providers (Google, Microsoft) create their own
networks § Bypass Internet, providing “instantaneous” access to
search, emai, etc. v E-commerce, universities, enterprises running their
services in “cloud” (eg, Amazon EC2)
Internet history Self Study
37
Introduction: summary
covered a “ton” of material! v Internet overview v what’s a protocol? v network edge, core, access
network § packet-switching versus
circuit-switching § Internet structure
v performance: loss, delay, throughput
v layering, service models v security v history
you now have: v context, overview, “feel”
of networking v more depth, detail to
follow!
38
2. Application Layer: outline
2.1 principles of network applications
2.2 Web and HTTP 2.3 FTP 2.4 electronic mail
§ SMTP, POP3, IMAP 2.5 DNS
2.6 P2P applications 2.7 socket programming
with UDP and TCP
39
2. Application layer
our goals: v conceptual,
implementation aspects of network application protocols § transport-layer
service models § client-server
paradigm § peer-to-peer
paradigm
v learn about protocols by examining popular application-level protocols § HTTP § FTP § SMTP / POP3 / IMAP § DNS
v creating network applications § socket API
40
Some network apps
v e-mail v web v text messaging v remote login v P2P file sharing v multi-user network games v streaming stored video
and audio (YouTube, NetFlix, Spotify)
v voice over IP (e.g., Skype) v real-time video
conferencing v social networking v Search v virtual reality v …
41
Creating a network app write programs that: v run on (different) end systems v communicate over network v e.g., web server software communicates
with browser software
Varying degrees of integration v Loose: email, web browsing v Medium: chat, Skype, remote file systems v Tight: process migration, distributed file
systems
no need to write software for network-core devices
v network-core devices do not run user applications
v applications on end systems allows for rapid app development, propagation
application transport network data link physical
application transport network data link physical
application transport network data link physical
42
Interprocess Communication (IPC)
v Processes talk to each other through Inter-process communication (IPC)
v On a single machine: § Shared memory
v Across machines: § We need other abstractions (message passing)
43
Shared Segment
Interprocess Communication (IPC)
• In order to cooperate, need to communicate • Achieved via IPC: interprocess communication
– ability for a process to communicate with another
• On a single machine: – Shared memory
• Across machines:
– We need other abstractions (message passing)
Text
Data
Stack
Text
Data
Stack P1 P2
Sockets v process sends/receives messages to/from its socket v socket analogous to door
§ sending process shoves message out door § sending process relies on transport infrastructure on other
side of door to deliver message to socket at receiving process
v Application has a few options, OS handles the details
Internet
controlled by OS
controlled by app developer
transport
application
physical
link
network
process
transport
application
physical
link
network
process socket
44
Addressing processes v to receive messages,
process must have identifier v host device has unique 32-
bit IP address v Q: does IP address of host
on which process runs suffice for identifying the process?
v identifier includes both IP address and port numbers associated with process on host.
v example port numbers: § HTTP server: 80 § mail server: 25
v to send HTTP message to cse.unsw.edu.au web server: § IP address: 129.94.242.51 § port number: 80
v more shortly…
§ A: no, many processes can be running on same host
45
Client-server architecture server: v Exports well-defined request/
response interface v long-lived process that waits for
requests v Upon receiving request, carries
it out
clients: v Short-lived process that makes
requests v “User-side” of application v Initiates the communication
client/server
46
Client versus Server
v Server § Always-on host § Permanent IP address
(rendezvous location) § Static port conventions
(http: 80, email: 25, ssh:22)
§ Data centres for scaling § May communicate with
other servers to respond
v Client § May be intermittently
connected § May have dynamic IP
addresses § Do not communicate
directly with each other
47
P2P architecture v no always-on server
§ No permanent rendezvous involved
v arbitrary end systems (peers) directly communicate
v Symmetric responsibility (unlike client/server)
v Often used for: § File sharing (BitTorrent) § Games § Video distribution, video chat § In general: “distributed
systems”
peer-peer
48
v In P2P architecture are there clients and servers?
v A. Yes
v B. No
49
Quiz: Peer-to-Peer
P2P architecture: Pros and Cons + peers request service from other peers, provide service in return to other peers
§ self scalability – new peers bring new service capacity, as well as new service demands
+ Speed: parallelism, less contention + Reliability: redundancy, fault tolerance + Geographic distribution
- Fundamental problems of decentralized
control § State uncertainty: no shared memory or
clock § Action uncertainty: mutually conflicting
decisions
- Distributed algorithms are complex
peer-peer
50
App-layer protocol defines v types of messages
exchanged, § e.g., request, response
v message syntax: § what fields in messages
& how fields are delineated
v message semantics § meaning of information
in fields v rules for when and how
processes send & respond to messages
open protocols: v defined in RFCs v allows for interoperability v e.g., HTTP, SMTP proprietary protocols: v e.g., Skype
51
What transport service does an app need? data integrity v some apps (e.g., file transfer,
web transactions) require 100% reliable data transfer
v other apps (e.g., audio) can tolerate some loss
timing v some apps (e.g., Internet
telephony, interactive games) require low delay to be “effective”
throughput v some apps (e.g.,
multimedia) require minimum amount of throughput to be “effective”
v other apps (“elastic apps”) make use of whatever throughput they get
security v encryption, data integrity,
…
52
Self Study
Transport service requirements: common apps
application
file transfer e-mail
Web documents real-time audio/video
stored audio/video interactive games
text messaging
data loss no loss no loss no loss loss-tolerant loss-tolerant loss-tolerant no loss
throughput elastic elastic elastic audio: 5kbps-1Mbps video:10kbps-5Mbps same as above few kbps up elastic
time sensitive no no no yes, 100’s msec yes, few msecs yes, 100’s msec yes and no
53
Self Study
Internet transport protocols services
TCP service: v reliable transport between
sending and receiving process
v flow control: sender won’t overwhelm receiver
v congestion control: throttle sender when network overloaded
v does not provide: timing, minimum throughput guarantee, security
v connection-oriented: setup required between client and server processes
UDP service: v unreliable data transfer
between sending and receiving process
v does not provide: reliability, flow control, congestion control, timing, throughput guarantee, security, orconnection setup,
Q: why bother? Why is there a UDP?
NOTE: More on transport in Weeks 4 and 5 54
Self Study
Internet apps: application, transport protocols
application
e-mail remote terminal access
Web file transfer
streaming multimedia
Internet telephony
application layer protocol SMTP [RFC 2821] Telnet [RFC 854] HTTP [RFC 2616] FTP [RFC 959] HTTP (e.g., YouTube), RTP [RFC 1889] SIP, RTP, proprietary (e.g., Skype)
underlying transport protocol TCP TCP TCP TCP TCP or UDP TCP or UDP
55
Self Study
Securing TCP
TCP & UDP v no encryption v cleartext passwds sent
into socket traverse Internet in cleartext
SSL v provides encrypted
TCP connection v data integrity v end-point
authentication
SSL is at app layer v Apps use SSL libraries,
which “talk” to TCP SSL socket API v cleartext passwds sent
into socket traverse Internet encrypted
v See Chapter 7
56
Self Study
2. Application Layer: outline
2.1 principles of network applications § app architectures § app requirements
2.2 Web and HTTP 2.3 FTP 2.4 electronic mail
§ SMTP, POP3, IMAP 2.5 DNS
2.6 P2P applications 2.7 socket programming
with UDP and TCP
Note: Some of the material here, particularly the descriptive details of various protocols is for self-study. Lectures will focus on design principles.
57
The Web – Precursorv 1967, Ted Nelson, Xanadu:
§ A world-wide publishing network that would allow information to be stored not as separate files but as connected literature
§ Owners of documents would be automatically paid via electronic means for the virtual copying of their documents
v Coined the term “Hypertext” Ted Nelson
58
The Web – Historyv World Wide Web (WWW): a
distributed database of “pages” linked through Hypertext Transport Protocol (HTTP) § First HTTP implementation - 1990
• Tim Berners-Lee at CERN
§ HTTP/0.9 – 1991 • Simple GET command for the Web
§ HTTP/1.0 –1992 • Client/Server information, simple caching
§ HTTP/1.1 - 1996
Tim Berners-Lee
59
http://info.cern.ch/hypertext/WWW/TheProject.html
Web and HTTP
First, a review… v web page consists of objects v object can be HTML file, JPEG image, Java applet,
audio file,… v web page consists of base HTML-file which
includes several referenced objects v each object is addressable by a URL, e.g., www.someschool.edu/someDept/pic.gif
host name path name
60
Uniform Record Locator (URL)
protocol://host-name[:port]/directory-path/resource
v protocol: http, ftp, https, smtp, rtsp, etc. v hostname: DNS name, IP address v port: defaults to protocol’s standard port; e.g. http: 80 https: 443 v directory path: hierarchical, reflecting file system v resource: Identifies the desired resource
61
Uniform Record Locator (URL)
protocol://host-name[:port]/directory-path/resource
v Extend the idea of hierarchical hostnames to include anything in a file system § http://www.cse.unsw.edu.au/~salilk/papers/journals/TMC2012.pdf
v Extend to program executions as well… § http://us.f413.mail.yahoo.com/ym/ShowLetter?box=%40B
%40Bulk&MsgId=2604_1744106_29699_1123_1261_0_28917_3552_1289957100&Search=&Nhead=f&YY=31454&order=down&sort=date&pos=0&view=a&head=b
§ Server side processing can be incorporated in the name
62
HTTP overview
HTTP: hypertext transfer protocol
v Web’s application layer protocol
v client/server model § client: browser that
requests, receives, (using HTTP protocol) and “displays” Web objects
§ server: Web server sends (using HTTP protocol) objects in response to requests
PC running Firefox browser
server running
Apache Web server
iphone running Safari browser
63
HTTP overview (continued)
uses TCP: v client initiates TCP
connection (creates socket) to server, port 80
v server accepts TCP connection from client
v HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)
v TCP connection closed
HTTP is “stateless” v server maintains no
information about past client requests
protocols that maintain “state” are complex!
v past history (state) must be maintained
v if server/client crashes, their views of “state” may be inconsistent, must be reconciled
aside
64
HTTP request message
v two types of HTTP messages: request, response v HTTP request message:
§ ASCII (human-readable format)
request line (GET, POST, HEAD commands)
header lines
carriage return, line feed at start of line indicates end of header lines
GET /index.html HTTP/1.1\r\n Host: www-net.cs.umass.edu\r\n User-Agent: Firefox/3.6.10\r\n Accept: text/html,application/xhtml+xml\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n Keep-Alive: 115\r\n Connection: keep-alive\r\n \r\n
carriage return character line-feed character
65
HTTP response message
status line (protocol status code status phrase)
header lines
data, e.g., requested HTML file
HTTP/1.1 200 OK\r\n Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n Server: Apache/2.0.52 (CentOS)\r\n Last-Modified: Tue, 30 Oct 2007 17:00:02 GMT
\r\n ETag: "17dc6-a5c-bf716880"\r\n Accept-Ranges: bytes\r\n Content-Length: 2652\r\n Keep-Alive: timeout=10, max=100\r\n Connection: Keep-Alive\r\n Content-Type: text/html;
charset=ISO-8859-1\r\n \r\n data data data data data ...
66
HTTP response status codes
200 OK § request succeeded, requested object later in this msg
301 Moved Permanently § requested object moved, new location specified later in this msg
(Location:)
400 Bad Request § request msg not understood by server
404 Not Found § requested document not found on this server
505 HTTP Version Not Supported 451 Unavailable for Legal Reasons 429 Too Many Requests 418 I’m a Teapot
v status code appears in 1st line in server-to-client response message. v some sample codes:
67
HTTP is all text
v Makes the protocol simple § Easy to delineate messages (\r\n) § (relatively) human-readable § No issues about encoding or formatting data § Variable length data
v Not the most efficient § Many protocols use binary fields
• Sending “12345678” as a string is 8 bytes • As an integer, 12345678 needs only 4 bytes
§ Headers may come in any order § Requires string parsing/processing
68
Request Method types (“verbs”)
HTTP/1.0: v GET
§ Request page
v POST § Uploads user response to a
form v HEAD
§ asks server to leave requested object out of response
HTTP/1.1: v GET, POST, HEAD v PUT
§ uploads file in entity body to path specified in URL field
v DELETE § deletes file specified in the
URL field v TRACE, OPTIONS,
CONNECT, PATCH § For persistent connections
69
Uploading form input
POST method: v web page often includes form input v input is uploaded to server in entity body
Get (in-URL) method: v uses GET method v input is uploaded in URL field of request line:
www.somesite.com/animalsearch?monkeys&banana
70
GET vs. POST
v GET can be used for idempotent requests § Idempotence: an operation can be applied multiple
times without changing the result (the final state is the same)
v POST should be used when.. § A request changes the state of the session or server or
DB § Sending a request twice would be harmful
• (Some) browsers warn about sending multiple post requests
§ Users are inputting non-ascii characters § Input may be very large § You want to hide how the form works/user input 71
72
Quiz: When might you use GET vs. POST
GET POST A. Forum post Search terms, Pizza order B. Search terms, Pizza order Forum post C Search terms Forum post, Pizza order D. Forum post, Search
terms, Pizza order
E. Forum post, Search terms, Pizza order
Trying out HTTP (client side) for yourself 1. Telnet to your favorite Web server:
opens TCP connection to port 80 (default HTTP server port) at cis.poly.edu. anything typed in sent to port 80 at cis.poly.edu
telnet cis.poly.edu 80
2. type in a GET HTTP request: GET /~ross/ HTTP/1.1 Host: cis.poly.edu
by typing this in (hit carriage return twice), you send this minimal (but complete) GET request to HTTP server
3. look at response message sent by HTTP server! (or use Wireshark to look at captured HTTP request/response)
Web-based sniffer: http://web-sniffer.net/ 73
Your 3rd lab
State(less)
74
State(less)
(XKCD #869, “Server Attention Span”) XKCD #869, “Server Attention Span”
User-server state: cookies
many Web sites use cookies four components:
1) cookie header line of HTTP response message
2) cookie header line in next HTTP request message
3) cookie file kept on user’s host, managed by user’s browser
4) back-end database at Web site
example: v Susan always access Internet
from PC v visits specific e-commerce
site for first time v when initial HTTP requests
arrives at site, site creates: § unique ID § entry in backend
database for ID
75
Cookies: keeping “state” (cont.) client server
usual http response msg
usual http response msg
cookie file
one week later:
usual http request msg cookie: 1678 cookie-
specific action
access
ebay 8734 usual http request msg Amazon server creates ID
1678 for user create entry
usual http response set-cookie: 1678 ebay 8734
amazon 1678
usual http request msg cookie: 1678 cookie-
specific action
access ebay 8734 amazon 1678
backend database
76
Cookies (continued) what cookies can be used
for: v authorization v shopping carts v recommendations v user session state (Web
e-mail)
cookies and privacy: v cookies permit sites to
learn a lot about you v you may supply name and
e-mail to sites
aside
how to keep “state”: v protocol endpoints: maintain state at
sender/receiver over multiple transactions
v cookies: http messages carry state
77
The Dark Side of Cookies
v Cookies permit sites to learn a lot about you
v You may supply name and e-mail to sites (and more)
v 3rd party cookies (from ad networks, etc) can follow you across multiple sites § Ever visit a website, and the next day ALL your ads are from
them ? • Check your browser’s cookie file (cookies.txt, cookies.plist) • Do you see a website that you have never visited
v You COULD turn them off § But good luck doing anything on the Internet !!
78
Third party cookies
Doubleclick server
Banner 1 url
Create cookie for doubleclick: 3445
Banner 2 url
Cookie:3445
Website A Website B
For more, check the following link and follow the references: http://en.wikipedia.org/wiki/HTTP_cookie
79
Performance Goals
v User § fast downloads § high availability
v Content provider § happy users (hence, above) § cost-effective infrastructure
v Network (secondary) § avoid overload
80
Solutions?
v User § fast downloads § high availability
v Content provider § happy users (hence, above) § cost-effective infrastructure
v Network (secondary) § avoid overload
Improve HTTP to achieve faster downloads
81
Solutions?
v User § fast downloads § high availability
v Content provider § happy users (hence, above) § cost-effective delivery infrastructure
v Network (secondary) § avoid overload
Caching and Replication
82
Improve HTTP to achieve faster downloads
Solutions?
v User § fast downloads § high availability
v Content provider § happy users (hence, above) § cost-effective delivery infrastructure
v Network (secondary) § avoid overload
Caching and Replication
Exploit economies of scale (Webhosting, CDNs, datacenters)
Improve HTTP to compensate for TCP’s weak spots
83
HTTP Performancev Most Web pages have multiple objects
§ e.g., HTML file and a bunch of embedded images v How do you retrieve those objects (naively)?
§ One item at a time v New TCP connection per (small) object!
non-persistent HTTP v at most one object sent over TCP connection
§ connection then closed v downloading multiple objects required multiple
connections
84
Non-persistent HTTP suppose user enters URL:
1a. HTTP client initiates TCP connection to HTTP server (process) at www.someSchool.edu on port 80
2. HTTP client sends HTTP request message (containing URL) into TCP connection socket. Message indicates that client wants object someDepartment/home.index
1b. HTTP server at host www.someSchool.edu waiting for TCP connection at port 80. “accepts” connection, notifying client
3. HTTP server receives request message, forms response message containing requested object, and sends message into its socket time
(contains text, references to 10
jpeg images) www.someSchool.edu/someDepartment/home.index
85
Non-persistent HTTP (cont.)
5. HTTP client receives response message containing html file, displays html. Parsing html file, finds 10 referenced jpeg objects
6. Steps 1-5 repeated for each of 10 jpeg objects
4. HTTP server closes TCP connection.
time
86
Non-persistent HTTP: response time
RTT (definition): time for a small packet to travel from client to server and back
HTTP response time: v one RTT to initiate TCP
connection v one RTT for HTTP request
and first few bytes of HTTP response to return
v file transmission time v non-persistent HTTP
response time = 2RTT+ file transmission
time
time to transmit file
initiate TCP connection
RTT
request file
RTT
file received
time time
87
Internet
Improving HTTP Performance:
Concurrent Requests & Responses
v Use multiple connections in parallel
v Does not necessarily maintain order of responses
• Client = J • Content provider = J
• Network = L Why?
R1 R2 R3
T1
T2 T3
88
Nonpersistent HTTP issues: v requires 2 RTTs per object v OS must work and allocate host
resources for each TCP connection v but browsers often open parallel
TCP connections to fetch referenced objects
Persistent HTTP v server leaves connection open after
sending response v subsequent HTTP messages
between same client/server are sent over connection
v Allow TCP to learn more accurate RTT estimate (APPARENT LATER)
v Allow TCP congestion window to increase (APPARENT LATER)
v i.e., leverage previously discovered bandwidth (APPARENT LATER)
Persistent without pipelining: v client issues new request only
when previous response has been received
v one RTT for each referenced object
Persistent with pipelining: v default in HTTP/1.1 v client sends requests as soon as it
encounters a referenced object v as little as one RTT for all the
referenced objects
Persistent HTTP
89
HTTP 1.1: response time
90
initiate TCP connection
RTT
request file
RTT
file received
time time
Internet
time to transmit file
Website with one index page and three embedded objects
Scorecard: Getting n Small Objects
Time dominated by latency (i.e. RTT)
v One-at-a-time: ~2n RTT v M concurrent: ~2[n/m] RTT v Persistent: ~ (n+1)RTT v Pipelined+ Persistent: ~2 RTT
91
v Among the following, in which case would you get the greatest improvement in performance with persistent HTTP compared to non-persistent HTTP?
A. Low throughput network paths (irrespective of distance)
B. High throughput network paths (irrespective of distance)
C. Long-distance network paths (irrespective of throughput)
D. High throughput, short-distance network paths
E. High throughput, long-distance network paths 92
Quiz: Persistent vs. Non-persistent HTTP
v Pipelining allows the client to send multiple HTTP requests on a single TCP connection without waiting for the corresponding responses. What could be a potential bottleneck despite using pipelining?
93
Quiz: Pipelining
Improving HTTP Performance: Caching
v Why does caching work? § Exploits locality of reference
v How well does caching work? § Very well, up to a limit § Large overlap in content § But many unique requests
94
Web caches (proxy server)
v user sets browser: Web accesses via cache
v browser sends all HTTP requests to cache § object in cache: cache
returns object § else cache requests
object from origin server, then returns object to client
goal: satisfy client request without involving origin server
client
proxy server
client origin server
origin server
95
More about Web caching
v cache acts as both client and server § server for original
requesting client § client to origin server
v typically cache is installed by ISP (university, company, residential ISP)
why Web caching? v reduce response time
for client request v reduce traffic on an
institution’s access link v Internet dense with
caches: enables “poor” content providers to effectively deliver content
96
Caching example:
origin servers
public Internet
institutional network
1 Gbps LAN
1.54 Mbps access link
assumptions: v avg object size: 100K bits v avg request rate from
browsers to origin servers:15/sec
v avg data rate to browsers: 1.50 Mbps
v RTT from institutional router to any origin server: 2 sec
v access link rate: 1.54 Mbps
consequences: v LAN utilization: 15% v access link utilization = 99% v total delay = Internet delay +
access delay + LAN delay = 2 sec + minutes + usecs
problem!
97
assumptions: v avg object size: 100K bits v avg request rate from
browsers to origin servers:15/sec
v avg data rate to browsers: 1.50 Mbps
v RTT from institutional router to any origin server: 2 sec
v access link rate: 1.54 Mbps
consequences: v LAN utilization: 15% v access link utilization = 99% v total delay = Internet delay + access
delay + LAN delay = 2 sec + minutes + usecs
Caching example: fatter access link
origin servers
1.54 Mbps access link
154 Mbps
154 Mbps
msecs Cost: increased access link speed (not cheap!)
9.9%
public Internet
institutional network
1 Gbps LAN
98
institutional network
1 Gbps LAN
Caching example: install local cache
origin servers
1.54 Mbps access link
local web cache
assumptions: v avg object size: 100K bits v avg request rate from
browsers to origin servers:15/sec
v avg data rate to browsers: 1.50 Mbps
v RTT from institutional router to any origin server: 2 sec
v access link rate: 1.54 Mbps
consequences: v LAN utilization: v access link utilization = v total delay =
? ?
How to compute link utilization, delay?
Cost: web cache (cheap!)
public Internet
?
99
Caching example: install local cache Calculating access link
utilization, delay with cache: v suppose cache hit rate is 0.4
§ 40% requests satisfied at cache, 60% requests satisfied at origin
origin servers
1.54 Mbps access link
v access link utilization: § 60% of requests use access link
v data rate to browsers over access link = 0.6*1.50 Mbps = .9 Mbps § utilization = 0.9/1.54 = .58
v total delay § = 0.6 * (delay from origin servers) +0.4
* (delay when satisfied at cache) § = 0.6 (2.01) + 0.4 (~msecs) § = ~ 1.2 secs § less than with 154 Mbps link (and
cheaper too!)
public Internet
institutional network
1 Gbps LAN local web
cache
100
v Distribution of web object requests generally follows a Zipf-like distribution
v The probability that a document will be referenced k requests after it was last referenced is roughly proportional to 1/k . That is, web traces exhibit excellent temporal locality.
101
But what is the likelihood of cache hits?
Paper – “Web Caching and Zipf-like Distributions: Evidence and Implications” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.34.8742&rep=rep1&type=pdf
Video content exhibits similar properties: 10% of the top popular videos account for nearly 80% of views, while the remaining 90% of videos account for total 20% of requests. Paper – http://yongyeol.com/papers/cha-video-2009.pdf
Conditional GET
v Goal: don’t send object if cache has up-to-date cached version § no object transmission
delay § lower link utilization
v cache: specify date of cached copy in HTTP request If-modified-since: <date>
v server: response contains no object if cached copy is up-to-date: HTTP/1.0 304 Not Modified
HTTP request msg If-modified-since: <date>
HTTP response HTTP/1.0
304 Not Modified
object not
modified before <date>
HTTP request msg If-modified-since: <date>
HTTP response HTTP/1.0 200 OK
<data>
object modified
after <date>
client server
102
Example Cache Check Request
103
Example Cache Check Response
104
v Replicate popular Web site across many machines § Spreads load on servers § Places content closer to clients § Helps when content isn’t cacheable
v Problem:
§ Want to direct client to particular replica • Balance load across server replicas • Pair clients with nearby servers
§ Expensive
v Common solution: § DNS returns different addresses based on client’s geo
location, server load, etc.
Improving HTTP Performance: Replication
105
v Caching and replication as a service v Integrate forward and reverse caching functionality v Large-scale distributed storage infrastructure (usually)
administered by one entity § e.g., Akamai has servers in 20,000+ locations
v Combination of (pull) caching and (push) replication § Pull: Direct result of clients’ requests § Push: Expectation of high access rate
v Also do some processing § Handle dynamic web pages § Transcoding § Maybe do some security function – watermark IP
106
Improving HTTP Performance: CDN
What about HTTPS?
v HTTP is insecure v HTTP basic authentication: password sent using
base64 encoding (can be readily converted to plaintext)
v HTTPS: HTTP over a connection encrypted by Transport Layer Security (TLS)
v Provides: § Authentication § Bidirectional encryption
v Widely used in place of plain vanilla HTTP
107
What’s on the horizon: HTTP/2 v Standardised in May 2015: RFC 7540 v Improvements
§ Severs can push content and thus reduce overhead of an additional request cycle
§ Fully multiplexed • Requests and responses are sliced in smaller chunks called frames,
frames are tagged with and ID that connects data to the request/response
• overcomes Head-of-line blocking in HTTP 1.1 § Prioritisation of the order in which objects should be sent (e.g.
CSS files may be given higher priority) § Data compression of HTTP headers
• Some headers such as cookies can be very long • Repetitive information
108
More details: https://http2.github.io/faq/ Demo: https://http2.akamai.com/demo
2. Application Layer: outline
2.1 principles of network applications § app architectures § app requirements
2.2 Web and HTTP 2.3 FTP 2.4 electronic mail
§ SMTP, POP3, IMAP 2.5 DNS
2.6 P2P applications 2.7 socket programming
with UDP and TCP
Self Study
109
FTP: the file transfer protocol file transfer
FTP server
FTP user
interface FTP client
local file system
remote file system
user at host
v transfer file to/from remote host v client/server model
§ client: side that initiates transfer (either to/from remote) § server: remote host
v ftp: RFC 959 v ftp server: port 21
Self Study
110
FTP: separate control, data connections
v FTP client contacts FTP server at port 21, using TCP
v client authorized over control connection
v client browses remote directory, sends commands over control connection
v when server receives file transfer command, server opens 2nd TCP data connection (for file) to client
v after transferring one file, server closes data connection
FTP client
FTP server
TCP control connection, server port 21
TCP data connection, server port 20
v server opens another TCP data connection to transfer another file
v control connection: “out of band”
v FTP server maintains “state”: current directory, earlier authentication
Self Study
111
FTP commands, responses sample commands: v sent as ASCII text over
control channel v USER username v PASS password v LIST return list of file in
current directory v RETR filename
retrieves (gets) file v STOR filename stores
(puts) file onto remote host
sample return codes v status code and phrase (as
in HTTP) v 331 Username OK, password required
v 125 data connection already open; transfer starting
v 425 Can’t open data connection
v 452 Error writing file
Self Study
112
Active FTP
v Client connects from port N (N>1023) to FTP server listening on port 20
v Sends a command “PORT N+1” to the FTP server
v Server sends back ACK v FTP server’s port 20 opens a
TCP connection with port N+1 on the client’s host
v Client sends back ACK v Issues with firewalls - client’s
Sys Admin may prevent incoming TCP connections
Self Study
113
Passive FTP
v Client initiates both connections - hence OK with firewalls
v Client connects from port N (N>1023) to FTP server listening on port 20
v Sends a command “PASV” to the FTP server
v FTP server opens a listening socket on some port X (not 20) and replies to the client with “X”
v Client connects to port X v Server sends back ACK
Self Study
114
Summary v Completed Introduction (Chapter 1)
§ Solve Sample Problem Set § Check questions on website
v Application Layer (Chapter 2) § Principles of Network Applications § HTTP § FTP
v Next Week § Application Layer (contd.)
• E-mail • DNS • P2P
§ First Programming assignment will be released
115
Solve all sample problems Reading Exercise Chapter 2: 2.4 – 2.7