web server design week 1 old dominion university department of computer science cs 495/595 spring...

31
Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein <[email protected]> http://www.cs.odu.edu/~mklein/ 1/13/10

Upload: walter-clarke

Post on 30-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Web Server Design

Week 1

Old Dominion UniversityDepartment of Computer Science

CS 495/595 Spring 2010

Martin Klein <[email protected]>

http://www.cs.odu.edu/~mklein/

1/13/10

Page 2: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Goals

• We will write a web (http) server from scratch– we will not use Apache, IIS, or other existing web

servers– the point is to learn http and have a working server

• your server won’t be as “good” as Apache -- and that’s ok…

• We will focus on the hypertext transfer protocol (http)– we will not focus on the all the neat things you can do

that are outside / on top of the protocol• modules, servlets, WebDAV/DASL, etc.

Page 3: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

What I’m Not Going To Teach

• HTML, SMTP, etc.– CS 312 Internet Concepts

• System administration– CS 454/554 Network Management

• Writing web applications (e.g. PHP/MySQL)– CS 418/518 Web Programming

• databases– CS 450/550 Database Concepts– CS 419/519 Internet Databases– and many others….

• Java– CS 695 Java & XML

Page 4: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Administrivia• This is a programming class!

– I assume you know how to:• do network (socket) programming• write a daemon

– you can develop in any environment you want to…• a machine will be provided, deviate at your own risk

– …but you will be graded only on the class machine• real programmers use unix

– your grade will be determined solely on your server’s performance on 5 different checkpoints through the semester

• You will work in teams of one or two people– mixes (g/u, g/g, u/u) ok– assignments are the same regardless of group size

Page 5: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Administrivia 2

• Pick teams wisely– teams will exist by mutual consent only

– at any time, teams can split up, but no new teams will be formed after the first assignment is due

• no team member swaps

– ex-team members will have access to their shared code base

Page 6: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Administrivia 3

• Important URLs– http://www.cs.odu.edu/~mklein/teaching/cs595-s10/ – http://groups.google.com/group/cs595-s10

• Class homepage:– Readings are listed under the day they are expected to

be completed– assignments are listed under the day they will be

demoed in class– each group will give a 3-4 minute status report the

week before an assignment is due!

Page 7: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Grading

• 5 Assignments, 20 points each

• Days of in class demo are posted

• Assignments lose 3 points for every 24 hours they are late

Page 8: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

No WWW History

If you want to know more, read a book(irony intentional)

Page 9: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

HTTP Developer’s Handbook

• Primary focus of this class will be reading & interpreting RFCs– RFCs are the technical documents

that define how the web works

• But RFCs are not always the best resources to learn from– augment class slides + discussion

with relevant sections from the class text book

Page 10: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

How To Read RFCs

(quoting from RFC 2119)

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

1. MUST This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification.

2. MUST NOT This phrase, or the phrase "SHALL NOT", mean that the definition is an absolute prohibition of the specification.

3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

4. SHOULD NOT This phrase, or the phrase "NOT RECOMMENDED" mean that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label.

5. MAY This word, or the adjective "OPTIONAL", mean that an item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item. An implementation which does not include a particular option MUST be prepared to interoperate with another implementation which does include the option, though perhaps with reduced functionality. In the same vein an implementation which does include a particular option MUST be prepared to interoperate with another implementation which does not include the option (except, of course, for the feature the option provides.)

Page 11: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

remember: • URIs identify Resources• Representations represent Resources• When URIs are dereferenced, they return representations (i.e., a resource is never returned)

taken from: http://www.w3.org/TR/webarch/

Important Web Architecture Concepts (As defined by the Web Architecture)

Page 12: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Uniform Resource Identifiers

URI

URL URN

RFC 2396

RFC 2141RFC 1738

Page 13: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

URI Schemes

foo://username:[email protected]:8042/over/there/index.dtb;type=animal?name=ferret#nose \ / \________________/\_________/ \__/ \___/ \_/ \_________/ \_________/ \__/ | | | | | | | | | | userinfo hostname port | | parameter query fragment | \_______________________________/ \_____________|____|____________/scheme | | | | | authority |path| | | | | path interpretable as filename | ___________|____________ | / \ / \ | urn:example:animal:ferret:nose interpretable as extension

taken from: http://en.wikipedia.org/wiki/URI_scheme

Page 14: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Terminology Highlights from RFC 2616 (section 1.3)

connection A transport layer virtual circuit established between two programs for the purpose of communication.

message The basic unit of HTTP communication, consisting of a structured sequence of octets matching the syntax defined in section 4 and transmitted via the connection.

request An HTTP request message, as defined in section 5.

response An HTTP response message, as defined in section 6. resource A network data object or service that can be identified by a URI, as defined in section 3.2. Resources may be available in multiple representations (e.g. multiple languages, data formats, size, and resolutions) or vary in other ways.

entity The information transferred as the payload of a request or response. An entity consists of metainformation in the form of entity-header fields and content in the form of an entity-body, as described in section 7.

representation An entity included with a response that is subject to content negotiation, as described in section 12. There may exist multiple representations associated with a particular response status.

Page 15: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Terminology Highlights from RFC 2616 (section 1.3)

content negotiation The mechanism for selecting the appropriate representation when servicing a request, as described in section 12. The representation of entities in any response can be negotiated (including error responses).

variant A resource may have one, or more than one, representation(s) associated with it at any given instant. Each of these representations is termed a `varriant'. Use of the term `variant' does not necessarily imply that the resource is subject to content negotiation.

client A program that establishes connections for the purpose of sending requests.

user agent The client which initiates a request. These are often browsers, editors, spiders (web-traversing robots), or other end user tools.

server An application program that accepts connections in order to service requests by sending back responses. Any given program may be capable of being both a client and a server; our use of these terms refers only to the role being performed by the program for a particular connection, rather than to the program's capabilities in general. Likewise, any server may act as an origin server, proxy, gateway, or tunnel, switching behavior based on the nature of each request.

Page 16: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Terminology Highlights from RFC 2616 (section 1.3)

origin server The server on which a given resource resides or is to be created.

validator A protocol element (e.g., an entity tag or a Last-Modified time) that is used to find out whether a cache entry is an equivalent copy of an entity.

upstream/downstream Upstream and downstream describe the flow of a message: all messages flow from upstream to downstream.

Page 17: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Intermediaries

• proxy– “a forwarding agent, receiving requests for a URI in its absolute form,

rewriting all or part of the message, and forwarding the reformatted request toward the server identified by the URI”

• gateway– “a receiving agent, acting as a layer above some other server(s) and, if

necessary, translating the requests to the underlying server's protocol”

• tunnel– “a relay point between two connections without changing the messages;

tunnels are used when the communication needs to pass through an intermediary (such as a firewall) even when the intermediary cannot understand the contents of the messages.”

definitions from section 1.4 of RFC 2616

Page 18: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

No Intermediaries for Us

• For simplicity, we will ignore the possibility of intermediaries in our assignments

• No caching intermediaries– skip section 13 of RFC 2616– any caching activities will be on the part of the

client

Page 19: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

HTTP Operation

Client OriginServer

request = (method, URI, version, “MIME-like” message)

response = (version, response code, “MIME-like” message)

Page 20: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Talking to HTTP servers…mk$ curl --head www.cs.odu.edu/~mklein/HTTP/1.1 200 OKDate: Wed, 13 Jan 2010 15:36:09 GMTServer: Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11Last-Modified: Mon, 11 Jan 2010 01:38:15 GMTETag: "640e2a-552-47cd9974d0fd9"Accept-Ranges: bytesContent-Length: 1362Content-Type: text/html

mk$ curl --head www.google.com/HTTP/1.1 200 OKDate: Wed, 13 Jan 2010 15:43:10 GMTExpires: -1Cache-Control: private, max-age=0Content-Type: text/html; charset=ISO-8859-1Set-Cookie: PREF=ID=93c27673a367c338:TM=1263397390:LM=1263397390:S=akzlDIbyLg9rjmww;expires=Fri, 13-Jan-2012 15:43:10 GMT; path=/; domain=.google.comServer: gwsTransfer-Encoding: chunked

“curl” is convenient, butspeaking raw HTTP ismore fun…

Page 21: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

GETmk$ telnet www.cs.odu.edu 80Trying 128.82.4.2...Connected to xenon.cs.odu.edu.Escape character is '^]'.GET /~mklein/index.html HTTP/1.1Connection: closeHost: www.cs.odu.edu

HTTP/1.1 200 OKDate: Wed, 13 Jan 2010 14:51:57 GMTServer: Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11Last-Modified: Mon, 11 Jan 2010 01:38:15 GMTETag: "640e2a-552-47cd9974d0fd9"Accept-Ranges: bytesContent-Length: 1362Connection: closeContent-Type: text/html

<html><head><title>Martin Klein -- Old Dominion University</title></head><body>…[lots of html deleted]…</html>Connection closed by foreign host.

Request(ends w/ CRLF)

Response

Page 22: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

HEADmk$ telnet www.cs.odu.edu 80Trying 128.82.4.2...Connected to xenon.cs.odu.edu.Escape character is '^]'.HEAD /~mklein/index.html HTTP/1.1Connection: closeHost: www.cs.odu.edu

HTTP/1.1 200 OKDate: Wed, 13 Jan 2010 15:46:43 GMTServer: Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11Last-Modified: Mon, 11 Jan 2010 01:38:15 GMTETag: "640e2a-552-47cd9974d0fd9"Accept-Ranges: bytesContent-Length: 1362Connection: closeContent-Type: text/html

Connection closed by foreign host.

Page 23: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

OPTIONSAIHT:~/Desktop/cs595-s06 mln$ telnet www.cs.odu.edu 80 Trying 128.82.4.2...Connected to xenon.cs.odu.edu.Escape character is '^]'.OPTIONS /~mln/index.html HTTP/1.1Connection: closeHost: www.cs.odu.edu

HTTP/1.1 200 OKDate: Mon, 09 Jan 2006 17:16:46 GMTServer: Apache/1.3.26 (Unix) ApacheJServ/1.1.2 PHP/4.3.4Content-Length: 0Allow: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, PATCH, PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, LOCK, UNLOCK, TRACEConnection: close

Connection closed by foreign host.

Page 24: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Response Codes

- 1xx: Informational - Request received, continuing process

- 2xx: Success - The action was successfully received, understood, and accepted

- 3xx: Redirection - Further action must be taken in order to complete the request

- 4xx: Client Error - The request contains bad syntax or cannot be fulfilled

- 5xx: Server Error - The server failed to fulfill an apparently valid request

from section 6.1.1 of RFC 2616

Page 25: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Other Responses - ex. 501mk$ telnet www.cs.odu.edu 80Trying 128.82.4.2...Connected to xenon.cs.odu.edu.Escape character is '^]'.NOTAREALMETHOD /index.html HTTP/1.1Connection: closeHost: www.cs.odu.edu

HTTP/1.1 501 Method Not ImplementedDate: Wed, 13 Jan 2010 14:59:57 GMTServer: Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11Allow: GET,HEAD,POST,OPTIONS,TRACEContent-Length: 320Connection: closeContent-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>501 Method Not Implemented</title></head><body><h1>Method Not Implemented</h1><p>NOTAREALMETHOD to /index.html not supported.<br /></p><hr><address>Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11 Server at www.cs.odu.edu Port 80</address></body></html>Connection closed by foreign host.

come up with your own examples for:• 400• 403 • 404• 505

Page 26: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Other Responses - ex. 301mk$ telnet www.cs.odu.edu 80Trying 128.82.4.2...Connected to xenon.cs.odu.edu.Escape character is '^]'.GET /~mklein HTTP/1.1Connection: closeHost: www.cs.odu.edu

HTTP/1.1 301 Moved PermanentlyDate: Wed, 13 Jan 2010 15:52:40 GMTServer: Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11Location: http://www.cs.odu.edu/~mklein/Content-Length: 333Connection: closeContent-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>301 Moved Permanently</title></head><body><h1>Moved Permanently</h1><p>The document has moved <a href="http://www.cs.odu.edu/~mklein/">here</a>.</p><hr><address>Apache/2.2.14 (Unix) DAV/2 PHP/5.2.11 Server at www.cs.odu.edu Port 80</address></body></html>Connection closed by foreign host.

Page 27: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Date/Time Format

from 3.3.1 RFC 2616

Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123 Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036 Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format… HTTP/1.1 clients and servers that parse the date value MUST accept all three formats (for compatibility with HTTP/1.0), though they MUST only generate the RFC 1123 format for representing HTTP-date values in header fields. See section 19.3 for further information

for simplicity, we’ll assume our clients will only generate RFC 1123 date/times

Page 28: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Things to Think About for Your Server

• Configuration files– should not have to recompile for trivial changes

• Logging – real http servers log their events– you’ll need logging for debugging

• consider concurrent logs with varying verbosity

• MIME types– most servers use a separate file (specified in your config file!) to

map file extensions to MIME types

• Claim HTTP/1.1– even though we’ll not fully satisfy all requirements

• What does it mean to GET a directory?

Page 29: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

What We Will Learn This Semester

• Fundamental knowledge about how http works– your future career is likely to involve web programming

• Working with others, explaining your results to colleagues– in real life, tasks are rarely performed in isolation

• How to read & interpret technical specifications and translate them into code– in real life, interesting problems are ambiguous & messy

• The importance of good, extensible design early in a software project– in real life, writing code from scratch is an uncommon luxury

Page 30: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

Side Effect: You’ll Be Well Prepared for REST Programming• REST == Representational State Transfer

– http://en.wikipedia.org/wiki/Representational_State_Transfer

– in contrast with RPC-style web applications:

RPC: foo.com/bigApp.jsp?verb=showThing&id=123REST: foo.com/thing/123 (w/ GET method)

RPC: foo.com/bigApp.jsp?verb=editThing&id=123REST: foo.com/thing/123 (w/ PUT method)

RPC: foo.com/bigApp.jsp?verb=newThingREST: foo.com/thing/ (w/ POST method)

Page 31: Web Server Design Week 1 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein mklein/ 1/13/10

To Do for Next Time…

• Subscribe to the class email list• Submit group info to class list

– I’ll assign each group a unique port: 70XX• xx = group # in the class• your server will be accessible as: http://mln-web.cs.odu.edu:70XX/• Apache still available as: http://mln-web.cs.odu.edu/~mln/

• Submit preferred development language / environment for your group– 1 Unix development machine (mln-web.cs.odu.edu)

will be made available; we’ll try to get whatever (Unix) languages you want