cos 109 monday november 23 housekeeping –lab 6 and problem set 7 due dates lab 6 is due by...

31
COS 109 Monday November 23 Housekeeping Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM on Monday November 30 Because these deadlines have been extended, there will be no further extensions Final exam – January 18 (Monday) at 7:30PM Today’s class A few more words about the internet The World Wide Web

Upload: cori-hannah-malone

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

COS 109 Monday November 23

• Housekeeping– Lab 6 and Problem Set 7 due dates

Lab 6 is due by midnight on Friday November 27Problem Set 7 is due by 5 PM on Monday November 30

– Because these deadlines have been extended, there will be no further extensions

– Final exam – January 18 (Monday) at 7:30PM

• Today’s class– A few more words about the internet– The World Wide Web

Page 2: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Grades on Problem set 6

Average score 35.8; a few people did not complete the assignment

Page 3: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

The geography of the internet

Page 4: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Internet Users WorldWide

• Internet Users (2014 Est.)

– China 626M– European Union 398M– USA 276.6M– India 237.3M– Japan 109.3M– Brazil 108.2M– Russia 84.4M– Germany 70.3M– Nigeria 66.6M

– Total WorldWide 3.2B

Page 5: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

The backbone of the internet

• http://upload.wikimedia.org/wikipedia/commons/d/d2/Internet_map_1024.jpg

• http://internet-map.net/

Page 6: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Lets register an internet domain

• http://www.directnic.com

Page 7: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Who manages this?

• Internet Corp. for Assigned Names and Numbers (ICANN) – Formed in October 1998, – non-profit, private-sector corporation – broad coalition of the Internet's business, technical, academic, and user

communities.– recognized by the U.S. and other governments as the global consensus entity

to coordinate the technical management of the Internet's domain name system, the allocation of IP address space, the assignment of protocol parameters, and the management of the root server system.

– funded through the many registries and registrars that comprise the global domain name and Internet addressing systems.

• ICANN was formed in 1998. It is a not-for-profit public-benefit corporation with participants from all over the world dedicated to keeping the Internet secure, stable and interoperable. It promotes competition and develops policy on the Internet’s unique identifiers.*

• ICANN doesn’t control content on the Internet. It cannot stop spam and it doesn’t deal with access to the Internet. But through its coordination role of the Internet’s naming system, it does have an important impact on the expansion and evolution of the Internet.*

* From http://www.icann.org/en/about/

Page 8: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

What does ICANN govern

• DNS – domain name system– Relates names to numbers

• TLD – top level domains – Originally there were 7

.com, .edu, .gov, .int, .mil, net, .org– 200+ country code top level domains– 1000+ gTLD (generic top level domains)

– ..academy, .accountant, .apartments, .biz, .black, .cool, .dad, .money, .ooo, .sucks, .vodka, .xxx, .zone

– More are here

• Management– One company (called a registry) is in charge of each TLD.– A large number of companies (called registrars) can sell (and

manage) names within a TLD

Page 9: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

How does ICANN govern

• Draws up contracts with each registry• Runs an accreditation system for registrars• Oversees IP addresses (through companies)• Oversees root servers

– Root servers are 13 addresses on the Internet where complete address tables can be found

Page 10: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

What about the root servers?

• What do they do?– Ultimately resolve addresses

With help from top level domainsCs.princeton.edu

.edu TLD to find princeton princeton.edu to find cs.princeton.edu

– But things change slowly, so There are intermediate name servers which cache

addressesVery few address queries actually come to a root server.

Page 11: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

List of root serversHostname IP Addresses Manager

a.root-servers.net 198.41.0.4, 2001:503:ba3e::2:30 VeriSign, Inc.

b.root-servers.net 192.228.79.201, 2001:500:84::b University of Southern California (ISI)

c.root-servers.net 192.33.4.12, 2001:500:2::c Cogent Communications

d.root-servers.net 199.7.91.13, 2001:500:2d::d University of Maryland

e.root-servers.net 192.203.230.10 NASA (Ames Research Center)

f.root-servers.net 192.5.5.241, 2001:500:2f::f Internet Systems Consortium, Inc.

g.root-servers.net 192.112.36.4 US Department of Defense (NIC)

h.root-servers.net 128.63.2.53, 2001:500:1::803f:235 US Army (Research Lab)

i.root-servers.net 192.36.148.17, 2001:7fe::53 Netnod

j.root-servers.net 192.58.128.30, 2001:503:c27::2:30 VeriSign, Inc.

k.root-servers.net 193.0.14.129, 2001:7fd::1 RIPE NCC

l.root-servers.net 199.7.83.42, 2001:500:3::42 ICANN

m.root-servers.net 202.12.27.33, 2001:dc3::35 WIDE Project

Page 12: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Root servers

• Some are fixed in location (unicast)• Others are distributed (anycast)

– Queries are routed to the topologically closest of a group of receivers all identified by the same destination address.

– So, a decentralized service is provided.– Anycase servers can be used to distribute the impact of a

distributed denial of service (DDoS) atack and so reduce its impact.

Page 13: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

And where are they?

Details at http://www.root-servers.org/

Page 14: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Peering points

• There are several hundred such points• Largest is Deutscher Commercial Internet Exchange with

650+ members and a peak speed of 5000 Gbit/sec (average speed 3000 Gbit/sec) of connected capacity and an average thruput of 1061 Gbit/sec Quick Facts (100% up time since 1997)

Page 15: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Summarizing internet Ideas

• packets versus circuits– different models (mail vs phone)

• names and addresses– what is a computer called, how to find it

• routing– how to get from here to there

• protocols and standards– Internet works because of IP as common mechanism

higher level protocols all use IPspecific hardware technologies carry IP packets

• layering– divide system into layers

each of which provides services to next higher levelwhile calling on service of next lower level

– a way to organize and control complexity, hide details

Page 16: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Summarizing internet technical issues:

• privacy & security are hard– data passes through shared unregulated dispersed media and sites

scattered over the whole world– it's hard to control access & protect information along the way– many network technologies (e.g., Ethernet, wireless) use broadcast

encryption necessary to maintain privacy– many mechanisms are not robust against intentional misuse– it's easy to lie about who you are

• service guarantees are hard– no assurance of reliable delivery, let alone of bandwidth, delay or

jitter

• some resources are running low– IPv4 addresses are pretty much all assigned– IPv6 (the next generation) uses 128-bit addresses

acceptance growing, by necessity

• but it has handled exponential growth amazingly well

Page 17: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM
Page 18: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

To summarize

• How the internet works• And now that we’ve reached the end of the internet

Page 19: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Website of the day

• google trends

Page 20: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Moving above internet pipes -- information flows to apps

Page 21: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Higher level protocols

• SSH: secure login• SMTP: mail transfer• HTTP: hypertext transfer -> Web• protocol layering:

– a single protocol can't do everything– higher-level protocols build elaborate operations out of simpler

ones– each layer uses only the services of the one directly below– and provides the services expected by the layer above– all communication is between peer levels: layer N destination

receives exactly the object sent by layer N source

connectionless packet delivery service

reliable transport service

application

physical layer

Page 22: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Encapsulation

• each piece of data at one level is wrapped up with a header

and sent as a packet at the next lower level• lowest level is what moves across specific network

data

dataether

dataHTTP

dataTCP

dataIP

Page 23: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

One particular app – the (World Wide) Web

• a way to connect computers that provide information (servers) with computers that ask for it (clients like you and me)– uses the Internet, but it's not the same as the Internet

• URL (uniform resource locator, e.g., http://www.amazon.com)– a way to specify what information to find, and where

• HTTP (hypertext transfer protocol)– a way to request specific information from a server and get it back

• HTML (hyptertext markup language)– a language for describing information for display

• browser (Firefox, Safari, Internet Explorer, Opera, Chrome, …)– a program for making requests, and displaying results

• embellishments– pictures, sounds, movies, ...– loadable software

• the set of everything this provides

Page 24: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Web history

• 1989: Tim Berners-Lee at CERN– a way to make physics literature and research results accessible on the Internet

• 1991: first software distributions

• Feb 1993: Mosaic browser– Marc Andreessen at NCSA (Univ of Illinois)

• Mar 1994: Netscape– first commercial browser

• technical evolution managed by World Wide Web Consortium– non-profit organization at MIT, Berners-Lee is director– official definition of HTML and other web specifications– see www.w3.org

Page 25: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

HTTP: Hypertext transfer protocol

• What happens when you click on a URL?• client opens TCP/IP connection to host, sends request

GET /filename HTTP/1.0

• server returns– header info– HTML

• since server returns the text, it can be created as needed– can contain encoded material of many different types (MIME)

• URL formatservice://hostname/filename?other_stuff

• filename?other_stuff part can encode– data values from client (forms)– request to run a program on server (cgi-bin)– anything else

GET url

HTML

client server

Page 26: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Embellishments

• original design of HTTP just returns text to be displayed• now includes pictures, sound, video, ...

– need helpers or plug-ins to display non-text contente.g., GIF, JPEG graphics; sound; movies

• forms filled in by user– need a program on the server to interpret the information (cgi-bin)

• cookies to remember information on client– HTTP is stateless: server doesn't saveanything from one request to

next– cookies are a way to remember information at the client

• active content: download code to run on the client– Javascript– Java applets– plug-ins– ActiveX

Page 27: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Forms and CGI programs

• "common gateway interface"– standard way to request the server to run a program– using information provided by the client via a form

• if the target file on server is an executable program• and it has the right properties and permissions

– e.g., in /cgi-bin directory and executable

• then run it on server to produce HTML to send back to client– using the contents of the form as input– output depends on client request: created on the fly, not just a file

• CGI programs can be written in any programming language– Perl, Python, PHP, Java, Ruby, …

Page 28: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Example form in HTML (dpd.mycpanel2.princeton.edu/mailform.html)

<html>

<body>

<form METHOD="post"

ACTION="http://dpd.mycpanel2.princeton.edu/zcgi-bin/

mailform.cgi">

<input type="hidden" name="email" value=“[email protected]">

Your name: <input type="text" name="name"><p>

Your email: <input type="text" name="address"><p>

Please rate this page:<p>

<input type=radio name=rate value=poor> Poor

<input type=radio name=rate value=ok> OK

<input type=radio name=rate value=good> Good <p>

<input type="submit">

<input type="reset">

</form>

</body>

</html>

Page 29: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Cookies

• HTTP is stateless: doesn't remember from one request to next

• cookies intended to deal with stateless nature of HTTP– remember preferences, manage "shopping cart", etc.

• cookie: one chunk of text sent by server to be stored on client– stored in browser while it is running (transient)– stored in client file system when browser terminates (persistent)

• when client reconnects to same domain,browser sends the cookie back to the server

– sent back verbatim; nothing added– sent back only to the same domain that sent it originally– contains no information that didn't originate with the server

• in principle, pretty benign• but heavily used to monitor browsing habits, for

commercial purposes

Page 30: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Cookie crumbs

• fetch a page from xyz.com– it contains <img src=http://doubleclick.com/advt.gif>– this causes a page to be fetched from DoubleClick.com– which now knows your IP address and what page you were looking

at

• DoubleClick sends back a suitable advertisement– with a cookie that identifies "you" at DoubleClick

• next time you fetch any page that contains a DoubleClick.com image– the last DoubleClick cookie is sent back to DoubleClick– the set of sites and images that you are viewing is used to

- update the record of where you have been and what you have looked at

- send back targeted advertising (and a new cookie)

Page 31: COS 109 Monday November 23 Housekeeping –Lab 6 and Problem Set 7 due dates Lab 6 is due by midnight on Friday November 27 Problem Set 7 is due by 5 PM

Advertising marketplace

• advertising exchanges– Yahoo Right Media, Doubleclick Ad Exchange, Facebook Atlas ...

• a person uses a browser to request a web page• web page "publisher" notifies exchange that

advertising space on that page is available– publishers are typically portals or entertainment and news sites– publisher provides information about the person: past online

activity, viewing and shopping habits, geographic location, demographics

probably not actual identity (?)

• advertisers bid on the ad space– amount depends on person's attributes and location, advertiser's

budget, etc.

• winner's advertisement is inserted into the page• elapsed time: 10-100 milliseconds

• this happens for multiple advertisements on one page