internet content html, mime, http

21
F2 1 Internet Content HTML, CSS, XML, XHTML, MIME, HTTP Cristian Bogdan 2D2052 / 2D1335

Upload: others

Post on 03-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

F2 1

Internet ContentHTML, CSS, XML, XHTML,

MIME, HTTPCristian Bogdan

2D2052 / 2D1335

F2 2

Objectives

• What HTML is, what are its origins, and where can one find information about it

• Modern forms of HTML: CSS, XHTML• Describing internet data: XML• Understanding how different types of

content are dealt with in the Internet (MIME)

• HTTP, internet protocols, proxies

F2 3

HTML and SGML• Hypertext Markup Language• Markup:

– Classic usage: editing <b>text marked up</b>– Correcting, data description and other usages

• Markup languages have a long history– Runoff, troff, nroff (unix man pages)– TeX, excellent to write books with (Knuth),

LaTeX– SGML, the origin of HTML, looks like today’s

XML. SGML describes data

F2 4

SGML/XML example• This describes just data• It does not specify how

the data will be presented(what color, whatalignment, etc)

• All text is inside a <tag>• There is a separate

document that specifieswhat tags are allowed in the SGML document, in what order are theyallowed, etc. Can be usedto validate the SGML

<EMail><sender><person>

<firstname> Karen </firstname><lastname> Lemone </lastname>

</person></sender><receiver><person>

<distributionList> cs525@cs </distributionList>

</person></receiver><contents>Don't you agree this is really ugly?

</contents></EMail>

F2 5

SGML validation document (DTD)<!doctype EMail [ <!element EMail (sender, receiver,contents)><!element sender (person) ><!element receiver (person)+ > <!element person (distributionlist) | (firstname, middlename?,

lastname )><!element (firstname,

middlename,lastname) (#PCDATA)>

<!element distributionlist(#PCDATA)>

<!element contents (#PCDATA)>]>

F2 6

HTML• Describes how webpages are visualised• A ”web browser” reads the description and

interprets it• http://www.w3.org/History.html

– Tim Bernes-Lee at CERN, but earlier versions existedin the ARPA project or in Douglas Engebart’s NLS project in 1965

• HTML is developped by W3C (WWW constium)– HTML, CSS, XML– Latest version is called XHTML 1.0– Previous version is called HTML 4.0

F2 7

<html><head><title>HTML</title></head><body><!-- the above may be missing --><H1>HTML</H1><p>This is a short presentation of <b><u>HTML</u></b>. Its

main points:</p><ul><li>Unlike SGML and XML, HTML describes how data is

<i>presented</i>, not what the data <i>is</i>. It is thusan editing markup, much like

<a href="http://www.tex.ac.uk/cgi-bin/texfaq2html">TeX</a><ul>

<li>There can be text outside any tag, though this won'tbe validated, it will "work"</li>

<li>You can't write a validator document</li><li>If a HTML document is invalid e.g. by not having

correct tag order, or missing tags, it will be presentedanyway. A closing tag is missing right here</ul></li>

<li>As in other markup languages, some tags can only appearinside other tags (e.g. &lt;li&gt; can only appear inside&lt;body&gt;)</li>

<li> A text fragment in a document can link to otherdocuments, or to a specific place in the document.</li>

</ul></body></html>

F2 8

HTMLThis is a short presentation of HTML. Its main points:• Unlike SGML and XML, HTML describes how data is presented, not

what the data is. It is thus an editing markup, much like TeX– There can be text outside any tag, though this won't be validated, it will

"work" – You can't write a validator document – If a HTML document is invalid e.g. by not having correct tag order, or

missing tags, it will be presented anyway. A closing tag is missing right here

• As in other markup languages, some tags can only appear inside other tags (e.g. <li> can only appear inside <body>)

• A text fragment in a document can link to other documents, or to a specific place in the document.

F2 9

HTML tags• HTML reference http://www.htmlhelp.com/reference/html40/• Also http://www.w3schools.com• Comments <!-- … --> and special characters &lt; • Parts of the HTML document

– Header, describes general information about the document, e.g. the title, author, generator, etc.

– Body, the document content OR– Frameset, describes a set of frames– If the header is missing, the whole content is interpreted as body

• We look at the organizational list of HTML tags http://www.htmlhelp.com/reference/html40/olist.html

• <a href > links can be absolute or relative. It is good practice to use relative links to pages in the same site, because then moving the site to anotherserver or directory is easy

– <a href=”docInSameDir.html”>…– <a href=”dir/docInAnotherDir.html”>…– <a href=”../..dir2/docInAnUpperDir.html”>…– BASE can be used to indicate the base for all relative links

F2 10

HTML tags cont’d• Other useful head tags: TITLE (most used),• We will understand later: META (used to simulate HTTP headers),

STYLE (to use cascading style sheets, CSS)• In the body: <P> describes a paragraph. If there’s no tag around a

text fragment, this is the choice. <BR> for a line break• Anchors: <A name=”here”> marks a place in the document, which

can be later accessed using documentName.html#placeName, like http://www.nada.kth.se/kurser/kth/2D2052/ingint05/index.html#Lectures

• H1… H6 heading levels, like in e.g. Word documents– Other styles can be defined similar with Word styles. See CSS

• If you want spaces and newlines to matter, use <PRE>• Text appearance can be changed like in text editors with <B>, <I>,

<FONT >, <BIG>, <SMALL>,<SUB> , <SUP>, <TT>, <S>• <TABLE>, table row: <TR>, table header <TH>, table cell <TD>

F2 11

HTML validator

• A program that checks if you HTML is correct

• There are many on the web, just give themyour URL and they will do the analysis

• http://www.htmlhelp.com/tools/validator/

F2 12

Cascading Style Sheets (CSS)<html><head>

<style type="text/css">h2 {text-decoration: overline}h4 {text-decoration: line-through}p {text-decoration: underline}a {text-decoration: none}</style>

</head>

• CSS can redefine the style of various HTML elements

• The style can be defined in the HTML file or in a separate CSS file, indicated in the <head><link rel="stylesheet"

type="text/css" href=“thestyle.css" />

• Or defined directly<p style="color: sienna;

margin-left: 20px"> This is a paragraph </p>

• http://www.w3schools.com/css/css_examples.asp

F2 13

CSS cont’d<body><h2>This is header 2</h2><h4>This is header 4</h4>

<p>This is some text in a paragraph</p>

<p><a href="http://www.w3schools.com">This is a link</a></p>

</body></html>

F2 14

XML• XML is a way of describing data, according to a DTD

(like SGML)• There are no predefined XML tags (like in HTML). One

has to describe their own using e.g. a DTD• XML doesn’t do anything in itself• However, data from one XML document can be

presented in an indefinite number of ways• Or XML can be used to exchange data• Or to express configuration of software in a rich,

hierarchical manner• Or even as program source code• An example of XML application: RSS (really simple

syndicating). See for examplehttp://www.mozilla.org/products/firefox/live-bookmarks.html

F2 15

XML example

<?xml version="1.0" encoding="ISO-8859-1"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>

F2 16

XHTML• eXensible HTML• XHTML is designed to replace HTML• 1.0 is almost identical to HTML 4.1• A stricter and cleaner version

– There should be no text outside tags– All tags must be closed (e.g. <br> becomes <br/>)

and well-enclosed (<b><i>text</b></i> is illegal)• Basically XHTML is HTML defined as XML

application. See the DTD: http://www.w3.org/TR/xhtml1/dtds.html

F2 17

MIME: Multi Internet Mail Extension• An open standard describing how multimedia is sent via email

(initially) HTTP request, response, etc• Describes how the parts of the content are organized. A part can

contain other parts, etc. We look at an example• Describes a type for the data sent, for example

– text• plain• html

– news– postscript, pdf– zip– image

• jpeg, gif, tiff– audio– video

• mpeg• quicktime

F2 18

HTTP, Hypertext Transport Protocol

• Standard that describes how a web client(browser) and server communicate to exchangedata

• Uses MIME to encode data both from the browser to the server (request) and back (response)

• uses TCP/IP for data transfer• To get the index.html page the client sends a

request something likeGET /index.html HTTP/1.1

• See http://www.w3.org/Protocols

F2 19

HTTP structure• Command (GET, POST requests)GET /index.html HTTP/1.1• Headers (pairs name: value)Host: www.nada.kth.seAccept: */* (accept any MIME type)• Empty line• Content (in the declared content-type, nothing in

case of GET)• POST is a request that sends content

F2 20

Example of HTTP response• Command (Mostly OK)HTTP/1.1 200 OK• Headers (for example content-type, MIME)Date: Thu, 20 Jan 2005 17:31:29 GMTServer: Apache/1.3.26 (Unix) mod_jk/1.1.0 PHP/4.2.2 mod_perl/1.26 mod_ssl/2.8.10 OpenSSL/0.9.6d

Transfer-Encoding: chunkedContent-Type: text/html• Empty line• Content (the index.html file in our case)

F2 21

HTTP proxies• Some organizations wish to limit the HTTP accesses from

their network to other networks• To do that, all browsers will have to connect to a proxy

running on a host inside the local network. Otherconnections are restricted by a firewall

• The proxy ’looks’ like a normal HTTP server, but it doesn’tactually host any page. It is not limited by the firewall

• The proxy will connect to the desired host out in the Internet

• The proxy can decide to block connetion to some ”baned”host

• The proxy can also do caching: if a page is requested by a lot of people in the local network, it will be served faster

• The proxy adds special headers to the HTTP response• Some networks only require proxy for port 80. Links to

http://host:8080 may still work• Proxies can be defined for many other protocols