lis901n: uri thomas krichel 2003-01-??. uris (background) uri: uniform resource identifier...

28
LIS901N: URI Thomas Krichel 2003-01-??

Upload: maria-baird

Post on 27-Mar-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

LIS901N: URI

Thomas Krichel

2003-01-??

Page 2: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

URIs (background)

• URI: “uniform resource identifier” • Originally, a generalization of:

– URL (uniform resource locator),– URN (uniform resource name),– URC (uniform resource citation),– and potentially others,

• but mainly, URL and URN

Page 3: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

The difference (in theory) between URL and URN:

• a URL is bound to a location– when resource moves, url changes

• a URN is a name– thus location independent, and, in theory, persistent

(whatever “persistent” means)

Page 4: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

The Other View

• Distinction between URL and URN is artificial• Both terms should be abolished and replaced by

“URI”• thus all identifier “schemes” would be URI

schemes (even “http”) and no prefix would be necessary (URL, URN, or even URI).

Page 5: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

Reasoning • Original URI philosophy:

– URLs were a short-term solution and URNs long-term .– URL would be a temporary identification mechanism until a

location-independent, persistent identifier was developed, the URN.

• Now it seems:– URNs won’t be any more persistent than URLs.– persistence is a social problem, not a technical

problem

Page 6: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

URI vs URL

• The term ‘URL’ or “Universal Resource Locator” is not used in standards anymore. It generally means a URI that contains a domain-name but it is historical only.

• This presentation uses the term URI exclusively.• The term ‘URL’ is still sufficient to convey the

meaning but should not be used when precision is necessary.

Page 7: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

What does a URI identify?

• A URI identifies a Resource.• A URI only comes into existence when it is bound to a

Resource.• A Resource is defined as anything that is identified by

a URI. • Resources only come into existence when a URI is

bound to it.• A URI cannot exist without a Resource.• A Resource cannot exist without a URI.

Page 8: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

it all comes from Plato• The “URI identifies an abstract Resource” formalism

assumes the Platonic concept of “form”.• A Resource, once bound to a URI and brought into

existence, is only the abstract ‘essence’ of the ‘real world’ thing’ we perceive.

• Any physical or digital version of that Resource is only one of all possible physical representations of that Resource.

• For example, http://openlib.org/home/krichel is a URI for a homepage. Using language and content negotiation it is possible to request that page in many languages and formats. Which version is the Resource?

• Answer: none of them. Each is only a representation. It is possible to assign a URI to even the representations. But even still, each Resource is only the abstraction of the physical or digital thing, not the thing itself.

Page 9: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

What is ‘resolution’?• ‘Resolution’ means accessing some

representation of the Resource that a URI identifies. – For ‘http://foo.com/’ it means accessing the

homepage of ‘foo.com’– For ‘mailto:[email protected]’ it can mean sending

an email message to that address.

• For URIs that contain network location information it is simply a matter of visiting that location and doing some function. I.e. ‘foo.com’ is the exact network host that can give you the web page.

Page 10: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

The history

• Tim Berners-Lee came to the IETF in 1992 to develop the WorldWideWeb standards. At the time URIs were known as Universal Resource Locators.

• RFC 1738 “Uniform Resource Locators (URL) was published in 1994.

• RFC 1738 was updated by RFC 1808, RFC 2368, RFC 2396.

• RFC 2396 “Uniform Resource Identifiers (URI): Generic Syntax” is the current standard.

• RFC 2396 may be updated to reflect developments in internationalization, terminology updates, and registration procedures.

Page 11: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

Confusion…

• Due to misunderstandings and the formation of the W3C separately from the IETF, there was a long term disagreement on certain aspects of URIs, especially when it came to Uniform Resource Names (URNs).

• A join IETF/W3C URI Interest Group was formed in 2000 to investigate work that needed to be done with URIs in general.

• That group published URIs, URLs, and URNs: Clarifications and Recommendations Report from the joint W3C/IETF URI Planning Interest Group (draft-mealling-uri-ig-01.txt ) which begins to clarify the problems and proposes solutions.

Page 12: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

URN Uniform Resource Names

Are defined by RFC 2141 as a particular URI Are defined by RFC 2141 as a particular URI scheme with these characteristics:scheme with these characteristics:– Permanent – Once a URN is assigned to some Permanent – Once a URN is assigned to some

Resource it can Resource it can never never be re-assigned to something be re-assigned to something else. else.

– Location Independent – The actual URN should not Location Independent – The actual URN should not contain any network location information such as contain any network location information such as domain-names, IP addresses, file path-names, etc.domain-names, IP addresses, file path-names, etc.

Page 13: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

RFC2396

• Berners-Lee, Tim Roy T. Fielding and Larry Masinter (1998) ``Uniform Resource Identifiers (URI): Generic Syntax'', rfc2396

• A Uniform Resource Identifier (URI) is a compact string of character for identifying an abstract or physical resource.

• They provide a simple and extensible means for identifying a resource.

Page 14: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

operations on a URI

• There is a set of operations that can be applied to URIs. For example, for a URL, the access to the resource.

• To understand if a given URI instance is valid, we have to study the operations applied to URIs.

Page 15: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

benefits of uniformity• It allows different type of resource identifiers to be used

in the same context, even when the mechanisms used to access those resources may differ

• it allows uniform semantic interpretation of common syntactic conventions across different types of resource identifiers

• it allows introduction of new types of resource identifiers without interfering with the way that existing identifiers are

• it allows the identifiers to be reused in many different contexts, thus permitting new applications or protocols to leverage a pre-existing, large, and widely-used set of resource identifiers.

Page 16: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

Resources and Identity in the RFC

• A resource can be anything that has identity. Not all resources are network ``retrievable''. The resource is the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping at any particular instance in time.

• An identifier is an object that can act as a reference to something that has identity. In the case of URI, the object is a sequence of characters with a restricted syntax.

Page 17: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

URI, URL, & URN in the RFC

• A URI can be further classified as a locator, a name, or both. The term ``Uniform Resource Locator'' (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network “location”), rather than identifying the resource by name or by some other attribute(s) of that resource.

• The term ``Uniform Resource Name'' (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable.

Page 18: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

URN in the RFC

• A URN differs from a URL in that it's primary purpose is persistent labeling of a resource with an identifier. That identifier is drawn from one of a set of defined namespaces, each of which has its own set name structure and assignment procedures. The “urn” scheme has been reserved to establish the requirements for a standardized URN namespace, as defined in “URN Syntax” RFC2141 and its related specifications.

Page 19: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

transcribability

• The URI syntax was designed with global transcribability as one of its main concerns. A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters. A URI may be represented in a variety of ways.

Page 20: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

consequences of transcribability

• A URI is a sequence of characters, which is not always represented as a sequence of octets.

• A URI may be transcribed from a non-network source, and thus should consist of characters that are most likely to be able to be typed into a computer, within the constraints imposed by keyboards (and related input devices) across languages and locales.

• A URI often needs to be remembered by people, and it is easier for people to remember a URI when it consists of meaningful components.

Page 21: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

URI characters

• URI consist of a restricted set of characters, nota sequence of octets. The allowable characters primarily chosen to aid transcribability and usability both in computer systems and in non-computer communications. Characters used conventionally as delimiters around URI are excluded.

• In the simplest case, the original character sequence contains only characters that are defined in US-ASCII, and the two levels of mapping are simple and easily invertible: each 'original character' is represented as the octet for the US-ASCII code for it, which is, in turn, represented as either the US-ASCII character.

Page 22: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

reserved characters

• Many URI include components consisting of or delimited by, certain special characters. These characters are called ``reserved'', since their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI.

• they are ; / ? : @ & = + $ , • They are allowed within a URI, but which may not be

allowed within a particular component of the generic URI syntax.

Page 23: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

unreserved & excluded characters

• Those are the characters that are allowed and never take any special meaning. They are– the upper and lowercase letters a to z and A to Z – the decimal digits 0 to 9– the following: - _ . ! ~ * ‘ ( )

• All characters that are not reserved or unreserved are excluded– < > # % ” { } | ^ [ ] `– and the blank

are excluded. They have to be escaped.

Page 24: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

escaping

• When you want to use a character in a URI that not one of the excluded characters, you have to escape it The way that this done is to write a construction of the form

• % hex hex• where hex is a digit or the letters a to f

(uppercase or lowercase). The two hex characters represent the value of the character in unicode in hex. For example %7eis the character ~

Page 25: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

The Semantic Web

• The W3C has been developing a new architecture that applies knowledge representation technology to the WWW.

• Using the Resource Description Framework (RDF), Statements are made using a Subject, Predicate and Object (very similar to Lisp and other predicate based languages).

• Each Subject, Predicate or Object are Resources in the URI sense and are identified by URIs within an RDF Statement using XML Namespaces.

Page 26: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

example

• This statement says that the Resource identified by the URI ‘http://openlib.org/home/krichel’ was created by the person ‘Thomas Krichel’:

<?xml version="1.0"?> <RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <Description about="http://openlib.org/home/krichel"> <Creator xmlns="http://description.org/schema/">Ora Lassila</Creator> </Description> </RDF>

Page 27: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

The Semantic Web

• The combination of Web Services and the Semantic Web should give the Web the ability to turn any existing Web Resource into a full node in a purposefully built knowledge representation system with a functional component that allows that knowledge to be acted on.

• And both are based on the simple Uniform Resource Identifier.

Page 28: LIS901N: URI Thomas Krichel 2003-01-??. URIs (background) URI: uniform resource identifier Originally, a generalization of: –URL (uniform resource locator),

http://openlib.org/home/krichel

Thank you for your attention!