uniform resource identifiers

24
Uniform Resource Identifiers Jacek Kopecký WSML Working Group June 2004

Upload: gaurav

Post on 22-Jan-2016

74 views

Category:

Documents


0 download

DESCRIPTION

Uniform Resource Identifiers. Jacek Kope cký WSML Working Group June 2004. Overview. History of URIs URI syntax URI references and their resolution Good practices for creating URIs Interesting issues. URI History. Universal Resource Identifiers (RFC 1630, June 1994) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Uniform Resource Identifiers

Uniform Resource Identifiers

Jacek Kopecký

WSML Working Group

June 2004

Page 2: Uniform Resource Identifiers

June 2004 2Jacek Kopecký, [email protected]

Overview

• History of URIs• URI syntax• URI references and their resolution• Good practices for creating URIs• Interesting issues

Page 3: Uniform Resource Identifiers

June 2004 3Jacek Kopecký, [email protected]

URI History

• Universal Resource Identifiers (RFC 1630, June 1994)

• Uniform Resource Locators and Names• RFC 2396, August 1998• 2396bis in development• Originally “Universal”, later “Uniform” as a

compromise• “Universal” again preferred by TimBL

Page 4: Uniform Resource Identifiers

June 2004 4Jacek Kopecký, [email protected]

URLs and URNs

• Locators (addresses) vs. Names• URNs not easily dereferencable• URNs can be made dereferencable by

infrastructure• URLs perceived as less persistent• URLs and URNs drifting towards middle

ground• http://www.w3.org/DesignIssues/NameMyth.html

• No point in making the distinction any more

Page 5: Uniform Resource Identifiers

June 2004 5Jacek Kopecký, [email protected]

Uniform Resource Identifiers

• URIs “identify” “resources”• Identification doesn’t imply interaction• Resource is a sameness of characteristics over time

• Latest blog rant• Latest blog rant on politics• Blog rant on politics from 2004-6-22

• Resource need not be accessible when URI is created• Pictures from my future trip to London will be at

http://jacek.cz/photos/2004-08-london

Page 6: Uniform Resource Identifiers

June 2004 6Jacek Kopecký, [email protected]

URI Syntax

• According to 2396bis• http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html

• Examples• http://www.ietf.org/rfc/rfc2396.txt• mailto:[email protected]• news:comp.infosystems.www.servers.unix• telnet://melvyl.ucop.edu/

• URI Syntax - simplified• scheme: [//authority] [/path] [?query] [#fragid]• Relative URI without “scheme:”• Dot path segments (‘.’ and ‘..’) treated specially

Page 7: Uniform Resource Identifiers

June 2004 7Jacek Kopecký, [email protected]

URI Syntax cont’d

• Reserved characters (like /:?#@$&+* )• Many allowed characters• Rest of UNICODE percent-encoded from

UTF-8• http://google.com/search?q=kopeck%C3%BD

• Percent-encoding allowed characters creates equivalent URIs

• But namespaces compared char-by-char

Page 8: Uniform Resource Identifiers

June 2004 8Jacek Kopecký, [email protected]

URI Reference Resolution

• Resolving URI A against base URI B• Going from the left, keep as much from B as

is undefined in A• First part of A replaces that part from B• Path resolution special

• If A has absolute path, that is taken• Relative path from A resolved against path from

B, removing dot segments from result

• Everything after first part of A taken from A• Fragment always taken from A

Page 9: Uniform Resource Identifiers

June 2004 9Jacek Kopecký, [email protected]

URI Ref. Resolution Examples

• Base URI: http://a/b/c/d?e#f

1. g = http://a/b/c/g2. . = http://a/b/c/3. ./ = http://a/b/c/4. ./g = http://a/b/c/g5. .. = http://a/b/6. ../ = http://a/b/7. ../g = http://a/b/g8. ../../g = http://a/g9. ../../../g = http://a/g

Page 10: Uniform Resource Identifiers

June 2004 10Jacek Kopecký, [email protected]

URI Ref. Resolution Examples

• Base URI: http://a/b/c/d?e#f

10. /./g = http://a/g11. //g = http://g12. #s = http://a/b/c/d?e#s13. g#s = http://a/b/c/g#s14. ?y = http://a/b/c/d?y15. g?y = http://a/b/c/g?y16. g?y#s = http://a/b/c/g?y#s17. g:h = g:h18. ./g:h = http://a/b/d/g:h

Page 11: Uniform Resource Identifiers

June 2004 11Jacek Kopecký, [email protected]

Base URIs

• Necessary when resolving URI references

1. Explicit base URI embedded in content• <link xml:base=“http://example.com/bar/” href=“x.html” />

2. URI of the document• Usual in HTML files on the web

3. App-dependent base URI default

Page 12: Uniform Resource Identifiers

June 2004 12Jacek Kopecký, [email protected]

URI Equivalence

• Do two URIs identify the same resource?• Comparing without accessing the resources• Various applications for URI comparison

• Increasing cache efficiency• Comparing the namespaces of two symbols

• Algorithms must avoid false positives• False negatives unavoidable

• http://weather.example.com/innsbruck • http://jacek.cz/innsbruckweather redirect to above

Page 13: Uniform Resource Identifiers

June 2004 13Jacek Kopecký, [email protected]

Uses of URIs

• Addresses on the Web• Namespaces in XML QNames• Namespaces in QNames in other languages• Identifiers of things and concepts (e.g. RDF)• Unique keys (e.g. MIME message ID)

Page 14: Uniform Resource Identifiers

June 2004 14Jacek Kopecký, [email protected]

QName

• Introduced in XML Namespaces• Name of an XML namespace-qualified element• RDF uses QNames for brevity of URI notation• XML Schema expanded use of QNames to further

things (6 symbol spaces)• Every following language uses QNames as

identifiers• Number of independent symbol spaces• => Turning QNames into URIs is cumbersome• Should have been as simple as in RDF (IMHO)

Page 15: Uniform Resource Identifiers

June 2004 15Jacek Kopecký, [email protected]

Creating URIs for Web Resources

• Versioning approach for persistence• http://w3.org/TR/soap vs. • http://w3.org/TR/soap12 vs.• http://w3.org/TR/2003/REC-soap12-part1-20030624/

• Simple, memorable URIs• http://jacek.cz/blog• Scribbled on a napkin• Correcting spelling and case helps – mod_speling• Making the “www.” prefix optional (both ways) helps

• Content negotiation – drop .html (.php, .asp)• URI changes harmful

Page 16: Uniform Resource Identifiers

June 2004 16Jacek Kopecký, [email protected]

Creating Example URIs

• http://example.com• http://example.net• http://example.org• Reserved for precisely this purpose• Or use own domain (deri.org, wsmo.org)• http://foo.com not good

Page 17: Uniform Resource Identifiers

June 2004 17Jacek Kopecký, [email protected]

Creating URIs for Namespaces

• Dereferencable, ending with ‘/’ or ‘#’• Canonical URIs – no unnecessary dot

segments or percent-encoding• Namespaces compared char-by-char

• Namespace document• Preferably in the language that uses the

namespace – enables automatic discovery• With human-oriented descriptions• To allow for the above, don’t share namespace

URIs for schema and WSDL

Page 18: Uniform Resource Identifiers

June 2004 18Jacek Kopecký, [email protected]

Creating URIs for Concepts

• Group concepts in a common, dereferencable namespace

• Each concept identified by its fragID• In RDF/XML, namespace ends with ‘#’• Namespace document describes the

concepts• Two problems

• FragIDs depend on media types• Can http://example.com/#car identify a car?

Page 19: Uniform Resource Identifiers

June 2004 19Jacek Kopecký, [email protected]

Fragment IDs in URIs

• Fragment ID identifies a secondary resource• Interpretation of fragment IDs depends on

media type• In HTML <a name=“foo”>• In XML <element xml:id=“foo”/>• No meaning in JPEG

• xml:id in development• So far language-dependent (often DTD) solutions

• Fragment IDs should mean the same thing across media types with content negotiation

Page 20: Uniform Resource Identifiers

June 2004 20Jacek Kopecký, [email protected]

Range of HTTP URIs?

• Open W3C TAG issue• Can http: URI identify a car?• Can I say http://jacek.cz/dragstar/ is my

motorbike?• TimBL doesn’t seem to think so• Is it necessary to distinguish between a thing

and a description of that thing?

Page 21: Uniform Resource Identifiers

June 2004 21Jacek Kopecký, [email protected]

Other Interesting Issues

• data: URI scheme – the URI is the resource• RFC 2397• data:image/gif;base64,R0lGODdhMAAwAPAA…

• mailto: scheme a misnomer• URIs don’t specify actions but identifiers

• uuid: scheme for unique identifiers• Good for transient identification in closed systems

• Mismatches between perceived and intended meaning of a resource• http://w3.org/tr/soap

• Should URIs be human-readable?• http://www.bscw.semanticweb.org/bscw/bscw.cgi/0/21621

Page 22: Uniform Resource Identifiers

June 2004 22Jacek Kopecký, [email protected]

Main Points

• Cool URIs don’t change• URIs can be (and are) scribbled on napkins• URIs don’t (necessarily) point to documents• Dereferencable URIs also good as names• URLs, URNs obsolete

Page 23: Uniform Resource Identifiers

June 2004 23Jacek Kopecký, [email protected]

References

• http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html• http://www.ietf.org/rfc/rfc2396.txt• http://www.w3.org/Provider/Style/URI• http://www.w3.org/DesignIssues/Architecture.html• http://www.w3.org/DesignIssues/Axioms.html• http://www.w3.org/DesignIssues/NameMyth.html

Page 24: Uniform Resource Identifiers

June 2004 24Jacek Kopecký, [email protected]

Hope it Helped

• Thanks for your attention• Questions? Comments?

[email protected]