rdf data model - imranihsan.comimranihsan.com/upload/lecture/sws1703.pdf · • understand the rdf...
TRANSCRIPT
SEMANTIC WEB
IMRAN IHSANASSISTANT PROFESSOR, AIR UNIVERSITY, ISLAMABADWWW.IMRANIHSAN.COM
03RDF DATA MODELRESOURCE DESCRIPTION FRAMEWORK
MOTIVATION
2
• How do you encode the piece of knowledge:
"The theory of relativity was discovered by Albert Einstein."
<theory>
<name>Theory of Relativity</name>
<discoverer>Albert Einstein</discoverer>
</theory>
• or
<person>
<name>Albert Einstein</name>
<discovered>Theory of Relativity</discovered>
</person>
• or
<person name="Albert Einstein">
<discovered>Theory of Relativity</discovered>
</person>
• There is no unique way (in XML) to represent knowledge.
• Information represented in such ways is not easy to integrate. (Why?)
• RDF helps to solve this problem.
GOALS
3
• Understand the RDF data model, including
• URI and IRI concepts
• Triples
• Resources
• Literals
• Blank nodes
• Lists
RDF OVERVIEW
4
• RDF = Resource Description Framework
• W3C Recommendation since 1998
• Version 1.1 since 2014
• RDF is a data model
• Originally used for metadata for web resources, then generalized
• Encodes structured information
• Universal, machine readable exchange format
• Data structured in graphs
• Vertices, edges
PARTS OF THE RDF GRAPH
5
• URIs
• Used to reference resources unambiguously
• Literals
• Describe data values with no clear identity like "100 km/h"
• Blank nodes
• Facilitate existential quantification for an individual with certain properties without naming it
EXAMPLE OF AN RDF GRAPH
6
RDF TRIPLE
7
COMPONENTS OF AN RDF TRIPLE
• Modeled using linguistic categories (but not always consistent)
• Allowed assignments:
• Subject: URI or blank node
• Predicate: URI (a.k.a. property)
• Object: URI, blank node or literal
• Node and edge labels should be unambiguous, so that the original graph is reconstructablefrom triple list
URI
8
• URI = Uniform Resource Identifier
• Used to create globally unique names for resources
• Every object with a clear identity can be a resource
• Books, places, organizations ...
• In books domain the ISBN serves the same purpose
URI SYNTAX
9
• Extension of the URL concept
• Not every URI denotes a web document, but the URL is often used as URI for web documents
• Starts with URL schema, which is separated from the rest by ":"
• examples: http, ftp, mailto, file
• Typically hierarchical structure
• [scheme:][//authority][path][?query][#fragment]
SELF-DEFINED URIS
10
• Necessary if resource has no URI yet or URI is not known
• Use HTTP URIs of own website to avoid naming collisions
• Facilitates creation of documentation of URI at this location
• Example: http://jens-lehmann.org/foaf.rdf#i
• Separation of URI for …
• a resource (a real-world thing)
• and its documentation (e.g. an HTML page)
• … with the help of URI references (with “#”-attached fragments) or content negotiation
• Example: URI for Shakespeare's "Othello":
• bad (why?): http://de.wikipedia.org/wiki/Othello
• good: http://de.wikipedia.org/wiki/Othello#URI
IRI
11
• IRI = Internationalized Resource Identifier
• Generalization of URI concept
• IRI can contain Unicode
• Example:
• http://www.example.org/Wüste
• http://www.example.org/사막
LITERALS
12
• Used to model data values
• Representation as strings
• Interpretation through datatype
• Literals without datatype are treated as strings
• Literals may never be the origin of a node of an RDF graph
• Edges may never be labeled with literals
TURTLE SYNTAX
13
• Language to serialize RDF Triples to strings
• Turtle – Terse RDF Triple Language
• URIs in angle brackets: <http://dbpedia.org/resource/Leipzig>
• Literals in quotes
• "Leipzig"@de
• "51.333332"^^xsd:float
• Triples are subject-predicate-object sentences terminated with a dot.
<http://dbpedia.org/resource/Leipzig>
<http://www.w3.org/2000/01/rdf-schema#label>
"Leipzig"@de .
• Whitespace and line breaks are ignored outside of identifiers
• Status: W3C Recommendation, http://www.w3.org/TR/turtle/
TURTLE ABBREVIATIONS
14
• In Turtle one can use abbreviations
• Syntax: @prefix abbr ':' <URI> .
• E.g. @prefix dbr: <http://dbpedia.org/resource/> .
• One can transform
<http://dbpedia.org/resource/Leipzig>
<http://www.w3.org/2000/01/rdf-schema#label>
"Leipzig"@de .
• into
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema> .
dbr:Leipzig rdfs:label "Leipzig"@de .
TURTLE ABBREVIATIONS
15
• Triples with the same subject can be grouped together
@prefix rdf:
...
@prefix geo:
dbr:Leipzig dbp:hasMayor dbr:Burkhard_Jung ;
rdfs:label "Leipzig"@de ;
geo:lat "51.333332"^^xsd:float ;
geo:long "12.383333"^^xsd:float .
• Even triples with the same subject and predicate can be grouped together
@prefix dbr: .
@prefix dbp: .
dbr:Leipzig dbp:locatedIn dbr:Saxony, dbr:Germany;
dbp:hasMayor dbr:Burkhard_Jung .
LITERALS II – DATATYPES
16
• Example: xsd:decimal
DATATYPES IN RDF
17
• So far: literals are untyped, treated as strings: "02" < "100" < "11" < "2"
• Typing allows better, in other words, semantic interpretation of values
• Datatypes get identified by URIs and are freely choosable
• Typically usage of XML Schema Datatypes (XSD)
• Syntax: "data value"^^<datatype-URI>
• rdf:HTML and rdf:XMLLiteral are the only predefined datatypes in RDF
• Used for HTML and XML fragments
EXAMPLE
18
• Graph:
• Turtle:
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
dbr:Leipzig geo:lat "51.333332"^^xsd:float ;
geo:long "12.383333"^^xsd:float .
LANGUAGE DECLARATION
19
• Influences only untyped literals
• Example:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
http://dbpedia.org/resource/Leipzig
rdfs:label "Leipzig"@de, "Леи пциг"@ru .
• In RDF 1.0 the following literals were all different, but implementations typically treated them the same.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dbr: <http://dbpedia.org/resource/> .
dbr:Leipzig
rdfs:label "Leipzig", "Leipzig"@de, "Leipzig"^^xsd:string .
• As of RDF 1.1 "Leipzig" is a shorthand for "Leipzig"^^xsd:string.
N-ARY RELATIONS I
20
• Cooking with RDF
• "For the preparation of mango chutney you need 450g of green mango , a teaspoon of cayenne pepper ..."
• 1st attempt to model this recipe:
• Not satisfying:
• Ingredients and amounts coded as strings
• Search for recipes which contain green mango not easily possible
@prefix ex: <http://example.org/> .
ex:Chutney ex:hasIngredient "450g green mango", "1tsp Cayenne pepper" .
N-ARY RELATIONS II
21
• Cooking with RDF
• "For the preparation of mango chutney you need 450g of green mango , a teaspoon of cayenne pepper ..."
• 2nd attempt to model this recipe:
• Even worse:
• No unambiguous association between ingredient and amount possible
@prefix ex: <http://example.org/> .
ex:Chutney
ex:ingredient ex:GreenMango;
ex:amount "450g" ;
ex:ingredient ex:CayennePepper;
ex:amount "1tsp" .
N-ARY RELATIONS III
22
• Problem: it is a real trivalent, or ternary relationship (see e.g. databases)
• Recipe Ingredient Amount
• Mango Chutney green Mango 450g
• Mango Chutney Cayenne pepper 1 tsp
• Directly not possible to express in RDF
• Solution: introduction of helper nodes
N-ARY RELATIONS IV
23
• Helper nodes in RDF:
• As graph:
• In Turtle Syntax:
@prefix ex: <http://example.org/> .
ex:Chutney ex:hasIngredient ex:ChutneyIngredient1.
ex:ChutneyIngredient1 ex:ingredient ex:GreenMango;
ex:amount "450g" .
BLANK NODES
24
• Blank nodes can be used for resources which don't need to be named
• Can be read as existential statements
• As graph:
• In Turtle Syntax:@prefix ex: <http://example.org/> .
ex:Chutney ex:hasIngredient _:id1 .
_:id1 ex:ingredient ex:GreenMango;
ex:amount "450g" .
# can be shortened:
ex:Chutney ex:hasIngredient
[ ex:ingredient ex:GreenMango;
ex:amount "450g" ] .
LISTS
25
• General data structures for enumerating arbitrarily many resources
• Distinction between
• Container: adding new elements possible ordered and unordered container types
• Collections: ordered list; adding new elements impossible
• Can be modeled with previously presented tools, so no additional expressiveness
TYPES OF CONTAINER
26
• The list root node is assigned one of the following rdf:types:
• rdf:Seq
• Interpretation as ordered list, sequence
• rdf:Bag
• Interpretation as unordered set
• Order coded in RDF not relevant
• rdf:Alt
• Set of alternatives
• Usually only one list element relevant
COLLECTIONS
27
• Idea: recursive partition of list into a head element and (possibly empty) rest list
• Turtle Syntax (Shortened Notation with brackets)
@prefix ex: <http://example.org/> .
ex:AKSW ex:groupLeaders (ex:Sören ex:Jens ex:Axel) .
SUMMARY
28
• extensively supported standard for storing and exchanging data
• enables almost syntax-independent representation of distributed information in a graph based data model
• pure RDF is very individual oriented
• almost no possibility to represent schema
MINI PROJECT – II
29
1. Create a small knowledge base in Turtle describing a domain (your family)!
2. Write an RDF resource description describing yourself in Turtle with labels in two different languages, your birthday and age!
3. Draw an RDF graph for representing a recipe for cup cakes!
4. Create an RDF list of European countries!