semantic web
DESCRIPTION
Semantic Web Overview The slides are presented to the school of computing and technology at Eastern Mediterranean UniversityTRANSCRIPT
EMU–School of Computing and Technology
Karwan Jacksi
PhD Student
School of Computing and Technology
Eastern Mediterranean University
Link to the slides
www.KarwanJacksi.net/seminars
Guest Lecture
Text Mining -Fall – 2014
11.11.2014
Semantic Web
EMU–School of Computing and Technology
Agenda
• Motivation
– Development of the Web
– Limitations of the current Web
• Technical Solution
– Introduction to Semantic Web
– Semantic Web Architecture and Languages
– Semantic Web Tools
• Summery
• References
11/18/2014Karwan Jacksi 2
EMU–School of Computing and Technology
Development of the Web
1. Internet
2. Web 1.0
3. Web 2.0
11/18/2014Karwan Jacksi 3
EMU–School of Computing and Technology
1-Internet
• “The Internet is a global system of interconnected computer
networks that use the standard Internet Protocol Suite
(TCP/IP) to serve billions of users worldwide.
• It is a network of networks that consists of millions of private
and public, academic, business, and government networks of
local to global scope that are linked by a broad array of
electronic and optical networking technologies.”
11/18/2014Karwan Jacksi 4
EMU–School of Computing and Technology
1-Internet Cont’d
11/18/2014Karwan Jacksi 5
EMU–School of Computing and Technology
1-Internet Cont’d
11/18/2014Karwan Jacksi 6
EMU–School of Computing and Technology
Memex
Conceived
1945
WWW
Created
1989
Mosaic
Created
1993
A
Mathematical
Theory of
Communication
1948
Packet
Switching
Invented
1964
Silicon
Chip
1958
First Vast
Computer
Network
Envisioned
1962
ARPANET
1969
TCP/IP
Created
1972
Internet
Named
and
Goes
TCP/IP
1984
Hypertext
Invented
1965
Age of
eCommerce
Begins
1995
A brief summary of Internet evolution
11/18/2014Karwan Jacksi 7
1945 1995
EMU–School of Computing and Technology
2-Web 1.0
• “The World Wide Web ("WWW" or simply the "Web") is a
system of interlinked, hypertext documents that runs over the
Internet.
• With a Web browser, a user views Web pages that may
contain text, images, and other multimedia and navigates
between them using hyperlinks”.
11/18/2014Karwan Jacksi 8
EMU–School of Computing and Technology
Web 1.0 principles
• The success of Web1.0 is based on three simple principles:
1. A simple and uniform addressing schema to identify information
chunks i.e. Uniform Resource Identifiers (URIs)
2. A simple and uniform representation formalism to structure
information chunks allowing browsers to render them i.e.
Hyper Text Markup Language (HTML)
3. A simple and uniform protocol to access information chunks i.e.
Hyper Text Transfer Protocol (HTTP)
11/18/2014Karwan Jacksi 9
EMU–School of Computing and Technology
Web 1.0 principles Cont’d
• 1. Uniform Resource Identifiers (URIs)
– Uniform Resource Identifiers (URIs) are used to name/identify
resources on the Web
– URIs are pointers to resources to which request methods can be
applied to generate potentially different responses
– Resource can reside anywhere on the Internet
– Most popular form of a URI is the Uniform Resource Locator
(URL)
11/18/2014Karwan Jacksi 10
EMU–School of Computing and Technology
Web 1.0 principles Cont’d
• 2. Hyper-Text Markup Language (HTML)
– Hyper-Text Markup Language:• A subset of Standardized General Markup Language (SGML)
• Facilitates a hyper-media environment
– Documents use elements to “mark up” or identify sections of text
for different purposes
– HTML markup consists of several types of entities, including:
elements, attributes, data types and character references
– Markup elements are not seen by the user when page is
displayed
– Documents are rendered by browsers
11/18/2014Karwan Jacksi 11
EMU–School of Computing and Technology
Web 1.0 principles Cont’d
• 3. Hyper-Text Transfer Protocol (HTTP)
– Protocol for client/server communication• The heart of the Web
• Very simple request/response protocol– Client sends request message, server replies with response message
• Provide a way to publish and retrieve HTML pages
• Relies on URI naming mechanism
11/18/2014Karwan Jacksi 12
EMU–School of Computing and Technology
3- Web 2.0
• “The term "Web 2.0" (2004–present) is commonly associated
with web applications that facilitate interactive information
sharing, interoperability, user-centered design, and
collaboration on the World Wide Web”
11/18/2014Karwan Jacksi 13
EMU–School of Computing and Technology
3- Web 2.0 Cont’d
• A Web 2.0 site may allow users to interact and collaborate
with each other in a social media dialogue as creators of user-
generated content in a virtual community, in contrast to Web
sites where people are limited to the passive viewing
of content.
– Examples of Web 2.0 include social networking sites, blogs,
wikis, video sharing sites…etc.
11/18/2014Karwan Jacksi 14
EMU–School of Computing and Technology
3- Web 2.0 Cont’d
• With Web 1.0 technology a significant amount of software
skills and investment in software was necessary to publish
information.
– Web 2.0 technology changed this dramatically.
11/18/2014Karwan Jacksi 15
EMU–School of Computing and Technology
Web 2.0 major breakthroughs
• The four major breakthroughs of Web 2.0 are:
– Blurring the distinction between content consumers and content providers.
• Wiki, Blogs, and Twitter turned the publication of text in mass phenomena, as Flickr and YouTube did for multimedia
– Moving from media for individuals towards media for communities.• Social web sites such as Facebook, LinkedIn…etc allow communities of
users to smoothly interweave their information and activities
– Blurring the distinction between service consumers and service providers
• Mashups allow web users to easy integrate services in their web site that were implemented by third parties
– Integrating human and machine computing in a new and innovative way
• Amazon Mechanical Turk - allows to access human services through a web service interface blurring the distinction between manually and automatically provided services
11/18/2014Karwan Jacksi 16
EMU–School of Computing and Technology
Limitations of the current Web
• The current Web has its limitations when it comes to:
1. finding relevant information
2. extracting relevant information
3. combining and reusing information
11/18/2014Karwan Jacksi 17
EMU–School of Computing and Technology
1-Finding relevant information
11/18/2014Karwan Jacksi 18
EMU–School of Computing and Technology
1-Finding relevant information Cont’d
• Finding information on the current Web is based on keyword
search
• Keyword search has a limited recall and precision due to:
– Synonyms: • e.g. Searching information about “Cars” will ignore Web pages that
contain the word “Automobiles” even though the information on
these pages could be relevant
– Homonyms:• e.g. Searching information about “Jaguar” will bring up pages
containing information about both “Jaguar” (the car brand) and
“Jaguar” (the animal) even though the user is interested only in one
of them
11/18/2014Karwan Jacksi 19
EMU–School of Computing and Technology
1-Finding relevant information Cont’d
• Keyword search has a limited recall and precision due also to:
– Spelling variants:• e.g. “organize” in American English vs. “organise” in British English
– Spelling mistakes
– Multiple languages• i.e. information about same topics in published on the Web on
different languages (English, German, Italian,…)
• Current search engines provide no means to specify the
relation between a resource and a term
– e.g. sell / buy
11/18/2014Karwan Jacksi 20
EMU–School of Computing and Technology
2-Extracting relevant information
• One-fit-all automatic solution for extracting information from
Web pages is not possible due to different formats, different
syntaxes
• Even from a single Web page is difficult to extract the relevant
information
11/18/2014Karwan Jacksi 21
Which book is
about the Web?
What is the price
of the book?
EMU–School of Computing and Technology
2-Extracting relevant information Cont’d
• Extracting information from current web sites can be done
using wrappers
11/18/2014Karwan Jacksi 22
WEBHTML pages
Layout
Structured Data,Databases,
XMLStructure
Wrapper
extract
annotate
structure
EMU–School of Computing and Technology
2-Extracting relevant information Cont’d
• The actual extraction of information from web sites is
specified using standards such as XSL Transformation (XSLT)
• Extracted information can be stored as structured data in XML
format or databases.
• However, using wrappers do not really scale because the
actual extraction of information depends again on the web site
format and layout
11/18/2014Karwan Jacksi 23
EMU–School of Computing and Technology
3-Combining and reusing information
• Tasks often require to combine data on the Web
1. Searching for the same information in different digital libraries
2. Information may come from different web sites and needs to be
combined
11/18/2014Karwan Jacksi 24
EMU–School of Computing and Technology
3-Combining and reusing information
1. Searches for the same information in different digital libraries
11/18/2014Karwan Jacksi 25
Example: I want travel from Innsbruck to Rome.
EMU–School of Computing and Technology
3-Combining and reusing information
2. Information may come from different web sites and needs to
be combined
11/18/2014Karwan Jacksi 26
Example: I want to travel from Innsbruck to Rome where I want
to stay in a hotel and visit the city
EMU–School of Computing and Technology
How to improve current Web?
• Increasing automatic linking among data
• Increasing recall and precision in search
• Increasing automation in data integration
• Increasing automation in the service life cycle
• Adding semantics to data and services is the solution!
11/18/2014Karwan Jacksi 27
EMU–School of Computing and Technology
Introduction to semantic web
• The Vision
11/18/2014Karwan Jacksi 28
Static WWWURI, HTML, HTTP
More than 2 billion users
more than 50 billion pages
EMU–School of Computing and Technology
Introduction to semantic web cont’d
11/18/2014Karwan Jacksi 29
WWWURI, HTML, HTTP
Serious problems in• information finding,
• information extracting,
• information representing,
• information interpreting
• and information maintaining.
Semantic WebRDF, RDF(S), OWL
Static
EMU–School of Computing and Technology
What is the Semantic Web?
• “The Semantic Web is an extension of the current web in
which information is given well-defined meaning, better
enabling computers and people to work in cooperation.”
T. Berners-Lee, J. Hendler, O. Lassila, “The Semantic
Web”, Scientific American, May 2001
11/18/2014Karwan Jacksi 30
EMU–School of Computing and Technology
What is the Semantic Web? Cont’d
• The next generation of the WWW
• Information has machine-processable and machine-
understandable semantics
• Not a separate Web but an augmentation of the current one
• The backbone of Semantic Web are ontologies
11/18/2014Karwan Jacksi 31
EMU–School of Computing and Technology
Ontology definition
• Ontologies are the modeling foundations to Semantic Web
– They provide the well-defined meaning for information
11/18/2014Karwan Jacksi 32
formal, explicit specification of a shared conceptualization
commonly accepted
understanding
conceptual model
of a domain
(ontological theory)
unambiguous
terminology definitions
machine-readability
with computational
semantics
EMU–School of Computing and Technology
Ontology example
• Concept
– conceptual entity of the domain
• Property
– attribute describing a concept
• Relation
– relationship between concepts or
properties
• Axiom
– coherency description between Concepts
/ Properties / Relations via logical
expressions
11/18/2014Karwan Jacksi 33
Person
Student Professor
Lecture
isA – hierarchy (taxonomy)
name email
matr.-nr.research
field
topiclecture
nr.
attends holds
holds(Professor, Lecture) =>
Lecture.topic = Professor.researchField
EMU–School of Computing and Technology
Semantic Web of Data
1. Web Data Annotation
2. Data Linking on the Web (Web of Data)
3. Data Integration over the Web
11/18/2014Karwan Jacksi 34
EMU–School of Computing and Technology
1-Web Data Annotation
11/18/2014Karwan Jacksi 35
– connecting (syntactic) Web objects, like text chunks, images, … to
their semantic notion • e.g., this image is about Famagusta, Nazife Dimililer is a professor
EMU–School of Computing and Technology
2-Data Linking on the Web (Web of Data)
– global networking of knowledge through URI, RDF, and SPARQL • e.g., connecting my calendar with my rss feeds, my pictures, ...
11/18/2014Karwan Jacksi 36
LinkedCT
EMU–School of Computing and Technology
2-Data Linking on the Web (Web of Data) Cont’d
• Linked Open Data statistics:
– data sets: 123
– total number of triples: 19.562.409.691
– total number of links between data sets: 142.605.717
• Statistics available at: (These pages were last modified on 23 September 2010!)
– http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics
– http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/LinkStatistics
11/18/2014Karwan Jacksi 37
EMU–School of Computing and Technology
3-Data Integration over the Web
• Data integration involves combining data residing in different
sources and providing user with a unified view of these data
• Data integration over the Web can be implemented as follows:
1. Export the data sets to be integrated as RDF graphs
2. Merge identical resources (i.e. resources having the same URI)
from different data sets
3. Start making queries on the integrated data, queries that were
not possible on the individual data sets.
4. Example (next slide)
11/18/2014Karwan Jacksi 38
EMU–School of Computing and Technology
3-Data Integration over the Web Cont’d
1. Export first data set as RDF graph
For example the following RDF graph contains information about
book “The Glass Palace” by Amitav Ghosh
11/18/2014Karwan Jacksi 39
EMU–School of Computing and Technology
3-Data Integration over the Web Cont’d
1. Export second data set as RDF graph
Information about the same book but in French this time is
modeled in RDF graph below
11/18/2014Karwan Jacksi 40
EMU–School of Computing and Technology
3-Data Integration over the Web Cont’d
2. Merge identical resources (i.e. resources having the same
URI) from different data sets
11/18/2014Karwan Jacksi 41
Same URI = Same resource
EMU–School of Computing and Technology
3-Data Integration over the Web Cont’d
11/18/2014Karwan Jacksi 42
2. Merge identical resources (i.e. resources having the same
URI) from different data sets
EMU–School of Computing and Technology
3-Data Integration over the Web Cont’d
3. Start making queries on the integrated data
– A user of the second dataset may ask queries like: “give me the title
of the original book”
– This information is not in the second dataset!
– This information can be however retrieved from the integrated
dataset, in which the second dataset was connected with the first
dataset
11/18/2014Karwan Jacksi 43
EMU–School of Computing and Technology
Semantic Web Architecture and Languages
• Requirements
– Extensibility• Each layer should extend the previous one(s)
– Support for data interchange• Using data from one source in other applications
– Support for ontology description with different complexity• Including rules
– Support for data query
– Support for data provenance and trust evaluation
11/18/2014Karwan Jacksi 44
EMU–School of Computing and Technology
Semantic Web Stack
11/18/2014Karwan Jacksi 45
EMU–School of Computing and Technology
UNICODE, URI and XML
• UNICODE is the standard international character set
– E.g. used to encode the data in the repository• ASCII –7 bit, 128 characters (a-z, A-Z, 0-9, punctuation
• Extension code pages –128 chars (ß, Ä, ñ, ø, Š, etc.)– Different systems, many different code pages
– ISO Latin 1, CP1252 –Western languages(197 = Å)
– ISO Latin 2, CP1250 –East Europe(197 = Ĺ)
• Code page is an interpretation, not a property of text– Thus if we do not interpret correctly the code page, the result visualized
will not be the expected one
• Uniform Resource Identifiers (URIs) identify things and
concepts
– E.g. used to identify resources on the Web and in the repository• URI Syntax scheme: [//authority] [/path] [?query] [#fragid]
11/18/2014Karwan Jacksi 46
EMU–School of Computing and Technology
UNICODE, URI and XML Cont’d
• eXtensible Markup Language (XML) is a markup language
used for data exchange
– Language for creating languages
– “Meta-language”
– XHTML is a language: HTML expressed in XML• E.g. format that can be wrapped into RDF and imported into the
repository
– Examples:• <temperature unit="F">64</temperature>
• <swearword language='fr'>con</swearword>
11/18/2014Karwan Jacksi 47
EMU–School of Computing and Technology
RDF, RDFS and OWL
• Resource Description Framework (RDF) is the HTML of the Semantic Web
– Simple way to describe resources on the Web
– Based on triples <subject, predicate, object>
• Various serializations, including one based on XML (RDF/XML, Turtle,N3…etc)
• Example:
– <ex:john, ex:father-of, ex:bill>
– <#john, rdf:type, #Student>
– What is a “#Student”?
• RFD is not defining a vocabulary about the statements, but only to expressstatements
• We know that “#Student” identifies a category (a concept or a class), but this is only implicitly defined in RDF
• We need a language for defining RDF types:
– Define classes:• “#Student is a class”
– –Relationships between classes:• “#Student is a sub-class of #Person”
– –Properties of classes:• “#Person has a property hasName”
• RDF Schema (RDFS) is such a language
11/18/2014Karwan Jacksi 48
Person
Student Professor
Lecture
isA – hierarchy (taxonomy)
name email
matr.-nr.research
field
topiclecture
nr.
attends holds
EMU–School of Computing and Technology
RDF, RDFS and OWL Cont’d
• Web Ontology Language (OWL) is a more complex ontology language than RDFS– Layered language based on DL
– Overcomes some RDF(S) limitations
11/18/2014Karwan Jacksi 49
EMU–School of Computing and Technology
SPARQL and Rule Languages
• SPARQL
– Query language for RDF triples
– A protocol for querying RDF data over the Web
– A language used to query the repository from the user interface• E.g. What are all the country capitals in Africa?
• Rule languages (esp. Rule Interchange Format RIF)
– Extend ontology languages with proprietary axioms
– Based on different types of logics• Description Logic• Logic Programming
– E.g. used to enable reasoning over data to infer new knowledge
11/18/2014Karwan Jacksi 50
PREFIX abc: <http://example.com/exampleOntology#>
SELECT ?capital ?country
WHERE {
?x abc:cityname ?capital ;
abc:isCapitalOf ?y .
?y abc:countryname ?country ;
abc:isInContinent abc:Africa . }
holds(Professor, Lecture) =>
Lecture.topic = Professor.researchField
EMU–School of Computing and Technology
Logics, Proof and Trust
• Unifying logic
– Bring together the various ontology and rule languages
– Common inferences, meaning of data
• Proof
– Explanation of inference results, data provenance
• Trust
– Trust that the system performs correctly
– Trust that the system can explain what it is doing
– Network of trust for data sources and services
– Technology and user interface
11/18/2014Karwan Jacksi 51
EMU–School of Computing and Technology
Tools for semantic web
• The Semantic web tools can be grouped according to their
purpose
1. the ontology Editors are used for data modeling• The visualization and editing of Ontologies are added to be the
extensions of Ontology Editors.
2. the Reasoner for the notion of generalized Inference
mechanism.
3. the Plugin and APIs that can set up the Semantic web
development environment.
11/18/2014Karwan Jacksi 52
EMU–School of Computing and Technology
Ontology Editor
• The Ontology Editors are used for creating a data model for
the underlying Semantic web Application
– It favours manipulating the available ontologies for access.
• The ontologies are constructed using standard Ontology
languages.
• The Ontology Languages are formal languages that depends
with reasoning rules for framing these ontologies
• The Ontology Languages are classified either by syntax or
structure.
– The Syntax oriented Ontology Languages like OWL, RDF and
OIL are more prevalent ontology languages.
– The Ontology Editors can be classified based on the type of
ontology languages used.
11/18/2014Karwan Jacksi 53
EMU–School of Computing and Technology
Ontology Editor Cont’d
• Protege (today) http://protege.stanford.edu
• Neon Toolkit: www.neon-toolkit.org
• myOntology: www.myontology.org
• Semantic Media Wiki
– HALO extension http://www.mediawiki.org/wiki/Extension:Halo_Extension
– Ontology editor extension http://smw-active.sti-innsbruck.at
• DOGMA Modeler http://starlab.vub.ac.be/website/node/47
• OntoStudio http://www.ontoprise.de/
• TopBraid Composer http://www.topbraidcomposer.com/11/18/2014Karwan Jacksi 54
EMU–School of Computing and Technology
Reasoner
• The Inference mechanism on Reasoners are the rich
capability provided to infer logical consequences from axioms.
• These Logical consequences are nothing but the relationship
between statements that are true, where axioms are the
statements which are accepted to be true without controversy.
11/18/2014Karwan Jacksi 55
EMU–School of Computing and Technology
Reasoner Cont’d
• AllegroGraph http://agraph.franz.com/
• Fact http://www.cs.man.ac.uk/%7Ehorrocks/FaCT/
• Pellet http://clarkparsia.com/pellet
• Racer http://www.racer-systems.com/
• IRIS http://www.sti-innsbruck.at/
• OWLIM http://http//ontotext.com/owlim/
• KAON http://kaon2.semanticweb.org/
11/18/2014Karwan Jacksi 56
EMU–School of Computing and Technology
Plugin and API
• The Application Programming Interface (API) is a set of
protocols, and tools for building software applications.
– It favors easier development of a program by providing
necessary building blocks and assembling these building blocks
is the remaining part to work.
• The APIs are more useful for users than programmers.
– Since, they provide a common API with similar interface for all
programs.
• This leads to an easier learning task for even naïve users.
11/18/2014Karwan Jacksi 57
EMU–School of Computing and Technology
Plugin and API Cont’d
• The plug-ins are a set of software components used for
adding specific abilities to a larger software application.
– For Example, Pellet Reasoner Plugins are added in Eclipse IDE
for inferring the relationships.
11/18/2014Karwan Jacksi 58
EMU–School of Computing and Technology
Plugin and API Cont’d
• Jena API
– The Jena API is a well known Semantic web platform among
Java framework for building Semantic Web applications.
– It includes wide range of java libraries that help developers to
develop code easier.
– The first approach of developing Jena was underwent by
researchers in HP Labs by 2000.
– Jena is an open-source project, and has been used in a wide
variety of semantic web applications.
– In 2010, Jena was adopted by the Apache Software Foundation
11/18/2014Karwan Jacksi 59
EMU–School of Computing and Technology
Summery
• Semantic Web is not a replacement of the current Web, it’s
an evolution of it
• Semantic Web is about:
– annotation of data on the Web
– data linking on the Web
– data Integration over the Web
• Semantic Web aims at automating tasks currently carried out
by humans
• Semantic Web is becoming real!
11/18/2014Karwan Jacksi 60
EMU–School of Computing and Technology
Good Videos
• Evolution Web 1.0, Web 2.0 to Web 3.0• https://www.youtube.com/watch?v=bsNcjya56v8
• Web today -Web 2.0• https://www.youtube.com/watch?v=6gmP4nk0EOE
• The Future Internet: Service Web 3.0• https://www.youtube.com/watch?v=off08As3siM
• Web 3.0 - The Internet of Things!• https://www.youtube.com/watch?v=F_nbUizGeEY
• Web 3.0• http://www.sti-innsbruck.at/results/movies/web-30
11/18/2014Karwan Jacksi 61
EMU–School of Computing and Technology
References
• A Survey on Tools Essential for Semantic Web Research
– International Journal of Computer Applications (0975 – 8887) Volume 62– No.9, January 2013
• A Review on Semantic-Based Web Mining and its Applications
– Sivakumar J et al. / International Journal of Engineering and Technology (IJET)
• Wikipedia
– https://en.wikipedia.org/wiki/Internet
– https://en.wikipedia.org/wiki/Memex
– https://en.wikipedia.org/wiki/Integrated_circuit
– https://en.wikipedia.org/wiki/Mosaic_(web_browser)
– https://en.wikipedia.org/wiki/World_Wide_Web
– https://en.wikipedia.org/wiki/Web_2.0
– http://en.wikipedia.org/wiki/Semantic_Web
– https://en.wikipedia.org/wiki/Amazon_Mechanical_Turk
– http://en.wikipedia.org/wiki/Resource_Description_Framework
– http://en.wikipedia.org/wiki/Linked_Data
11/18/2014Karwan Jacksi 62
EMU–School of Computing and Technology
References
• Other Links
– http://www.w3.org/TR/xslt
– http://www.ontoprise.de
– http://linkeddata.org
– http://www.w3.org/People/Ivan/CorePresentations/SWTutorial/Sli
des.pdf
– http://sti-innsbruck.at/sites/default/files/courses/01_SW-
Introduction.pdf
– http://facweb.cs.depaul.edu/mobasher/classes/it130/internet-
www/index.html
11/18/2014Karwan Jacksi 63
EMU–School of Computing and Technology
THE END
Your Questions?
11/18/2014Karwan Jacksi 64