semantic web

EMU–School of Computing and Technology

Karwan Jacksi

PhD Student

School of Computing and Technology

Eastern Mediterranean University

Link to the slides

www.KarwanJacksi.net/seminars

Guest Lecture

Text Mining -Fall – 2014

11.11.2014

Semantic Web


Agenda

• Motivation

– Development of the Web

– Limitations of the current Web

• Technical Solution

– Introduction to Semantic Web

– Semantic Web Architecture and Languages

– Semantic Web Tools

• Summery

• References

11/18/2014Karwan Jacksi 2


Development of the Web

1. Internet

2. Web 1.0

3. Web 2.0



1-Internet

• “The Internet is a global system of interconnected computer

networks that use the standard Internet Protocol Suite

(TCP/IP) to serve billions of users worldwide.

• It is a network of networks that consists of millions of private

and public, academic, business, and government networks of

local to global scope that are linked by a broad array of

electronic and optical networking technologies.”



1-Internet Cont’d



Memex

Conceived

1945

WWW

Created

1989

Mosaic

Created

1993

A

Mathematical

Theory of

Communication

1948

Packet

Switching

Invented

1964

Silicon

Chip

1958

First Vast

Computer

Network

Envisioned

1962

ARPANET

1969

TCP/IP

Created

1972

Internet

Named

and

Goes

TCP/IP

1984

Hypertext

Invented

1965

Age of

eCommerce

Begins

1995

A brief summary of Internet evolution


1945 1995


2-Web 1.0

• “The World Wide Web ("WWW" or simply the "Web") is a

system of interlinked, hypertext documents that runs over the

Internet.

• With a Web browser, a user views Web pages that may

contain text, images, and other multimedia and navigates

between them using hyperlinks”.



Web 1.0 principles

• The success of Web1.0 is based on three simple principles:

1. A simple and uniform addressing schema to identify information

chunks i.e. Uniform Resource Identifiers (URIs)

2. A simple and uniform representation formalism to structure

information chunks allowing browsers to render them i.e.

Hyper Text Markup Language (HTML)

3. A simple and uniform protocol to access information chunks i.e.

Hyper Text Transfer Protocol (HTTP)



Web 1.0 principles Cont’d

• 1. Uniform Resource Identifiers (URIs)

– Uniform Resource Identifiers (URIs) are used to name/identify

resources on the Web

– URIs are pointers to resources to which request methods can be

applied to generate potentially different responses

– Resource can reside anywhere on the Internet

– Most popular form of a URI is the Uniform Resource Locator

(URL)




• 2. Hyper-Text Markup Language (HTML)

– Hyper-Text Markup Language:• A subset of Standardized General Markup Language (SGML)

• Facilitates a hyper-media environment

– Documents use elements to “mark up” or identify sections of text

for different purposes

– HTML markup consists of several types of entities, including:

elements, attributes, data types and character references

– Markup elements are not seen by the user when page is

displayed

– Documents are rendered by browsers




• 3. Hyper-Text Transfer Protocol (HTTP)

– Protocol for client/server communication• The heart of the Web

• Very simple request/response protocol– Client sends request message, server replies with response message

• Provide a way to publish and retrieve HTML pages

• Relies on URI naming mechanism



3- Web 2.0

• “The term "Web 2.0" (2004–present) is commonly associated

with web applications that facilitate interactive information

sharing, interoperability, user-centered design, and

collaboration on the World Wide Web”



3- Web 2.0 Cont’d

• A Web 2.0 site may allow users to interact and collaborate

with each other in a social media dialogue as creators of user-

generated content in a virtual community, in contrast to Web

sites where people are limited to the passive viewing

of content.

– Examples of Web 2.0 include social networking sites, blogs,

wikis, video sharing sites…etc.



3- Web 2.0 Cont’d

• With Web 1.0 technology a significant amount of software

skills and investment in software was necessary to publish

information.

– Web 2.0 technology changed this dramatically.



Web 2.0 major breakthroughs

• The four major breakthroughs of Web 2.0 are:

– Blurring the distinction between content consumers and content providers.

• Wiki, Blogs, and Twitter turned the publication of text in mass phenomena, as Flickr and YouTube did for multimedia

– Moving from media for individuals towards media for communities.• Social web sites such as Facebook, LinkedIn…etc allow communities of

users to smoothly interweave their information and activities

– Blurring the distinction between service consumers and service providers

• Mashups allow web users to easy integrate services in their web site that were implemented by third parties

– Integrating human and machine computing in a new and innovative way

• Amazon Mechanical Turk - allows to access human services through a web service interface blurring the distinction between manually and automatically provided services



Limitations of the current Web

• The current Web has its limitations when it comes to:

1. finding relevant information

2. extracting relevant information

3. combining and reusing information



1-Finding relevant information



1-Finding relevant information Cont’d

• Finding information on the current Web is based on keyword

search

• Keyword search has a limited recall and precision due to:

– Synonyms: • e.g. Searching information about “Cars” will ignore Web pages that

contain the word “Automobiles” even though the information on

these pages could be relevant

– Homonyms:• e.g. Searching information about “Jaguar” will bring up pages

containing information about both “Jaguar” (the car brand) and

“Jaguar” (the animal) even though the user is interested only in one

of them



1-Finding relevant information Cont’d

• Keyword search has a limited recall and precision due also to:

– Spelling variants:• e.g. “organize” in American English vs. “organise” in British English

– Spelling mistakes

– Multiple languages• i.e. information about same topics in published on the Web on

different languages (English, German, Italian,…)

• Current search engines provide no means to specify the

relation between a resource and a term

– e.g. sell / buy



2-Extracting relevant information

• One-fit-all automatic solution for extracting information from

Web pages is not possible due to different formats, different

syntaxes

• Even from a single Web page is difficult to extract the relevant

information


Which book is

about the Web?

What is the price

of the book?


2-Extracting relevant information Cont’d

• Extracting information from current web sites can be done

using wrappers


WEBHTML pages

Layout

Structured Data,Databases,

XMLStructure

Wrapper

extract

annotate

structure


2-Extracting relevant information Cont’d

• The actual extraction of information from web sites is

specified using standards such as XSL Transformation (XSLT)

• Extracted information can be stored as structured data in XML

format or databases.

• However, using wrappers do not really scale because the

actual extraction of information depends again on the web site

format and layout



3-Combining and reusing information

• Tasks often require to combine data on the Web

1. Searching for the same information in different digital libraries

2. Information may come from different web sites and needs to be

combined




1. Searches for the same information in different digital libraries


Example: I want travel from Innsbruck to Rome.



2. Information may come from different web sites and needs to

be combined


Example: I want to travel from Innsbruck to Rome where I want

to stay in a hotel and visit the city


How to improve current Web?

• Increasing automatic linking among data

• Increasing recall and precision in search

• Increasing automation in data integration

• Increasing automation in the service life cycle

• Adding semantics to data and services is the solution!



Introduction to semantic web

• The Vision


Static WWWURI, HTML, HTTP

More than 2 billion users

more than 50 billion pages


Introduction to semantic web cont’d


WWWURI, HTML, HTTP

Serious problems in• information finding,

• information extracting,

• information representing,

• information interpreting

• and information maintaining.

Semantic WebRDF, RDF(S), OWL

Static


What is the Semantic Web?

• “The Semantic Web is an extension of the current web in

which information is given well-defined meaning, better

enabling computers and people to work in cooperation.”

T. Berners-Lee, J. Hendler, O. Lassila, “The Semantic

Web”, Scientific American, May 2001



What is the Semantic Web? Cont’d

• The next generation of the WWW

• Information has machine-processable and machine-

understandable semantics

• Not a separate Web but an augmentation of the current one

• The backbone of Semantic Web are ontologies



Ontology definition

• Ontologies are the modeling foundations to Semantic Web

– They provide the well-defined meaning for information


formal, explicit specification of a shared conceptualization

commonly accepted

understanding

conceptual model

of a domain

(ontological theory)

unambiguous

terminology definitions

machine-readability

with computational

semantics


Ontology example

• Concept

– conceptual entity of the domain

• Property

– attribute describing a concept

• Relation

– relationship between concepts or

properties

• Axiom

– coherency description between Concepts

/ Properties / Relations via logical

expressions


Person

Student Professor

Lecture

isA – hierarchy (taxonomy)

name email

matr.-nr.research

field

topiclecture

nr.

attends holds

holds(Professor, Lecture) =>

Lecture.topic = Professor.researchField


Semantic Web of Data

1. Web Data Annotation

2. Data Linking on the Web (Web of Data)

3. Data Integration over the Web



1-Web Data Annotation


– connecting (syntactic) Web objects, like text chunks, images, … to

their semantic notion • e.g., this image is about Famagusta, Nazife Dimililer is a professor


2-Data Linking on the Web (Web of Data)

– global networking of knowledge through URI, RDF, and SPARQL • e.g., connecting my calendar with my rss feeds, my pictures, ...


LinkedCT


2-Data Linking on the Web (Web of Data) Cont’d

• Linked Open Data statistics:

– data sets: 123

– total number of triples: 19.562.409.691

– total number of links between data sets: 142.605.717

• Statistics available at: (These pages were last modified on 23 September 2010!)

– http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics

– http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/LinkStatistics


http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/Statistics

http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets/LinkStatistics


3-Data Integration over the Web

• Data integration involves combining data residing in different

sources and providing user with a unified view of these data

• Data integration over the Web can be implemented as follows:

1. Export the data sets to be integrated as RDF graphs

2. Merge identical resources (i.e. resources having the same URI)

from different data sets

3. Start making queries on the integrated data, queries that were

not possible on the individual data sets.

4. Example (next slide)



3-Data Integration over the Web Cont’d

1. Export first data set as RDF graph

For example the following RDF graph contains information about

book “The Glass Palace” by Amitav Ghosh




1. Export second data set as RDF graph

Information about the same book but in French this time is

modeled in RDF graph below




2. Merge identical resources (i.e. resources having the same

URI) from different data sets


Same URI = Same resource




2. Merge identical resources (i.e. resources having the same

URI) from different data sets



3. Start making queries on the integrated data

– A user of the second dataset may ask queries like: “give me the title

of the original book”

– This information is not in the second dataset!

– This information can be however retrieved from the integrated

dataset, in which the second dataset was connected with the first

dataset



Semantic Web Architecture and Languages

• Requirements

– Extensibility• Each layer should extend the previous one(s)

– Support for data interchange• Using data from one source in other applications

– Support for ontology description with different complexity• Including rules

– Support for data query

– Support for data provenance and trust evaluation



Semantic Web Stack



UNICODE, URI and XML

• UNICODE is the standard international character set

– E.g. used to encode the data in the repository• ASCII –7 bit, 128 characters (a-z, A-Z, 0-9, punctuation

• Extension code pages –128 chars (ß, Ä, ñ, ø, Š, etc.)– Different systems, many different code pages

– ISO Latin 1, CP1252 –Western languages(197 = Å)

– ISO Latin 2, CP1250 –East Europe(197 = Ĺ)

• Code page is an interpretation, not a property of text– Thus if we do not interpret correctly the code page, the result visualized

will not be the expected one

• Uniform Resource Identifiers (URIs) identify things and

concepts

– E.g. used to identify resources on the Web and in the repository• URI Syntax scheme: [//authority] [/path] [?query] [#fragid]



UNICODE, URI and XML Cont’d

• eXtensible Markup Language (XML) is a markup language

used for data exchange

– Language for creating languages

– “Meta-language”

– XHTML is a language: HTML expressed in XML• E.g. format that can be wrapped into RDF and imported into the

repository

– Examples:• <temperature unit="F">64</temperature>

• <swearword language='fr'>con</swearword>



RDF, RDFS and OWL

• Resource Description Framework (RDF) is the HTML of the Semantic Web

– Simple way to describe resources on the Web

– Based on triples <subject, predicate, object>

• Various serializations, including one based on XML (RDF/XML, Turtle,N3…etc)

• Example:

– <ex:john, ex:father-of, ex:bill>

– <#john, rdf:type, #Student>

– What is a “#Student”?

• RFD is not defining a vocabulary about the statements, but only to expressstatements

• We know that “#Student” identifies a category (a concept or a class), but this is only implicitly defined in RDF

• We need a language for defining RDF types:

– Define classes:• “#Student is a class”

– –Relationships between classes:• “#Student is a sub-class of #Person”

– –Properties of classes:• “#Person has a property hasName”

• RDF Schema (RDFS) is such a language


Person

Student Professor

Lecture

isA – hierarchy (taxonomy)

name email

matr.-nr.research

field

topiclecture

nr.

attends holds


RDF, RDFS and OWL Cont’d

• Web Ontology Language (OWL) is a more complex ontology language than RDFS– Layered language based on DL

– Overcomes some RDF(S) limitations



SPARQL and Rule Languages

• SPARQL

– Query language for RDF triples

– A protocol for querying RDF data over the Web

– A language used to query the repository from the user interface• E.g. What are all the country capitals in Africa?

• Rule languages (esp. Rule Interchange Format RIF)

– Extend ontology languages with proprietary axioms

– Based on different types of logics• Description Logic• Logic Programming

– E.g. used to enable reasoning over data to infer new knowledge


PREFIX abc: <http://example.com/exampleOntology#>

SELECT ?capital ?country

WHERE {

?x abc:cityname ?capital ;

abc:isCapitalOf ?y .

?y abc:countryname ?country ;

abc:isInContinent abc:Africa . }

holds(Professor, Lecture) =>

Lecture.topic = Professor.researchField


Logics, Proof and Trust

• Unifying logic

– Bring together the various ontology and rule languages

– Common inferences, meaning of data

• Proof

– Explanation of inference results, data provenance

• Trust

– Trust that the system performs correctly

– Trust that the system can explain what it is doing

– Network of trust for data sources and services

– Technology and user interface



Tools for semantic web

• The Semantic web tools can be grouped according to their

purpose

1. the ontology Editors are used for data modeling• The visualization and editing of Ontologies are added to be the

extensions of Ontology Editors.

2. the Reasoner for the notion of generalized Inference

mechanism.

3. the Plugin and APIs that can set up the Semantic web

development environment.



Ontology Editor

• The Ontology Editors are used for creating a data model for

the underlying Semantic web Application

– It favours manipulating the available ontologies for access.

• The ontologies are constructed using standard Ontology

languages.

• The Ontology Languages are formal languages that depends

with reasoning rules for framing these ontologies

• The Ontology Languages are classified either by syntax or

structure.

– The Syntax oriented Ontology Languages like OWL, RDF and

OIL are more prevalent ontology languages.

– The Ontology Editors can be classified based on the type of

ontology languages used.



Ontology Editor Cont’d

• Protege (today) http://protege.stanford.edu

• Neon Toolkit: www.neon-toolkit.org

• myOntology: www.myontology.org

• Semantic Media Wiki

– HALO extension http://www.mediawiki.org/wiki/Extension:Halo_Extension

– Ontology editor extension http://smw-active.sti-innsbruck.at

• DOGMA Modeler http://starlab.vub.ac.be/website/node/47

• OntoStudio http://www.ontoprise.de/

• TopBraid Composer http://www.topbraidcomposer.com/11/18/2014Karwan Jacksi 54

http://protege.stanford.edu/

http://www.neon-toolkit.org/

http://www.myontology.org/

http://www.mediawiki.org/wiki/Extension:Halo_Extension

http://smw-active.sti-innsbruck.at/

http://starlab.vub.ac.be/website/node/47

http://www.ontoprise.de/

http://www.topbraidcomposer.com/


Reasoner

• The Inference mechanism on Reasoners are the rich

capability provided to infer logical consequences from axioms.

• These Logical consequences are nothing but the relationship

between statements that are true, where axioms are the

statements which are accepted to be true without controversy.



Reasoner Cont’d

• AllegroGraph http://agraph.franz.com/

• Fact http://www.cs.man.ac.uk/%7Ehorrocks/FaCT/

• Pellet http://clarkparsia.com/pellet

• Racer http://www.racer-systems.com/

• IRIS http://www.sti-innsbruck.at/

• OWLIM http://http//ontotext.com/owlim/

• KAON http://kaon2.semanticweb.org/


http://agraph.franz.com/

http://www.cs.man.ac.uk/~horrocks/FaCT/

http://clarkparsia.com/pellet

http://www.racer-systems.com/

http://www.sti-innsbruck.at/

http://http/ontotext.com/owlim/

http://kaon2.semanticweb.org/


Plugin and API

• The Application Programming Interface (API) is a set of

protocols, and tools for building software applications.

– It favors easier development of a program by providing

necessary building blocks and assembling these building blocks

is the remaining part to work.

• The APIs are more useful for users than programmers.

– Since, they provide a common API with similar interface for all

programs.

• This leads to an easier learning task for even naïve users.



Plugin and API Cont’d

• The plug-ins are a set of software components used for

adding specific abilities to a larger software application.

– For Example, Pellet Reasoner Plugins are added in Eclipse IDE

for inferring the relationships.



Plugin and API Cont’d

• Jena API

– The Jena API is a well known Semantic web platform among

Java framework for building Semantic Web applications.

– It includes wide range of java libraries that help developers to

develop code easier.

– The first approach of developing Jena was underwent by

researchers in HP Labs by 2000.

– Jena is an open-source project, and has been used in a wide

variety of semantic web applications.

– In 2010, Jena was adopted by the Apache Software Foundation



Summery

• Semantic Web is not a replacement of the current Web, it’s

an evolution of it

• Semantic Web is about:

– annotation of data on the Web

– data linking on the Web

– data Integration over the Web

• Semantic Web aims at automating tasks currently carried out

by humans

• Semantic Web is becoming real!



Good Videos

• Evolution Web 1.0, Web 2.0 to Web 3.0• https://www.youtube.com/watch?v=bsNcjya56v8

• Web today -Web 2.0• https://www.youtube.com/watch?v=6gmP4nk0EOE

• The Future Internet: Service Web 3.0• https://www.youtube.com/watch?v=off08As3siM

• Web 3.0 - The Internet of Things!• https://www.youtube.com/watch?v=F_nbUizGeEY

• Web 3.0• http://www.sti-innsbruck.at/results/movies/web-30


https://www.youtube.com/watch?v=bsNcjya56v8

https://www.youtube.com/watch?v=6gmP4nk0EOE

https://www.youtube.com/watch?v=off08As3siM

https://www.youtube.com/watch?v=F_nbUizGeEY

http://www.sti-innsbruck.at/results/movies/web-30


References

• A Survey on Tools Essential for Semantic Web Research

– International Journal of Computer Applications (0975 – 8887) Volume 62– No.9, January 2013

• A Review on Semantic-Based Web Mining and its Applications

– Sivakumar J et al. / International Journal of Engineering and Technology (IJET)

• Wikipedia

– https://en.wikipedia.org/wiki/Internet

– https://en.wikipedia.org/wiki/Memex

– https://en.wikipedia.org/wiki/Integrated_circuit

– https://en.wikipedia.org/wiki/Mosaic_(web_browser)

– https://en.wikipedia.org/wiki/World_Wide_Web

– https://en.wikipedia.org/wiki/Web_2.0

– http://en.wikipedia.org/wiki/Semantic_Web

– https://en.wikipedia.org/wiki/Amazon_Mechanical_Turk

– http://en.wikipedia.org/wiki/Resource_Description_Framework

– http://en.wikipedia.org/wiki/Linked_Data


https://en.wikipedia.org/wiki/Internet

https://en.wikipedia.org/wiki/Memex

https://en.wikipedia.org/wiki/Integrated_circuit

https://en.wikipedia.org/wiki/Mosaic_(web_browser)

https://en.wikipedia.org/wiki/World_Wide_Web

https://en.wikipedia.org/wiki/Web_2.0

http://en.wikipedia.org/wiki/Semantic_Web

https://en.wikipedia.org/wiki/Amazon_Mechanical_Turk

http://en.wikipedia.org/wiki/Resource_Description_Framework

http://en.wikipedia.org/wiki/Linked_Data


References

• Other Links

– http://www.w3.org/TR/xslt

– http://www.ontoprise.de

– http://linkeddata.org

– http://www.w3.org/People/Ivan/CorePresentations/SWTutorial/Sli

des.pdf

– http://sti-innsbruck.at/sites/default/files/courses/01_SW-

Introduction.pdf

– http://facweb.cs.depaul.edu/mobasher/classes/it130/internet-

www/index.html


http://www.w3.org/TR/xslt

http://www.ontoprise.de/

http://linkeddata.org/

http://www.w3.org/People/Ivan/CorePresentations/SWTutorial/Slides.pdf

http://facweb.cs.depaul.edu/mobasher/classes/it130/internet-www/index.html

http://facweb.cs.depaul.edu/mobasher/classes/it130/internet-www/index.html


THE END

Your Questions?