invited talk university of athens october 21, 2008

67
Invited Talk University of Athens October 21, 2008 Towards Data Data Mashups and Pipes Dr. Mustafa Jarrar [email protected] HPCLab, University of Cyprus MashQL MashQL Reading: Mustafa Jarrar and Marios D. Dikaiakos: MashQL: A Query-by-Diagram Topping SPARQL - Towards Semantic Data Mashups. In ONISW’08 workshop, part of the CiKM'08 confernce, ACM. 2008 http://www.jarrar.info/publications/JD08.pdf

Upload: infinity

Post on 05-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Invited Talk University of Athens October 21, 2008. MashQL. Towards Data Mashups and Pipes. Dr. Mustafa Jarrar [email protected] HPCLab, University of Cyprus. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Invited Talk  University of Athens October 21, 2008

Invited Talk University of Athens

October 21, 2008

Towards DataData Mashups and Pipes

Dr. Mustafa [email protected]

HPCLab, University of Cyprus

MashQLMashQL

Reading: Mustafa Jarrar and Marios D. Dikaiakos: MashQL: A Query-by-Diagram Topping SPARQL -Towards Semantic Data Mashups. In ONISW’08 workshop, part of the CiKM'08 confernce, ACM. 2008 http://www.jarrar.info/publications/JD08.pdf

Page 2: Invited Talk  University of Athens October 21, 2008

Imagine Imagine We are in 3008.We are in 3008.

The internet is a databaseInformation about every little thingInformation about every little thing

Structured,granular data

Semantics, linked data

How we will yahoo/google this knowledge !!? (oracle?)

Page 3: Invited Talk  University of Athens October 21, 2008

• The Data Web and the role of Mashups

• Mashup Challenges

• MashQL (A new Mashup Language)

• Conclusions and Discussion

Outline

Jarrar-University of Cyprus

Page 4: Invited Talk  University of Athens October 21, 2008

Jarrar-University of Cyprus

API

Web 2.0 and the phenomena of APIs

Page 5: Invited Talk  University of Athens October 21, 2008

Web 2.0 and the phenomena of APIs

APIWikipedia in RDF

Page 6: Invited Talk  University of Athens October 21, 2008

Web 2.0 and the phenomena of APIs

API

Page 7: Invited Talk  University of Athens October 21, 2008

Web 2.0 and the phenomena of APIs

API

Page 8: Invited Talk  University of Athens October 21, 2008

Web 2.0 and the phenomena of APIs

API

Also supports microformats/RDFa

Page 9: Invited Talk  University of Athens October 21, 2008

Web 2.0 and the phenomena of APIs

API

Page 10: Invited Talk  University of Athens October 21, 2008

Web 2.0 and the phenomena of APIs

API

Page 11: Invited Talk  University of Athens October 21, 2008

Web 2.0 and the phenomena of APIs

APIAnd many, many others APIs

Page 12: Invited Talk  University of Athens October 21, 2008

Web 2.0 and the phenomena of APIs

Moving to the Data Web, in parallel to the web of documents.

Jarrar-University of Cyprus

Page 13: Invited Talk  University of Athens October 21, 2008

An application that combines data from multiple sources (APIs).

Mashups

AthensTruism Portal

SOASOA

Jarrar-University of Cyprus

Page 14: Invited Talk  University of Athens October 21, 2008

An application that combines data from multiple sources (APIs).

Mashups

(API1 + API2) + API3 = money

(A puzzle of APIs)

Jarrar-University of Cyprus

Page 15: Invited Talk  University of Athens October 21, 2008

Mashups (Example)

Combines Google maps with real-estate databases

Google MapsReal-estate

Page 16: Invited Talk  University of Athens October 21, 2008

Mashups (Example)

A unified and comprehensive view of the current global state of infectious diseases and their effect on human and animal health

Google NewsProMED

World Health Organization

Page 17: Invited Talk  University of Athens October 21, 2008

How can I build a mashup?

What do you want to do?

Which data you need? APIs/RSS available? How is your programming skills?

Start coding

Use mashup editors

Start Configuring

Semi-Technical SkillsGeek

Microsoft Popfly Yahoo! Pipes QEDWiki by IBM Google Mashup Editor (Coming) Serena Business Mashups Dapper JackBe Presto Wires

Sign up for a developer tokenhttp://aws.amazon.com/http://www.google.com/apis/maps/http://api.search.yahoo.com/webservices/re

Page 18: Invited Talk  University of Athens October 21, 2008

Mashup Editors

Page 19: Invited Talk  University of Athens October 21, 2008

Mashup Editors

Page 20: Invited Talk  University of Athens October 21, 2008

Mashup Editors

Page 21: Invited Talk  University of Athens October 21, 2008

Mashup Editors

Page 22: Invited Talk  University of Athens October 21, 2008

Mashup Editors

Page 23: Invited Talk  University of Athens October 21, 2008

Mashup Editors

Limitations• Focus only on providing encapsulated access to (some)

public APIs and feeds (rather than querying data sources).

• Still require programming skills.

• Cannot play the role of a general-purpose data retrieval, as mashups are sophisticated applications.

• Lacks a formal framework for pipelining mashups.

Page 24: Invited Talk  University of Athens October 21, 2008

Vision and Challenges

Instead of accessing a method in an API in a programmatic style, can these APIs act as query end-points over http (i.e. a URL is a query).

Regard the internet as a database, where a data source is seen as a table, and a mashup is a query.

A Mashup can be a simple inquiry (e.g., Hacker’s articles after 2000).

In short, allow (casual users) to search and consume the Data Web intuitively, like we use search engines (or at least the “advance search” in search engines).

But the problem then is: users need to know the schema and technical details of the data sources they want to query.Jarrar-University of Cyprus

Page 25: Invited Talk  University of Athens October 21, 2008

How a user can query a source without knowing its schema, structure, and vocabulary?

SELECT S.Title FROM Google.Scholar SWhere (S.Author=‘Hacker’) UnionSELECT P.PattentTitle FROM Ggoogle.Patent PWhere (P.Inventor =‘Hacker’)UnionSELECT A.Title FROM Citeseer AWhere (P.Author =‘Hacker’)

DateSources

Vision and Challenges

Jarrar-University of Cyprus

Page 26: Invited Talk  University of Athens October 21, 2008

How a user can query a source without knowing its schema, structure, and vocabulary?

SELECT S.Title FROM Google.Scholar SWhere (S.Author=‘Hacker’) UnionSELECT P.PattentTitle FROM Ggoogle.Patent PWhere (P.Inventor =‘Hacker’)UnionSELECT A.Title FROM Citeseer AWhere (P.Author =‘Hacker’)

DateSources

Vision and Challenges

Jarrar-University of Cyprus

Page 27: Invited Talk  University of Athens October 21, 2008

Vision and Challenges

<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>

http:www.site1.com/rdf

<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”

http:www.site2.com/rdf

PREFIX S1: <http://site1.com/rdf>PREFIX S2: <http://site1.com/rdf>SELECT ? ArticleTitleFROM <http://site1.com/rdf>FROM <http://site2.com/rdf>WHERE { {{?X S1:Title ?ArticleTitle}UNION {?X S2:Title ?ArticleTitle}} {?X S1:Author ?X1} UNION {?X S2:Author ?X1} {?X S1:PubYear ?X2} UNION {?X S2:Year ?X2} FILTER regex(?X1, “^Hacker”)

FILTER (?X2 > 2000)}

Some data sources may come without a schema at all, as:

Hacker’s articles after 2000

Programmers usually explore such sources by eyes, and remember the vocabulary and structure…!! (Casual users?)

Page 28: Invited Talk  University of Athens October 21, 2008

MashQL MashQL

Jarrar-University of Cyprus

Page 29: Invited Talk  University of Athens October 21, 2008

MashQL

A simple query language for the Data Web, in a mashup style.

MashQL allows querying a dataspace(s) without any prior knowledge about its schema, vocabulary or technical details (a source may not have a schema al all). Explore unknown graph

Does not assume any knowledge about RDF, SPARQL, XML, or any technology, to get started.

Users only use drop-lists to formulate queries. (query-by-diagram/interaction).

Jarrar-University of Cyprus

Page 30: Invited Talk  University of Athens October 21, 2008

MashQL Example 1

<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>

http:www.site1.com/rdf

<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”

http:www.site2.com/rdf

Hacker’s Articles after 2000?

MashQL

From:

RDF Input

http://www.site1.com/rdf

Everything

Title ArticleTitle

Author “^Hacker”

Year\PubYear > 2000

http://www.site2.com/rdf

Jarrar-University of Cyprus

Page 31: Invited Talk  University of Athens October 21, 2008

MashQL Example 1

<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>

http:www.site1.com/rdf

<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”

http:www.site2.com/rdf

Hacker’s Articles after 2000?

MashQL

From:

RDF Input

http://www.site1.com/rdf

http://www.site2.com/rdf

Everything

InstancesTypes

a1a245

Everything

Interactive query formulation

Jarrar-University of Cyprus

Page 32: Invited Talk  University of Athens October 21, 2008

MashQL Example 1

<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>

http:www.site1.com/rdf

<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”

http:www.site2.com/rdf

Hacker’s Articles after 2000?

MashQL

From:

RDF Input

http://www.site1.com/rdf

http://www.site2.com/rdf

Everything

Title ArticleTitle

AuthorCitesPublisherPubYearTitleYear

Jarrar-University of Cyprus

Page 33: Invited Talk  University of Athens October 21, 2008

MashQL Example 1

<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>

http:www.site1.com/rdf

<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”

http:www.site2.com/rdf

Hacker’s Articles after 2000?

MashQL

From:

RDF Input

http://www.site1.com/rdf

http://www.site2.com/rdf

Everything

Title Article title

Author Con

EqualsContainsOneOfNotBetweenLessThanMoreThan

Hacker

AuthorCitesPublisherPubYearTitleYear Jarrar-University of Cyprus

Page 34: Invited Talk  University of Athens October 21, 2008

MashQL Example 1

<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>

http:www.site1.com/rdf

<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”

http:www.site2.com/rdf

Hacker’s Articles after 2000?

MashQL

From:

RDF Input

http://www.site1.com/rdf

http://www.site2.com/rdf

Everything

Title Article title

Author “^Hacker”

Year mor

OneOfNotBetweenLessThanMoreThan

2000\PubYePublisherPubYearTitleYear

Jarrar-University of Cyprus

Page 35: Invited Talk  University of Athens October 21, 2008

MashQL Example 1

<:a1> <:Title> “Web 2.0”<:a1> <:Author> “Hacker B.”<:a1> <:Year> 2007<:a1> <:Publisher> “Springer”<:a2> <:Title> “Web 3.0”<:a2> <:Author> “Smith B.”<:a2> <:Cites> <:a1>

http:www.site1.com/rdf

<:4> <:Title> “Semantic Web”<:4> <:Author> “Tom Lara”<:4> <:PubYear> 2005<:5> <:Title> “Web Services”<:5> <:Author> “Bob Hacker”

http:www.site2.com/rdf

Hacker’s Articles after 2000?

MashQL

From:

RDF Input

http://www.site1.com/rdf

http://www.site2.com/rdf

Everything

Title Article title

Author “^Hacker”

Year/PubYear > 2000

PREFIX S1: <http://site1.com/rdf>PREFIX S2: <http://site1.com/rdf>SELECT ? ArticleTitleFROM <http://site1.com/rdf>FROM <http://site2.com/rdf>WHERE { {{?X S1:Title ?ArticleTitle}UNION {?X S2:Title ?ArticleTitle}} {?X S1:Author ?X1} UNION {?X S2:Author ?X1} {?X S1:PubYear ?X2} UNION {?X S2:Year ?X2} FILTER regex(?X1, “^Hacker”)

FILTER (?X2 > 2000)}

Page 36: Invited Talk  University of Athens October 21, 2008

Retrieve every Article that has a title, written by an author, who has an address, this address has a country called Cyprus, and the article published after 2008.

MashQL Example 2

The recent articles from Cyprus

MashQL

Article

Title ArticleTitle

Author Address

Country “Cyprus”

Year > 2008

URL:

RDF Input

http://www4.wiwiss.fu-berlin.de/dblp/

Jarrar-University of Cyprus

Page 37: Invited Talk  University of Athens October 21, 2008

The Intuition of MashQL

A query is a tree

• The root is called the query subject.

• Each branch is a restriction.

• Branches can be expanded, (information path)

• Object value filters

Def. A Query Q with a subject S, denoted by Q(S), is a set of restrictions on S. Q(S) = R1 AND … AND Rn.

Dif. A Subject S (I V), where I is an identifier and V is a variable.

Dif. A Restriction R = <Rx , P, Of>, where Rx is an optional restriction prefix that can be (maybe | without), P is a predicate (P I V), and Of is an object filter.

MashQL

Article

Title ArticleTitle

Author Address

Country “Cyprus”

Year > 2008

URL:

RDF Input

http://www4.wiwiss.fu-berlin.de/dblp/

Article

Year ?X2 < 2008

Country?X111 = “Cyprus”

Address ?X11

Author ?X1

Title ?ArticleTitle

Page 38: Invited Talk  University of Athens October 21, 2008

The Intuition of MashQL

MashQL

Article

Title ArticleTitle

Author Address

Country “Cyprus”

Year > 2008

URL:

RDF Input

http://www4.wiwiss.fu-berlin.de/dblp/

An Object filter is one of :• Equals• Contains• MoreThan • LessThan• Between• one of• Not(f)• Information Path (sub query)

Def. An object filter Of = <O, f>, where O is an object and f is a filtering function one of :Of = <O>, where O is an object, O V I. Of = <O, Equals(X, T, Lt)>, where X can be a variable or a constant, T is a datatype, and L t is a language tag.Of = <O, Contains(X, T, Lt)>, where O is an object variable, X is a regex literal, T is a data type, and L t is a language.Of = <O, MoreThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype. Of = <O, LessThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype identifier. Of = <O, Between(X, Y, T)>, where X and Y are variables or constants, T is a datatype identifier. Of = <O, OneOf(V)>, where O is an object variable, and V is a set of values {v1, ... , vn}, vi is a variable or constant. Of = <O, Not(f)>, where f is one of the functions defined above. Of = <O, Qi(O)>, where O is an object (O V I), and Qi(O) is a sub-query with O being the query subject. Jarrar-University of Cyprus

Page 39: Invited Talk  University of Athens October 21, 2008

More MashQL Constructs

Resection Operators {Required, Maybe, or Without}

All restriction are required (i.e. AND), unless they are prefixed with

“maybe” or “without”

SELECT ?PersonName, ?UniversityWHERE { ?Person :Name ?PersonName. ?Person :WorkFor :Yahoo. OPTIONAL{?Person :StudyAt ?University} OPTIONAL{?Person :Salary ?X1} FILTER (!Bound(?X1))} }

Jarrar-University of Cyprus

Page 40: Invited Talk  University of Athens October 21, 2008

More MashQL Constructs

Union operator (denoted as “\”) between Objects, Predicates, Subjects

and Queries SELECT ?PersonWHERE { ?Person :WorkFor :Google UNION ?Person WorkFor :Yahoo}

SELECT ?FNameWHERE { ?Person :Surname ?FName UNION ?Person :Firstname ?FName}

SELECT ?AgentName, ?AgentPhoneWHERE { {?Person rdf:type :Person. ?Person :Name ?AgentName. ?Person :Phone ?AgentPhone}UNION {?Company rdf:type :Company. ?Company :Name ?AgentName. ?Company :Phone ?AgentPhone}}

SELECT ?CustName,WHERE { ?Person :Name ?CustName. UNION {?Company :Title ?CustName. ?Company :City ?X1. FILTER regex(?X1, “Paris”)}}

Page 41: Invited Talk  University of Athens October 21, 2008

More MashQL Constructs

And several other constructs, including: Types and Reverse Predicates Datatypes and Language Tags ….

Jarrar-University of Cyprus

Page 42: Invited Talk  University of Athens October 21, 2008

Formal Syntax and Semantics

Def.1 (Dataset): A dataset D is a set of triples, each triple t is formed as <S, P, O>, where S I, P I, and O I L.Def.2 (Typed Literals): Every object literal must have a datatype D: If O L then O D.Def.3 (Language Tags): An object literal (O L) may have a language tag Lt.Def. 4 (Query): A Query Q with a subject S, denoted by Q(S), is a set of restrictions on S. Q(S) = R1 AND … AND Rn.Def. 5 (Subject): A subject S (I V), where I is an identifier and V is a variable.Def. 6 (Restriction): A restriction R = <Rx , P, Of>, where Rx is an optional restriction prefix that can be (maybe | without), P is a predicate (P I V), and Of is an object.Def.7 (Object Filter): An object filter Of = <O, f>, where O is an object and f is a filtering function. An object filter can have one of the following nine forms:1.Of = <O>, where O is an object, O V I. This is the simplest object filter, i.e., it does not add any restriction on the object value of the retrieved triples. 2.Of = <O, Equals(X, T, Lt)>, where X can be a variable or a constant, T is a datatype, and Lt is a language tag. This filter restricts the retrieved results, such that, the object value O should be equal to X, with datatype T, and with language Lt.3.Of = <O, Contains(X, T, Lt)>, where O is an object variable, X is a regex literal, T is a data type, and Lt is a language. This filter restricts the retrieved results, such that, the object value O should be equal to regex(X), with datatype T, and with language Lt. A regex literal is a literal that contains a regular expression matching pattern. 4.Of = <O, MoreThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype. This filter restricts the retrieved results, such that, the object value O should be more than X and with datatype T. 5.Of = <O, LessThan(X, T)>, where O is an object variable, X is a variable or a constant, T is a datatype identifier. This filter restricts the retrieved results, such that, the object value O should be less than X and with datatype T (see rule-9).6.Of = <O, Between(X, Y, T)>, where X and Y are variables or constants, T is a datatype identifier. This filter restricts the retrieved results, such that, the object value O should be more than or equals X, less than or equals Y, and with datatype T.7.Of = <O, OneOf(V)>, where O is an object variable, and V is a set of values {v1, ... , vn}, vi is a variable or constant. This filter restricts the retrieved results, such that, the object value O should be equal to one of the values in V. 8.Of = <O, Not(f)>, where f is one of the functions defined above. This filter extends all of the above functions with simple negation. The filter is same as the Equals filter but with negation, i.e., Not Equal. 9.Of = <O, Qi(O)>, where O is an object (O V I), and Qi(O) is a sub-query with O being the query subject. The restrictions defined in the sub-query Qi(O) should be satisfied as well. Notice that this definition is recursive; however, this does not mean the query itself is recursive.Def.8 (Types): A subject (S I) or an object (O I) can be prefixed with “a” or “an” to mean the instances of this subject/object type, instead of the subject/object itself.Def.9 (Union): A union can be declared between objects, predicates, subjects and/or queries, in the following forms:1.On = <O1\O2 \ . . . \On>, to indicate unions between objects, where Oi I. 2.Pn = <P1\P2 \ . . . \Pn>, to indicate unions between predicates, where Pi I.3.Sn = <S1\S2 \ . . . \Sn>, to indicate unions between subjects, where Si I.4.Qn = <Q1\Q2 \ . . . \Qn>, to indicate unions between queries.Def.10 (Reverse): <~P> indicates the reverse of the predicate P. Let R1 be a restriction on S such that <S P O>, and R2 be <O ~P S>, R1 and R2 have the same meaning. Jarrar-University of Cyprus

Page 43: Invited Talk  University of Athens October 21, 2008

MashQL Queries

In the background, MashQL queries are translated into and executed as SPARQL queries.

At the moment, we focus on RDF (/RDFa) as a data format, and SPARQL (/Oracle’s SPARQL) as a backend query language. However, MashQL can be easily mappable to other query languages.

MashQL is not merely a user interface, by also a query language with its intuition (it focuses on path pattern, rather than triple pattern).

Jarrar-University of Cyprus

Page 44: Invited Talk  University of Athens October 21, 2008

Rule-1: The symbol before a variable means that it will be returned in the results; i.e., included in the SELECT part of in SPARQL. If the output of the query is input to another, use “CONSTRUCT *”.Rule-2: In any of the following rules, if a subject, predicate, or object is italicized: it is seen as a SPARQL variable, i.e. prefixed with “?”.Rule-3: If S is a subject and R = < , P, Of>, the mapping is: {S P O}.Rule-4: If S is a subject and R = <maybe, P, Of>, the mapping is: {OPTIONAL{S P O}}.Rule-5: If S is a subject and R = < without, P, Of>, the mapping is: {S P O. FILTER (!bound(?O))}. Rule 6. If Of = <O, Equals(X, T, Lt)>: Append the mapping with: FILTER(?O = X) If T Null: Append the mapping with: FILTER(datatype(?O)=T) If Lt Null: Append the mapping with: FILTER(lang(?O) = Lt)Rule 7. If Of = Contains(X, T, Lt)>: Append the mapping with: FILTER regex(?O, X) If T Null: Append the mapping with: FILTER(datatype(?O)=T) If Lt Null: Append the mapping with: FILTER(lang(?O) = Lt)Rule 8. If Of = <O, MoreThan(X, T)>: Append the mapping with: FILTER(?O > X) If T Null: Append the mapping with: FILTER(datatype(?O=T)Rule 9. If Of = <O, LessThan(X, T)>: Append the mapping with: FILTER(?O < X) If T Null: Append the mapping with: FILTER(datatype(?O=T)Rule 10. If Of = <O, Between(X, Y, T)>: Append the mapping with: FILTER(?O >=X)&& FILTER(?O<=Y) If T Null: Append the mapping with: FILTER(datatype(?O)=T)Rule 11. If Of = <O, OneOf (V)>: Append the mapping with: {FILTER(?O = V1)|| . . . || FILTER(?O = Vn)} If Vi is a regex-ed literal, the ith filter above should be replaced with: FILTER Regex(?O, Vi)Rule 12. If Of = <O, Not(f)>: The f filter will be generated as above, but with a negation.Rule 13. If Of = <O, Qi(O)>: Repeat all mapping rules to generate Qi(O).Rule 14. If a subject S is prefixed with “a” or “an”: Append the mapping with: {?S rdf:type :S}Rule 15. If an object O is prefixed with “a” or “an”: Append the mapping with: {?O rdf:type :O}Rule 16. Given On , If n >1 and Oi I : The mapping in rules 3-4 will be:{{S P :O1} UNION . . . UNION {S P :On}}Rule 17. Given Pn , If n >1 and Pi I : The mapping in rules 3-4 will be: {{S :P1 O} UNION . . . UNION {S :Pn O}}Rule 18. Given Sn , If n >1 and Si I : Regenerate the query n times, each time with Si as a root, and with a UNION between the queries.Rule 19. Given Qn , If n >1 : Add UNION between the n queries.Rule 20. If S is a subject and R = <~P, O>, the mapping is: {O P S}.

MashQL-SPARQL Mapping Rules

Also mapped into SQL and Oracle’s SPARQL

Jarrar-University of Cyprus

Page 45: Invited Talk  University of Athens October 21, 2008

MashQL Markup: an XML Schema to represent pipes in XML.

The reference grammar (Technical specification).

MashQL Compilation

Jarrar-University of Cyprus

Page 46: Invited Talk  University of Athens October 21, 2008

MashQL Compilation

Depending on the pipeline structure, MashQL generates either SELECT or CONSTRUCT queries:

• SELECT returns the results in a tabular form (e.g. ArticleTitle, Author)

• CONSTRUCT returns the results in a triple form (e.g. Subject, Predicate, Object). …

CONSTRUCT *WHERE{?Job :JobIndustry ?X1. ?Job :Type ?X2. ?Job :Currency ?X3. ?Job :Salary ?X4. FILTER(?X1=“Education”|| ?X1=“HealthCare”) FILTER(?X2=“Full-Time”|| ?X2=“Fulltime”)|| ?X2=“Contract”) FILTER(?X3=“^Euro”|| ?X3=“^€”) FILTER(?X4>=75000|| ?X4<=120000)}

… CONSTRUCT *WHERE{?Job :JobIndustry ?X1. ?Job :Type ?X2. ?Job :Currency ?X3. ?Job :Salary ?X4. FILTER(?X1=“Education”|| ?X1=“HealthCare”) FILTER(?X2=“Full-Time”|| ?X2=“Fulltime”)|| ?X2=“Contract”) FILTER(?X3=“^Euro”|| ?X3=“^€”) FILTER(?X4>=75000|| ?X4<=120000)}

…SELECT ?Job ?FirmWHERE {?Job :Location ?X1. ?X1 :Country ?X2. FILTER (?X2=“Italy”||?X2=“Spain”)|| ?X2=“Greece”||?X2=“Cyprus”)} OPTIONAL{{?job :Organization ?Firm} UNION {?job :Employer ?Firm}}

…SELECT ?Job ?FirmWHERE {?Job :Location ?X1. ?X1 :Country ?X2. FILTER (?X2=“Italy”||?X2=“Spain”)|| ?X2=“Greece”||?X2=“Cyprus”)} OPTIONAL{{?job :Organization ?Firm} UNION {?job :Employer ?Firm}}

Jarrar-University of Cyprus

Page 47: Invited Talk  University of Athens October 21, 2008

System Model (Online Mashup Editor)

Download(http)

Site1

Site2

Site3

QueryLoader

Client

ResultsRender

Bulk-load

B.Query(AJAX)

RunQuery(http)

DataSources(AJAX)

Results(http)

(Wikipedia Titles, 28 MB zip, 316 MB nt, 2.7 M triples): Download (37 s, 600KB/s) Bulk-Load Oracle-RDF (70 Sec, 40K triples per Sec). Query (one/few Sec.)

Mashup Server

Jarrar-University of Cyprus

The output of a mashup can be an input to another. (Enabling people to collaborate and innovate, build of each others’ results)

Page 48: Invited Talk  University of Athens October 21, 2008

MashQL Editor

Jarrar-University of Cyprus

Under Construction

Page 49: Invited Talk  University of Athens October 21, 2008

MashQL Firefox Add-On (Light-mashups @ your browser)

Page 50: Invited Talk  University of Athens October 21, 2008

Use Case: Job Seeking

A mashup of job vacancies based on Google Base and on Jobs.ac.uk.

…CONSTRUCT *WHERE { {{?Job :Category :Health}UNION {?Job :Category :Medicine}} ?Job :Role ?X1. ?Job :Salary ?X2. ?X2 :Currency :UPK. ?X2 :Minimun ?X3. FILTER(?X1=“Research” || ?X1=”Academic”) FILTER (?X3 > 50000) }

…CONSTRUCT *WHERE { {{?Job :Category :Health}UNION {?Job :Category :Medicine}} ?Job :Role ?X1. ?Job :Salary ?X2. ?X2 :Currency :UPK. ?X2 :Minimun ?X3. FILTER(?X1=“Research” || ?X1=”Academic”) FILTER (?X3 > 50000) } …

CONSTRUCT *WHERE{?Job :JobIndustry ?X1. ?Job :Type ?X2. ?Job :Currency ?X3. ?Job :Salary ?X4. FILTER(?X1=“Education”|| ?X1=“HealthCare”) FILTER(?X2=“Full-Time”|| ?X2=“Fulltime”)|| ?X2=“Contract”) FILTER(?X3=“^Euro”|| ?X3=“^€”) FILTER(?X4>=75000|| ?X4<=120000)}

… CONSTRUCT *WHERE{?Job :JobIndustry ?X1. ?Job :Type ?X2. ?Job :Currency ?X3. ?Job :Salary ?X4. FILTER(?X1=“Education”|| ?X1=“HealthCare”) FILTER(?X2=“Full-Time”|| ?X2=“Fulltime”)|| ?X2=“Contract”) FILTER(?X3=“^Euro”|| ?X3=“^€”) FILTER(?X4>=75000|| ?X4<=120000)}

…SELECT ?Job ?FirmWHERE {?Job :Location ?X1. ?X1 :Country ?X2. FILTER (?X2=“Italy”||?X2=“Spain”)|| ?X2=“Greece”||?X2=“Cyprus”)} OPTIONAL{{?job :Organization ?Firm} UNION {?job :Employer ?Firm}}

…SELECT ?Job ?FirmWHERE {?Job :Location ?X1. ?X1 :Country ?X2. FILTER (?X2=“Italy”||?X2=“Spain”)|| ?X2=“Greece”||?X2=“Cyprus”)} OPTIONAL{{?job :Organization ?Firm} UNION {?job :Employer ?Firm}}

Jarrar-University of Cyprus

Page 51: Invited Talk  University of Athens October 21, 2008

Use Case: My Citations

A mashup of cited Hacker’s articles (but no self citations), over Scholar

and Siteseer

Jarrar-University of Cyprus

Page 52: Invited Talk  University of Athens October 21, 2008

Use Case: eHealth Research

A mashup based an eHealth database to find what cases Prostate Cancer

Add/remove restrictions until you retrieve all and only the people with prostate cancer,

(the restrictions the symptoms )

Jarrar-University of Cyprus

Page 53: Invited Talk  University of Athens October 21, 2008

Use Case: Retailers

A Retailer mashup of three RDF data sources with a user-input of

some barcode numbers.

When scanning a product, retrieve its English and French titles directly from the manufacturer online catalog.

Jarrar-University of Cyprus

Page 54: Invited Talk  University of Athens October 21, 2008

Use Case: Car Rental business Auditing

A government connects to the databases of car rental companies to

audit whether they are in compliance to the local regulations.

(Each query is a business rule, if the results not empty, valuation)

Vehicles were rented without being insured.

Rentals to people without licenses

Rentals to people without proper licenses

Jarrar-University of Cyprus

Page 55: Invited Talk  University of Athens October 21, 2008

Evaluation

First: Query Execution :

•The performance of executing a MashQL query is bounded to the

performance to executing its backend language (i.e. SPARQL/SQL).

•A query with medium size complexity takes one or few seconds

(Oracle’s SPARQL, [Chong et al 2007]).

Jarrar-University of Cyprus

Page 56: Invited Talk  University of Athens October 21, 2008

Evaluation

Second: Background Queries:

•These are the queries that the MashQL editor performs in the

background (to generate drop-down lists), while a user formulate

his/her query.

•Executing background queries should be fast enough to allow

efficient query formulation.

•Experiments over:•DBLP data (12 million triples, 700 MB )•DBPedia data (25 Million triples , 2.x GB)

Jarrar-University of Cyprus

Page 57: Invited Talk  University of Athens October 21, 2008

Evaluation

MashQL

From:

RDF Input

http://www.informatik.uni-trier.de/~ley 12 Million Triples

Article

Title ArticleTitle

Creator

Name “^Berners-Lee^”

Year > 1993

Jarrar-University of Cyprus

Page 58: Invited Talk  University of Athens October 21, 2008

MashQL

From:

RDF Input

http://www.informatik.uni-trier.de/~ley 12 Million Triples

Evaluation of the Background Queries

[00.00]Article

InstancesTypes

ArticleBookIncollectionInproceedingsMasterthesisPersonPhdthesisProceedingswww

EverythingSelect O FROM …(?S <rdf:type> ?O)… Group by O Order by O;

Jarrar-University of Cyprus

Page 59: Invited Talk  University of Athens October 21, 2008

MashQL

From:

RDF Input

http://www.informatik.uni-trier.de/~ley 12 Million Triples

Evaluation of the Background Queries

[00.03]

BookTitleCDRomCiteCreatorDateEditorJournalMonthNumberPagesPublisherTitleVolumeYear

Title

Article [00.00]

ArticleTitleSelect P FROM …(?S <rdf:type> ?O)(?O ?P ?O1)… Group by P Order by P;

Jarrar-University of Cyprus

Page 60: Invited Talk  University of Athens October 21, 2008

MashQL

From:

RDF Input

http://www.informatik.uni-trier.de/~ley 12 Million Triples

Evaluation of the Background Queries

[00.03]

BookTitleCDRomCiteCreatorDateEditorJournalMonthNumberPagesPublisherTitleVolumeYear

Creator

Article [00.00]

Title [00.03] ArticleTitle

Select P FROM …(?S <rdf:type> ?O)(?O ?P ?O1)… Group by P Order by P;

Jarrar-University of Cyprus

Page 61: Invited Talk  University of Athens October 21, 2008

MashQL

From:

RDF Input

http://www.informatik.uni-trier.de/~ley 12 Million Triples

Evaluation of the Background Queries

[00.43]

NameType

NameEqualsContainsOneOfNotBetweenLessThanMoreThan

Cont Berners-Lee

Article [00.00]

Title [00.03] ArticleTitle

Creator [00.03]

Select P FROM …(?S <rdf:type> ?O)(?O <:Creator> ?O1)(?O1 ?P ?O2)… Group by P Order by P;

Jarrar-University of Cyprus

Page 62: Invited Talk  University of Athens October 21, 2008

MashQL

From:

RDF Input

http://www.informatik.uni-trier.de/~ley 12 Million Triples

Evaluation of the Background Queries

[00.03]

YearEqualsContainsOneOfNotBetweenLessThanMoreThan

More 1994

BookTitleCDRomCiteCreatorDateEditorJournalMonthNumberPagesPublisherTitleVolumeYear

Article [00.00]

Title [00.03] ArticleTitle

Creator [00.03]

Name [00.43] “^Berners-Lee^”

Select P FROM …(?S <rdf:type> ?O)(?O ?P ?O1)… Group by P Order by P;

Jarrar-University of Cyprus

Page 63: Invited Talk  University of Athens October 21, 2008

MashQL

From:

RDF Input

http://www.informatik.uni-trier.de/~ley 12 Million Triples

Evaluation of the Background Queries

Article [00.00]

Title [00.03] ArticleTitle

Creator [00.03]

Name [00.43] “^Berners-Lee^”

Year [00.03] > 1993

Jarrar-University of Cyprus

Page 64: Invited Talk  University of Athens October 21, 2008

Evaluation of the Background Queries

B.Query 12 M triples 6 M triples 3 M triples 1.5 Million

Q1 <00.00 <00.00 <00.00 <00.00

Q2 <00.03 <00.01 <00.01 <00.00

Q3 <00.03 <00.01 <00.01 <00.00

Q4 <00.43 <00.20 <00.13 00.08

Q5 <00.03 <00.01 <00.01 <00.00

Summary

Our goal is not to benchmark whether Oracle is fast and scalable, but to if know

Oracle’s speed is sufficient for MashQL interactivity ? Yes. Yes.

Jarrar-University of Cyprus

Page 65: Invited Talk  University of Athens October 21, 2008

Conclusions

• A formal but yet simple query language for the Data Web, in a mashup and declarative style.

• Allows people to discover and navigate unknown data spaces(/graphs) without prior knowledge about the schema or technical details.

• Can be use as a general purpose data retrieval and filtering (rather than only sophisticated Mashups).

• Query Cursors: to cache history information paths.

• Formal framework for query pipelines: caching, materialization.

• Query distribution and scheduling. Jarrar-University of Cyprus

Page 66: Invited Talk  University of Athens October 21, 2008

QuestionQuestion

Page 67: Invited Talk  University of Athens October 21, 2008

Thank YouThank You