semantic search on heterogeneous wiki systems - wikisym2010

27
Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Semantic Search on Heterogeneous Wiki Systems Fabrizio Orlandi, Alexandre Passant DERI – Galway WikiSym 2010 Gdansk – 8th July 2010

Upload: fabrizio-orlandi

Post on 09-May-2015

2.665 views

Category:

Technology


3 download

DESCRIPTION

by Fabrizio Orlandi at WikiSym 2010, Gdansk, Poland

TRANSCRIPT

Page 1: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Semantic Search on Heterogeneous Wiki Systems

Fabrizio Orlandi, Alexandre PassantDERI – Galway

WikiSym 2010Gdansk – 8th July 2010

Page 2: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Interlinking wikisInterlinking wikis

All wikis share a wide common knowledge, within many different All wikis share a wide common knowledge, within many different wiki platforms:wiki platforms:

All with different structures, platform dependent, all disconnected...All with different structures, platform dependent, all disconnected...

MoinMoin

TWiki DokuWiki

2 of 27

Widely used even in the workplace...Widely used even in the workplace...

AtlassianConfluence

TracWiki XWiki

Page 3: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ieMany isolated communities of users and their dataMany isolated communities of users and their data

* Source: Pidgin Technologies, www.pidgintech.com* Source: Pidgin Technologies, www.pidgintech.com

Wikis are also disconnected with other Wikis are also disconnected with other social media websitessocial media websites

Page 4: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

We propose a new approach based on Linked Data principles to solve such issues and to enable semantic search across heterogeneous wiki systems

Interlinking wikisInterlinking wikis

4 of 27

Page 5: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Wiki ModelsWiki Models

Several semantic models have been implemented and used within Several semantic models have been implemented and used within specific semantic wiki platformsspecific semantic wiki platforms

Semantic MediaWiki

as well as efforts to create generic ontology models:as well as efforts to create generic ontology models:•WikiOnt WikiOnt ontology ontology (DERI)(DERI)

•WIFWIF (Wiki Interchange Format) ontology ontology ((Völkel, Oren - 1st Workshop on Semantic Wikis - 2006Völkel, Oren - 1st Workshop on Semantic Wikis - 2006))

e.g.:

But they are all But they are all specific to wikisspecific to wikis and not open to other social and not open to other social websites websites

5 of 27

Page 6: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

• A project developed by A project developed by DERIDERI to semantically describe the content to semantically describe the content and structure of community sitesand structure of community sites

• In particular the SIOC ontology is In particular the SIOC ontology is not specific to wikisnot specific to wikis and is and is widely widely usedused on the Web on the Web

• It aims to create new connections between online discussion posts It aims to create new connections between online discussion posts and items, forums, blogs... and wikis.and items, forums, blogs... and wikis.

• Adopted in a framework of more than 50 applications, Adopted in a framework of more than 50 applications, deployed on deployed on over 400 sitesover 400 sites

http://sioc-project.orghttp://sioc-project.org

SIOCSIOC Semantically-Interlinked Online CommunitiesSemantically-Interlinked Online Communities

6 of 27

including including Drupal 7Drupal 7 and and Yahoo! SearchMonkeyYahoo! SearchMonkey

Page 7: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Extending the SIOC ontologyExtending the SIOC ontology

We decided to extend the SIOC ontology to make it compliant with wikis We decided to extend the SIOC ontology to make it compliant with wikis and make wikis and make wikis interoperableinteroperable and and linkablelinkable to other social objects. to other social objects.

First we considered the typical and First we considered the typical and relevantrelevant featuresfeatures ofof wikiswikis in terms of in terms of structure and social interactions.structure and social interactions.

ModelingModeling these features using these features using SIOCSIOC has other advantages: has other advantages:

• Integration with existing SIOC data, as well as interlinking with other Integration with existing SIOC data, as well as interlinking with other

RDF data for RDF data for advancedadvanced queryingquerying purposespurposes;;

• Ability to run the same SPARQL query to find items on a particular wiki Ability to run the same SPARQL query to find items on a particular wiki

site or on a weblog or a forum. site or on a weblog or a forum.

7 of 27

Page 8: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Relevant wiki featuresRelevant wiki features

• Multi-authoring:Multi-authoring: multiple users edit the same content collaboratively. multiple users edit the same content collaboratively.

FeatureFeature modeled using the class sioc:UserAccountsioc:UserAccount (subclass of foaf:OnlineAccountfoaf:OnlineAccount) as object of sioc:has_creator sioc:has_creator that describes a user account in an online community site.

In this way a foaf:Personfoaf:Person can be linked to several sioc:UserAccountsioc:UserAccount belonging to different wiki sites.

8 of 27

Page 9: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Relevant wiki featuresRelevant wiki features

• Multi-authoring:Multi-authoring: multiple users edit the same content collaboratively. multiple users edit the same content collaboratively.

FeatureFeature modeled using the class sioc:UserAccountsioc:UserAccount (subclass of foaf:OnlineAccountfoaf:OnlineAccount) as object of sioc:has_creator sioc:has_creator that describes a user account in an online community site.

In this way a foaf:Personfoaf:Person can be linked to several sioc:UserAccountsioc:UserAccount belonging to different wiki sites.

9 of 27

Page 10: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Relevant wiki featuresRelevant wiki features

• CategoriesCategories:: sets of articles on relatedsets of articles on related topicstopics whichwhich are are hierarchicallyhierarchically organizedorganized..

A solution is provided by the A solution is provided by the SKOSSKOS vocabulary, as it offers a way to model vocabulary, as it offers a way to model hierarchical structures between various categories, as instances of hierarchical structures between various categories, as instances of skos:Conceptskos:Concept [Miles, Bechhofer – W3C Recommendation - 2009]

Hence we Hence we defined the defined the sioct:Categorysioct:Category class as a subclass of class as a subclass of skos:Conceptskos:Concept..

10 of 27

Page 11: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Relevant wiki featuresRelevant wiki features

• Social Tagging:Social Tagging: non-organized but dynamic organization process.non-organized but dynamic organization process.

The properties The properties sioc:topicsioc:topic (using URIs) and (using URIs) and dc:subjectdc:subject (using keywords) can be (using keywords) can be used to represent tags related to a particular wiki page.used to represent tags related to a particular wiki page.

http://wiki.../The_Clash http://wiki.../punk_rock

Punk rock

sioc:topic

dc:subject tag:hasTag

11 of 27

Page 12: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Relevant wiki featuresRelevant wiki features

• DiscussionsDiscussions: : pagespages wherewhere people can people can discussdiscuss aboutabout the the articlearticle subjectsubject..

We added a new We added a new sioc:has_discussion sioc:has_discussion property, property, with domain with domain sioc:Itemsioc:Item and open and open range (to make this property reusable).range (to make this property reusable).

12 of 27

Page 13: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Relevant wiki featuresRelevant wiki features

• BacklinksBacklinks:: (or “what links here”) (or “what links here”) wiki internal links pointing to the same wiki internal links pointing to the same wiki article.wiki article.

We modeled this feature using the already existing We modeled this feature using the already existing sioc:links_tosioc:links_to property property ((subproperty of subproperty of dcterms:referencesdcterms:references).).

13 of 27

Page 14: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Relevant wiki featuresRelevant wiki features

• Pages Versioning:Pages Versioning: each page has an associated page history.each page has an associated page history.

In order to define an essential and In order to define an essential and lightweightlightweight model we: model we:• Added a Added a sioc:latest_version property;property;• Added 2 Added 2 transitivetransitive (OWL) properties: (OWL) properties: sioc:earlier_version & & sioc:later_version;;• Defined Defined sioc:later_version as inverse property of as inverse property of sioc:earlier_version;;• Defined Defined sioc:next(previous)_version as subproperty of as subproperty of sioc:later(earlier)_version..

14 of 27

Page 15: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

An exporter from a popular wiki platform to expose data in RDF using our An exporter from a popular wiki platform to expose data in RDF using our proposed model.proposed model.

A webservice, written in PHP, that exports a MediaWiki article in RDF publicly A webservice, written in PHP, that exports a MediaWiki article in RDF publicly available at: available at:

http://ws.sioc-project.org/mediawiki/http://ws.sioc-project.org/mediawiki/

SIOC-MediaWiki ExporterSIOC-MediaWiki Exporter

15 of 27

Page 16: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

An exporter from a popular wiki platform to expose data in RDF using our An exporter from a popular wiki platform to expose data in RDF using our proposed model.proposed model.

A webservice, written in PHP, that exports a MediaWiki article in RDF publicly A webservice, written in PHP, that exports a MediaWiki article in RDF publicly available at: available at:

http://ws.sioc-project.org/mediawiki/http://ws.sioc-project.org/mediawiki/

SIOC-MediaWiki ExporterSIOC-MediaWiki Exporter

16 of 27

Page 17: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

RDF data extracted from a wiki page is browsable with tools such as RDF data extracted from a wiki page is browsable with tools such as The TabulatorThe Tabulator

To offer a better browsing experience and ease the process of To offer a better browsing experience and ease the process of crawling SIOC exports of MediaWiki instances, the webservice crawling SIOC exports of MediaWiki instances, the webservice automatically produces automatically produces rdfs:seeAlsordfs:seeAlso links between wiki pages, links between wiki pages,

following the following the Linked DataLinked Data practices; practices;

Link to the corresponding Link to the corresponding DbpediaDbpedia resource added resource added automatically,automatically, if if the article is from the Wikipedia the article is from the Wikipedia [English] [English] (with (with foaf:primaryTopicfoaf:primaryTopic))

A RDF A RDF crawlercrawler can easily follow all the can easily follow all the seeAlsoseeAlso links found on every links found on every document and continue to crawl, so it is possible to crawl an entire document and continue to crawl, so it is possible to crawl an entire

wiki site starting from a single URI.wiki site starting from a single URI.

Browsing the generated dataBrowsing the generated data

Page 18: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

RDF data extracted from a wiki page is browsable with tools such as RDF data extracted from a wiki page is browsable with tools such as The TabulatorThe Tabulator

To offer a better browsing experience and ease the process of To offer a better browsing experience and ease the process of crawling SIOC exports of MediaWiki instances, the webservice crawling SIOC exports of MediaWiki instances, the webservice automatically produces automatically produces rdfs:seeAlsordfs:seeAlso links between wiki pages, links between wiki pages,

following the following the Linked DataLinked Data principles; principles;

Link to the corresponding Link to the corresponding DBpediaDBpedia resource added resource added automatically,automatically, if if the article is from the Wikipedia the article is from the Wikipedia [English] [English] (with (with foaf:primaryTopicfoaf:primaryTopic))

A RDF A RDF crawlercrawler can easily follow all the can easily follow all the seeAlsoseeAlso links found on every links found on every document and continue to crawl, so it is possible to crawl an entire document and continue to crawl, so it is possible to crawl an entire

wiki site starting from a single URI.wiki site starting from a single URI.

Browsing the generated dataBrowsing the generated data

18 of 27

Page 19: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

The DokuSIOC pluginThe DokuSIOC plugin

A A pluginplugin forfor DokuWikiDokuWiki that exports RDF data using popular lightweight ontologies that exports RDF data using popular lightweight ontologies (originally(originally developed by M. Haschke, a SIOC contributor). developed by M. Haschke, a SIOC contributor).

We We modifiedmodified and and extendedextended this plug-in in order to be compliant with our proposed this plug-in in order to be compliant with our proposed model and to export all the needed wiki features.model and to export all the needed wiki features.

It takes information from the metadata stored in the wiki system about pages, It takes information from the metadata stored in the wiki system about pages, users, links, etc. and provides it as raw RDF/XML serialized datausers, links, etc. and provides it as raw RDF/XML serialized data(instead of the usual HTML page).(instead of the usual HTML page).

Developed in Developed in PHPPHP and easy to install in every DokuWiki system. and easy to install in every DokuWiki system.

It uses the It uses the SIOC PHP APISIOC PHP API..

19 of 27

Page 20: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

The DokuSIOC pluginThe DokuSIOC plugin

Page 21: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Collecting Data Collecting Data

To evaluate our proposal, we exported and crawled different MediaWiki To evaluate our proposal, we exported and crawled different MediaWiki and DokuWiki instances: 5 wikis have been crawled, collecting more and DokuWiki instances: 5 wikis have been crawled, collecting more

than 1GB of RDF data.than 1GB of RDF data.

More than 3000 wiki articles and 700 users.More than 3000 wiki articles and 700 users.

RDF data loaded in a triple-store: RDF data loaded in a triple-store: Sesame + OWLIMSesame + OWLIM

Using the SPARQL endpoint it is possible to run advanced and Using the SPARQL endpoint it is possible to run advanced and cross-cross-sites queriessites queries on the top of the data collected by combining FOAF and on the top of the data collected by combining FOAF and

SIOC SIOC e.g.:

SELECT DISTINCT ?contentWHERE { <http://example.org/js#me> foaf:account ?account . ?account rdf:type sioc:UserAccount . ?content sioc:has_creator ?account .}

21 of 27

Page 22: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Collecting Data Collecting Data

22 of 27

SELECT DISTINCT ?contentWHERE { <http://example.org/js#me> foaf:account ?account . ?account rdf:type sioc:UserAccount . ?content sioc:has_creator ?account .}

Page 23: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Building the applicationBuilding the application

The The data acquisitiondata acquisition module is a module is a PHP scriptPHP script that: that: queries the triple-store queries the triple-store collects and parses the resultscollects and parses the results translates the data in the correct format translates the data in the correct format (JSON)(JSON) for the visualization for the visualization

layerlayer

The The visualization layervisualization layer has been built with the has been built with the ExhibitExhibit framework by the framework by the MIT SIMILE ProjectMIT SIMILE Project

It is a set of Javascript files directly configurable on the HTML code of It is a set of Javascript files directly configurable on the HTML code of the page to displaythe page to display

It allows for faceted browsing capabilities It allows for faceted browsing capabilities

23 of 27

Page 24: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Page 25: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

SELECT DISTINCT ?wiki ?title ?coauthorWHERE { ?pag1 dc:contributor ?me. FILTER regex(?me, "UserX", "i"). ?pag1 dc:title ?title ;

sioc:has_container ?wiki . ?pag2 dc:title ?title2. FILTER regex(str(?title), str(?title2)). ?pag2 dc:contributor ?coauthor. FILTER ((?coauthor) != (?me)). }

The underlying queriesThe underlying queries

The first part shows co-authors of the requested user and their articles in common.

The second part shows all the articles, and the related categories, contributed by the requested user on different wikis.

SELECT DISTINCT ? wiki ? title ? categoryWHERE { ?pag1 dc:contributor ?me. FILTER regex(?me, "UserX", "i"). ?pag1 dc:title ?title ; sioc:has_container ?wiki ; sioc:topic ?category . }

25 of 27

Page 26: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Conclusions and Future WorkConclusions and Future Work

Presented how the SIOC ontology and lightweight semantics can be used and extended to represent the structure of wikis in an unified way;

Demonstrated an overall benefit on applying SemWeb technologies to wikis:

– enabling end-users to access the information generated in a simple and transparent way,

– showing potentialities that cannot be obtained using the traditional Web 2.0 instruments;

The presented work goes in the direction of creating a collective knowledge system on the Web following the best Linked Data principles.

Future work: To provide more details about the content of wiki articles To add to the system architecture a real-time search functionality To standardise and spread plugins and exporters

26 of 27

Page 27: Semantic Search on Heterogeneous Wiki Systems - wikisym2010

Digital Enterprise Research Institute www.deri.ie

Thank you!Thank you!

Any questions?Any questions?

27 of 27