research problems in semantic web search varish mulwad ____________________________ 1

24
Research Problems in Semantic Web Search Varish Mulwad __________________________ __ 1

Post on 21-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Research Problems in Semantic Web Search

Varish Mulwad

____________________________

1

Page 2: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Agenda

• Introduction

• Swoogle

• Swoogle’s Competition – • Sindice• Semantic Web Search Engine (SWSE)• Watson• Falcon

• Research Problems and Issues with Swoogle

• References

____________________________

2

Page 3: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Introduction____________________________

3

Web

Dr.Finin’s FOAF Profile

Your Agent

Possible because: Data is in machine understandable form like – RDF, OWL

But how will agent find all this data ? Search Engines ?

Page 4: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Introduction

4

____________________________

Traditional Search Engine Results Semantic Web Search Engine Results

Page 5: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Swoogle

• Swoogle is a crawler based indexing and retrieval system for Semantic Web

• Swoogle crawls and discovers documents written in RDF,OWL

• Swoogle classifies a Semantic Web Document(SWD) as – • Semantic Web Ontology (SWO) – Defines new

terms• Semantic Web Databases (SWDB) – Makes

assertions about individuals

____________________________

5

Page 6: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Swoogle

SWOOGLE DEMO

____________________________

6

Page 7: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Swoogle Architecture____________________________

7

Page 8: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Swoogle ArchitectureSWD Discovery Component

• Google crawler using the Google web service• Filetypes with extensions “.rdf”, ”.owl”, “.n3”• Google limits only 1000 results per query

• A focussed crawler• Crawls documents within a given website• Extension and Focus constraints

• A Swoogle crawler • Jena based crawler• Explores Semantic Links between SWDs

____________________________

8

Page 9: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Swoogle ArchitectureMetadata Creation

• Basic Metadata• Encoding – “RDF/XML”, “N-Triple”, “N3”• Language – RDF, RDFS, OWL, DAML + OIL• OWL Species – OWL-LITE, OWL-DL, OWL-FULL

• Relations among SWDs• Reference relationship among SWDs• Inter ontology relationships

____________________________

9

Page 10: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Swoogle Architecture

Data analysis component • Classification of SWD as SWO or SWDB • Compute rank of SWD

Web based interface• Human User Interface – http://swoogle.umbc.edu• Web Services using REST interface• Agent Service

____________________________

10

Page 11: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Sindice

• Created at Digital Enterprise Research Institute (DERI)

• Key features of Sindice include –

• Sindice collects SWDs and indexes them on resource URIs, Inverse Functional Properties(IFPs) and keywords

• Sindice uses the Hadoop parallel architecture

____________________________

11

Page 12: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Sindice

Inverse Functional Property (IFP) – An OWL cardinality restriction

Sincdice uses three indexes –

• URI index• IFP index• Keyword index

Benefits - Faster retrieval of data

____________________________

12

Page 13: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

SindiceHadoop architecture is used in the following manner –

• Sindice employs Hadoop/Nutch to distribute crawling job across multiple machines

• Collected data is stored in the Hbase distributed column – based store

• Efficient handling of large datasets across the cluster using a MapReduce implementation

____________________________

13

Page 14: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Sindice

SINDICE DEMO

____________________________

14

Page 15: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

SWSE

• Semantic Web Search Engine (SWSE) is also a Semantic Web Search Engine created at Digital Enterprise Research Institute (DERI)

• SWSE uses a “Multicrawler” – a pipelined architecture for crawling

____________________________

15

Page 16: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Watson

• Created at Knowledge Management Institute at the UK Open University

• Major Design Principles –

• Considers explicit and implicit relations between Ontologies

• Ranking of Ontologies with focus on quality over popularity

____________________________

16

Page 17: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Watson

WATSON DEMO

____________________________

17

Page 18: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Falcon

• Falcon is a Semantic Web Search engine created at the Institute of Web Science in China

• Falcon allows keyword based queries on :

• Objects

• Concepts

• Documents

• Falcon performs class subsumption reasoning

____________________________

18

Page 19: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Falcon

FALCON DEMO

____________________________

19

Page 20: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Summary

Swoogle• Keyword based search

• Searches Ontologies and Instance Data

OthersSindice

• Indexes on URI, IFP, keywords

• Use of Hadoop Architecture

SWSE

• Pipelined Architecture for Crawling

Watson

• Implicit relations between SWDs

Falcon

• Class Subsumption Reasoning

20

____________________________

Page 21: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

IssuesCrawling• Swoogle’s crawler is running as a single thread on

one machine

• Limits the number of SWDs dicovered and revisted

Possible Solutions• Use of Hadoop Architecture

• Use of Grub

____________________________

21

Page 22: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Other IssuesCrawling large structured Datasets like DBPedia

More reasoning

More services

____________________________

22

Page 23: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

References• Li Ding et al., "Swoogle: A Search and Metadata Engine for the Semantic Web",

Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, November 2004.

• P. Mika, G. Tummarello “Web Semantics in the Clouds”, IEEE Intelligent Systems, Volume 23 , Issue 5 (September 2008)

• E. Oren, R.Delbru, M. Catasta, R. Cyganiak, H. Stenzhorn, G.Tummarello “Sindice.com: A document-oriented lookup index for open linked data.” In

International Journal of Metadata, Semantics and Ontologies, 3(1), 2008.

• Mathieu d’Aquin et al., “Watson: A Gateway for the Semantic Web” ,Poster session of the European Semantic Web Conference, ESWC 2007

• Gong Cheng, Weiyi Ge, Honghan Wu, Yuzhong Qu , “Searching Semantic Web Objects Based on Class Hierarchies” In WWW 2008 Workshop on Linked Data on the Web, 2008

____________________________

23

Page 24: Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1

Questions?

____________________________

24