research problems in semantic web search varish mulwad ____________________________ 1
Post on 21-Dec-2015
218 views
TRANSCRIPT
Research Problems in Semantic Web Search
Varish Mulwad
____________________________
1
Agenda
• Introduction
• Swoogle
• Swoogle’s Competition – • Sindice• Semantic Web Search Engine (SWSE)• Watson• Falcon
• Research Problems and Issues with Swoogle
• References
____________________________
2
Introduction____________________________
3
Web
Dr.Finin’s FOAF Profile
Your Agent
Possible because: Data is in machine understandable form like – RDF, OWL
But how will agent find all this data ? Search Engines ?
Introduction
4
____________________________
Traditional Search Engine Results Semantic Web Search Engine Results
Swoogle
• Swoogle is a crawler based indexing and retrieval system for Semantic Web
• Swoogle crawls and discovers documents written in RDF,OWL
• Swoogle classifies a Semantic Web Document(SWD) as – • Semantic Web Ontology (SWO) – Defines new
terms• Semantic Web Databases (SWDB) – Makes
assertions about individuals
____________________________
5
Swoogle
SWOOGLE DEMO
____________________________
6
Swoogle Architecture____________________________
7
Swoogle ArchitectureSWD Discovery Component
• Google crawler using the Google web service• Filetypes with extensions “.rdf”, ”.owl”, “.n3”• Google limits only 1000 results per query
• A focussed crawler• Crawls documents within a given website• Extension and Focus constraints
• A Swoogle crawler • Jena based crawler• Explores Semantic Links between SWDs
____________________________
8
Swoogle ArchitectureMetadata Creation
• Basic Metadata• Encoding – “RDF/XML”, “N-Triple”, “N3”• Language – RDF, RDFS, OWL, DAML + OIL• OWL Species – OWL-LITE, OWL-DL, OWL-FULL
• Relations among SWDs• Reference relationship among SWDs• Inter ontology relationships
____________________________
9
Swoogle Architecture
Data analysis component • Classification of SWD as SWO or SWDB • Compute rank of SWD
Web based interface• Human User Interface – http://swoogle.umbc.edu• Web Services using REST interface• Agent Service
____________________________
10
Sindice
• Created at Digital Enterprise Research Institute (DERI)
• Key features of Sindice include –
• Sindice collects SWDs and indexes them on resource URIs, Inverse Functional Properties(IFPs) and keywords
• Sindice uses the Hadoop parallel architecture
____________________________
11
Sindice
Inverse Functional Property (IFP) – An OWL cardinality restriction
Sincdice uses three indexes –
• URI index• IFP index• Keyword index
Benefits - Faster retrieval of data
____________________________
12
SindiceHadoop architecture is used in the following manner –
• Sindice employs Hadoop/Nutch to distribute crawling job across multiple machines
• Collected data is stored in the Hbase distributed column – based store
• Efficient handling of large datasets across the cluster using a MapReduce implementation
____________________________
13
Sindice
SINDICE DEMO
____________________________
14
SWSE
• Semantic Web Search Engine (SWSE) is also a Semantic Web Search Engine created at Digital Enterprise Research Institute (DERI)
• SWSE uses a “Multicrawler” – a pipelined architecture for crawling
____________________________
15
Watson
• Created at Knowledge Management Institute at the UK Open University
• Major Design Principles –
• Considers explicit and implicit relations between Ontologies
• Ranking of Ontologies with focus on quality over popularity
____________________________
16
Watson
WATSON DEMO
____________________________
17
Falcon
• Falcon is a Semantic Web Search engine created at the Institute of Web Science in China
• Falcon allows keyword based queries on :
• Objects
• Concepts
• Documents
• Falcon performs class subsumption reasoning
____________________________
18
Falcon
FALCON DEMO
____________________________
19
Summary
Swoogle• Keyword based search
• Searches Ontologies and Instance Data
OthersSindice
• Indexes on URI, IFP, keywords
• Use of Hadoop Architecture
SWSE
• Pipelined Architecture for Crawling
Watson
• Implicit relations between SWDs
Falcon
• Class Subsumption Reasoning
20
____________________________
IssuesCrawling• Swoogle’s crawler is running as a single thread on
one machine
• Limits the number of SWDs dicovered and revisted
Possible Solutions• Use of Hadoop Architecture
• Use of Grub
____________________________
21
Other IssuesCrawling large structured Datasets like DBPedia
More reasoning
More services
____________________________
22
References• Li Ding et al., "Swoogle: A Search and Metadata Engine for the Semantic Web",
Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, November 2004.
• P. Mika, G. Tummarello “Web Semantics in the Clouds”, IEEE Intelligent Systems, Volume 23 , Issue 5 (September 2008)
• E. Oren, R.Delbru, M. Catasta, R. Cyganiak, H. Stenzhorn, G.Tummarello “Sindice.com: A document-oriented lookup index for open linked data.” In
International Journal of Metadata, Semantics and Ontologies, 3(1), 2008.
• Mathieu d’Aquin et al., “Watson: A Gateway for the Semantic Web” ,Poster session of the European Semantic Web Conference, ESWC 2007
• Gong Cheng, Weiyi Ge, Honghan Wu, Yuzhong Qu , “Searching Semantic Web Objects Based on Class Hierarchies” In WWW 2008 Workshop on Linked Data on the Web, 2008
____________________________
23
Questions?
____________________________
24