search engines for semantic web knowledge

53
UMBC UMBC an Honors University in an Honors University in Maryland Maryland 1 Search Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore County Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.

Upload: brady-logan

Post on 31-Dec-2015

51 views

Category:

Documents


1 download

DESCRIPTION

Search Engines for Semantic Web Knowledge. Tim Finin University of Maryland, Baltimore County Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 1

Search Engines for Semantic Web

KnowledgeTim Finin

University of Maryland, Baltimore County

Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi

http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and

HP.

Page 2: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 2

This talk• Motivation

• Semantic web 101• Swoogle Semantic Web

search engine• Use cases and applications• State of the Semantic Web• Conclusions

Page 3: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 8

Google has made us smarter

Page 4: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 9

But what about our agents?

tell

register

Agents still have a very minimal understanding of text and images.

Page 5: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 10

This talk• Motivation

• Semantic web 101• Swoogle Semantic Web

search engine• Use cases and applications• State of the Semantic Web• Conclusions

Page 6: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 11

XML helps

“XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.”

-- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.

Page 7: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 12

“The Semantic Web will globalize KR, just as the WWW globalize hypertext”

-- Tim Berners-Lee

Semantic Web adds semantics

Page 8: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 13

Semantic Web 101<?xml version="1.0" encoding="utf-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf=http://xmlns.com/foaf/0.1/ xmlns:uni=http//ebiquity.umbc.edu/ontologies/uni/>

<uni:Student> <foaf:name>Li Ding</foaf:name> <foaf:mbox rdf:resource=“mailto:[email protected]”/> </uni:Student></rdf:RDF>

• RDF/XML• rdf:RDF tag• namespaces ontologies

• Semantic graph, URIs as nodes & links

• triples

Li Dingfoaf:name

uni:Studentrdf:type

Page 9: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 18

But what about our agents?

A Google for knowledge on the Semantic Web is needed by software agents and programs

SwoogleSwoogle

Swoogle

Swoogle

SwoogleSwoogle

SwoogleSwoogle

Swoogle SwoogleSwoogle

SwoogleSwoogle

SwoogleSwoogle

tell

register

Page 10: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 19

This talk• Motivation

• Semantic web 101• Swoogle Semantic Web

search engine• Use cases and applications• State of the Semantic Web• Conclusions

Page 11: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 20

•http://swoogle.umbc.edu/•Running since summer 2004•1.4M RDF documents, 250M RDF triples, 10K

ontologies

Page 12: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 21

Analysis

Index

Discovery

IR Indexer

Search Services

Semantic Webmetadata

Web Service

Web Server

Candidate URLs

Bounded Web CrawlerGoogle Crawler

SwoogleBot

SWD Indexer

Ranking

document cache

SWD classifier

human machine

html rdf/xml

the WebSemantic Web

Information flow Swoogle‘s web interface

Legends

Swoogle Architecture

Page 13: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 22

A Hybrid Harvesting Framework

Manual submission

RDF crawlingBounded HTML crawlingMeta crawling

Seeds M Seeds H Seeds R

SwoogleSampleDataset

Inductive learner

the Web

Google API call crawl crawl

true

would

google

Page 14: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 25

This talk• Motivation

• Semantic web 101• Swoogle Semantic Web

search engine• Use cases and applications• State of the Semantic Web• Conclusions

Page 15: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 26

Applications and use cases

• Supporting Semantic Web developers– Ontology designers, vocabulary discovery, who’s using

my ontologies or data?, use analysis, errors,statistics, etc.

• Searching specialized collections– Spire: aggregating observations and data from biologists

– InferenceWeb: searching over and enhancing proofs

– SemNews: Text Meaning of news stories

• Supporting SW tools– Triple shop: finding data for SPARQL queries

Page 16: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 27

Page 17: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 28

By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size.

80 ontologies were found that had these three terms

Let’s look at this one

Page 18: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 29

Basic MetadatahasDateDiscovered:  2005-01-17 hasDatePing:  2006-03-21 hasPingState:  PingModified type:  SemanticWebDocument isEmbedded:  false hasGrammar:  RDFXML hasParseState:  ParseSuccess hasDateLastmodified:  2005-04-29 hasDateCache:  2006-03-21 hasEncoding:  ISO-8859-1 hasLength:  18K hasCntTriple:  311.00 hasOntoRatio:  0.98 hasCntSwt:  94.00 hasCntSwtDef:  72.00 hasCntInstance:  8.00

Page 19: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 30

Page 20: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 31

Page 21: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 32

These are the namespaces this ontology uses. Clicking on one

shows all of the documents using the namespace.

All of this is available in RDF form for the

agents among us.

Page 22: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 33

Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies.

Page 23: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 34

We can also search for terms (classes, properties) like terms for “person”.

Page 24: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 35

10K terms associatged with “person”! Ordered by use.

Let’s look at foaf:Person’s metadata

Page 25: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 36

Page 26: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 37

Page 27: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 38

Page 28: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 39

Page 29: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 40

Page 30: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 41

Page 31: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 45

UMBC Triple Shop

• http://sparql.cs.umbc.edu/• Online SPARQL RDF query processing based

on HP’s Jena and Joseki with several interesting features• Selectable level of inference over model• Automatically finds SWDs for give queries using Swoogle

backend database– Provide dataset creation wizard– Dataset can be stored on our server or downloaded– Tag, share and search over saved datasets

Page 32: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 46

Web-scale semantic web data access

agent data access service the Web

ask (“person”)Search vocabulary

ask (“?x rdf:type foaf:Person”)

inform (“foaf:Person”)

Fetch docs

Populate RDF database

Query localRDF database

inform (doc URLs)

Search URIrefs in SW vocabulary

Search URLsin SWD index

Compose query

Index RDF data

Page 33: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 47

Who knows Anupam Joshi?Show me their names, email address and pictures

Page 34: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 48

The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles

Page 35: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 49

No FROM clause!

Constraints on wherethe data comes from

Page 36: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 50

PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT DISTINCT ?p2name ?p2mbox ?p2pixWHERE { ?p1 foaf:name "Anupam Joshi" . ?p1 foaf:mbox ?p1mbox . ?p2 foaf:knows ?p3 . ?p3 foaf:mbox ?p1mbox . ?p2 foaf:name ?p2name . ?p2 foaf:mbox ?p2mbox . OPTIONAL { ?p2 foaf:depiction ?p2pix } . }ORDER BY ?p2name

Page 37: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 51

Page 38: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 52

Swoogle found 292 RDF data files that appear relevant to answering our query

Page 39: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 53

Let’s save the dataset before we use it

Page 40: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 54

Page 41: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 55

And tag it so we and others can find it more easily.

Page 42: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 56

Here we are using it to get an answer to “Who knows Anupam Joshi”

Page 43: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 57

He has many friends!

Page 44: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 58

Page 45: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 59

This talk• Motivation

• Semantic web 101• Swoogle Semantic Web

search engine• Use cases and applications

• State of the Semantic Web

• Conclusions

Page 46: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 60

Will it Scale? How?Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling

System/date Terms Documents Individuals Triples Bytes

Swoogle2 1.5x105 3.5x105 7x106 5x107 7x109

Swoogle3 2x105 7x105 1.5x107 7.5x107 1x1010

2006 1x106 5x107 5x107 5x109 5x1011

2008 5x106 5x109 5x109 5x1011 5x1013

We think Swoogle’s centralized approach can be made to work for the next few years if not longer.

Page 47: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 61

How much reasoning?

• SwoogleN (N<=3) does limited reasoning– It’s expensive

– It’s not clear how much should be done

• More reasoning would benefit many use cases– e.g., type hierarchy

• Recognizing specialized metadata– E.g., that ontology A some maps terms from B to C

Page 48: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 62

This talk• Motivation

• Semantic web 101• Swoogle Semantic Web

search engine• Use cases and applications• State of the Semantic Web• Conclusions

Page 49: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 63

Conclusion• The web will contain the world’s knowledge in

forms accessible to people and computers– We need better ways to discover, index, search and

reason over SW knowledge

• SW search engines address different tasks than html search engines– So they require different techniques and APIs

• Swoogle like systems can help create consensus ontologies and foster best practices– Swoogle is for Semantic Web 1.0– Semantic Web 2.0 will make different demands

Page 50: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 64

http://ebiquity.umbc.edu/Annotated

in OWL

For more information

Page 51: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 65

backup

Page 52: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 66

Page 53: Search Engines  for Semantic Web Knowledge

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland 67