moving beyond sameas with plato: partonomy detection for linked data

22
May 2012 –GE Global Research– Prateek Jain 23 rd ACM HT Conference 2012– Prateek Jain Moving beyond sameAs with PLATO: Partonomy detection for Linked Data Prateek Jain, Pascal Hitzler, Amit Sheth Kno.e.sis Center Wright State University, Dayton, OH Peter Z. Yeh, Kunal Verma Accenture Technology Labs San Jose, CA

Upload: prateek-jain

Post on 17-May-2015

399 views

Category:

Education


1 download

DESCRIPTION

The Linked Open Data (LOD) Cloud has gained significant traction over the past few years. With over 275 interlinked datasets across diverse domains such as life science, geography, politics, and more, the LOD Cloud has the potential to support a variety of applications ranging from open domain question answering to drug discovery. Despite its significant size (approx. 30 billion triples), the data is relatively sparely interlinked (approx. 400 million links). A semantically richer LOD Cloud is needed to fully realize its potential. Data in the LOD Cloud are currently interlinked mainly via the owl:sameAs property, which is inadequate for many applications. Additional properties capturing relations based on causality or partonomy are needed to enable the answering of complex questions and to support applications. In this work, we present a solution to enrich the LOD Cloud by automatically detecting partonomic relationships, which are well-established, fundamental properties grounded in linguistics and philosophy. We empirically evaluate our solution across several domains, and show that our approach performs well on detecting partonomic properties between LOD Cloud data.

TRANSCRIPT

Page 1: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain

Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

Prateek Jain, Pascal Hitzler, Amit ShethKno.e.sis Center

Wright State University, Dayton, OH

Peter Z. Yeh, Kunal VermaAccenture Technology Labs

San Jose, CA

Page 2: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 22

Outline

• Introduction - Linked Open Data

• Challenges

• PLATO – Partonomic Relationship detection

• Conclusion & Future Work

Page 3: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 33

Tim Berners-Lee 2006

• from http://www.w3.org/DesignIssues/LinkedData.html

1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the

standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things.

Page 4: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 44

Linked Open Data 2011

Page 5: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 55

Linked Open Data

Number of Datasets

2011-09-19 295 2010-09-22 203 2009-07-14 95 2008-09-18 45 2007-10-08 25 2007-05-01 12

Number of triples (Sept 2011)

31,634,213,770

with 503,998,829 out-links

From http://www4.wiwiss.fu-berlin.de/lodcloud/state/

Page 6: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 66

Page 7: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 77

Page 8: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –IBM TJ Watson Center– Prateek Jain

Mainstream Semantic Web?

Page 9: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 99

Is it really mainstream Semantic Web?

• What is the relationship between the models whose instances are being linked?

• How to do querying on LOD without knowing individual datasets?

• How to perform schema level reasoning over LOD cloud?

• A very fundamental, important and conceptual relationship namely “PART OF” has little or no existence in LOD

Page 10: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain

PLATO Approach

Page 11: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1111

Our Approach

Use knowledge contributed by users

• Detection of relationships within and across datasets

LOD Cloud

Page 12: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1212

PLATO Approach

• PLATO generates all possible partonomically linked pairs between the entities in the dataset. – Utilize “strongly” associated entities

• Identify the type of each entity in the pair using WordNet.– Use Class Names– Gives the lexicographer files for the synsets corresponding to these

entities

• Use this information to determine the applicable OWL partonomy properties.– Using Winston’s taxonomy

Page 13: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1313

Winston’s Taxonomy

Page 14: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1414

PLATO Approach – Step 2

• PLATO generates linguistic patterns for each applicable property based on linguistic cues suggested by Winston.– Cell Wall is made of Cellulose– Cellulose is made of Cell Wall– Cell Wall is partly Cellulose

• Tests the lexical patterns for each entity pair in a corpus-driven manner.– Using Web as a corpus

• PLATO counts the total number of web pages that contain the pattern– Parse the page and identify the occurance of pattern.

Page 15: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1515

PLATO Approach – Step 3

• Asserts the partonomy property with strongest supporting evidence– Cell Wall is made of Cellulose, 48– Cellulose is made of Cell Wall, 10

• PLATO also enriches the schema by generalizing from the instance level assertions.

Page 16: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1616

PLATO Evaluation

Page 17: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1717

Outreach

• Prateek Jain, Pascal Hitzler, Kunal Verma, Peter Z. Yeh and Amit P. Sheth, “Moving beyond sameAs with PLATO: Partonomy detection for Linked Data”. In Proceedings of the 23rd ACM Hypertext and Social Media conference (HT 2012), Milwaukee, WI, USA, June 25th-28th, 2012 (To Appear)

• Tool available for download at

http://wiki.knoesis.org/index.php/PLATO

Page 18: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1818

End Product

Page 19: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain

Conclusions and Future Work

Page 20: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 2020

Conclusions

• PLATO is an approach for partonomic relationship detection

• Approach works for both instances and schema level relationships

• Evaluation performed between and within prominent and big LOD datasets

• Results validate the use of knowledge on the Web to solve tough problems

Page 21: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 2121

Future Work

• Use incomplete knowledge for part of relationship identification– Machine learning based techniques

• Release the schema mappings in public domain

• Develop better querying system for LOD using PLATO and BLOOMS• Work in progress with ALOQUS (Submitted to ODBASE 2012)

• Identify and incorporate user preferences

Page 22: Moving beyond sameAs with PLATO: Partonomy detection for Linked Data

May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain

Questions?

Prateek JainKno.e.sis Center

Wright State University, Dayton, OH http://wiki.knoesis.org/index.php/Prateek