moving beyond sameas with plato: partonomy detection for linked data
DESCRIPTION
The Linked Open Data (LOD) Cloud has gained significant traction over the past few years. With over 275 interlinked datasets across diverse domains such as life science, geography, politics, and more, the LOD Cloud has the potential to support a variety of applications ranging from open domain question answering to drug discovery. Despite its significant size (approx. 30 billion triples), the data is relatively sparely interlinked (approx. 400 million links). A semantically richer LOD Cloud is needed to fully realize its potential. Data in the LOD Cloud are currently interlinked mainly via the owl:sameAs property, which is inadequate for many applications. Additional properties capturing relations based on causality or partonomy are needed to enable the answering of complex questions and to support applications. In this work, we present a solution to enrich the LOD Cloud by automatically detecting partonomic relationships, which are well-established, fundamental properties grounded in linguistics and philosophy. We empirically evaluate our solution across several domains, and show that our approach performs well on detecting partonomic properties between LOD Cloud data.TRANSCRIPT
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain
Moving beyond sameAs with PLATO: Partonomy detection for Linked Data
Prateek Jain, Pascal Hitzler, Amit ShethKno.e.sis Center
Wright State University, Dayton, OH
Peter Z. Yeh, Kunal VermaAccenture Technology Labs
San Jose, CA
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 22
Outline
• Introduction - Linked Open Data
• Challenges
• PLATO – Partonomic Relationship detection
• Conclusion & Future Work
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 33
Tim Berners-Lee 2006
• from http://www.w3.org/DesignIssues/LinkedData.html
1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the
standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things.
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 44
Linked Open Data 2011
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 55
Linked Open Data
Number of Datasets
2011-09-19 295 2010-09-22 203 2009-07-14 95 2008-09-18 45 2007-10-08 25 2007-05-01 12
Number of triples (Sept 2011)
31,634,213,770
with 503,998,829 out-links
From http://www4.wiwiss.fu-berlin.de/lodcloud/state/
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 66
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 77
May 2012 –IBM TJ Watson Center– Prateek Jain
Mainstream Semantic Web?
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 99
Is it really mainstream Semantic Web?
• What is the relationship between the models whose instances are being linked?
• How to do querying on LOD without knowing individual datasets?
• How to perform schema level reasoning over LOD cloud?
• A very fundamental, important and conceptual relationship namely “PART OF” has little or no existence in LOD
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain
PLATO Approach
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1111
Our Approach
Use knowledge contributed by users
• Detection of relationships within and across datasets
LOD Cloud
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1212
PLATO Approach
• PLATO generates all possible partonomically linked pairs between the entities in the dataset. – Utilize “strongly” associated entities
• Identify the type of each entity in the pair using WordNet.– Use Class Names– Gives the lexicographer files for the synsets corresponding to these
entities
• Use this information to determine the applicable OWL partonomy properties.– Using Winston’s taxonomy
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1313
Winston’s Taxonomy
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1414
PLATO Approach – Step 2
• PLATO generates linguistic patterns for each applicable property based on linguistic cues suggested by Winston.– Cell Wall is made of Cellulose– Cellulose is made of Cell Wall– Cell Wall is partly Cellulose
• Tests the lexical patterns for each entity pair in a corpus-driven manner.– Using Web as a corpus
• PLATO counts the total number of web pages that contain the pattern– Parse the page and identify the occurance of pattern.
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1515
PLATO Approach – Step 3
• Asserts the partonomy property with strongest supporting evidence– Cell Wall is made of Cellulose, 48– Cellulose is made of Cell Wall, 10
• PLATO also enriches the schema by generalizing from the instance level assertions.
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1616
PLATO Evaluation
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1717
Outreach
• Prateek Jain, Pascal Hitzler, Kunal Verma, Peter Z. Yeh and Amit P. Sheth, “Moving beyond sameAs with PLATO: Partonomy detection for Linked Data”. In Proceedings of the 23rd ACM Hypertext and Social Media conference (HT 2012), Milwaukee, WI, USA, June 25th-28th, 2012 (To Appear)
• Tool available for download at
http://wiki.knoesis.org/index.php/PLATO
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 1818
End Product
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain
Conclusions and Future Work
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 2020
Conclusions
• PLATO is an approach for partonomic relationship detection
• Approach works for both instances and schema level relationships
• Evaluation performed between and within prominent and big LOD datasets
• Results validate the use of knowledge on the Web to solve tough problems
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain 2121
Future Work
• Use incomplete knowledge for part of relationship identification– Machine learning based techniques
• Release the schema mappings in public domain
• Develop better querying system for LOD using PLATO and BLOOMS• Work in progress with ALOQUS (Submitted to ODBASE 2012)
• Identify and incorporate user preferences
May 2012 –GE Global Research– Prateek Jain23rd ACM HT Conference 2012– Prateek Jain
Questions?
Prateek JainKno.e.sis Center
Wright State University, Dayton, OH http://wiki.knoesis.org/index.php/Prateek