thesis presentation
TRANSCRIPT
PowerPoint-Prsentation - Folie 1
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page
http://lod2.eu
AKSW, Universitt Leipzig
Sebastian Hellmann
A Transparent Formalization of Text for Machines
http://nlp2rdf.org
Start: Jan 2009Tentative End: Summer 2012
Introduction of the touched areas
Scientific Core
Evaluation
Plan
Overview
The Semantic Gap
The Semantic Gap
Most problems occurred at the bottom
Data integration is difficult, if the pivots are not well defined
Questions (in order):
What structure to use?
What URIs to use?
What is a String?
How can we teach machines to understand Strings (Knowledge Representation)?
How can we formalize text in a way, which is:Transparent for machines
Efficient for NLP Use Cases
Consistent with the Web architecture
Main question
Areas
Preliminary definition
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.
This definition is still limited to RDF and NLP and targets software integration via a common exchange format
Scientific core
Scientific core
Scientific core
Intransparent for machines
Scientific core
The city Berlin is the capital of Germany.
URIhttp://example.org/sample #offset_0_42
Universe of discourse is defined as the words over the alphabet of Unicode characters (Unicode Normal Form C), often called *
Scientific core
The city Berlin is the capital of Germany.
URIhttp://example.org/sample #offset_0_42
Universe of discourse is defined as the words over the alphabet of Unicode characters (Unicode Normal Form C), often called *
http://example.org/sample #offset_34_41
Germany
isString
referenceContext
contextisString
Scientific core
Define the notion of Context and formalize it in OWL:Context is similar to the German word Betrachtungshorizont
In English maybe inside context, i.e. the text itself, which serves as a reference context for all included substrings.
Definitely disjoint with groupings such as Document, because a wider context is needed for this.
Example following...
Scientific core
Scientific core
Define the notion of Context and formalize it in OWL:Context is similar to the German word Betrachtungshorizont
In English maybe inside context, i.e. the text itself, which serves as a reference context for all included substrings.
Definitely disjoint with groupings such as Document, because a wider context is needed for this.
Scientific Core
Goal is to research some of the implications, ...but I might not be able to finish it, completely.In scope:Property contextString is inverse-functional, which means that machines can infer automatically that the same context occurs in different documents.
Show consistency with ambiguity
Define metrics that compare contexts
Formalize the interpretation function
Show interoperability with internal models of all major NLP frameworks
(Partial) compatibility with the WWW and the GGG
Scientific Core
Out of scope:Transition between contexts: Do statements from a smaller context hold in a broader context
Incorporate all layers of NLP (Stack). Limited to POS tags and Entity Recognition
Fill all the question marks in the Venn diagram
Areas
Linguistic Linked Open Data Cloud
Developers study
Areas
Evaluation
Compare to other models in NLP: Size (RDF vs. XML) , performance, expressivity
Is NIF easy to understand and implement? Developers study, release of the specification had quite an impact, people started to create extensions and use the format. 50 people on the mailing list.
How to evaluate Web Service integration or consistency with web architecture. If the way strings are represented is transparent and formalized, do I need to do experimental evaluation to show benefits?
Q & A
Thank you for your attention
Standing on the shoulders of giants
BIS 2012/03/01 Leipzig Page
http://lod2.eu
LOD2 Title . 02.09.2010 . Page
http://lod2.eu
http://lod2.eu
ISSLOD 2011/09/15 Page
http://lod2.eu
Table of Contents
LOD2 Title . 02.09.2010 . Page
http://lod2.eu
LOD2 Title . 02.09.2010 . Page
http://lod2.eu
Address
University of LeipzigFaculty of Mathematics and Computer ScienceInstitute of Computer ScienceDepartment of Business Information SystemsPostfach 10092004009 LeipzigGermany
Thanks for your attention!
Contact