linked data in linguistics for nlp and web annotation › international › multilingualweb ›...
TRANSCRIPT
MultilingualWeb – 2012/06/11 Dublin – Page 1 http://lod2.euMultilingualWeb –Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page http://lod2.euAKSW, Universität Leipzig
Sebastian Hellmann
Linked Data in Linguistics for NLP and Web Annotation
http://nlp2rdf.orghttp://lod2.eu
MultilingualWeb – 2012/06/11 Dublin – Page 2 http://lod2.eu
The Semantic Gap
MultilingualWeb – 2012/06/11 Dublin – Page 3 http://lod2.eu
Turning Walled Gardens into Park Networks of Semantic Linguistic Data
1. Use the Data Web as
background knowledge for
NLP
2. Use Data Web
technologies for integrating
NLP tools & approaches
How can we leverage the Data Web for natural language processing?
On the Web, by sharing and copying the value of information increases
50 Billion facts covering all kinds of domains are readily availableLeverage the wisdom of the crowds
RDF is all about semantic interoperability
3. Make the output of NLP tools available
on the Data Web
MultilingualWeb – 2012/06/11 Dublin – Page 4 http://lod2.eu
1. Use the Data Web as background knowledge for NLP
Linguistic Data currently filed under “cross-domain”
MultilingualWeb – 2012/06/11 Dublin – Page 5 http://lod2.eu
Three communities with three resources:
• Working Group for Open Linguistics Data (OWLG)
– > http://linguistics.okfn.org
• DBpedia Internationalization Committee
– > http://wiki.dbpedia.org/Internationalization
• Wiktionary2RDF Wrappers
– > http://dbpedia.org/Wiktionary
All communities are open, please join!
1. Use the Data Web as background knowledge for NLP
MultilingualWeb – 2012/06/11 Dublin – Page 6 http://lod2.eu
The Linguistic Linked Open Data Cloud
MultilingualWeb – 2012/06/11 Dublin – Page 7 http://lod2.eu
Main question
MultilingualWeb – 2012/06/11 Dublin – Page 8 http://lod2.eu
Wiktionary2RDF – Mediator Wrapper
http://dbpedia.org/Wiktionary
MultilingualWeb – 2012/06/11 Dublin – Page 9 http://lod2.eu
Wiktionary2RDF – Mediator Wrapper
http://dbpedia.org/Wiktionary
MediatorLemon
MultilingualWeb – 2012/06/11 Dublin – Page 10 http://lod2.eu
2. Use Data Web Technologies for Integrating NLP Tools and Approaches
Image from http://pbmo.wordpress.com/2011/09/29/maslows-hammer/
Golden Hammer Anti-pattern
The question is not whether touse RDF and Linked Data, but when to use...
MultilingualWeb – 2012/06/11 Dublin – Page 11 http://lod2.euMultilingualWeb – 2012/06/11 Dublin – Page 11 http://lod2.eu
MultilingualWeb – 2012/06/11 Dublin – Page 12 http://lod2.eu
• Ontologies provide (formal) documentation (UML, ERD)
• Structure is easy to understand
• Wide range of RDF tools can be used, e.g. LOD2 Stack
• Indexing and querying as Big Picture possible
2. Use Data Web Technologies for Integrating NLP Tools and Approaches
MultilingualWeb – 2012/06/11 Dublin – Page 13 http://lod2.eu
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.
• Road map
• Bootstrapped by LOD2, but a community project
• First release in September 2011
• Great resonance
– Over 50 people joined the mailing list:
http://lists.okfn.org/mailman/listinfo/open-linguistics
– First third party implementations and contributions
– Several project discuss usage
• Currently setting up advisory board, next draft in July
2. Use Data Web Technologies for Integrating NLP Tools and Approaches
MultilingualWeb – 2012/06/11 Dublin – Page 14 http://lod2.eu
S. Auer and S. Hellmann: The Web of Data: Decentralized, collaborative, interlinked and interoperable LREC 2012, http://www.lrec-conf.org/proceedings/lrec2012/keynotes/LREC%202012.Keynote%20Speech%201.Soeren%20Auer.pdf
MultilingualWeb – 2012/06/11 Dublin – Page 15 http://lod2.eu
3. Make the Output of NLP Tools available on the Web
Currently there is no standard mechanism to transparently combine the WWW, GGG and NLP
GGG = Giant Global Graph (basically the Web of Data)
see: http://dig.csail.mit.edu/breadcrumbs/node/215
MultilingualWeb – 2012/06/11 Dublin – Page 16 http://lod2.eu
3. Make the Output of NLP Tools available on the Web
MultilingualWeb – 2012/06/11 Dublin – Page 17 http://lod2.eu
3. Make the Output of NLP Tools available on the Web
http://dbpedia.org/spotlight P. Mendes et. al. DBpedia spotlight: Shedding light on the web of documents. In I-Semantics, 2011
MultilingualWeb – 2012/06/11 Dublin – Page 18 http://lod2.eu
3. Make the Output of NLP Tools available on the Web
http://annotateit.orghttp://sourceforge.net/projects/fragmentlinks/
MultilingualWeb – 2012/06/11 Dublin – Page 19 http://lod2.eu
3. Make the Output of NLP Tools available on the Web
NLP Interchange Format (NIF) join the mailing list at:http://nlp2rdf.org
Hellmann et.al.: Towards an Ontology for Representing Strings In: EKAW 2012 http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf
LOD2 Title . 02.09.2010 . Page 20 http://lod2.eu
Address
University of LeipzigFaculty of Mathematics and Computer ScienceInstitute of Computer ScienceDepartment of Business Information Systems
Postfach 10092004009 LeipzigGermany
Thanks for your attention!
Contact
Project: http://lod2.euOrganisation: http://uni-leipzig.de, http://aksw.org Presenter: http://bis.informatik.uni-leipzig.de/SebastianHellmannNLP2RDF page: http://nlp2rdf.org
Acknowledgement: some slides are taken from the keynote
of Sören Auer at LREC 2012CC-BY-SA
unless otherwise stated