linked data in linguistics for nlp and web annotation › international › multilingualweb ›...

20
MultilingualWeb – 2012/06/11 Dublin Page 1 http://lod2.eu Creating Knowledge out of Interlinked Data AKSW, Universität Leipzig Sebastian Hellmann Linked Data in Linguistics for NLP and Web Annotation http://nlp2rdf.org http://lod2.eu

Upload: others

Post on 08-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 1 http://lod2.euMultilingualWeb –Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page http://lod2.euAKSW, Universität Leipzig

Sebastian Hellmann

Linked Data in Linguistics for NLP and Web Annotation

http://nlp2rdf.orghttp://lod2.eu

Page 2: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 2 http://lod2.eu

The Semantic Gap

Page 3: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 3 http://lod2.eu

Turning Walled Gardens into Park Networks of Semantic Linguistic Data

1. Use the Data Web as

background knowledge for

NLP

2. Use Data Web

technologies for integrating

NLP tools & approaches

How can we leverage the Data Web for natural language processing?

On the Web, by sharing and copying the value of information increases

50 Billion facts covering all kinds of domains are readily availableLeverage the wisdom of the crowds

RDF is all about semantic interoperability

3. Make the output of NLP tools available

on the Data Web

Page 4: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 4 http://lod2.eu

1. Use the Data Web as background knowledge for NLP

Linguistic Data currently filed under “cross-domain”

Page 5: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 5 http://lod2.eu

Three communities with three resources:

• Working Group for Open Linguistics Data (OWLG)

– > http://linguistics.okfn.org

• DBpedia Internationalization Committee

– > http://wiki.dbpedia.org/Internationalization

• Wiktionary2RDF Wrappers

– > http://dbpedia.org/Wiktionary

All communities are open, please join!

1. Use the Data Web as background knowledge for NLP

Page 6: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 6 http://lod2.eu

The Linguistic Linked Open Data Cloud

Page 7: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 7 http://lod2.eu

Main question

Page 8: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 8 http://lod2.eu

Wiktionary2RDF – Mediator Wrapper

http://dbpedia.org/Wiktionary

Page 9: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 9 http://lod2.eu

Wiktionary2RDF – Mediator Wrapper

http://dbpedia.org/Wiktionary

MediatorLemon

Page 10: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 10 http://lod2.eu

2. Use Data Web Technologies for Integrating NLP Tools and Approaches

Image from http://pbmo.wordpress.com/2011/09/29/maslows-hammer/

Golden Hammer Anti-pattern

The question is not whether touse RDF and Linked Data, but when to use...

Page 11: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 11 http://lod2.euMultilingualWeb – 2012/06/11 Dublin – Page 11 http://lod2.eu

Page 12: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 12 http://lod2.eu

• Ontologies provide (formal) documentation (UML, ERD)

• Structure is easy to understand

• Wide range of RDF tools can be used, e.g. LOD2 Stack

• Indexing and querying as Big Picture possible

2. Use Data Web Technologies for Integrating NLP Tools and Approaches

Page 13: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 13 http://lod2.eu

The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.

• Road map

• Bootstrapped by LOD2, but a community project

• First release in September 2011

• Great resonance

– Over 50 people joined the mailing list:

http://lists.okfn.org/mailman/listinfo/open-linguistics

– First third party implementations and contributions

– Several project discuss usage

• Currently setting up advisory board, next draft in July

2. Use Data Web Technologies for Integrating NLP Tools and Approaches

Page 14: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 14 http://lod2.eu

S. Auer and S. Hellmann: The Web of Data: Decentralized, collaborative, interlinked and interoperable LREC 2012, http://www.lrec-conf.org/proceedings/lrec2012/keynotes/LREC%202012.Keynote%20Speech%201.Soeren%20Auer.pdf

Page 15: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 15 http://lod2.eu

3. Make the Output of NLP Tools available on the Web

Currently there is no standard mechanism to transparently combine the WWW, GGG and NLP

GGG = Giant Global Graph (basically the Web of Data)

see: http://dig.csail.mit.edu/breadcrumbs/node/215

Page 16: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 16 http://lod2.eu

3. Make the Output of NLP Tools available on the Web

Page 17: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 17 http://lod2.eu

3. Make the Output of NLP Tools available on the Web

http://dbpedia.org/spotlight P. Mendes et. al. DBpedia spotlight: Shedding light on the web of documents. In I-Semantics, 2011

Page 18: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 18 http://lod2.eu

3. Make the Output of NLP Tools available on the Web

http://annotateit.orghttp://sourceforge.net/projects/fragmentlinks/

Page 19: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

MultilingualWeb – 2012/06/11 Dublin – Page 19 http://lod2.eu

3. Make the Output of NLP Tools available on the Web

NLP Interchange Format (NIF) join the mailing list at:http://nlp2rdf.org

Hellmann et.al.: Towards an Ontology for Representing Strings In: EKAW 2012 http://svn.aksw.org/papers/2012/WWW_NIF/public/string_ontology.pdf

Page 20: Linked Data in Linguistics for NLP and Web Annotation › International › multilingualweb › dublin › ... · background knowledge for NLP 2. Use Data Web technologies for integrating

LOD2 Title . 02.09.2010 . Page 20 http://lod2.eu

Address

University of LeipzigFaculty of Mathematics and Computer ScienceInstitute of Computer ScienceDepartment of Business Information Systems

Postfach 10092004009 LeipzigGermany

Thanks for your attention!

Contact

Project: http://lod2.euOrganisation: http://uni-leipzig.de, http://aksw.org Presenter: http://bis.informatik.uni-leipzig.de/SebastianHellmannNLP2RDF page: http://nlp2rdf.org

Acknowledgement: some slides are taken from the keynote

of Sören Auer at LREC 2012CC-BY-SA

unless otherwise stated