heidelplace: an extensible framework for geoparsing · ludwig richter, johanna geiß, andreas spitz...

HeidelPlace: An Extensible Framework for Geoparsing Ludwig Richter, Johanna Geiß, Andreas Spitz and Michael Gertz Database Systems Research Group, Heidelberg University, Im Neuenheimer Feld 205, 69120 Heidelberg, Germany What is Geoparsing? Why HeidelPlace? Generic Gazetteer Data Model Motivation: Geoparsing is a key task in text processing and central to sub- sequent spatial analyses. It describes the process of identifying place mentions (topoynms) and linking them to unam- biguous spatial references. For example, geoparsing “Heidelberg was founded in 1196 AD” may reveal a reference to the German town Heidelberg. Problem: Several geoparsers are avail- able, each with their own gazetteer and toponym recognition and resolution approaches. However, they often lack ex- tensibility, implementations are not ac- cessible, or they are fixed to a particu- lar gazetteer. This makes adjustments to other application domains difficult and prevents easy experimental setup. HeidelPlace provides: • A generic gazetteer model supporting integration of place information from heterogeneous knowledge bases • A pipeline approach enabling implemen- tation and combination of modules for specific geoparsing applications • GUIs for gazetteer browsing and testing developed modules This makes HeidelPlace a unique and valuable tool for experimenting with new geoparsing approaches. The Architecture of HeidelPlace Outlook The architecture of HeidelPlace consists of three major components: An Annotation Pipeline executes the geoparsing process on input documents. It utilizes the Stanford CoreNLP toolkit [1] . An annotation object contains metadata for a processed document. This enables passing information between annotators, which represent processing tasks that operate on a document. An annotation pipeline iteratively executes the tasks. Geoparsing Modules unify the geoparsing process. For each step in the process, a module interface is defined that specifies expected in- and output and de- cides when the step can be executed. A Gazetteer serves as a knowledge base for the geoparsing modules. It employs a generic gazetteer data model, which provides a wide spectrum of place information. A flexible query system allows to efficiently search the gazetteer for places with certain characteristics. This architecture forms a framework for geoparsing that can be easily extended and customized. Getting ready for practical use: Include more modules for state-of-the-art methods, e.g., co-occurrence based toponym disambiguation [2] . Further extensions of HeidelPlace: • Quantitative evaluation framework • Gazetteer web service • UIMA component • Gazetteer data editor Gazetteer Browsing Desktop App Gazetteer Viewer: • Selection of search filters • Detailed view of place information • Intuitive navigation and visualization Performing & Analyzing Geoparsing Desktop App Geoparser Viewer: • Text input provided by user • Selection of (pre-configured) modules • Step-by-step geoparsing • Output visualization for each step • Interactive exploration of geoparsing Comparing Geoparsing Methods Desktop App Geoparser Viewer: • Run multiple configurations at once • Demonstrate handling of corner cases • Understand differences of methods • Experiment with module combinations Contact Information: Ludwig Richter [email protected] http://event.ifi.uni-heidelberg.de References [1] C. D. Manning, et al.: The Stanford CoreNLP Natural Language Processing Toolkit. ACL’14, 2014 [2 ] A. Spitz, J. Geiß, and M. Gertz: So Far Away and Yet so Close: Augmenting Toponym Disam- biguation and Similarity With Text-Based Networks. GeoRich’16, 2016 This work was presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP ’17), Sept. 7-11, 2017, Copenhagen, Denmark.

Upload: others

Post on 26-Sep-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

HeidelPlace: An Extensible Framework for GeoparsingLudwig Richter, Johanna Geiß, Andreas Spitz and Michael Gertz

Database Systems Research Group, Heidelberg University, Im Neuenheimer Feld 205, 69120 Heidelberg, Germany

What is Geoparsing? Why HeidelPlace? Generic Gazetteer Data Model

Motivation: Geoparsing is a key taskin text processing and central to sub-sequent spatial analyses. It describesthe process of identifying place mentions(topoynms) and linking them to unam-biguous spatial references. For example,geoparsing “Heidelberg was founded in1196 AD” may reveal a reference to theGerman town Heidelberg.

Problem: Several geoparsers are avail-able, each with their own gazetteer andtoponym recognition and resolution ap-proaches. However, they often lack ex-tensibility, implementations are not ac-cessible, or they are fixed to a particu-lar gazetteer. This makes adjustmentsto other application domains difficult andprevents easy experimental setup.

HeidelPlace provides:•A generic gazetteer model supporting

integration of place information fromheterogeneous knowledge bases

•A pipeline approach enabling implemen-tation and combination of modules forspecific geoparsing applications

•GUIs for gazetteer browsing and testingdeveloped modules

This makes HeidelPlace a unique andvaluable tool for experimenting with newgeoparsing approaches.

The Architecture of HeidelPlace Outlook

The architecture of HeidelPlace consistsof three major components:

An Annotation Pipeline executes thegeoparsing process on input documents.It utilizes the Stanford CoreNLP toolkit[1].An annotation object contains metadatafor a processed document. This enablespassing information between annotators,which represent processing tasks thatoperate on a document. An annotationpipeline iteratively executes the tasks.

Geoparsing Modules unify the geopars-ing process. For each step in the pro-cess, a module interface is defined thatspecifies expected in- and output and de-cides when the step can be executed.

A Gazetteer serves as a knowledge basefor the geoparsing modules. It employs ageneric gazetteer data model, which pro-vides a wide spectrum of place informa-tion. A flexible query system allows toefficiently search the gazetteer for placeswith certain characteristics.

This architecture forms a framework forgeoparsing that can be easily extendedand customized.

Getting ready for practical use:Include more modules for state-of-the-artmethods, e.g., co-occurrence based to-ponym disambiguation[2].

Further extensions of HeidelPlace:•Quantitative evaluation framework•Gazetteer web service•UIMA component•Gazetteer data editor

Gazetteer Browsing

Desktop App Gazetteer Viewer:•Selection of search filters•Detailed view of place information• Intuitive navigation and visualization

Performing & Analyzing Geoparsing

Desktop App Geoparser Viewer:•Text input provided by user•Selection of (pre-configured) modules•Step-by-step geoparsing•Output visualization for each step• Interactive exploration of geoparsing

Comparing Geoparsing Methods

Desktop App Geoparser Viewer:•Run multiple configurations at once•Demonstrate handling of corner cases•Understand differences of methods•Experiment with module combinations

Contact Information:Ludwig [email protected]://event.ifi.uni-heidelberg.de

References[1 ] C. D. Manning, et al.: The Stanford CoreNLP Natural Language Processing Toolkit. ACL’14, 2014[2 ] A. Spitz, J. Geiß, and M. Gertz: So Far Away and Yet so Close: Augmenting Toponym Disam-

biguation and Similarity With Text-Based Networks. GeoRich’16, 2016

This work was presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP ’17), Sept. 7-11, 2017, Copenhagen, Denmark.

Richtlinien zum Mutterschutz - klinikum.uni-heidelberg.de · Richtlinien zum Mutterschutz (V16) 1 Arbeitssicherheit und Biologische Sicherheit der Universität Heidelberg Im Neuenheimer

Einführung in die Praktische Informatik · RUPRECHT-KARLS-UNIVERSITÄT HEIDELBERG Institut für Informatik Neuenheimer Feld 348 69120 Heidelberg [email protected]

Strategische Finanzplanung Jürgen E. Biernoth Dipl.-Kaufmann Executive Consultant MLP Finanzdienstleistungen AG Geschäftsstelle HD II Neuenheimer Landstr

Judith Geiß – Inhaberin von the Bridge · Seite 1 von 21 Office: +49-6204 – 670 21 60 · [email protected] · Judith Geiß – Inhaberin von the Bridge Spezialisiert

Straßenbahn im Neuenheimer Feld · Planfeststellungsbeschluss „Straßenbahn im Neuenheimer Feld“ Az.: 24-3871.1-HSB/36 Regierungspräsidium Karlsruhe Seite 7 1.4.1.1 Vorkommende

Numerik 2 Motivation - conan.iwr.uni-heidelberg.de · Numerik 2 Motivation P. Bastian Universit at Heidelberg Interdisziplin ares Zentrum fur Wissenschaftliches Rechnen Im Neuenheimer

Use of the Edinburgh Geoparser in the GeoDigRef and ... · ttt2) with geoparsing-specific sub-components which the LTG has developed in collaboration with EDINA, as part of the GeoCrossWalk

Fragen zur Zustellung - nussbaum-medien.de · Mitteilungsblatt der Gemeinde Oftersheim · 31. März 2018 · Nr. 13 3 Bürgermeister Jens Geiß besuchte am vergangenen Freitag, 23

Geocoding and Geoparsing are Easy

Sympetaly in Apiales (Apiaceae, Araliaceae, Pittosporaceae) · Heidelberg Institute of Plant Sciences (HIP) — Biodiversity and Plant Systematics, University of Heidelberg, Im Neuenheimer