heidelplace: an extensible framework for geoparsing · ludwig richter, johanna geiß, andreas spitz...

1
HeidelPlace: An Extensible Framework for Geoparsing Ludwig Richter, Johanna Geiß, Andreas Spitz and Michael Gertz Database Systems Research Group, Heidelberg University, Im Neuenheimer Feld 205, 69120 Heidelberg, Germany What is Geoparsing? Why HeidelPlace? Generic Gazetteer Data Model Motivation: Geoparsing is a key task in text processing and central to sub- sequent spatial analyses. It describes the process of identifying place mentions (topoynms) and linking them to unam- biguous spatial references. For example, geoparsing “Heidelberg was founded in 1196 AD” may reveal a reference to the German town Heidelberg. Problem: Several geoparsers are avail- able, each with their own gazetteer and toponym recognition and resolution ap- proaches. However, they often lack ex- tensibility, implementations are not ac- cessible, or they are fixed to a particu- lar gazetteer. This makes adjustments to other application domains difficult and prevents easy experimental setup. HeidelPlace provides: A generic gazetteer model supporting integration of place information from heterogeneous knowledge bases A pipeline approach enabling implemen- tation and combination of modules for specific geoparsing applications GUIs for gazetteer browsing and testing developed modules This makes HeidelPlace a unique and valuable tool for experimenting with new geoparsing approaches. The Architecture of HeidelPlace Outlook The architecture of HeidelPlace consists of three major components: An Annotation Pipeline executes the geoparsing process on input documents. It utilizes the Stanford CoreNLP toolkit [1] . An annotation object contains metadata for a processed document. This enables passing information between annotators, which represent processing tasks that operate on a document. An annotation pipeline iteratively executes the tasks. Geoparsing Modules unify the geopars- ing process. For each step in the pro- cess, a module interface is defined that specifies expected in- and output and de- cides when the step can be executed. A Gazetteer serves as a knowledge base for the geoparsing modules. It employs a generic gazetteer data model, which pro- vides a wide spectrum of place informa- tion. A flexible query system allows to efficiently search the gazetteer for places with certain characteristics. This architecture forms a framework for geoparsing that can be easily extended and customized. Getting ready for practical use: Include more modules for state-of-the-art methods, e.g., co-occurrence based to- ponym disambiguation [2] . Further extensions of HeidelPlace: Quantitative evaluation framework Gazetteer web service UIMA component Gazetteer data editor Gazetteer Browsing Desktop App Gazetteer Viewer: Selection of search filters Detailed view of place information Intuitive navigation and visualization Performing & Analyzing Geoparsing Desktop App Geoparser Viewer: Text input provided by user Selection of (pre-configured) modules Step-by-step geoparsing Output visualization for each step Interactive exploration of geoparsing Comparing Geoparsing Methods Desktop App Geoparser Viewer: Run multiple configurations at once Demonstrate handling of corner cases Understand differences of methods Experiment with module combinations Contact Information: Ludwig Richter [email protected] http://event.ifi.uni-heidelberg.de References [1] C. D. Manning, et al.: The Stanford CoreNLP Natural Language Processing Toolkit. ACL’14, 2014 [2 ] A. Spitz, J. Geiß, and M. Gertz: So Far Away and Yet so Close: Augmenting Toponym Disam- biguation and Similarity With Text-Based Networks. GeoRich’16, 2016 This work was presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP ’17), Sept. 7-11, 2017, Copenhagen, Denmark.

Upload: others

Post on 26-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HeidelPlace: An Extensible Framework for Geoparsing · Ludwig Richter, Johanna Geiß, Andreas Spitz and Michael Gertz Database Systems Research Group, Heidelberg University, Im Neuenheimer

HeidelPlace: An Extensible Framework for GeoparsingLudwig Richter, Johanna Geiß, Andreas Spitz and Michael Gertz

Database Systems Research Group, Heidelberg University, Im Neuenheimer Feld 205, 69120 Heidelberg, Germany

What is Geoparsing? Why HeidelPlace? Generic Gazetteer Data Model

Motivation: Geoparsing is a key taskin text processing and central to sub-sequent spatial analyses. It describesthe process of identifying place mentions(topoynms) and linking them to unam-biguous spatial references. For example,geoparsing “Heidelberg was founded in1196 AD” may reveal a reference to theGerman town Heidelberg.

Problem: Several geoparsers are avail-able, each with their own gazetteer andtoponym recognition and resolution ap-proaches. However, they often lack ex-tensibility, implementations are not ac-cessible, or they are fixed to a particu-lar gazetteer. This makes adjustmentsto other application domains difficult andprevents easy experimental setup.

HeidelPlace provides:•A generic gazetteer model supporting

integration of place information fromheterogeneous knowledge bases

•A pipeline approach enabling implemen-tation and combination of modules forspecific geoparsing applications

•GUIs for gazetteer browsing and testingdeveloped modules

This makes HeidelPlace a unique andvaluable tool for experimenting with newgeoparsing approaches.

The Architecture of HeidelPlace Outlook

The architecture of HeidelPlace consistsof three major components:

An Annotation Pipeline executes thegeoparsing process on input documents.It utilizes the Stanford CoreNLP toolkit[1].An annotation object contains metadatafor a processed document. This enablespassing information between annotators,which represent processing tasks thatoperate on a document. An annotationpipeline iteratively executes the tasks.

Geoparsing Modules unify the geopars-ing process. For each step in the pro-cess, a module interface is defined thatspecifies expected in- and output and de-cides when the step can be executed.

A Gazetteer serves as a knowledge basefor the geoparsing modules. It employs ageneric gazetteer data model, which pro-vides a wide spectrum of place informa-tion. A flexible query system allows toefficiently search the gazetteer for placeswith certain characteristics.

This architecture forms a framework forgeoparsing that can be easily extendedand customized.

Getting ready for practical use:Include more modules for state-of-the-artmethods, e.g., co-occurrence based to-ponym disambiguation[2].

Further extensions of HeidelPlace:•Quantitative evaluation framework•Gazetteer web service•UIMA component•Gazetteer data editor

Gazetteer Browsing

Desktop App Gazetteer Viewer:•Selection of search filters•Detailed view of place information• Intuitive navigation and visualization

Performing & Analyzing Geoparsing

Desktop App Geoparser Viewer:•Text input provided by user•Selection of (pre-configured) modules•Step-by-step geoparsing•Output visualization for each step• Interactive exploration of geoparsing

Comparing Geoparsing Methods

Desktop App Geoparser Viewer:•Run multiple configurations at once•Demonstrate handling of corner cases•Understand differences of methods•Experiment with module combinations

Contact Information:Ludwig [email protected]://event.ifi.uni-heidelberg.de

References[1 ] C. D. Manning, et al.: The Stanford CoreNLP Natural Language Processing Toolkit. ACL’14, 2014[2 ] A. Spitz, J. Geiß, and M. Gertz: So Far Away and Yet so Close: Augmenting Toponym Disam-

biguation and Similarity With Text-Based Networks. GeoRich’16, 2016

This work was presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP ’17), Sept. 7-11, 2017, Copenhagen, Denmark.