volunteering assistance to online geocoding services through a distributed knowledge solution

5
1 Introduction In recent years there exists a growing impulse in different areas like business, marketing or public management and services to position geospatial technologies as an efficient way to integrate, visualize and analyse spatial data to answer questions with a location perspective and to obtain knowledge. Geocoding process, consist on assigning a geographic coordinate pair to a particular place by comparing its descriptive location elements with those in a reference database [1, 2, 3]. Then it is needed to search in the reference data to assign a score to each potential candidate, filter them based on the minimum match score and deliver the best match [3]. In the last years, main companies that offer digital mapping services like Google, Yahoo or Here Maps and open-data mapping platforms like OpenStreetMap, are improving their web service technologies and APIs (Application Programming Interface) to tackle geocoding complexity and to make it transparent to end users. End users must analyse the quality of the geocoded results for each service to choose the best option to their applications [4] and data characteristics. Service providers are responsible for maintaining the reference matching data and for improving predefined algorithms, so the user cannot customize the geocoder settings or rules to manipulate the response according to their needs or specific input data. While geocoding services give immediate output with high match rates, basic user knowledge and low or no cost, sometimes their characteristics produce low quality results, mainly when we work with ambiguous input. Most of these online services provide a value of results quality, like the calculation method used or obtained entity type. These values can be helpful as a guideline to understand the output and for data quality assessment, complementing data quality common metrics like completeness, positional accuracy, repeatability [3] and similarity [4]. In the scenario of unstructured named places (addresses) as input, the variety of online geocoding services response can be an advantage. The platform presented in this paper proposes to combine and analyse different geocoders outputs as options for incomplete or imprecise data. Based on crowdsourcing geospatial data [5, 6] and Volunteered Geographic Information [7] approaches, the platform facilitates the online assistance of users to analyse quality and geographic precision in geocoding results. Identify and save the selected best candidate or geocode manually the address, relying on the cognitive abilities and local knowledge of the collaborators. An address or location represents the entry of a geocoding task and is completed by distributed users, therefore a comparative evaluation can be made using the platform database. The rest of the article is organized as follows. The next section describes the platform components and development; Section 3 presents the data management and flow within the platform and the user interaction. Finally, conclusions and future work are presented in Section 4. 2 Platform Development The main motivation to work on this platform is the difficulty to geolocate places and addresses stored in a relational database, that are ambiguous or with partial information. An address with missing components of the descriptive normalized location elements may produce that a geocoder algorithm misunderstands individual values and fails to find the correct geographic location. For example, a street number Volunteering assistance to online geocoding services through a distributed knowledge solution José Pablo Gómez-Barrón Universidad Politécnica de Madrid Camino de la Arboleda s/n Km 7 de la Carretera de Valencia 28031 Madrid, Spain [email protected] Miguel Ángel Manso Universidad Politécnica de Madrid Camino de la Arboleda s/n Km 7 de la Carretera de Valencia 28031 Madrid, Spain [email protected] Ramón Alcarria Universidad Politécnica de Madrid Camino de la Arboleda s/n Km 7 de la Carretera de Valencia 28031 Madrid, Spain [email protected] Abstract Geocoding process of unstructured or poor quality location addresses requires human supervision in order to obtain valuable data. Current availability of geocoding web-service technologies, enabling the deployment of collaborative applications and the existence of volunteering communities, has motivated the proposal of a platform to generate geocoding collaborative tasks relying on the available solutions in order to get accurate results in short-term. In this work we present the design and development of a tool that facilitates to volunteers the geocoding process. We implement some strategies to give the volunteers elaborated information about web-service geocoding results and the capacity to propose other positions different than those suggested. All the information is registered in database model to enable later analysis or accuracy studies. Keywords: Geocoding task, web services, VGI, crowdsourcing, collaborative platform.

Upload: jose-pablo-gomez-barron-sierra

Post on 03-Mar-2017

63 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: Volunteering assistance to online geocoding services through a distributed knowledge solution

1 Introduction

In recent years there exists a growing impulse in different areas like business, marketing or public management and services to position geospatial technologies as an efficient way to integrate, visualize and analyse spatial data to answer questions with a location perspective and to obtain knowledge. Geocoding process, consist on assigning a geographic coordinate pair to a particular place by comparing its descriptive location elements with those in a reference database [1, 2, 3]. Then it is needed to search in the reference data to assign a score to each potential candidate, filter them based on the minimum match score and deliver the best match [3].

In the last years, main companies that offer digital mapping services like Google, Yahoo or Here Maps and open-data mapping platforms like OpenStreetMap, are improving their web service technologies and APIs (Application Programming Interface) to tackle geocoding complexity and to make it transparent to end users. End users must analyse the quality of the geocoded results for each service to choose the best option to their applications [4] and data characteristics. Service providers are responsible for maintaining the reference matching data and for improving predefined algorithms, so the user cannot customize the geocoder settings or rules to manipulate the response according to their needs or specific input data.

While geocoding services give immediate output with high match rates, basic user knowledge and low or no cost, sometimes their characteristics produce low quality results, mainly when we work with ambiguous input. Most of these online services provide a value of results quality, like the calculation method used or obtained entity type. These values

can be helpful as a guideline to understand the output and for data quality assessment, complementing data quality common metrics like completeness, positional accuracy, repeatability [3] and similarity [4].

In the scenario of unstructured named places (addresses) as input, the variety of online geocoding services response can be an advantage. The platform presented in this paper proposes to combine and analyse different geocoders outputs as options for incomplete or imprecise data. Based on crowdsourcing geospatial data [5, 6] and Volunteered Geographic Information [7] approaches, the platform facilitates the online assistance of users to analyse quality and geographic precision in geocoding results. Identify and save the selected best candidate or geocode manually the address, relying on the cognitive abilities and local knowledge of the collaborators. An address or location represents the entry of a geocoding task and is completed by distributed users, therefore a comparative evaluation can be made using the platform database.

The rest of the article is organized as follows. The next section describes the platform components and development; Section 3 presents the data management and flow within the platform and the user interaction. Finally, conclusions and future work are presented in Section 4.

2 Platform Development

The main motivation to work on this platform is the difficulty to geolocate places and addresses stored in a relational database, that are ambiguous or with partial information. An address with missing components of the descriptive normalized location elements may produce that a geocoder algorithm misunderstands individual values and fails to find the correct geographic location. For example, a street number

Volunteering assistance to online geocoding services through a distributed knowledge solution

José Pablo Gómez-Barrón Universidad Politécnica de

Madrid Camino de la Arboleda s/n Km 7 de la Carretera de

Valencia 28031 Madrid, Spain

[email protected]

Miguel Ángel Manso Universidad Politécnica de

Madrid Camino de la Arboleda s/n Km 7 de la Carretera de

Valencia 28031 Madrid, Spain

[email protected]

Ramón Alcarria Universidad Politécnica de

Madrid Camino de la Arboleda s/n Km 7 de la Carretera de

Valencia 28031 Madrid, Spain

[email protected]

Abstract

Geocoding process of unstructured or poor quality location addresses requires human supervision in order to obtain valuable data. Current availability of geocoding web-service technologies, enabling the deployment of collaborative applications and the existence of volunteering communities, has motivated the proposal of a platform to generate geocoding collaborative tasks relying on the available solutions in order to get accurate results in short-term. In this work we present the design and development of a tool that facilitates to volunteers the geocoding process. We implement some strategies to give the volunteers elaborated information about web-service geocoding results and the capacity to propose other positions different than those suggested. All the information is registered in database model to enable later analysis or accuracy studies.

Keywords: Geocoding task, web services, VGI, crowdsourcing, collaborative platform.

Page 2: Volunteering assistance to online geocoding services through a distributed knowledge solution

RICH-VGI: enRICHment of volunteered geographic information (VGI) - AGILE 2015

could be considered a postal code or a street name could be interpreted as neighborhood or a locality. As a result, specific elements of a query to an online geocoding service will be incomplete and geocoded outputs may differ. To guarantee the quality of a geographic final layer, human interation to check and analyze the result is required.

In order to deal with these obstacles the platform uses a crowdsourcing approach to facilitate a distributed online participation by volunteered users to process an address, assisting the online geocoding services. Supported by a web mapping client to browse and explore contextual data and reference geographic layers, the user makes a comparison of the output locations and quality attributes provided by the services to choose the best option. Finally the platform saves the user-selected response to geocode the address, but also the coordinates and quality info of each geocoder with the aim to have a data model to enable further evaluation analysis of geocoders quality with redundant answers of the collaborators.

Based on the identified problem and presented objectives the platform implements two main components with some secondary or support elements. Figure 1 shows the model that corresponds with the platform capabilities.

The first main component is a management and user profile area where the users can create a geocoding task or register in a created task to participate. The second component is where the geocoding task takes place. This user interface relies on a short random list with the locations or addresses as input, and a map that enables the user to examine the different geocoders outputs.

The remainder platform support components are the landing page where project is described, presenting the objectives, utility and use instructions, and also the processes to register and sign into the platform since all tasks, actions and displayed data are linked to a particular user.

The platform is primary build on Python and JavaScript languages. We use Django high-level Python web framework to facilitate a clean design and a more organized application structure, with server-side Python models and defined functions to process POST and AJAX requests made by the HTML/JavaScript client. Moreover, the Django framework template language facilitates the use of variables inside template tags to easy pass output values and context data to render them on the web client. Also simplifies the platform security with the integration of authentication, registration and account management of the platform users by installing the “django-allauth” application within the project.

All data of users, tasks and task rewards are managed using Django models, a Python class that gives automatically

generated database-access API. The models that contain fields and behaviours of the data are related to a single database table implemented in PostgreSQL.

The server-side function related to process an address input to be geocoded, imports and employs the “Python Geocoder API” library. This Python wrapper client supports most popular geocoding web services and converts the different responses between each other into a consistent and unified JSON response. In our platform, Google, Bing, Here and OSM (Nominatim) providers are enabled and can be requested through this library to send the geocoders output to the client.

Figure 1: Crowd-geocoding platform component model

Page 3: Volunteering assistance to online geocoding services through a distributed knowledge solution

RICH-VGI: enRICHment of volunteered geographic information (VGI) - AGILE 2015

Regarding address and task result querying and storing, the platform use CartoDB geospatial database that internally is based on PostgreSQL and PostGIS to manage the geographic outputs. The client uses the SQL JavaScript API to obtain random addresses with the constraint that the user can process these addresses only once, so the SELECT query considers the login user id. To insert user-selected option and each geocoder output on CartoDB results table, the Python client API for CartoDB SQL service was used. This library supports OAuth 2.0 open standard, and the platform server-side scripts use it in a transparent way to the final user.

Finally the web client utilizes some JavaScript libraries like JQuery and Bootstrap for easier web development and Leaflet.js and Mapbox.js for the provision of interactive web maps. These maps are composed of reference base layers and overlay data generated with geographic coordinates given by the geocoder services.

3 Data Management and User Interaction

3.1 Platform-User Interaction

User interaction with the platform components is described in Figure 2 and the geocoding task process is related with the user interface elements presented in Figure 3.

Figure 2: Flowchart with main processes.

After user registration and login, users are directed to the user profile and task management area. Inside this area the user can administrate their information as a task creator or collaborator and begin tasks. Some actions enabled in this area are: create, explore and select tasks, upload and download address data, explore user and task progress and statistics, and manage rewards and communications with collaborators. Also a geocoding task can be edited or customized with specific data entries related to a project particular needs like the selection of geocoder providers, predefined geocoder area results filters, number of task answers needed for an address to be considered completed, or number of null task answers to consider an address invalid.

To begin the assisted geocoding process the user has to select an address or location from the listed options on the left panel area of Figure 3 and make a request to the server in order to geocode the selected input. Then, on center and right panels (Figure 3), respectively, the user explores and interacts with the locations on the map and compares the data quality attributes provided by the geocoding library. Based on these data and contextual information obtained by toggling the base map layers the user chooses the geocoding service that considers more accurate or suggests a new point in the map as geocoding output. 3.2 Data Management

In addition to the geocoded location three parameters are stored in the database: accuracy, quality and confidence. An example for accuracy values is the method used to calculate the location; If the result was approximated, interpolated or is a geometric center of a line or polygon area, or if corresponds to a precise correct match as a postal address. Regarding to the quality value, it represents the output match level or granularity of the match, related with a location entity type like street address or house number, route or street, intersection, neighbourhood, postal code, etc.

The third parameter to evaluate the geocoder output quality, confidence, is obtained from the OpenCage API calculation method that uses the bounding box data response from the each API to create a confidence range between 0 and 10. The confidence is calculated by measuring the distance in kilometres between the South West and North East corners of each resulted associated bounding box; smaller distances represent a high confidence while larger distances represent a lower confidence. In the geocoding platform, this parameter can help users to select the best output, as a normalized score result and comparable between the different geocoding services.

Page 4: Volunteering assistance to online geocoding services through a distributed knowledge solution

RICH-VGI: enRICHment of volunteered geographic information (VGI) - AGILE 2015

The platform benefits from the user evaluation tasks to compare the performance between the geocoding services storing all their information. The data model designed for this purpose is presented in Figure 4.

4 Conclusions and Future Work

We present a crowdsourced collaborative approach to deal with an actual problem in the use of online geocoding services. Our approach facilitates the user interaction to control and evaluate the accuracy of geocoded outputs relying in the amount of collaborators reviewing results and the combination of diverse reference sources to increase data availability.

With the task results database, users and researchers can generate descriptive statistics, make a comparative evaluation

or data quality assessment using common geocoder metrics like completeness, positional accuracy against base line data and similarity between services.

As future work, we will enable the possibility for the user to correct or modify the input address text to reduce the ambiguity of the entry to the geocoding process and iterate the process with the accumulative text editions. Also enable users, in the task creation, to indicate the entity or geographic feature that corresponds to the input address to geocode (e.g swimming pools), hence the user can identify the best accurate geocoder related to the searched physical object in the base map.

Figure 4: General data model.

Figure 3: Crowd-geocoding platform user interface

Page 5: Volunteering assistance to online geocoding services through a distributed knowledge solution

RICH-VGI: enRICHment of volunteered geographic information (VGI) - AGILE 2015

References

[1] H. A. Karimi, M. Durcik, and W. Rasdorf, “Evaluation of uncertainties associated with geocoding techniques,” Comput. Civ. Infrastruct. Eng., vol. 19, no. 3, pp. 170–185, 2004.

[2] D. W. Goldberg, J. P. Wilson, and C. a. Knoblock, “From Text to Geographic Coordinates: The Current State of Geocoding,” URISA J., vol. 19, pp. 33–46, 2007.

[3] P. A. Zandbergen, “A comparison of address point, parcel and street geocoding techniques,” Comput. Environ. Urban Syst., vol. 32, no. 3, pp. 214–232, 2008.

[4] D. Roongpiboonsopit and H. a. Karimi, “Comparative evaluation and analysis of online geocoding services,” Int. J. Geogr. Inf. Sci., vol. 24, no. April 2015, pp. 1081–1100, 2010.

[5] R. Hudson-Smith, A., Batty, M., Crooks, A., Milton, “Mapping for the masses: accessing web 2.0 through crowdsourcing.,” Soc. Sci. Comput. Rev., vol. 27 (4), pp. 524–538, 2009.

[6] C. Heipke, “Crowdsourcing geospatial data,” ISPRS J. Photogramm. Remote Sens., vol. 65, no. 6, pp. 550–557, Nov. 2010.

[7] M. F. Goodchild, “Citizens as sensors: the world of volunteered geography,” GeoJournal, vol. 69, no. 4, pp. 211–221, Nov. 2007.