qualità dei dati openstreetmap: sperimentazioni sulla città di milano e risultati

Post on 12-Jan-2017

265 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Qualità dei dati OpenStreetMap:sperimentazioni sulla città di Milano e risultati

Marco Minghini & Monia Elisa Molinari

Politecnico di Milano | GEOlab

OSMit 2016Milano, 20 maggio 2016

Outline✔ We will present some ongoing research works on OpenStreetMap:

➔ An automated procedure to compare OSM and authoritative road network datasets

➔ An automated methodology for converting OSM data into a Land Use/Cover map

➔ Positional accuracy assessment of the OSM buildings in Milan

➔ Open geodata quality in Milan

GEOlab, Como Campus

An automated procedure to compare OSM and authoritative road network datasets

Politecnico di Milano, Como Campus, DICA, via Valleggio 11, 22100 Como (Italy)

Monia Elisa Molinari, Marco Minghini, Maria Antonia Brovelli

2

✔ Increasing popularity of OpenStreetMap (OSM) as today's most notable Volunteered Geographic Information (VGI) project on the Internet

Motivation of the work – VGI & OSM quality

GEOlab, Politecnico di Milano – Como Campus

✔ Increasing concern on VGI (and OSM) data quality:➔ spatial accuracy➔ completeness➔ temporal accuracy➔ semantic accuracy➔ up-to-dateness

✔ Increasing availability of open data from NMAs and CSC that can be used as a source of comparison for VGI (and OSM) data:

➔ comparing two spatial datasets against each other is a challenging geocomputation problem!

3

✔ Literature provides plenty of works assessing or comparing OSM quality against that of authoritative datasets:

Motivation of the work – OSM comparisons

➔ strongly focused on road network➔ OSM compared to data from NMA (UK Ordnance Survey, French

NMA, USGS TNM/TIGER, etc.) and CSC (Navteq, TeleAtlas, etc.)➔ semi- or fully-automated

✔ Comparison techniques are very strong and fit for purpose, but mostly application and dataset specific:

➔ hard to replicate➔ difficult to extend to other dataset comparisons

GEOlab, Politecnico di Milano – Como Campus

4

✔ Novel methodology to compare OSM and authoritative road datasets:

Our methodology

➔ fully automated➔ focused on spatial accuracy and completeness➔ flexible, i.e. not developed for a specific dataset

➔ built with FOSS4G (Free and Open Source Software for Geospatial)✗ reusable and extensible in case of need

GEOlab, Politecnico di Milano – Como Campus

5

➔ 1. Preliminary comparison of the datasets and computation of global statistics

➔ 2. Geometric preprocessing of the OSM dataset to extract a subset which is fully comparable with the reference dataset

➔ 3. Evaluation of OSM spatial accuracy using a grid-based approach

✔ Currently developed as 3 GRASS GIS modules:

Our methodology – Overview

➔ written in Python➔ available with a Graphical User Interface (GUI)

✔ Comparison between OSM and reference road network datasets composed of 3 consecutive steps:

GEOlab, Politecnico di Milano – Como Campus

6

Case study: Paris

GEOlab, Politecnico di Milano – Como Campus

data © IGN and © OpenStreetMap contributors

7

✔ Compute the total length of OSM and IGN datasets and their length difference, both in map units and percentage

Step 1: Preliminary comparison of the datasets

➔ output values are returned in a text file

GEOlab, Politecnico di Milano – Como Campus

✗ ≅450 km more in OSM than IGN dataset!

8

✔ Compute the length of OSM and IGN datasets contained in a user-specified buffer around the IGN and OSM dataset, respectively

Step 1: Preliminary comparison of the datasets

➔ output values are returned in a text file

GEOlab, Politecnico di Milano – Como Campus

✗ ≅450 km more in OSM than IGN dataset!

9

✔ Cleaning of OSM dataset to make it comparable with IGN dataset

Step 2: preprocessing of the OSM dataset

GEOlab, Politecnico di Milano – Como Campus

✔ Apply a buffer of user-specified width around each IGN segment

➔ suitable buffer width derived from Step 1➔ delete all the OSM roads falling outside the buffer

10

Step 2: preprocessing of the OSM dataset

➔ compute the angular coefficient of each IGN segment and all the OSM segments included in the buffer around it

➔ compare the difference between IGN and OSM angular coefficients with a user-specified threshold

✔ Further clean the OSM dataset:

GEOlab, Politecnico di Milano – Como Campus

11

Step 2: preprocessing of the OSM dataset

✔ Further clean the OSM dataset:

GEOlab, Politecnico di Milano – Como Campus

12

✔ Outputs from Step 2 are saved and can be used for further analysis:

Step 2: preprocessing of the OSM dataset

➔ Area 2: buffer = 11 m, angular coefficient threshold = 30°

GEOlab, Politecnico di Milano – Como Campus

✗ preprocessed OSM has 50 km less than original OSM≅✗ preprocessed OSM has still 50 km more than IGN≅

13

✔ Use a grid to take into account OSM heterogeneous nature:

Step 3: grid-based evaluation of OSM accuracy

➔ import a vector layer to be used as grid

GEOlab, Politecnico di Milano – Como Campus

14

✔ For each grid cell, find the OSM maximum deviation from IGN:

Step 3: grid-based evaluation of OSM accuracy

GEOlab, Politecnico di Milano – Como Campus

➔ Area 2: perc=98%

5 - 6 m

6 - 7 m

7 - 8 m

8 - 9 m

9 - 10 m

10 - 11 m

15

✔ For each grid cell, evaluate OSM accuracy against one or more threshold values of OSM deviation from IGN:

Step 3: grid-based evaluation of OSM accuracy

➔ length percentage of OSM roads included in the threshold buffer➔ Area 2: tol_eval = 6 m

GEOlab, Politecnico di Milano – Como Campus

85 - 90%90 - 95%95 - 100%

16

✔ Work in progress, currently available just for Step 1

Transposition of the procedure as a WPS

GEOlab, Politecnico di Milano – Como Campus

➔ available at http://131.175.143.84/WPS

17

Future work

➔ reduce computational time (especially for Step 2)➔ increase usability through a WPS implementation (also for Steps 2 & 3)➔ extend the procedure to also compare attributes

GEOlab, Politecnico di Milano – Como Campus

18

Links & publications

GEOlab, Politecnico di Milano – Como Campus

✔ Related publications:➔ Brovelli M. A., Minghini M., Molinari M. and Mooney P (in press).

Towards an automated comparison of OpenStreetMap with authoritative road datasets. Transactions in GIS.

➔ Antunes F., Fonte C. C., Brovelli M. A., Minghini M., Molinari M. and Mooney P. (2015) Assessing OSM Road Positional Quality with Authoritative Data. Proceedings of the VIII Conferência Nacional de Cartografia e Geodesia, Lisbon (Portugal), October 29-30, 2015.

➔ Brovelli M. A., Minghini M., Molinari M. and Mooney P. (2015) A FOSS4G-based procedure to compare OpenStreetMap and authoritative road network datasets. Geomatics Workbooks 12, 235-238, ISSN 1591-092X.

✔ Links:

➔ source code: https://github.com/MoniaMolinari/OSM-roads-comparison ➔ WPS client: http://131.175.143.84/WPS

An automated methodology for convertingOSM data into a Land Use/Cover map

Marco Minghini, Cidália Fonte, Vyron Antoniou, Linda See, Joaquim Patriarca, Maria Antonia Brovelli, Grega Milcinski

EU COST Actions on VGI✔ COST (Cooperation in Science and Technology) is an EU framework to

provide networking of nationally funded research activities.✔ We are currently involved in 2 EU COST Actions on Volunteered

Geographic Information (VGI):

➔ COST Action TD1202 - “Mapping and the Citizen Sensor” (http://www.citizensensor-cost.eu)

➔ COST Action IC1203 - “European Network Exploring Research into Geospatial Information Crowdsourcing: software and methodologies for harnessing geographic information from the crowd (ENERGIC)” (http://vgibox.eu)

Land Use/Land Cover (LULC) maps✔ LULC maps are crucial products for multiple areas of application:

➔ the creation and updating process is long, costly, and time-consuming – insufficient to describe rapidly-changing environments

➔ level of detail and spatial coverage inadequate for many applications

✔ LULC maps are created through the classification of satellite imagery

and validated using reference data:

➔ modeling climate and biochemistry of the Earth

➔ natural resources managem

➔ planning/urban studies

➔ many others

OSM as a source of LULC maps✔ Exploiting OSM as a source for LULC maps has a number of advantages:

✔ Exploiting OSM as a source for LULC maps has some disadvantages:

➔ OSM full spatial coverage in the world

➔ OSM richness

➔ OSM non-stop updating

➔ OSM open license

➔ OSM uneven spatial coverage

➔ OSM positional accuracy & geometrical inconsistencies

➔ OSM semantic inconsistencies

✔ Purpose: creating an automated procedure which converts OSM data in

a specific area into a LULC map

OSM as a source of LULC maps

Level 1 Level 2 Level 3

1. Artificial Surfaces 1.1 Urban Fabric1.1.1 Continuous urban fabric1.1.2 Discontinuous urban fabric1.1.3 Isolated Structures

1.2 Industrial, commercial, public, military, private and transport units

1.2.1 Industrial, commercial, public, military and private units

1.2.2 Road and rail network and associated land1.2.3 Port areas1.2.4 Airports

1.3 Mine, dump and construction sites

1.3.1 Mineral extraction and dump sites1.3.3 Construction sites1.3.4 Land without current use

1.4 Artificial non-agricultural vegetated areas

1.4.1 Green urban areas1.4.2 Sports and leisure facilities

2. Agricultural, semi-natural areas, wetlands

3. Forests

5. Water

✔ The nomenclature chosen for LULC classes is the one of EU Urban Atlas:

Procedure to convert OSM to LULC maps✔ The procedure consists of transforming available OSM features into

LULC corresponding to Urban Atlas (UA) levels 1 and 2

✔ At this initial stage of the procedure, the only keys considered are:

➔ OSM ways (polylines)

➔ this is done considering any OSM data that might be associated with classes in all three levels of the UA nomenclature

✗ highway✗ railway✗ waterway

➔ OSM ways (polygons)

✗ building✗ landuse✗ natural

✔ The procedure is implemented in Python and makes use of GRASS GIS,

GDAL/OGR Python bindings and PostgreSQL/PostGIS.

Procedure to convert OSM to LULC maps✔ The main steps of the procedure are:

➔ identification of the keys available in the OSM data to be processed

➔ conversion of the linear features into areas using spatial analysis, and merging them with areas in the polygon features that have values of predefined keys corresponding to the themes of the linear features

➔ conversion of the polygon features to the LUCC according to themes

➔ application of priority rules to solve remaining inconsistencies

➔ conversion of the map into the appropriate Minimum Mapping Unit (MMU), merging small features with their neighboring features

Example: roads (highway=*) UA class 1.2→✔ The main steps of the conversion are:

➔ choose the values of “highway” key to be considered for the conversion

➔ identify a maximum and typical width for each road type in the region of interest (typically larger for primary roads and smaller for other less important roads); predefined default values may also be considered

➔ compute the distance between each road segment and the buildings within the maximum defined width of roads; store the minimum value

➔ generate an area feature for each road segment where the distance to buildings is larger than zero, using the width to generate a buffer; for those segments where the distance to buildings was not obtained, the typical width chosen above is applied

➔ merge and dissolve the created buffers

Procedure to convert OSM to LULC maps✔ Priority rules are defined to solve remaining inconsistencies:

➔ some classes prevail on other according to a hierarchy of importance

Level of priority

UA Class (level 2)

Class Description

1 1.2Industrial, commercial, public,

military, private and transport units

2 5.0 Water

3 1.4Artificial non-agricultural

vegetated areas

4 1.3 Mine, dump and construction sites

5 1.1 Urban Fabric

6 2.0Agricultural, semi-natural areas,

wetlands

7 3.0 Forests

Case studies✔ Two squared areas of 10 km on one side for a total area of 100 km2:

➔ 60 km NW of Paris – agriculture and forest region, low urban density

➔ Milan city centre – very dense urban area, small agricultural and rural areas, parks and a cemetery

Results✔ Paris

Results✔ Milan

Conclusions✔ Results are generally satisfactory:

➔ very good correspondence with UA classes

➔ OSM advantages: up-to-dateness, higher resolution data

✔ There are a number of problems:➔ empty areas in the final LULC map

✗ no OSM data at all✗ use of GeoFabrik data, where some data are missing✗ many tags not (yet) taken into account

➔ values chosen for buffers can be highly dependent on specific regions

✔ Next/future developments:➔ switch to the Planet OSM file

➔ increase/detail the number of OSM tags considered

➔ implementation of a Web service to expose the procedure on the Web

Reference✔ Fonte C.C., Minghini M., Antoniou V., See L., Patriarca J., Brovelli M.A. &

Milcinski, G. (in press) “An automated methodology for converting OSM

data into a Land Use/Cover map”. Proceedings of the 6th International

Conference on Cartography & GIS, Albena, Bulgaria, 13-17 June 2016.

Positional accuracy assessment of the OSM buildings in Milan

Monia Elisa Molinari, Marco Minghini, Maria Antonia Brovelli, Giorgio Zamboni

Politecnico di Milano | GEOlab

Introduction✔ Earliest works on OSM quality assessment were all focused on

streets, that were the initial mapping target of the OSM project.

✔ In recent years, researches on other objects of OSM database began to

appear:✔ point of interest✔ land use features✔ building footprints

Objective✔ The purpose of this study is to contribute to the quality assessment of

OSM building data quality in Milan.

✔ The quality assessment has been performed by comparing the OSM

data (downloaded in January 2016) with the building layer of the

official vector cartography of Milan Municipality (produced in 2012)

✔ Two different analyses have been performed:

✔ Completeness evaluation based on methods suggested by

literature

✔ Positional accuracy evaluation by means of a novel approach

developed at GEOlab - PoliMI

Completeness assessment✔ The completeness analysis has been performed through the area ratio

unit-based method proposed by Hecht et al. (2013)

C = AOSM

/AREF

C = completeness

AREF

= total area of reference buildings

AOSM

= total area of OSM buildings

The completeness parameter should be calculated within a

defined spatial unit (e.g. administrative or geometrical) in order

to take into account the heterogeneity of OSM data.

Completeness assessment● The area ratio method can introduce an overestimation of C due to

exceeding data available in OSM. For this reason the computation of three additional rates is recommended:

● True Positive (TP): the areas of agreement between the datasets

● False Positive (FP): the OSM building areas which do not exist in the REF dataset

● False Negative (FN): the REF building areas which do not exist in the OSM dataset.

Completeness analysis results✔ Spatial distribution of C rate in Milan area

The completeness of the OSM dataset is very high in the city center and gradually decreases when moving towards the periphery.

C VALUES %

> 100% 28.9%

80% < C < 100% 27.7%

60% < C < 80% 18.5%

40% < C < 60% 8.0%

C < 40% 16.9%

TP analysis results✔ Spatial distribution of TP rate in Milan area

Results largely confirm the trend observed for C: OSM completeness is high in the city center and gradually lower in the peripheral areas

TP VALUES %

60% < C < 100% 63.9%

60% < C < 40% 16.0%

C < 40% 20.1%

Positional accuracy assessment✔ An advanced algorithm implemented in an application (Brovelli and

Zamboni, 2004) allowed the:✔ Quasi-automated detection of homologous points between REF

and OSM by means of geometric, topological and semantic

analyses.

Positional accuracy assessment✔ An advanced algorithm implemented in an application (Brovelli and

Zamboni, 2004) allowed the:✔ Quasi-automated detection of homologous points between REF

and OSM by means of geometric, topological and semantic

analyses.✔ Application of a set of warping transformations to the OSM dataset

in order to optimize its match with the REF dataset

Positional accuracy analysis results✔ The number of homologous pairs detected is approximately 100000

Cell Points Trasf.ΔY

μ [m]ΔX

μ [m]d

μ [m]

0 19135 None 0.45 0.46 0.81

1-2 16480 None 0.35 0.46 0.77

2 18732 None 0.44 0.43 0.79

3-1 4318 None 0.28 0.41 0.71

The positional accuracy is the same in

both Milan center and periphery

Positional accuracy analysis results✔ Using the homologous points detected it is possible to estimate the

parameters of an affine or spline MR transformation to remove the

systematic translation and reduce the mean distance

Cell Points Trasf.ΔY

μ [m]ΔX

μ [m]d

μ [m]

0 19135

None 0.45 0.46 0.81

Affine 0.00 0.00 0.56

Spline MR 0.00 0.00 0.50

1-2 16480

None 0.35 0.46 0.77

Affine 0.00 0.00 0.57

Spline MR 0.00 0.00 0.50

2 18732

None 0.44 0.43 0.79

Affine 0.00 0.00 0.55

Spline MR 0.00 0.00 0.49

3-1 4318

None 0.28 0.41 0.71

Affine 0.00 0.00 0.55

Spline MR 0.00 0.00 0.48

References✔ Brovelli M.A., Minghini M., Molinari M.E. & Zamboni G. (in press)

“Positional accuracy assessment of the OpenStreetMap buildings layer

through automatic homologous pair detection: the method and a case

study”. International Archives of the Photogrammetry, Remote Sensing

and Spatial Information Sciences. ✔ Brovelli M.A., Zamboni G., 2004. A step towards geographic

interoperability: the automatic detection of maps homologous pairs. In:

Proceedings of Urban Data Management Society Conference (UDMS

’04), Chioggia, Italy, 27-29 October 2004.✔ Hecht, C., Kunze, C., Hahmann, S., 2013. Measuring Completeness of

Building Footprints in OpenStreetMap over Space and Time. ISPRS

International Journal of Geo-Information, 2(4), pp. 1066-1091.

Open geodata quality in MilanMarco Minghini, Monia Elisa Molinari, Miriam

Molteni, Maria Antonia Brovelli

Politecnico di Milano | GEOlab

Open (geo)data✔ Open data (http://opendefinition.org):

✔ On a total of 13 domains of open governance data, the geospatial one

has been recognized as the one with the highest commercial value

➔ permissions: use, modification, separation, redistribution, compilation, non-discrimination, propagation, application to any purpose, no charge

➔ conditions: attribution, integrity, share-alike, notice, source, technical restriction prohibition, non-aggression

(Carrara, W., Chan, W. S., Fischer, S., van Steenbergen, E., 2015. Creating Value through Open Data: Study

on the Impact of Re-use of Public Data Resources. European Commission, European Data Portal, http://www.europeandataportal.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf

✔ Which is the quality of open (geo)data?

➔ focus on Milan Municipality (only data published by Italian institutions)

➔ quality check performed using ISO guidelines

Open geodata in Milan Municipality✔ Classification according to the provider:

Open geodata in Milan Municipality✔ Classification according to the content:

Open geodata in Milan Municipality✔ Classification according to the format:

Open geodata in Milan Municipality✔ Classification according to the scale:

Open geodata in Milan Municipality✔ Classification according to the license:

Open geodata in Milan Municipality✔ Classification according to the year of publication:

Open geodata in Milan Municipality✔ Classification according to the content vs. the provider:

Example of quality evaluation✔ Example of quality (positional accuracy) evaluation:

➔ dataset: orthophoto

➔ provider: Italian Environmental Ministry

➔ producer: AGEA (Agenzia per le Erogazioni in Agricoltura)

➔ content: airborne observations

➔ format: Web Map Service (WMS)

➔ scale: national

➔ license: CC-BY-SA

➔ year of publication: 2012

➔ declared planimetric accuracy: 4 m (3 m, according to AGEA)

✔ Ground truth dataset for quality check:

➔ buildings DBTR Milan Municipality (2012): scale 1:1000, accuracy 20 cm

Example of quality evaluation✔ Procedure:

➔ extract DBTR buildings according to a random stratified sampling on a hexagonal grid (ISO guidelines)

➔ identification of the corresponding buildings on the orthophoto and manual digitization of 3 corners of the roof of each building

➔ computation of accuracy measures on the homologous points

✗ statistics on planimetric errors✗ measures of error confidence

Example of quality evaluation✔ Statistics on planimetric errors (number of homologous pairs = 1450):

Index eX

eY e

μ 1.66 m 3.65 m 4.32 m

Me 1.04 m 2.19 m 3.03 m

σ 2.49 m 5.32 m 4.24 m

min 0.00 m 0.00 m 0.04 m

max 13.03 m 29.70 m 29.94 m

n 245 615 758

Example of quality evaluation✔ Measures of error confidence:

Index Value

CE39.4 4.15 m

CE50 4.89 m

CE90 8.91 m

CE95 10.16 m

CE99.8 14.53 m

Example of quality evaluation✔ Error visualization:

Example of quality evaluation✔ Error visualization:

Reference✔ Brovelli M.A., Minghini M., Molinari M.E. & Molteni M. (in press) “Do

open geodata actually have the quality they declare? The case study of

Milan, Italy”. International Archives of the Photogrammetry, Remote

Sensing and Spatial Information Sciences.

Politecnico di Milano, GEOlab – Como Campus

Via Valleggio 11, 22100 Como (Italy)

marco.minghini@polimi.it, moniaelisa.molinari@polimi.it

@MarcoMinghini, @MoniaMolinari

Marco Minghini & Monia Elisa Molinari

top related