qualità dei dati openstreetmap: sperimentazioni sulla città di milano e risultati
TRANSCRIPT
Qualità dei dati OpenStreetMap:sperimentazioni sulla città di Milano e risultati
Marco Minghini & Monia Elisa Molinari
Politecnico di Milano | GEOlab
OSMit 2016Milano, 20 maggio 2016
Outline✔ We will present some ongoing research works on OpenStreetMap:
➔ An automated procedure to compare OSM and authoritative road network datasets
➔ An automated methodology for converting OSM data into a Land Use/Cover map
➔ Positional accuracy assessment of the OSM buildings in Milan
➔ Open geodata quality in Milan
GEOlab, Como Campus
An automated procedure to compare OSM and authoritative road network datasets
Politecnico di Milano, Como Campus, DICA, via Valleggio 11, 22100 Como (Italy)
Monia Elisa Molinari, Marco Minghini, Maria Antonia Brovelli
2
✔ Increasing popularity of OpenStreetMap (OSM) as today's most notable Volunteered Geographic Information (VGI) project on the Internet
Motivation of the work – VGI & OSM quality
GEOlab, Politecnico di Milano – Como Campus
✔ Increasing concern on VGI (and OSM) data quality:➔ spatial accuracy➔ completeness➔ temporal accuracy➔ semantic accuracy➔ up-to-dateness
✔ Increasing availability of open data from NMAs and CSC that can be used as a source of comparison for VGI (and OSM) data:
➔ comparing two spatial datasets against each other is a challenging geocomputation problem!
3
✔ Literature provides plenty of works assessing or comparing OSM quality against that of authoritative datasets:
Motivation of the work – OSM comparisons
➔ strongly focused on road network➔ OSM compared to data from NMA (UK Ordnance Survey, French
NMA, USGS TNM/TIGER, etc.) and CSC (Navteq, TeleAtlas, etc.)➔ semi- or fully-automated
✔ Comparison techniques are very strong and fit for purpose, but mostly application and dataset specific:
➔ hard to replicate➔ difficult to extend to other dataset comparisons
GEOlab, Politecnico di Milano – Como Campus
4
✔ Novel methodology to compare OSM and authoritative road datasets:
Our methodology
➔ fully automated➔ focused on spatial accuracy and completeness➔ flexible, i.e. not developed for a specific dataset
➔ built with FOSS4G (Free and Open Source Software for Geospatial)✗ reusable and extensible in case of need
GEOlab, Politecnico di Milano – Como Campus
5
➔ 1. Preliminary comparison of the datasets and computation of global statistics
➔ 2. Geometric preprocessing of the OSM dataset to extract a subset which is fully comparable with the reference dataset
➔ 3. Evaluation of OSM spatial accuracy using a grid-based approach
✔ Currently developed as 3 GRASS GIS modules:
Our methodology – Overview
➔ written in Python➔ available with a Graphical User Interface (GUI)
✔ Comparison between OSM and reference road network datasets composed of 3 consecutive steps:
GEOlab, Politecnico di Milano – Como Campus
6
Case study: Paris
GEOlab, Politecnico di Milano – Como Campus
data © IGN and © OpenStreetMap contributors
7
✔ Compute the total length of OSM and IGN datasets and their length difference, both in map units and percentage
Step 1: Preliminary comparison of the datasets
➔ output values are returned in a text file
GEOlab, Politecnico di Milano – Como Campus
✗ ≅450 km more in OSM than IGN dataset!
8
✔ Compute the length of OSM and IGN datasets contained in a user-specified buffer around the IGN and OSM dataset, respectively
Step 1: Preliminary comparison of the datasets
➔ output values are returned in a text file
GEOlab, Politecnico di Milano – Como Campus
✗ ≅450 km more in OSM than IGN dataset!
9
✔ Cleaning of OSM dataset to make it comparable with IGN dataset
Step 2: preprocessing of the OSM dataset
GEOlab, Politecnico di Milano – Como Campus
✔ Apply a buffer of user-specified width around each IGN segment
➔ suitable buffer width derived from Step 1➔ delete all the OSM roads falling outside the buffer
10
Step 2: preprocessing of the OSM dataset
➔ compute the angular coefficient of each IGN segment and all the OSM segments included in the buffer around it
➔ compare the difference between IGN and OSM angular coefficients with a user-specified threshold
✔ Further clean the OSM dataset:
GEOlab, Politecnico di Milano – Como Campus
11
Step 2: preprocessing of the OSM dataset
✔ Further clean the OSM dataset:
GEOlab, Politecnico di Milano – Como Campus
12
✔ Outputs from Step 2 are saved and can be used for further analysis:
Step 2: preprocessing of the OSM dataset
➔ Area 2: buffer = 11 m, angular coefficient threshold = 30°
GEOlab, Politecnico di Milano – Como Campus
✗ preprocessed OSM has 50 km less than original OSM≅✗ preprocessed OSM has still 50 km more than IGN≅
13
✔ Use a grid to take into account OSM heterogeneous nature:
Step 3: grid-based evaluation of OSM accuracy
➔ import a vector layer to be used as grid
GEOlab, Politecnico di Milano – Como Campus
14
✔ For each grid cell, find the OSM maximum deviation from IGN:
Step 3: grid-based evaluation of OSM accuracy
GEOlab, Politecnico di Milano – Como Campus
➔ Area 2: perc=98%
5 - 6 m
6 - 7 m
7 - 8 m
8 - 9 m
9 - 10 m
10 - 11 m
15
✔ For each grid cell, evaluate OSM accuracy against one or more threshold values of OSM deviation from IGN:
Step 3: grid-based evaluation of OSM accuracy
➔ length percentage of OSM roads included in the threshold buffer➔ Area 2: tol_eval = 6 m
GEOlab, Politecnico di Milano – Como Campus
85 - 90%90 - 95%95 - 100%
16
✔ Work in progress, currently available just for Step 1
Transposition of the procedure as a WPS
GEOlab, Politecnico di Milano – Como Campus
➔ available at http://131.175.143.84/WPS
17
Future work
➔ reduce computational time (especially for Step 2)➔ increase usability through a WPS implementation (also for Steps 2 & 3)➔ extend the procedure to also compare attributes
GEOlab, Politecnico di Milano – Como Campus
18
Links & publications
GEOlab, Politecnico di Milano – Como Campus
✔ Related publications:➔ Brovelli M. A., Minghini M., Molinari M. and Mooney P (in press).
Towards an automated comparison of OpenStreetMap with authoritative road datasets. Transactions in GIS.
➔ Antunes F., Fonte C. C., Brovelli M. A., Minghini M., Molinari M. and Mooney P. (2015) Assessing OSM Road Positional Quality with Authoritative Data. Proceedings of the VIII Conferência Nacional de Cartografia e Geodesia, Lisbon (Portugal), October 29-30, 2015.
➔ Brovelli M. A., Minghini M., Molinari M. and Mooney P. (2015) A FOSS4G-based procedure to compare OpenStreetMap and authoritative road network datasets. Geomatics Workbooks 12, 235-238, ISSN 1591-092X.
✔ Links:
➔ source code: https://github.com/MoniaMolinari/OSM-roads-comparison ➔ WPS client: http://131.175.143.84/WPS
An automated methodology for convertingOSM data into a Land Use/Cover map
Marco Minghini, Cidália Fonte, Vyron Antoniou, Linda See, Joaquim Patriarca, Maria Antonia Brovelli, Grega Milcinski
EU COST Actions on VGI✔ COST (Cooperation in Science and Technology) is an EU framework to
provide networking of nationally funded research activities.✔ We are currently involved in 2 EU COST Actions on Volunteered
Geographic Information (VGI):
➔ COST Action TD1202 - “Mapping and the Citizen Sensor” (http://www.citizensensor-cost.eu)
➔ COST Action IC1203 - “European Network Exploring Research into Geospatial Information Crowdsourcing: software and methodologies for harnessing geographic information from the crowd (ENERGIC)” (http://vgibox.eu)
Land Use/Land Cover (LULC) maps✔ LULC maps are crucial products for multiple areas of application:
➔ the creation and updating process is long, costly, and time-consuming – insufficient to describe rapidly-changing environments
➔ level of detail and spatial coverage inadequate for many applications
✔ LULC maps are created through the classification of satellite imagery
and validated using reference data:
➔ modeling climate and biochemistry of the Earth
➔ natural resources managem
➔ planning/urban studies
➔ many others
OSM as a source of LULC maps✔ Exploiting OSM as a source for LULC maps has a number of advantages:
✔ Exploiting OSM as a source for LULC maps has some disadvantages:
➔ OSM full spatial coverage in the world
➔ OSM richness
➔ OSM non-stop updating
➔ OSM open license
➔ OSM uneven spatial coverage
➔ OSM positional accuracy & geometrical inconsistencies
➔ OSM semantic inconsistencies
✔ Purpose: creating an automated procedure which converts OSM data in
a specific area into a LULC map
OSM as a source of LULC maps
Level 1 Level 2 Level 3
1. Artificial Surfaces 1.1 Urban Fabric1.1.1 Continuous urban fabric1.1.2 Discontinuous urban fabric1.1.3 Isolated Structures
1.2 Industrial, commercial, public, military, private and transport units
1.2.1 Industrial, commercial, public, military and private units
1.2.2 Road and rail network and associated land1.2.3 Port areas1.2.4 Airports
1.3 Mine, dump and construction sites
1.3.1 Mineral extraction and dump sites1.3.3 Construction sites1.3.4 Land without current use
1.4 Artificial non-agricultural vegetated areas
1.4.1 Green urban areas1.4.2 Sports and leisure facilities
2. Agricultural, semi-natural areas, wetlands
3. Forests
5. Water
✔ The nomenclature chosen for LULC classes is the one of EU Urban Atlas:
Procedure to convert OSM to LULC maps✔ The procedure consists of transforming available OSM features into
LULC corresponding to Urban Atlas (UA) levels 1 and 2
✔ At this initial stage of the procedure, the only keys considered are:
➔ OSM ways (polylines)
➔ this is done considering any OSM data that might be associated with classes in all three levels of the UA nomenclature
✗ highway✗ railway✗ waterway
➔ OSM ways (polygons)
✗ building✗ landuse✗ natural
✔ The procedure is implemented in Python and makes use of GRASS GIS,
GDAL/OGR Python bindings and PostgreSQL/PostGIS.
Procedure to convert OSM to LULC maps✔ The main steps of the procedure are:
➔ identification of the keys available in the OSM data to be processed
➔ conversion of the linear features into areas using spatial analysis, and merging them with areas in the polygon features that have values of predefined keys corresponding to the themes of the linear features
➔ conversion of the polygon features to the LUCC according to themes
➔ application of priority rules to solve remaining inconsistencies
➔ conversion of the map into the appropriate Minimum Mapping Unit (MMU), merging small features with their neighboring features
Example: roads (highway=*) UA class 1.2→✔ The main steps of the conversion are:
➔ choose the values of “highway” key to be considered for the conversion
➔ identify a maximum and typical width for each road type in the region of interest (typically larger for primary roads and smaller for other less important roads); predefined default values may also be considered
➔ compute the distance between each road segment and the buildings within the maximum defined width of roads; store the minimum value
➔ generate an area feature for each road segment where the distance to buildings is larger than zero, using the width to generate a buffer; for those segments where the distance to buildings was not obtained, the typical width chosen above is applied
➔ merge and dissolve the created buffers
Procedure to convert OSM to LULC maps✔ Priority rules are defined to solve remaining inconsistencies:
➔ some classes prevail on other according to a hierarchy of importance
Level of priority
UA Class (level 2)
Class Description
1 1.2Industrial, commercial, public,
military, private and transport units
2 5.0 Water
3 1.4Artificial non-agricultural
vegetated areas
4 1.3 Mine, dump and construction sites
5 1.1 Urban Fabric
6 2.0Agricultural, semi-natural areas,
wetlands
7 3.0 Forests
Case studies✔ Two squared areas of 10 km on one side for a total area of 100 km2:
➔ 60 km NW of Paris – agriculture and forest region, low urban density
➔ Milan city centre – very dense urban area, small agricultural and rural areas, parks and a cemetery
Results✔ Paris
Results✔ Milan
Conclusions✔ Results are generally satisfactory:
➔ very good correspondence with UA classes
➔ OSM advantages: up-to-dateness, higher resolution data
✔ There are a number of problems:➔ empty areas in the final LULC map
✗ no OSM data at all✗ use of GeoFabrik data, where some data are missing✗ many tags not (yet) taken into account
➔ values chosen for buffers can be highly dependent on specific regions
✔ Next/future developments:➔ switch to the Planet OSM file
➔ increase/detail the number of OSM tags considered
➔ implementation of a Web service to expose the procedure on the Web
Reference✔ Fonte C.C., Minghini M., Antoniou V., See L., Patriarca J., Brovelli M.A. &
Milcinski, G. (in press) “An automated methodology for converting OSM
data into a Land Use/Cover map”. Proceedings of the 6th International
Conference on Cartography & GIS, Albena, Bulgaria, 13-17 June 2016.
Positional accuracy assessment of the OSM buildings in Milan
Monia Elisa Molinari, Marco Minghini, Maria Antonia Brovelli, Giorgio Zamboni
Politecnico di Milano | GEOlab
Introduction✔ Earliest works on OSM quality assessment were all focused on
streets, that were the initial mapping target of the OSM project.
✔ In recent years, researches on other objects of OSM database began to
appear:✔ point of interest✔ land use features✔ building footprints
Objective✔ The purpose of this study is to contribute to the quality assessment of
OSM building data quality in Milan.
✔ The quality assessment has been performed by comparing the OSM
data (downloaded in January 2016) with the building layer of the
official vector cartography of Milan Municipality (produced in 2012)
✔ Two different analyses have been performed:
✔ Completeness evaluation based on methods suggested by
literature
✔ Positional accuracy evaluation by means of a novel approach
developed at GEOlab - PoliMI
Completeness assessment✔ The completeness analysis has been performed through the area ratio
unit-based method proposed by Hecht et al. (2013)
C = AOSM
/AREF
C = completeness
AREF
= total area of reference buildings
AOSM
= total area of OSM buildings
The completeness parameter should be calculated within a
defined spatial unit (e.g. administrative or geometrical) in order
to take into account the heterogeneity of OSM data.
Completeness assessment● The area ratio method can introduce an overestimation of C due to
exceeding data available in OSM. For this reason the computation of three additional rates is recommended:
● True Positive (TP): the areas of agreement between the datasets
● False Positive (FP): the OSM building areas which do not exist in the REF dataset
● False Negative (FN): the REF building areas which do not exist in the OSM dataset.
Completeness analysis results✔ Spatial distribution of C rate in Milan area
The completeness of the OSM dataset is very high in the city center and gradually decreases when moving towards the periphery.
C VALUES %
> 100% 28.9%
80% < C < 100% 27.7%
60% < C < 80% 18.5%
40% < C < 60% 8.0%
C < 40% 16.9%
TP analysis results✔ Spatial distribution of TP rate in Milan area
Results largely confirm the trend observed for C: OSM completeness is high in the city center and gradually lower in the peripheral areas
TP VALUES %
60% < C < 100% 63.9%
60% < C < 40% 16.0%
C < 40% 20.1%
Positional accuracy assessment✔ An advanced algorithm implemented in an application (Brovelli and
Zamboni, 2004) allowed the:✔ Quasi-automated detection of homologous points between REF
and OSM by means of geometric, topological and semantic
analyses.
Positional accuracy assessment✔ An advanced algorithm implemented in an application (Brovelli and
Zamboni, 2004) allowed the:✔ Quasi-automated detection of homologous points between REF
and OSM by means of geometric, topological and semantic
analyses.✔ Application of a set of warping transformations to the OSM dataset
in order to optimize its match with the REF dataset
Positional accuracy analysis results✔ The number of homologous pairs detected is approximately 100000
Cell Points Trasf.ΔY
μ [m]ΔX
μ [m]d
μ [m]
0 19135 None 0.45 0.46 0.81
1-2 16480 None 0.35 0.46 0.77
2 18732 None 0.44 0.43 0.79
3-1 4318 None 0.28 0.41 0.71
The positional accuracy is the same in
both Milan center and periphery
Positional accuracy analysis results✔ Using the homologous points detected it is possible to estimate the
parameters of an affine or spline MR transformation to remove the
systematic translation and reduce the mean distance
Cell Points Trasf.ΔY
μ [m]ΔX
μ [m]d
μ [m]
0 19135
None 0.45 0.46 0.81
Affine 0.00 0.00 0.56
Spline MR 0.00 0.00 0.50
1-2 16480
None 0.35 0.46 0.77
Affine 0.00 0.00 0.57
Spline MR 0.00 0.00 0.50
2 18732
None 0.44 0.43 0.79
Affine 0.00 0.00 0.55
Spline MR 0.00 0.00 0.49
3-1 4318
None 0.28 0.41 0.71
Affine 0.00 0.00 0.55
Spline MR 0.00 0.00 0.48
References✔ Brovelli M.A., Minghini M., Molinari M.E. & Zamboni G. (in press)
“Positional accuracy assessment of the OpenStreetMap buildings layer
through automatic homologous pair detection: the method and a case
study”. International Archives of the Photogrammetry, Remote Sensing
and Spatial Information Sciences. ✔ Brovelli M.A., Zamboni G., 2004. A step towards geographic
interoperability: the automatic detection of maps homologous pairs. In:
Proceedings of Urban Data Management Society Conference (UDMS
’04), Chioggia, Italy, 27-29 October 2004.✔ Hecht, C., Kunze, C., Hahmann, S., 2013. Measuring Completeness of
Building Footprints in OpenStreetMap over Space and Time. ISPRS
International Journal of Geo-Information, 2(4), pp. 1066-1091.
Open geodata quality in MilanMarco Minghini, Monia Elisa Molinari, Miriam
Molteni, Maria Antonia Brovelli
Politecnico di Milano | GEOlab
Open (geo)data✔ Open data (http://opendefinition.org):
✔ On a total of 13 domains of open governance data, the geospatial one
has been recognized as the one with the highest commercial value
➔ permissions: use, modification, separation, redistribution, compilation, non-discrimination, propagation, application to any purpose, no charge
➔ conditions: attribution, integrity, share-alike, notice, source, technical restriction prohibition, non-aggression
(Carrara, W., Chan, W. S., Fischer, S., van Steenbergen, E., 2015. Creating Value through Open Data: Study
on the Impact of Re-use of Public Data Resources. European Commission, European Data Portal, http://www.europeandataportal.eu/sites/default/files/edp_creating_value_through_open_data_0.pdf
✔ Which is the quality of open (geo)data?
➔ focus on Milan Municipality (only data published by Italian institutions)
➔ quality check performed using ISO guidelines
Open geodata in Milan Municipality✔ Classification according to the provider:
Open geodata in Milan Municipality✔ Classification according to the content:
Open geodata in Milan Municipality✔ Classification according to the format:
Open geodata in Milan Municipality✔ Classification according to the scale:
Open geodata in Milan Municipality✔ Classification according to the license:
Open geodata in Milan Municipality✔ Classification according to the year of publication:
Open geodata in Milan Municipality✔ Classification according to the content vs. the provider:
Example of quality evaluation✔ Example of quality (positional accuracy) evaluation:
➔ dataset: orthophoto
➔ provider: Italian Environmental Ministry
➔ producer: AGEA (Agenzia per le Erogazioni in Agricoltura)
➔ content: airborne observations
➔ format: Web Map Service (WMS)
➔ scale: national
➔ license: CC-BY-SA
➔ year of publication: 2012
➔ declared planimetric accuracy: 4 m (3 m, according to AGEA)
✔ Ground truth dataset for quality check:
➔ buildings DBTR Milan Municipality (2012): scale 1:1000, accuracy 20 cm
Example of quality evaluation✔ Procedure:
➔ extract DBTR buildings according to a random stratified sampling on a hexagonal grid (ISO guidelines)
➔ identification of the corresponding buildings on the orthophoto and manual digitization of 3 corners of the roof of each building
➔ computation of accuracy measures on the homologous points
✗ statistics on planimetric errors✗ measures of error confidence
Example of quality evaluation✔ Statistics on planimetric errors (number of homologous pairs = 1450):
Index eX
eY e
μ 1.66 m 3.65 m 4.32 m
Me 1.04 m 2.19 m 3.03 m
σ 2.49 m 5.32 m 4.24 m
min 0.00 m 0.00 m 0.04 m
max 13.03 m 29.70 m 29.94 m
n 245 615 758
Example of quality evaluation✔ Measures of error confidence:
Index Value
CE39.4 4.15 m
CE50 4.89 m
CE90 8.91 m
CE95 10.16 m
CE99.8 14.53 m
Example of quality evaluation✔ Error visualization:
Example of quality evaluation✔ Error visualization:
Reference✔ Brovelli M.A., Minghini M., Molinari M.E. & Molteni M. (in press) “Do
open geodata actually have the quality they declare? The case study of
Milan, Italy”. International Archives of the Photogrammetry, Remote
Sensing and Spatial Information Sciences.
Politecnico di Milano, GEOlab – Como Campus
Via Valleggio 11, 22100 Como (Italy)
[email protected], [email protected]
@MarcoMinghini, @MoniaMolinari
Marco Minghini & Monia Elisa Molinari