deepvgi: deep learning with volunteered geographic informationpapers.… · information (vgi). vgi...

2
DeepVGI: Deep Learning with Volunteered Geographic Information Jiaoyan Chen Heidelberg University, Germany [email protected] Alexander Zipf Heidelberg University, Germany [email protected] ABSTRACT Recently, deep learning has been widely studied to recog- nize ground objects with satellite imageries. However, find- ing ground truths especially for developing and rural areas is quite hard and manually labeling a large set of training data is costly. In this work, we propose an ongoing research named DeepVGI which aims at deeply learning from satellite imageries with the supervision of Volunteered Geographic Information (VGI). VGI data from OpenStreetMap (OSM) and a crowdsourcing mobile application named MapSwipe which allows volunteers to label images with buildings or roads for humanitarian aids are utilized. Meanwhile, an ac- tive learning framework with deep neural networks is devel- oped by incorporating both VGI data with more complete supervision knowledge. Our experiments show that Deep- VGI can achieve high building detection performance for humanitarian mapping in rural African areas. Keywords Deep Learning; OpenStreetMap; VGI 1. INTRODUCTION Recently, machine learning especially deep learning al- gorithms like Convolutional Neural Networks (CNNs) are widely applied in satellite image classification for building detection, scene understanding, etc. However, such learn- ing methods usually rely on a large set of labeled samples for supervision and semi-supervision. Finding training sam- ples or manually labeling images for a specific domain like humanitarian mapping is costly. With the development of World Wide Web and crowd- sourcing, Volunteered Geographic Information (VGI) which is a special case of the larger Web phenomenon known as user-generated content starts to harvest big geographic data provided voluntarily by individuals [1]. VGI platforms like OpenStreetMap (OSM) provide a way for free labels for satellite image classification. Mnih et al. [2] proposed to extract vector data from OSM for supervised learning with deep networks. The study defined missing error (i.e., cases when an object that appears in an satellite image does not appear in the map) and registration error (i.e., cases when the location of an object in the map is inaccurate) of the c 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. WWW’17 Companion, April 3–7, 2017, Perth, Australia. ACM 978-1-4503-4914-7/17/04. http://dx.doi.org/10.1145/3041021.3054250 . map labels and developed two loss functions for training to reduce the negative effect of such noise. In humanitarian mapping where prediction targets are of- ten located in rural or undeveloped areas (e.g., cottage in Africa) and not labeled on OSM (cf. (3) in Figure 1), the missing error becomes so large that only a part of the do- main’s supervision knowledge is covered by OSM. In this work, we propose an ongoing study named DeepVGI which aims at deeply learning from VGI and satellite images for humanitarian mapping. It incorporates free satellite image labels from both OSM and MapSwipe (cf. (1) in Figure 1) with an active deep learning framework based on deep neu- ral networks and a cost sensitive active sampling strategy which enriches the supervision knowledge for the domain of building detection. 1 2 3 OSM Labeled OSM NOT Labeled Figure 1: (1) Interface of MapSwipe application where vol- unteers will click the images that contain buildings or roads for humanitarian aids and (2)(3) two level-18 satellite images used in MapSwipe 2. PROBLEM AND FRAMEWORK In MapSwipe, the humanitarian mapping task is to judge whether a size-fixed satellite image (256*256) has target ob- jects (e.g., building). With DeepVGI, this task is imple- mented by first sliding a window (e.g., 32*32) over an image and then classifying the image tiles generated by the sliding window. Therefore, the technical problem of DeepVGI in- cludes (1) predicting the label y of each small tile x and (2) deciding the label l of a MapSwipe image s. The workflow of DeepVGI which includes four steps is shown in Figure 2. The first step (S1) trains a multilayer artificial neural network (ANN-S1) with (1) positive tiles (x,y+) whose centers are determined by the geographic loca- tions of the buildings on OSM and (2) negative tiles (x,y-) 771

Upload: others

Post on 05-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DeepVGI: Deep Learning with Volunteered Geographic Informationpapers.… · Information (VGI). VGI data from OpenStreetMap (OSM) and a crowdsourcing mobile application named MapSwipe

DeepVGI: Deep Learning with Volunteered GeographicInformation

Jiaoyan ChenHeidelberg University, [email protected]

Alexander ZipfHeidelberg University, Germany

[email protected]

ABSTRACTRecently, deep learning has been widely studied to recog-nize ground objects with satellite imageries. However, find-ing ground truths especially for developing and rural areasis quite hard and manually labeling a large set of trainingdata is costly. In this work, we propose an ongoing researchnamed DeepVGI which aims at deeply learning from satelliteimageries with the supervision of Volunteered GeographicInformation (VGI). VGI data from OpenStreetMap (OSM)and a crowdsourcing mobile application named MapSwipewhich allows volunteers to label images with buildings orroads for humanitarian aids are utilized. Meanwhile, an ac-tive learning framework with deep neural networks is devel-oped by incorporating both VGI data with more completesupervision knowledge. Our experiments show that Deep-VGI can achieve high building detection performance forhumanitarian mapping in rural African areas.

KeywordsDeep Learning; OpenStreetMap; VGI

1. INTRODUCTIONRecently, machine learning especially deep learning al-

gorithms like Convolutional Neural Networks (CNNs) arewidely applied in satellite image classification for buildingdetection, scene understanding, etc. However, such learn-ing methods usually rely on a large set of labeled samplesfor supervision and semi-supervision. Finding training sam-ples or manually labeling images for a specific domain likehumanitarian mapping is costly.

With the development of World Wide Web and crowd-sourcing, Volunteered Geographic Information (VGI) whichis a special case of the larger Web phenomenon known asuser-generated content starts to harvest big geographic dataprovided voluntarily by individuals [1]. VGI platforms likeOpenStreetMap (OSM) provide a way for free labels forsatellite image classification. Mnih et al. [2] proposed toextract vector data from OSM for supervised learning withdeep networks. The study defined missing error (i.e., caseswhen an object that appears in an satellite image does notappear in the map) and registration error (i.e., cases whenthe location of an object in the map is inaccurate) of the

c©2017 International World Wide Web Conference Committee(IW3C2), published under Creative Commons CC BY 4.0 License.WWW’17 Companion, April 3–7, 2017, Perth, Australia.ACM 978-1-4503-4914-7/17/04.http://dx.doi.org/10.1145/3041021.3054250

.

map labels and developed two loss functions for training toreduce the negative effect of such noise.

In humanitarian mapping where prediction targets are of-ten located in rural or undeveloped areas (e.g., cottage inAfrica) and not labeled on OSM (cf. (3) in Figure 1), themissing error becomes so large that only a part of the do-main’s supervision knowledge is covered by OSM. In thiswork, we propose an ongoing study named DeepVGI whichaims at deeply learning from VGI and satellite images forhumanitarian mapping. It incorporates free satellite imagelabels from both OSM and MapSwipe (cf. (1) in Figure 1)with an active deep learning framework based on deep neu-ral networks and a cost sensitive active sampling strategywhich enriches the supervision knowledge for the domain ofbuilding detection.

1 2

3

OSM Labeled

OSM NOT Labeled

Figure 1: (1) Interface of MapSwipe application where vol-unteers will click the images that contain buildings or roadsfor humanitarian aids and (2)(3) two level-18 satellite imagesused in MapSwipe

2. PROBLEM AND FRAMEWORKIn MapSwipe, the humanitarian mapping task is to judge

whether a size-fixed satellite image (256*256) has target ob-jects (e.g., building). With DeepVGI, this task is imple-mented by first sliding a window (e.g., 32*32) over an imageand then classifying the image tiles generated by the slidingwindow. Therefore, the technical problem of DeepVGI in-cludes (1) predicting the label y of each small tile x and (2)deciding the label l of a MapSwipe image s.

The workflow of DeepVGI which includes four steps isshown in Figure 2. The first step (S1) trains a multilayerartificial neural network (ANN-S1) with (1) positive tiles(x, y+) whose centers are determined by the geographic loca-tions of the buildings on OSM and (2) negative tiles (x, y−)

771

Page 2: DeepVGI: Deep Learning with Volunteered Geographic Informationpapers.… · Information (VGI). VGI data from OpenStreetMap (OSM) and a crowdsourcing mobile application named MapSwipe

that are extracted from empty MapSwipe images. The sec-ond step (S2) includes an active sampling strategy which(1) slides a window over each training MapSwipe image (2)predicts the label of each tile using ANN-S1 with a probabil-ity threshold α and determines whether the image has anybuildings, (3) divides the training images into MapSwipe-consistent (sC) and MapSwipe-inconsistent (sI) by com-paring the predicted label with the MapSwipe volunteers’label. The third step (S3) manually labels a limited numberof positive tiles (xI , y+) randomly selected from the aboveMapSwipe-inconsistent images with type II error. The fouthstep (S4) integrates the new training tiles in S3 with the oldtraining tiles in S1 and re-trains a multilayer artificial neuralnetwork (ANN-S4).

In the above workflow of DeepVGI, we call the procedureof resampling as VGI-based active learning strategy, wherethe MapSwiple volunteers’ labels are used to guide the se-lection of candidate images for manual labeling. The posi-tive supervision knowledge from OSM are compared againstthose from MapSwipe. Those training MapSwipe imageswhose supervision knowledge are consistent with OSM aredenoted as sC , while those that are inconsistent are denotedas sI where new positive tiles are sampled to bridge the su-pervision knowledge gap between OSM labels and the wholehumanitarian mapping domain.

S1: Train

MapSwipe  Labels Satellite  Images

S2: Predict with Sliding Window

MapSwipe-­‐Inconsistent    Images  

MapSwipe-­‐Consistent  Images  

Satellite  Tiles

MapSwipe-based Active Learning Strategy

ANN-S1 ANN-S4

S4: Re-train

S4: Re-train

Posi8ve  Labels  from  OSM   Nega8ve  Labels  from  MapSwipe

Satellite  Tiles (x, y+ / y− )

(x I , y+I )

Manual  Labels

S3: Labeling

sC s I

Right Wrong

Type  II  Error  Images

Figure 2: The Workflow of DeepVGI

3. EVALUATION AND DISCUSSIONIn the evaluation of this poster, 1910 MapSwipe remote

sensing images from Southern Malawi Districts in Africa areused. They are randomly divided into 1590 training imagesand 320 testing images. Two classic CNNs named LeNetand AlexNet as well as a multilayer perception are tested.We (1) analyze the performance of the VGI-based activelearning strategy in comparison with a typical uncertainty-based strategy which selects the most uncertain samples forlabeling, and (2) compare the overall testing performance ofDeepVGI with Deep-OSM 1 and MapSwipe volunteers. Allthe used codes, data and a tutorial are open2.

Figure 3 presents the average testing performance of theVGI-based active learning strategy and the uncertainty-based

1Deep-OSM only uses OSM labels for supervision deeplearning (i.e., ANN-S1 in Figure 2), which follows the ba-sic idea of https://github.com/trailbehind/DeepOSM andhttps://github.com/geometalab/OSMDeepOD2https://gitlab.com/giscience/DeepVGI

0.6  

0.62  

0.64  

0.66  

0.68  

0.7  

0.72  

0.74  

0.76  

0.78  

5   10  Number  of  Ac5vely  Sampled  Tiles

VGI-­‐based   Uncertainty-­‐based  

(a) F1 Score

0.77  

0.78  

0.79  

0.8  

0.81  

0.82  

0.83  

0.84  

0.85  

0.86  

5   10  

Number  of  Ac7vely  Sampled  Tiles

VGI-­‐based   Uncertainty-­‐based  

(b) Accuracy

Figure 3: Testing performance using VGI-based active learn-ing strategy and uncertainty-based active learning strategy

active learning strategy, where multiple batches of activesamples (with size of 5 or 10) are randomly selected for test-ing. We can find that the VGI-based strategy achieves thebest performance with 10 actively sampled tiles, and out-performs the uncertainty-based strategy in most cases.

Table 1 shows the overall testing performance of Deep-VGI, Deep-OSM and MapSwipe, where we can find Deep-VGI’s F1 score and accuracy are (1) significantly (p-value� 0.05) larger than Deep-OSM (2) but still smaller than theMapSwipe volunteers. The first result indicates that addingactively sampled positive tiles do brings additional supervi-sion knowledge and increases the generalization performanceof the ANN model. The second result means that DeepVGIis still weaker than the volunteers of MapSwipe which eachimage is voted by three volunteers. This is may be causedby two factors: (1) the noise (e.g., big rocks on bare land)and (2) the small size of images used in this evaluation.

Precision Recall F1 Score AccuracyDeepVGI 0.775 0.737 0.756 0.841Deep-OSM 0.632 0.875 0.734 0.788MapSwipe 0.738 0.938 0.826 0.868

Table 1: Testing performance of DeepVGI (10 actively sam-pled tiles), DeepOSM and MapSwipe (volunteers)

To further improve the accuracy, we will on one handadopt much more training data for higher generalization per-formance with a scalable implementation on the Spark clus-ter. On the other hand, refined OSM supervision knowledge(e.g. building contour) as well as other VGI will be utilizedwith a more carefully designed object function. By the way,we will also implement the deep learning algorithm in [2] toextend the evaluation.

4. ACKNOWLEDGMENTSThis work has been supported by the Klaus Tschira Foun-

dation (KTS) Heidelberg. The authors thank Benjamin Her-fort and Melanie Eckle from MapSwipe and Missing Maps.

5. REFERENCES[1] M. F. Goodchild. Citizens as sensors: the world of

volunteered geography. GeoJournal, 69(4):211–221,2007.

[2] V. Mnih and G. E. Hinton. Learning to label aerialimages from noisy data. In Proceedings of the 29thInternational Conference on Machine Learning(ICML-12), pages 567–574, 2012.

772