1 proximate sensing: inferring what-is-where from georeferenced photo collections daniel leung and...

38
1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science University of California at Merced CVPR 2010 June 17 th , 2010

Upload: stephanie-foster

Post on 15-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

1

Proximate Sensing:Inferring What-Is-Where From

Georeferenced Photo Collections

Daniel Leung and Shawn Newsam

Electrical Engineering & Computer Science

University of California at MercedCVPR 2010June 17th, 2010

Page 2: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

2

Remote sensing: using overhead images of distant scenes to derive geographic information.

satellite image (Google Maps) National Land Cover Database (USGS)

Page 3: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

3

Proximate sensing: use ground-level images of close-by objects and scenes.

Land Cover Map 2000(UK Centre for Ecology &

Hydrology)

?

community-contributed photos(Geograph Britain and Ireland project)

study area: 100x100 km region in southeastern UK

(region TQ in National Grid)

Page 4: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

4

community-contributed photos(Geograph Britain and Ireland project)

Proximate sensing: use ground-level images of close-by objects and scenes.

Page 5: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

5

Proximate Sensing

• We conjecture that the visual content of georeferenced images can be used to derive maps of what-is-where on the surface of the earth.

• Motivation:– Such collections are becoming increasingly available,

e.g. Flickr (100+ million geotagged images), Panoramio, Picasa, Geograph, TrekEarth.

– Derive geographic information not possible through other means, e.g. land-use classification.

– Exciting new application of CV that not only provides another context to apply/revisit standard techniques but stands to motivate novel problems.

Page 6: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

6

Proximate Sensing: Context

• Volunteered Geographic Information (Wikipedia):– VGI is the harnessing of tools to create,

assemble, and disseminate geographic data provided voluntarily by individuals (Goodchild, 2007).

– Goodchild, M. 2007. Citizens as Sensors: The World of Volunteered Geography.

proximate

sensing

citizen science

volunteeredgeographicinformation

Page 7: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

7

VGI: Flickr• 103,679,986 geotagged items• 2.8 million things geotagged this month

Page 8: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

8

VGI: Geograph• “The Geograph Britain and Ireland project aims to collect

geographically representative photographs and information for every square kilometre of Great Britain and Ireland, and you can be part of it.”

• 9,973 users have contributed 1,897,042 images covering 255,904 grid squares, or 77.1% of the total.

“Railway bridge crossing R. Rother

This is now a dismantled railway, further east it

becomesthe Kent & East Sussex

Railway.”

Page 9: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

9

Objective

• Eventual goal is to use the visual content of georeferenced photos to produce land use/cover maps.

• Initial focus on simpler problem of binary classification into developed and undeveloped regions.

Page 10: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

10

Related Work

• Other researchers have leveraged location information in georeferenced photo collections:– To annotate novel images [Quack et al., CIVR 2008;

Moxley et al., MIR 2008].– To geolocate novel images [Hays and Efros, CVPR

2008]. – To organize the collections themselves [Crandall et

al., WWW 2009].

• However, ours is the first work (to the best of our knowledge) to use the collections to infer what-is-where on the surface of the earth on a large scale.

Page 11: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

11

Overview

fraction developed map

binary classification map

trainingimages

labelimages

trainclassifier

featureextraction

aggregate labelsin 1x1 km tiles

targetimages

featureextraction

classifytarget images

Page 12: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

12

Ground Truth (1)• Land Cover Map 2000 (UK Centre

for Ecology & Hydrology)

LCM AC 10: Oceanic Seas

LCM AC 8: Standing open water

LCM AC 4: Improved grassland

LCM AC 7: Built up areas and gardens

LCM AC 3: Arable and horticulture

LCM AC 1: Broad-leaved / mixed woodland

LCM AC 9: Coastal

LCM AC 2: Coniferous woodland

LCM AC 5: Semi-natural grass

LCM AC 6: Mountain, heath, bog

Page 13: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

13

Ground Truth (2)• Aggregate 10 land cover classes into 2

superclasses:– Developed: LCM AC:7 Built up areas and gardens– Undeveloped: other 9 classes

• Derive 2 ground truth maps:– Fraction map: percent developed for each 1x1 km

tile.– Binary classification map: apply 50% threshold to

fraction map.

Ground truth fraction map indicating percent developed

for each 1x1km tile.

Ground truth binary classification map indicating tiles labelled as developed

(white) or undeveloped (black).

Page 14: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

14

Datasets (1)

• Downloaded 920K Flickr images for the TQ region.

• Distribution for 1x1 km tiles shown to left (log10 scale).

• 5,420 tiles contain no Flickr images.• 4,580 tiles contain average of 200,

median of 10, and maximum of 53,840 images.

Flickr

Page 15: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

15

Datasets (2)

• Downloaded 120K images from the Geograph Britain and Ireland project

• Distribution for 1x1 km tiles shown to left (log10 scale).

• Only 614 tiles without images.• 9,386 tiles contain average of 13,

median of 5, and maximum of 1,458 images.

Geograph

Page 16: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

16

Image Features

• Extract simple five dimensional edge histogram features for each image.

• Motivated by the observation that images of developed scenes typically have a higher proportion of horizontal and vertical edges than images of undeveloped scenes.

Page 17: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

17

Image Classification

• Perform image level binary classification:– Developed.– Undeveloped.

• SVM classifier with Gaussian RBF kernel, five-fold cross validation, and grid search for optimal parameter selection.

Page 18: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

18

Experiments (1)

fraction developed map

binary classification map

trainingimages

labelimages

trainclassifier

featureextraction

targetimages

featureextraction

aggregate labelsin 1x1 km tiles

classifytarget images

Page 19: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

19

Experiments (2)• Fraction developed map: the fraction of

images classified as developed in each tile.

• Binary classification map: threshold applied to fraction map.

• Explore two types of thresholds:– Fixed at 0.5.– Adaptive so that 38.9% of the tiles are labelled as

developed (this represents prior knowledge on the distribution of developed vs. undeveloped regions).

Page 20: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

20

Experiments (3)• Results are qualitatively evaluated by visually

comparing predicted maps with ground truth maps.• Results are quantitatively evaluated using ground

truth:– Binary classification: number of tiles with same label.– Fraction developed: correlation coefficient () over

tiles. Also, mean absolute difference (MAD) and root mean squared difference (RMSD).

• Quantitative results computed over 4,553 tiles for which there are both Flickr and Geograph images.– 38.9% of these tiles are developed in the ground truth

so that chance binary classification is 61.1% achievable by labelling all tiles as undeveloped.

Page 21: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

21

Experiments (4)• Manual vs. weakly-supervised labelling

of training set.• Effect of photographer intent.• Relative importance of training vs. target

set.• Filtering out non-informative images.• Training set size.• Training set quality.

Page 22: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

22

Results—Manually Labelled Training Set (1)

• Training set contains 2,740 Flickr images which have been manually labeled as depicting a scene that is developed or undeveloped.

• Developed ~ containing constructed materials such as used in houses, buildings, etc.

Page 23: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

23

Results—Manually Labelled Training Set (2)

Ground Truth MapsMaps Generated Using

Flickr Images

Page 24: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

24

 

Binary Maps

Fraction MapsOverall Class. Rate Avg. Class. Rate

Training Set

Target Set

Training Set Size

FixedThresh.

%

AdaptiveThresh.

%

FixedThresh.

%

AdaptiveThresh.

% MAD RMSD

Manual (Flickr) Flickr 2740 (0.51) 66.4 64.9 68.8 63 0.374 0.287 0.383

fraction of images labelled as developed

in the training set

• Performance is better than chance (61.1%)

Results—Manually Labelled Training Set (4)

Page 25: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

25

• Labelled training set constructed in fully automated fashion:– Select 2 images at random from tiles with

4 or more images.– Label them with the majority label of the

tile in the ground truth map.

Results—Weakly-Supervised Training (1)

Page 26: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

26

Results—Weakly-Supervised Training (2)

 

Binary Maps

Fraction MapsOverall Class. Rate Avg. Class. Rate

Training Set

Target Set

Training Set Size

FixedThresh.

%

AdaptiveThresh.

%

FixedThresh.

%

AdaptiveThresh.

% MAD RMSD

Manual (Flickr) Flickr 2740 (0.51) 66.4 64.9 68.8 63 0.374 0.287 0.383

Weakly(Flickr) Flickr 5872 (0.52) 67.2 66.9 68.7 65.2 0.380 0.279 0.373

• Weakly-labelled training set outperforms manually-labelled one.– Suggests training sets can be generated

from regions for which maps exist and then used to train classifiers for mapping unmapped regions.

Page 27: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

27

Results—Photographer Intent (1)

• Compare Flickr vs. Geograph results.

Page 28: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

28

Ground Truth Maps

Maps GeneratedUsing

Flickr Images

Maps GeneratedUsing

Geograph Images

Results—Photographer Intent (2)

Page 29: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

29

Results—Photographer Intent (4)

 

Binary Maps

Fraction MapsOverall Class. Rate Avg. Class. Rate

Training Set

Target Set

Training Set Size

FixedThresh.

%

AdaptiveThresh.

%

FixedThresh.

%

AdaptiveThresh.

% MAD RMSD

Flickr Flickr 5872 (0.52) 67.2 66.9 68.7 65.2 0.380 0.279 0.373

Geograph Geograph 10576 (0.26) 68.2 74.0 60.8 72.6 0.520 0.271 0.358

• Photographer intent is a significant factor.

Page 30: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

30

Results—Importance of Training vs. Target Set (1)

• Geograph training+target set outperforms Flickr training+target set.

• Investigate whether improvement is due to training or target set.

• Training and target sets from different collections.

Page 31: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

31

 

Binary Maps

Fraction MapsOverall Class. Rate Avg. Class. Rate

Training Set

Target Set

Training Set Size

FixedThresh.

%

AdaptiveThresh.

%

FixedThresh.

%

AdaptiveThresh.

% MAD RMSD

Flickrgood Flickr 5070 (0.49) 67.0 68.1 67.4 66.6 0.329 0.285 0.374

Geographgood Flickr 5603 (0.47) 60.7 68.3 53.8 66.6 0.330 0.294 0.381

Geographgood Geograph 5603 (0.47) 74.2 74.6 71.5 73.1 0.551 0.231 0.308

Flickrgood Geograph 5070 (0.49) 69.9 73.1 71.5 71.7 0.496 0.254 0.331

• Photographer intent is more important for target than training set.

Results—Importance of Training vs. Target Set (2)

Page 32: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

32

Results—Filtering Out Non-informative Images (1)

• Investigate whether removing images with faces improves results.

• Motivation: photographs of people are less likely to be geographically informative, especially close-in portraits.

Page 33: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

33

Results—Filtering Out Non-informative Images (2)

 

Binary Maps

Fraction MapsOverall Class. Rate Avg. Class. Rate

Training Set

Target Set

Training Set Size

FixedThresh.

%

AdaptiveThresh.

%

FixedThresh.

%

AdaptiveThresh.

% MAD RMSD

Flickr Flickr 5872 (0.52) 67.2 66.9 68.7 65.2 0.380 0.279 0.373

FlickrFlickr

no faces 5872 (0.52) 66.8 66.7 66.8 64.2 0.367 0.301 0.414

Geograph Flickr 5603 (0.47) 60.7 68.3 53.8 66.6 0.330 0.294 0.381

GeographFlickr

no faces 5603 (0.47) 59.9 68.0 52.0 65.2 0.312 0.321 0.428

• Filtering out images with faces from the target set does not result in improved performance.

Page 34: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

34

• Demonstrated that georeferenced community-contributed photo collections can be considered as a form of VGI.

• Maps of developed/undeveloped regions automatically generated using Flickr and Geograph images shown to be similar to ground truth maps.– Despite simple image features.

Discussion (1)

Page 35: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

35

• Weakly-labelled training set outperforms manually-labelled training set.– Clear benefits for training classifiers.

• Photographer intent is significant, especially for target set.– Restricts what can be used as target sets.– Poses interesting research challenges such as how

to use the Geograph dataset to filter the “noisy” Flickr dataset.

• Initial results on filtering out images with faces inconclusive.

Discussion (2)

Page 36: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

36

• Improved image features.– Gist.

• Integrate textual annotations.– Flickr tags.– Geograph descriptive text.

• Additional land-cover/use classes.• Spatial models:

– Tobler’s first law of geography: all things are related, but nearby things are more related than distant things.

Extensions

Page 37: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

37

Come to our poster this afternoon

Page 38: 1 Proximate Sensing: Inferring What-Is-Where From Georeferenced Photo Collections Daniel Leung and Shawn Newsam Electrical Engineering & Computer Science

38

Thank you! and questions?

Acknowledgements:• This work was funded in part by the following

grants:– DOE Early Career Scientist and Engineer

Award/PECASE– NSF 0917069: IIS Core

• Thanks to Nathan Graves for implementing the edge histogram descriptors.