discovering p ersistent change windows in big spatiotemporal datasets a summary of results

26
Discovering Persistent Change Windows in Big Spatiotemporal Datasets A summary of results Xun Zhou, Shashi Shekhar, Dev Oliver 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (BigSpatial 2013) Nov. 5, 2013

Upload: dalton-little

Post on 02-Jan-2016

13 views

Category:

Documents


2 download

DESCRIPTION

Discovering P ersistent Change Windows in Big Spatiotemporal Datasets A summary of results. Xun Zhou, Shashi Shekhar , Dev Oliver 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data ( BigSpatial 2013) Nov. 5, 2013. Outline. Motivation Problem Formulation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Discovering Persistent Change Windows in Big Spatiotemporal

DatasetsA summary of results

Xun Zhou, Shashi Shekhar, Dev Oliver

2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (BigSpatial 2013)

Nov. 5, 2013

Page 2: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Outline

• Motivation• Problem Formulation• Challenges• Our Contribution• Novelty• Validation

Page 3: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Motivation (1)• Understanding climate and environmental changes

– A global challenge: where, when, how, why?– Deforestation: forest is logged down at a certain speed– Desertification: grassland turned into desert– Urban changes: city sprawl, irrigation (vegetation increase).

• Detecting changes: An essential step– Where and when

Desertification Deforestation Urban sprawl

Page 4: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Motivation (2)• Big Data for climate and earth science

– Land cover data at various resolutions: MODIS, Landsat, etc.– Help domain scientists find potential regions of interests:

desertification, deforestation, urban sprawl…– Google time lapse: Amazon deforestation [1]

• Our goal: • Find a spatial window and a time period where data value (e.g.,

vegetation cover) change at a certain high speed

1984 1998 2012

[1]. Google Earth Engine, https://earthengine.google.org/#intro/

Page 5: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Problem Formulation: Basic Concepts• Spatiotemporal Windows

– A spatial field S, each location si has a time series of length |T|– Spatial window: a rectangular area in S.– ST window: a pair of <spatial window Sj, time interval Tj>

• Spatial aggregated time series– For a spatial window Sj, TSj ={ , ,…, }

– x(si, 1), x(si, 2),… are values in location si at time 1, 2, … |T|

– SUM can be replaced by AVG, etc.• Average change rate (ACR):

– For a ST window, ACR(Sj, Tj) = [TSj(t1) – TSj(tn)]/TSj(t1)/(tn-t1), Tj = [t1, tn]

• Persistent Change Window (PCW): ACR ≥ threshold

• “Total (average) vegetation cover in an area change at an average rate of … in a few years”

Page 6: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Problem statement• Given:

– A spatial time series with |S| = M x N locations, and |T| time steps.– A threshold r of average change rate (ACR)– Minimum window size Smin and minimum time length Tmin

• Find:– All the ST persistent change windows <Si, Ti> where ACR(Si, Ti) ≥ r

• Objective:– Reduce computational cost

• Constraints:– |Si| ≥ Smin and |Ti| ≥Tmin

– <Si, Ti> is not a subset of any other window <S’, T’>, such that Si S’ and Ti T’

– Completeness & Correctness

Page 7: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Examples

Red box (3x3) for T=[1,4]ACR = 16.5%

Yellow box (2x4) for T=[3,4]ACR = 14.5%

Threshold: 15%Smin

= 6, Tmin = 2

Output:<Red-box, [1, 4]>

Page 8: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Challenges

• Large number of candidates ( big combinatorics)– M2xN2xT2 candidate patterns (M x N locations, T time steps).

• Pattern lack of monotonicity– Temporal pattern may have non-interesting part– Sub-regions in a window may be non-interesting

• Large dataset: • 250m MODIS tile: 4800 by 4800 pixels and 250 snapshots• Hundreds of such tiles in the dataset• Terabyte data volume

Page 9: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Contributions

• Formulate the Persistent Change Window (PCW) discovery problem

• A ST window enumeration and pruning (SWEP) approach

• Theoretical analysis : correctness, completeness, and space/time complexity

• Case study on MODIS NDVI data • Experiments: scalability w.r.t. data volume and input

parameters.

Page 10: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Related work

Spatiotemporal Change pattern discovery

Persistent (arbitrary long interval in long time series) Zonal Change

(Our work)

Other footprints

Time point (CUSUM[2]) or interval [7] in single time series

Local/zonal change across few snapshots(image differencing, object-based change detection[5,6])

Zonal change at time point (ST scan statistics [3, 4])

Page 11: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Baseline solution: Naïve approach• Two step framework (N,M= sides of spatial field, T=# time steps)

– Step 1:• Enumerate all the pairs of {spatial window, time interval} and

generate aggregated time series for each window• Find interesting intervals for each spatial window and add to

candidate set• O(N3 x M3 x T3)

– Step 2:• For each window-interval pair (S, T) in candidate set, prune all the

pairs that are dominated by it. • O(k2) where k is the total number of candidates from step 1• K = O(N2 x M2 x T2) in the worst case

– Time complexity: • O(M x N x T)4 in worst case

Page 12: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

A ST Window Enumeration-n-Pruning approach (SWEP)

– Step 0: • Scan all the windows with left-top corner (1,1) and build a lookup

table for all spatial windows• O(M x N x T) time cost, O(M x N X T) memory cost

– Step 1:• Two level BFS enumeration of all ST windows• Outer loop: Enumerate all the LBN locations from (1,1,1)

– Find the enumeration space for the current LBN using record– Inner loop: enumerate all the “valid RTF” for each LBN– Record all the WPCs found in this iteration

– Step 2:• Refine step not needed. No dominated ST window will be generated.

Page 13: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Step 0: Window sum lookup table(1,1)

A

C

B

D

SUM(Target area) = D – B – C + A

Target area

x\y 1 2 3 4

1 3 9 15 24

2 7 19 31 50

3 13 33 53 82

4 23 53 82 121T = 4

Page 14: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Step 1: Two-level Enumeration (1)• Enumerate 3-D ST windows in the dataset using two corner locations

• BFS on the Left-bottom-near (LBN) and Right-top-far (RTF) locations• Avoid visiting dominated ST windows

LBN and RTF representation of a window Enumeration of LBN location Enumeration of RTF for each LBN

• Challenge: Record discovered PCWs for later pruning• For each LBN, record the discovered PCWs• For later LBNs, skip RTF locations inside these PCWs

W1 = <LBN1, RTF1> is a PCW. For LBN2 , we don’t need to test RTFs inside W1.

W1

Page 15: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Step 1: Two-level enumeration (2)• A six-dimensional enumeration space

(1, 1, 1) (2, 1, 1) (3, 1, 1)

(1, 1, 2)

(3, 3, 3)

(1, 1, 3)

<(1,1,1), (3,3,3)>

<(1,1,1), (3,3,2)>

<(1,1,1), (3,2,3)>

<(1,1,1), (2,3,3)>

<(1,1,2), (3,3,3)>

<(1,2,1), (3,3,3)>

<(2,1,1), (3,3,3)>

<(1,1,1), (3,3,1)>

<(1,1,1), (3,2,2)>

<(1,1,1), (3,3,1)>

<(1,1,1), (3,2,2)>

<(1,1,1), (2,2,3)>

<(1,1,1), (3,1,3)>

<(1,1,2), (3,3,2)>

<(1,2,1), (3,3,2)>

<(2,1,1), (3,3,2)>

<(1,1,2), (3,2,3)>

<(1,2,1), (3,2,3)>

<(2,1,1), (3,2,3)>

<(1,1,1), (2,3,2)>

<(1,1,1), (2,2,3)>

<(1,1,1), (1,3,3)>

<(1,1,2), (3,3,2)>

<(1,1,2), (3,2,3)>

<(1,1,2), (2,3,3)>

<(1,1,2), (2,3,3)>

<(1,2,1), (2,3,3)>

<(2,1,1), (2,3,3)>

<(1,1,3), (3,3,3)>

<(1,2,2), (3,3,3)>

<(2,1,2), (3,3,3)>

<(1,2,1), (3,3,2)>

<(1,2,1), (3,2,3)>

<(1,2,1), (2,3,3)>

<(2,1,1), (3,3,2)>

<(2,1,1), (3,2,3)>

<(2,1,1), (2,3,3)>

<(1,2,2), (3,3,3)>

<(1,3,1), (3,3,3)>

<(2,2,1), (3,3,3)>

<(2,1,2), (3,3,3)>

<(2,2,1), (3,3,3)>

<(3,1,1), (3,3,3)>

(3, 2, 3)

Page 16: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Evaluations

• Theoretical – Correct– Complete– Time & space complexity

• Case study– Land cover data: MODIS 250m NDVI Data

• Experimental Evaluation– Change data volume (with fixed time length)– Change data volume (with fixed number of locations)– Change the location of pattern in the search space

16

Page 17: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Theoretical analysis

• The SWEP algorithm is correctness• The SWEP algorithm is complete• Space/time complexity (k = MxNxT)

17

Best Scenario

O(k) O(k2) O(k3)

O(k)

O(k2)

O(k3)

SWEP

Naive

O(k4)

O(k4)

Worst Scenario

SWEP

Naive

Page 18: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Case study

An annual increase of 11.5%, 2001-2012

• Initial Results• MODIS 250m NDVI data (16 days)• Time:2000-2012. Annual: July 27/28 of each year.

Study area

2001 2006 2012

2001

2006

2012

Average NDVI in outlined window

Irrigation in Saudi Arabia, shown by Google Time lapse [1]

Results of the proposed algorithm with average change rate >= 10% (outlined window)

Page 19: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Experiments• Questions:

– What is the impact of the data volume on run-time?– What is the impact of the pattern size on run-time?

• Synthetic data– Data volume (area size, time length)– Pattern size (pattern volume ratio, PVR)

• PVR = max pattern volume/ST data volume

• Settings:

– Matlab 2013 Under Linux– HP ProLiant BL280c G6 blade servers, with a quad-core 2.8 GHz Intel Xeon

X5560 processor and 24 GB shared memory

Page 20: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Impact of Varying Dataset Size

2000 8000 18000 32000 50000 Data Volume (# values)

2000 8000 18000 32000 50000 Data Volume (# values)

25000 50000 75000 100000 125000 Data Volume (# values)

25000 50000 75000 100000 125000 Data Volume (# values)

(1) Fixed PVR = 0.1 (worst case), varying data volume with fixed T = 20

(2) Fixed PVR = 0.95 (best case), varying data volume with fixed T = 20

(3) Fixed PVR = 0.1 (worst case), varying data volume with fixed |S| = 2500

(4) Fixed PVR = 0.95 (best case), varying data volume with fixed |S| = 2500

Page 21: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Impact of Varying Pattern Size(1) Fixed M = N = T, 125000 total data points, varying PVR from 0.1

(worst case) to 1 (best case)

Summary: SWEP is orders of magnitude faster than Naïve algorithm with respect to (1) data volume and (2) the pattern size.

Page 22: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Conclusion and Future work

• The PCW discovery problem is defined• A space-time window enumeration and pruning (SWEP) approach is

proposed to mine PCW patterns • Correct, complete and faster.• Case study primarily show usefulness.

• Future work– Accelerate the approach using parallel computing (e.g., CUDA) – Improve the SWEP algorithm (e.g., multi-resolution enumeration)– More case studies on remote sensing datasets (e.g., Amazon deforestation)

to compare with known results (e.g., Google Time lapse).

Page 23: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Acknowledgements & References • Acknowledgements

– NSF, USDOD for funding projects.– Minnesota Supercomputing Institute (MSI)– Spatial DB & DM group @ UMN

References[1] Google Engine ( https://earthengine.google.org/#intro/)[2] Basseville, Michele, and Igor V. Nikiforov. "Detection of abrupt changes: theory and applications." Journal of the Royal Statistical Society-Series A Statistics in Society 158.1 (1995): 185.

[3] Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics-Theory and methods, 26(6), 1481-1496.

[4] M. Kulldorff. Prospective time periodic geographical disease surveillance using a scan statistic. Journal of the Royal Statistical Society: Series A (Statistics in Society),164(1):61--72, 2001.

[5] Coppin, Pol, et al. "Review Article Digital change detection methods in ecosystem monitoring: a review." International journal of remote sensing 25.9 (2004): 1565-1596.

[6] A. Singh. Review article digital change detection techniques using remotely-sensed data. International journal of remote sensing, 10(6):989--1003, 1989.

[7] X. Zhou, S. Shekhar, P. Mohan, S. Liess, and P. K. Snyder. Discovering interesting sub-paths in spatiotemporal datasets: A summary of results. In19th ACM SIGSPATIAL GIS, pages 44-53. ACM,2011.

Page 24: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Step 1: Two-level enumeration (2)• Find the space to enumerate in

each round– Skip any location that

• Falls into the union of existing PCWs

– “Covered space” of a LBN• The minimum set of RTF

locations to traverse for each LBN

• The “Covered space” of a LBN is a subset of the “covered space” of its predecessors.

– The space to traverse for each LBN is

• The intersection of covered space of all its direct parents [proof]

34

(1,1,1)

(3,3,3)

PCW

(2,2,3)

Page 25: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Step 1: Two-level enumeration (3)• Record the traversal space of each LBN location

– Intersection of covered space of all the parents– Put all the “covered space“ as a list of “3-D Boolean maps”– Use a pointer array to link LBN with a “map”– Merge duplicate “maps”

35

Map2Map1 Map3 Map k

List of covered space

Map1

Map2

Map2

LBN1 LBN2 LBN3 LBN4

Page 26: Discovering  P ersistent Change Windows  in Big Spatiotemporal  Datasets A summary of results

Theoretical analysis

• The SWEP algorithm is correctness• The SWEP algorithm is complete• Space/time complexity

36

Naïve SWEP

Time Complexity

Best case O(M3N3T2) O(MNT)

Worst case O(M4N4T4) O(M2N2T2)

Space Complexity

Best Case O(M3N3T2) O(MNT)

Worst case O(M4N4T4) O(M2N2T2)