gaussian process models of spatial aggregation algorithmsexpensivedata collection. much implicit but...

Gaussian Process Modelsof Spatial Aggregation Algorithms

Naren RamakrishnanVirginia Tech Computer Science

http://people.cs.vt.edu/~ramakris/

Chris Bailey-KelloggPurdue Computer Sciences

http://www.cs.purdue.edu/homes/cbk/

Big Picture

Spatial Aggregation: genericmechanism for spatial datamining, parameterized bydomain knowledge.

classesEquivalence

objectsSpatial

N-graph

Ambiguities

Sample

Aggregate

Interpolate

LocalizeRedescribe

LocalizeRedescribe

Lower-Level Objects

Higher-Level Objects

Abstract Description

Classify

Input Field

Gaussian Processes: genericframework for spatial statisticalmodeling, parameterized bycovariance structure.

SA+GP: model the mining mechanism for meta-level reasoning,e.g. targeting samples and characterizing sensitivity to parametersand inputs.

Example: Wireless System Configuration

Optimize performance (e.g. signal-to-noise, bit error probability) ofwireless system configuration (e.g. distance between antennae).

Simulate across range ofconfigurations (hours to days persimulation).

10 20 30 40

1020

3040

SNR1, dB

SNR

2, d

B

Aggregate structures inconfiguration space.

In shaded region, 99% confidencethat average error is acceptable.

Analyze structures tocharacterize performance.

Configs in upper right lesssensitive to power imbalance(region width).

General Features

Problem: scarce spatial data mining in physical domains

• Expensive data collection. Much implicit but little explicitdata.

• Control over data collection.

• Available physical knowledge — continuity, locality, symmetry,etc.

Approach: multi-level qualitative analysis

• Exploit domain knowledge to uncover qualitative structures indata.

• Sample optimization driven by model selection — maximizeexpected information gain, minimize expense, . . . .

• Decisions explainable in terms of problem structures & physicalknowledge.

Mining Mechanism: Spatial Aggregation (SA)

Local operations for finding multi-level structures in spatial data.• Input: numerical field.

Ex: weather maps, numericalsimulation output.

• Output: high-level description ofstructure, behavior, and design.Ex: fronts, stability regions in dy-namical systems.

• Bridge quantitative ↔ qualita-tive via increasingly abstractstructural descriptions.

• Key domain knowledge: localityin domain, similarity in feature.

classesEquivalence

objectsSpatial

N-graph

Ambiguities

Sample

Aggregate

Interpolate

LocalizeRedescribe

LocalizeRedescribe

Lower-Level Objects



Classify

Input Field

Spatial Aggregation Example

Goal: find flows in vector field (e.g. wind velocity, temp. gradient).

(a) Input (b) Localize (distance < r)

(c) Test similarity (angle < θ) (d) Select succ (d · distance + angle)

(e) Select pred (d · distance + angle) (f) Redescribe (points 7→ curve)

(g) Bundle curves by higher-level locality, similarity

Reasoning About SA Applications

• Sensitivity to input?

• Sensitivity to parameters (locality, similarity metrics)?

• Optimization of additional samples?

Approach: probabilistic model of spatial relationships, in terms ofGaussian Processes.

classesEquivalence

objectsSpatial

N-graph

Ambiguities

Sample

Aggregate

Interpolate

LocalizeRedescribe

LocalizeRedescribe

Lower-Level Objects



Classify

Input Field

↔

Gaussian Processes: Intuition

• 1D version of vector flow analysis:

0 2 4 6 8 10 12 14 16 18 20−3

−2

−1

0

1

2

3

x

grad

ient

Qualitative structure: same-direction flow.

• Regression: given angles at some sample points, predict at new,unobserved points.

0 2 4 6 8 10 12 14 16 18 20−3

−2

−1

0

1

2

3

x

vect

or a

ngle

(val

ues

or d

istri

butio

ns):

radi

ans

Gaussian conditional distribution; covariance structurecaptures locality.

• Classification: apply logistic (higher-D: softmax) function toestimate latent variable representing class:

0 2 4 6 8 10 12 14 16 18 20−3

−2

−1

0

1

2

3

x

grad

ient

⊗−10 −8 −6 −4 −2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

7→0 2 4 6 8 10 12 14 16 18 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

GP as Spatial Interpolation (Kriging)

• Given set of observations {(x1, y1), . . . , (xk, yk)} (vector anglesat positions), want to model y = f(x).

• Possible form f(x) = α+ Z(x).

• Model Z with Gaussian: mean 0, covariance σ2R.

• Key: structure of R captures neighborhood relationshipsamong samples.

Ex: R(xi, xj) = e−ρ|xi−xj |2

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

−10 −8 −6 −4 −2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ρ = 0.1 ρ = 1Note: exact interpolation at data points.

• Optimize parameters given observations, to estimate f ′.Ex: minimize mean squared error E{(f ′ − f)2}:

maxρ

(−k

2(lnσ2 + ln |R|)

)where R is k × k symmetric correlation matrix from R.

• One-D optimization straightforward; higher-D requires MCMC.

• Once optimized, prediction for xk+1 is easy, based oncorrelation to samples:

f ′(xk+1) = α̂+ rT (xk+1)R−1(y − α̂Ik)

r is correlation vector for xk+1 vs. sample points.α̂ estimates α: α̂ = (IT

k R−1Ik)−1ITk R−1y.

Then the estimate’s variance is

σ̂2 =(y − α̂Ik)TR−1(y − α̂Ik)

k

Gaussian Processes in General

−3 −2 −1 0 1 2 3−2

−1

0

1

2

3

4

5

6

Keys:

• Bayesian modeling, with prior directly on function space.

• Generalize Gaussian distribution over finite vectors to one overfunctions, using mean and covariance functions.

• Fully specified by distributions on finite sample sets, so stillonly perform nice matrix operations.

Related Work:

• Rasmussen: unifying framework for multivariate regression.

• Williams and Barber: classification.

• MacKay: pattern recognition.

• Neal: model for neural networks.

• Sacks: model deterministic computer experiments withstochastic processes.

Multi-Layer GP

• SAL programs repeatedly aggregate/classify/redescribe, up anabstraction hierarchy. 7→ sequence of GP models, each withcovariance; superpose for composite.

• Input data field: interpolated surrogate for sparse samples.

• Locality (neighborhood graph — “close enough”) modeled by

R(x(k), x(l)) = ζn∏i=1

e−ρi|x(k)i −x

(l)i |

η

• Similarity in feature (equivalence predicate — “good-directionflow”) only applicable when combined with locality.⇒ Combined hyperparameters for position and direction.Hierarchical prior allows for determination of relativeimportance.

Case Study: Pocket Identification

Abstract wireless problem with de Boor “pocket” function.

α(X) = cos

(n∑i=1

2i(

1 +xi

| xi |

))− 2

δ(X) = ‖X− 0.5I‖

p(X) = α(X)(1− δ2(X)(3− 2δ(X))) + 1

−1−0.5

00.5

1

−1

−0.5

0

0.5

1−2

−1.5

−1

−0.5

0

0.5

1

Goal: identify number & locations of pockets (not func. approx.),with minimal # samples.

SAL Pocket Finding

−1 −0.5 0 0.5 1−1

−0.5

0

0.5

1

Test

Vary parameters (close-enough wrt r, similar-enough angle wrt θ,weight d for combining distance and angle):

r ∈ {1,√

2, 1.5,√

3, 2}

θ ∈ {0.7, 0.8, 0.85, 0.9, 0.95}

d ∈ {0.01, 0.02, 0.03, 0.04, 0.05}

Construct GP (i.e. estimate covariance terms) for flow classes usingNeal’s software, hybrid MC.

Number of Pockets

• d had little effect in this field, due to symmetry.

• Averaged over d, at varying (r, θ):

1 1.414 1.5 1.732 20

20

40

60

80

100

120

140

# po

cket

s

0.700.800.850.900.95

• Abrupt jump at θ = 0.95 — stringent vector similarity.

Covariance Contributions

1 1.414 1.5 1.732 20

1

2

3

4

5co

var c

ontri

b ρ x

0.700.800.850.900.95

1 1.414 1.5 1.732 20

1

2

3

4

5

cova

r con

trib

ρ y

0.700.800.850.900.95

• Basically symmetric.

• Increase quadratically with # pockets — can’t stray “too far”for prediction.

• Characteristic length, 1/ρ, decreases with # pockets —identified pockets occupy less of the space.

Discussion

• Model qualitative spatial data mining with stochastic processframework, summarizing transformation from input tohigh-level abstractions.

• Probabilistic basis allows sample optimization, studies ofparameter sensitivity, reasoning about algorithm applicability.

• Next steps: combined modeling of sensitivity to input andparameters.

• Thanks to Feng Zhao (PARC), Layne T. Watson (Va. Tech).

• Funding: NR (NSF EIA-9974956, EIA-9984317, andEIA-0103660) and CBK (NSF IIS-0237654).

gaussian process models of spatial aggregation algorithmsexpensivedata collection. much implicit but...

Documents