Gaussian Process Modelsof Spatial Aggregation Algorithms
Naren RamakrishnanVirginia Tech Computer Science
http://people.cs.vt.edu/~ramakris/
Chris Bailey-KelloggPurdue Computer Sciences
http://www.cs.purdue.edu/homes/cbk/
Big Picture
Spatial Aggregation: genericmechanism for spatial datamining, parameterized bydomain knowledge.
classesEquivalence
objectsSpatial
N-graph
Ambiguities
Sample
Aggregate
Interpolate
LocalizeRedescribe
LocalizeRedescribe
Lower-Level Objects
Higher-Level Objects
Abstract Description
Classify
Input Field
Gaussian Processes: genericframework for spatial statisticalmodeling, parameterized bycovariance structure.
SA+GP: model the mining mechanism for meta-level reasoning,e.g. targeting samples and characterizing sensitivity to parametersand inputs.
Example: Wireless System Configuration
Optimize performance (e.g. signal-to-noise, bit error probability) ofwireless system configuration (e.g. distance between antennae).
Simulate across range ofconfigurations (hours to days persimulation).
10 20 30 40
1020
3040
SNR1, dB
SNR
2, d
B
Aggregate structures inconfiguration space.
In shaded region, 99% confidencethat average error is acceptable.
Analyze structures tocharacterize performance.
Configs in upper right lesssensitive to power imbalance(region width).
General Features
Problem: scarce spatial data mining in physical domains
• Expensive data collection. Much implicit but little explicitdata.
• Control over data collection.
• Available physical knowledge — continuity, locality, symmetry,etc.
Approach: multi-level qualitative analysis
• Exploit domain knowledge to uncover qualitative structures indata.
• Sample optimization driven by model selection — maximizeexpected information gain, minimize expense, . . . .
• Decisions explainable in terms of problem structures & physicalknowledge.
Mining Mechanism: Spatial Aggregation (SA)
Local operations for finding multi-level structures in spatial data.• Input: numerical field.
Ex: weather maps, numericalsimulation output.
• Output: high-level description ofstructure, behavior, and design.Ex: fronts, stability regions in dy-namical systems.
• Bridge quantitative ↔ qualita-tive via increasingly abstractstructural descriptions.
• Key domain knowledge: localityin domain, similarity in feature.
classesEquivalence
objectsSpatial
N-graph
Ambiguities
Sample
Aggregate
Interpolate
LocalizeRedescribe
LocalizeRedescribe
Lower-Level Objects
Higher-Level Objects
Abstract Description
Classify
Input Field
Spatial Aggregation Example
Goal: find flows in vector field (e.g. wind velocity, temp. gradient).
(a) Input (b) Localize (distance < r)
(c) Test similarity (angle < θ) (d) Select succ (d · distance + angle)
(e) Select pred (d · distance + angle) (f) Redescribe (points 7→ curve)
(g) Bundle curves by higher-level locality, similarity
Reasoning About SA Applications
• Sensitivity to input?
• Sensitivity to parameters (locality, similarity metrics)?
• Optimization of additional samples?
Approach: probabilistic model of spatial relationships, in terms ofGaussian Processes.
classesEquivalence
objectsSpatial
N-graph
Ambiguities
Sample
Aggregate
Interpolate
LocalizeRedescribe
LocalizeRedescribe
Lower-Level Objects
Higher-Level Objects
Abstract Description
Classify
Input Field
↔
Gaussian Processes: Intuition
• 1D version of vector flow analysis:
0 2 4 6 8 10 12 14 16 18 20−3
−2
−1
0
1
2
3
x
grad
ient
Qualitative structure: same-direction flow.
• Regression: given angles at some sample points, predict at new,unobserved points.
0 2 4 6 8 10 12 14 16 18 20−3
−2
−1
0
1
2
3
x
vect
or a
ngle
(val
ues
or d
istri
butio
ns):
radi
ans
Gaussian conditional distribution; covariance structurecaptures locality.
• Classification: apply logistic (higher-D: softmax) function toestimate latent variable representing class:
0 2 4 6 8 10 12 14 16 18 20−3
−2
−1
0
1
2
3
x
grad
ient
⊗−10 −8 −6 −4 −2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
7→0 2 4 6 8 10 12 14 16 18 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
GP as Spatial Interpolation (Kriging)
• Given set of observations {(x1, y1), . . . , (xk, yk)} (vector anglesat positions), want to model y = f(x).
• Possible form f(x) = α+ Z(x).
• Model Z with Gaussian: mean 0, covariance σ2R.
• Key: structure of R captures neighborhood relationshipsamong samples.
Ex: R(xi, xj) = e−ρ|xi−xj |2
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
−10 −8 −6 −4 −2 0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ρ = 0.1 ρ = 1Note: exact interpolation at data points.
• Optimize parameters given observations, to estimate f ′.Ex: minimize mean squared error E{(f ′ − f)2}:
maxρ
(−k
2(lnσ2 + ln |R|)
)where R is k × k symmetric correlation matrix from R.
• One-D optimization straightforward; higher-D requires MCMC.
• Once optimized, prediction for xk+1 is easy, based oncorrelation to samples:
f ′(xk+1) = α̂+ rT (xk+1)R−1(y − α̂Ik)
r is correlation vector for xk+1 vs. sample points.α̂ estimates α: α̂ = (IT
k R−1Ik)−1ITk R−1y.
Then the estimate’s variance is
σ̂2 =(y − α̂Ik)TR−1(y − α̂Ik)
k
Gaussian Processes in General
−3 −2 −1 0 1 2 3−2
−1
0
1
2
3
4
5
6
Keys:
• Bayesian modeling, with prior directly on function space.
• Generalize Gaussian distribution over finite vectors to one overfunctions, using mean and covariance functions.
• Fully specified by distributions on finite sample sets, so stillonly perform nice matrix operations.
Related Work:
• Rasmussen: unifying framework for multivariate regression.
• Williams and Barber: classification.
• MacKay: pattern recognition.
• Neal: model for neural networks.
• Sacks: model deterministic computer experiments withstochastic processes.
Multi-Layer GP
• SAL programs repeatedly aggregate/classify/redescribe, up anabstraction hierarchy. 7→ sequence of GP models, each withcovariance; superpose for composite.
• Input data field: interpolated surrogate for sparse samples.
• Locality (neighborhood graph — “close enough”) modeled by
R(x(k), x(l)) = ζn∏i=1
e−ρi|x(k)i −x
(l)i |
η
• Similarity in feature (equivalence predicate — “good-directionflow”) only applicable when combined with locality.⇒ Combined hyperparameters for position and direction.Hierarchical prior allows for determination of relativeimportance.
Case Study: Pocket Identification
Abstract wireless problem with de Boor “pocket” function.
α(X) = cos
(n∑i=1
2i(
1 +xi
| xi |
))− 2
δ(X) = ‖X− 0.5I‖
p(X) = α(X)(1− δ2(X)(3− 2δ(X))) + 1
−1−0.5
00.5
1
−1
−0.5
0
0.5
1−2
−1.5
−1
−0.5
0
0.5
1
Goal: identify number & locations of pockets (not func. approx.),with minimal # samples.
SAL Pocket Finding
−1 −0.5 0 0.5 1−1
−0.5
0
0.5
1
Test
Vary parameters (close-enough wrt r, similar-enough angle wrt θ,weight d for combining distance and angle):
r ∈ {1,√
2, 1.5,√
3, 2}
θ ∈ {0.7, 0.8, 0.85, 0.9, 0.95}
d ∈ {0.01, 0.02, 0.03, 0.04, 0.05}
Construct GP (i.e. estimate covariance terms) for flow classes usingNeal’s software, hybrid MC.
Number of Pockets
• d had little effect in this field, due to symmetry.
• Averaged over d, at varying (r, θ):
1 1.414 1.5 1.732 20
20
40
60
80
100
120
140
# po
cket
s
0.700.800.850.900.95
• Abrupt jump at θ = 0.95 — stringent vector similarity.
Covariance Contributions
1 1.414 1.5 1.732 20
1
2
3
4
5co
var c
ontri
b ρ x
0.700.800.850.900.95
1 1.414 1.5 1.732 20
1
2
3
4
5
cova
r con
trib
ρ y
0.700.800.850.900.95
• Basically symmetric.
• Increase quadratically with # pockets — can’t stray “too far”for prediction.
• Characteristic length, 1/ρ, decreases with # pockets —identified pockets occupy less of the space.
Discussion
• Model qualitative spatial data mining with stochastic processframework, summarizing transformation from input tohigh-level abstractions.
• Probabilistic basis allows sample optimization, studies ofparameter sensitivity, reasoning about algorithm applicability.
• Next steps: combined modeling of sensitivity to input andparameters.
• Thanks to Feng Zhao (PARC), Layne T. Watson (Va. Tech).
• Funding: NR (NSF EIA-9974956, EIA-9984317, andEIA-0103660) and CBK (NSF IIS-0237654).