approximating sensor network queries using in-network summaries alexandra meliou carlos guestrin...

Approximating Sensor Network Queries Using In-Network

Summaries

Alexandra Meliou

Carlos Guestrin

Joseph Hellerstein

Approximate Answer Queries Approximate representation of the world:

Discrete locations Lossy communication Noisy measurements

Applications do not expect accurate values (tolerance to noise)

Example: Return the temperature at all locations ±1C, with 95% confidence

Query Satisfaction: On expectation the requested portion of sensor values lies within the

error range

In-network DecisionsQuery

Use in-network models to make routing decisions

No centralized planning

In-network Summaries

Spanning tree T(V,E’)

Models Mv for all nodes v

Mv represents the whole subtree rooted at v.

Model Complexity

Need for compression

Gaussian distributions at the leaves:• good for modeling individual node

measurements

Talk “outline”

Compression

TraversalConstruction

In-network summaries

Collapsing Gaussian Mixtures Compress an m-size

mixture to a k-size mixture.

Look at simple case (k=1) Minimize KL-

divergence?

“Fake” mass

Quality of Compression

Depends on query workload

Query with acceptable error window WQuery with acceptable error window W’<W

Compression

Accurate mass inside interval

No guarantee on the tails

f (x)dxz−w

N(μ,σ 2)dxμ−w

∫ = N i(μ i,σ i2)dx

μ−w

Talk “outline”

Compression

Query Satisfaction

A response R={r1…rn} satisfies query Q(w,δ) if: In expectation the values of at least δn nodes lie

within [ri-w,ri+w]

f i(x)dxri −w

ri +w∫i

∑ ≥ δn

In-network summary

R [r1, r2, r3, r4, r5, r6, r7, r8, r9, r10]

Within error bounds

Optimal Traversal Given: tree and models Find: subtree such that

T =G(V ,E)

G(V ',E '), E '⊆ E

Mass(Mv,w) ≥ δnleaves∑

Can be computed with Dynamic Programming

response [μleaves]

Greedy Traversal If local model satisfies

Return μ Else descend to child node

f (x)dxμ−w

∫ ≥ δ

More conservative solution:enforces query satisfiability on every subtree instead of the whole tree

Traversal Evaluation

Talk “outline”

Compression

Optimal Tree Construction

Given a structure, we know how to build the models

But how do we pick the structure?

Traversal = cut

Theorem: In a fixed fanout tree, the cost of the traversal is where |C| is the size of the cut, and F the fanout

FF−1 |C | −1( )

Intuition: minimize cut size

Group nodes into a minimum number of groups which satisfy the query constraints

Clustering problem

Optimal Clustering

Given a query Q(w,δ), optimal clustering is NP-hard Related to the Group Steiner Tree Problem

Greedy algorithm with factor log(n) approximation Greedily pick max size cluster Issue: does not enforce connectivity of

clusters

Greedy Clustering Include extra nodes to enforce connectivity

Augment clusters only with accessible nodes (losing the logn guarantee)

Clustering comparison 2 distributed clustering algorithms are compared to the centralized

greedy clustering

Talk “outline”

Compression

Enriched models

Enriched models Support more complex models

k-mixtures• Compress to a k-size mixture instead of a SGM

Virtual nodes• Every component of the k-size mixture is stored as a

separate “virtual node” SGMs on multiple windows

• Maintain additional SGMs for different window sizes

More space, more expensive model updates

(SGM = Single Gaussian Model)

Evaluation of enriched models

SGM surprisingly effective in representing the underlying data

Sensitivity analysis

Talk “outline”

Compression

Tree Construction Parameters and Effect on Performance

Confidence Performance for workloads of different confidence

than the hierarchy design

Error window Broader vs narrower ranges of window sizes Assignment of windows across tree levels

Temporal changes How often should the models be updated

Confidence

Workload of 0.95 confidence

Design confidence does not have a big impact on performance

Error windows

A wide range is not always better, because it forces the traversal of more levels

Model Updates

Sensitivity analysis

Conclusions

Analyzed compression schemes for in-network summaries

Evaluated summary traversal Studied optimal hierarchy construction Studied increased complexity models

Showed that simple SGM are sufficient Analyzed the effect on efficiency of various

parameters

Compression

In-network summariesEnriched models

approximating sensor network queries using in-network summaries alexandra meliou carlos guestrin...

unic ation c

window size nu mber

network models

size mixture

acceptable error window

acceptable error window

network decisions queryuse

confidence query satisfaction

Documents

decision trees - university of washington · decision trees...

federal tax - duke university school of law · federal...

daniel hellerstein (ers) and sean sylvia (arec/umd)

online query processing joseph m. hellerstein uc berkeley

metro maps of dafna shahaf carlos guestrin eric horvitz

databases 101 & multimedia support joe hellerstein computer...

multiagent planning with factored mdps carlos guestrin...

1 em for bns graphical models – 10708 carlos guestrin...

carlos guestrin

telegraph: an adaptive global- scale query engine joe...

08 leket hellerstein against girl songs a

expectation...

learning tree conditional random fields joseph k. bradley...

a sketch of regres mike carey joey hellerstein michael...

carnegie mellon university danny bickson yucheng low aapo...

telegraph endeavour retreat 2000 joe hellerstein

unsupervised learning or clustering – k-means gaussian...

jonathan huangcarlos guestrin carnegie mellon university...

telegraph continuously adaptive dataflow joe hellerstein

alvin k hellerstein financial disclosure report for 2009