challenges in sensor network query processing

56
Challenges in Sensor Network Query Processing Sam Madden NEST Retreat January 15, 2002

Upload: haroun

Post on 22-Jan-2016

61 views

Category:

Documents


0 download

DESCRIPTION

Challenges in Sensor Network Query Processing. Sam Madden NEST Retreat January 15, 2002. Outline. Background Server Side Solutions Fjords, Sensor Proxies, CACQ Sensor Side Solutions Catalog Management Aggregation Future Work. Background: Query Processors. What is a Query?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Challenges in Sensor Network Query Processing

Challenges in Sensor Network Query Processing

Sam MaddenNEST Retreat

January 15, 2002

Page 2: Challenges in Sensor Network Query Processing

Outline Background Server Side Solutions

Fjords, Sensor Proxies, CACQ Sensor Side Solutions

Catalog Management Aggregation

Future Work

Page 3: Challenges in Sensor Network Query Processing

Background: Query Processors

Page 4: Challenges in Sensor Network Query Processing

What is a Query? Declarative statement requesting a

subset of data Possibly transforming or computing

statistics about that data Data independent

Query can apply to any data

Page 5: Challenges in Sensor Network Query Processing

What is a Query Processor? Converts declarative queries into flow of

data operators, a query plan Relational Operators:

Project, Select, Join ‘Scans’ read data from base relations,

indices, etc. Traditional Flows:

Pull based, ‘iterator model’ Higher level operators call ‘getNext()’ to

extract data from lower level operators

Page 6: Challenges in Sensor Network Query Processing

Query Optimizer Given a declarative query, build

the ‘best’ query plan Choose which operators to run What order to run them in Where to run them

In distributed databases

Page 7: Challenges in Sensor Network Query Processing

Why Databases and Sensors? All applications depend on data

processing Declarative query language over

sensors attractive Application specific solutions difficult to built

and deploy Want “to combine and aggregate data

streaming from motes.” Sounds like a database…

Page 8: Challenges in Sensor Network Query Processing

New Problems In Sensor Databases

Sensors unreliable Come on and offline, variable bandwidth

Sensors push data Sensors stream data Sensors have limited memory, power,

bandwidth Communication very expensive

Sensors have processors Sensors very numerous

Page 9: Challenges in Sensor Network Query Processing

Components of A Sensor Database

Server Side Query Parser Catalog Query Optimizer Query Executor Query Processor

Sensor Side Catalog ‘Advertisements’ Query Processor Network Management

Page 10: Challenges in Sensor Network Query Processing

Outline Background Server Side Solutions

Fjords, Sensor Proxies, CACQ Sensor Side Solutions

Catalog Management Aggregation

Future Work

Page 11: Challenges in Sensor Network Query Processing

Fjords

Query Plan Abstraction to handle lack of reliability and streaming, push based data

Combine push and pull in arbitrary combinations Use connectors between operators to isolate

them from flow direction “Bracket Model” – Graefe ‘93

Page 12: Challenges in Sensor Network Query Processing

Fjords (Continued) Operators assume non-blocking queue

interface between each other. Queues implement push vs. pull

Pull from A to B : Suspend A, schedule B until it produces data. A cannot go forward until B produces data.

Push from B to A : A polls, scheduler thread invokes B until it produces data. A can process other inputs while waiting for B.

Supports parallelism between operators via queues, state machines, and OS (e.g. NIC buffers, DMA) in operator transparent way.

Page 13: Challenges in Sensor Network Query Processing

Fjords Example

Push

Push

Pull

Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

Page 14: Challenges in Sensor Network Query Processing

Fjords Example

Push

Push

Pull

Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

Page 15: Challenges in Sensor Network Query Processing

Fjords Example

Push

Push

Pull

Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

Page 16: Challenges in Sensor Network Query Processing

Fjords Example

Push

Push

Pull

Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

Page 17: Challenges in Sensor Network Query Processing

Fjords Example

Push

Push

Pull

Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

Page 18: Challenges in Sensor Network Query Processing

Fjords Example

Push

Push

Pull

Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

Page 19: Challenges in Sensor Network Query Processing

Fjords Example

Push

Push

Pull

Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

Page 20: Challenges in Sensor Network Query Processing

Fjords Example

Push

Push

Pull

Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

Page 21: Challenges in Sensor Network Query Processing

Fjords Example

Push

Push

Pull

Samuel Madden, Michael J. Franklin. Fjording The Stream: An Architecture For Queries Over Streaming Sensor Data. International Conference on Data Engineering, 2002. To Appear, Feburary 2002.

Page 22: Challenges in Sensor Network Query Processing

Fjords Applications Combine traffic streams with web-based

accident reports

Francis Li, Sam Madden, Megan Thomas. Traffic Visualization. http://www.cs.berkeley.edu/~mct/infovis/project/traffic.html

Page 23: Challenges in Sensor Network Query Processing

Operators for Streaming Data Need special operators for dealing

with streams (See P. Seshadri, et al. The design and

implementation of a sequence database systems..VLDB ’96) In particular, streams can’t be joined or

sorted in the traditional sense Solution: Use windows – e.g. “Zipper Join”

Page 24: Challenges in Sensor Network Query Processing

Sensor Proxy Energy-sensitive database operator

Buffer sensor tuples and route to multiple user queries to hide query load from sensors

Push aggregation operators into sensors to reduce communications load

Dynamically adjust sample rate based on user demand

Push results into Fjords so that other operators don’t block waiting on slow or dead sensors

Page 25: Challenges in Sensor Network Query Processing

Some Results Pushing predicates into sensors

can vastly reduce costs:

Power Drain (W) vs. Sample Method

00.0010.0020.0030.0040.0050.0060.0070.008

Every Sample Every Vehicle

Sampling Method

Po

wer

(W

)

Atmel Simulator

100 samples / sec

5 vehicles / sec

7x power savings

Page 26: Challenges in Sensor Network Query Processing

CACQ Expect hundreds to thousands of

queries over same sensor sources Continuously Adaptive Continuous

Queries Continuous Queries: Long running queries

which combine selections and joins to improve efficiency (See Chen, NiagaraCQ, SIGMOD 2000)

Stocks.

symbol = ‘MSFT’

Stocks.

symbol = ‘APPL’

Query 2

Query 1

Stock Quotes

‘MSFT’

‘APPL’

Stock Quotes

Page 27: Challenges in Sensor Network Query Processing

CACQ (Cont.) Continuous Adaptivity From Eddies Route tuples differently, depending

on selectvity and cost estimates of operators

staticdataflow

eddy

Diagrams Courtesy Joe Hellerstein

Page 28: Challenges in Sensor Network Query Processing

CACQ (cont.) Combining CA with CQ is a win:

CQ increases number of simultaneous queries

Adaptivity well suited to long running queries

Eddies allow us to avoid ugly query-optimization phase in traditional CQ

Eddies + Streams == few copies, unlike traditional CQ

Page 29: Challenges in Sensor Network Query Processing

CACQ (cont)

Look for a paper in SIGMOD 2002 (fingers crossed!)

Page 30: Challenges in Sensor Network Query Processing

Outline Background Server Side Solutions

Fjords, Sensor Proxies, CACQ Sensor Side Solutions

Catalog Management Aggregation

Future Work

Page 31: Challenges in Sensor Network Query Processing

Sensor Side Solutions CACQ + Fjords provides interface

+ performance on QP, but sensors still need help: Locate / identify sensors Reduce power consumption

Take advantage of processors? Improve responsiveness

Page 32: Challenges in Sensor Network Query Processing

Cataloging Sensors To query sensors, need a way to

locate, identify properties, extract values

Goal: Drop a bunch of sensors around the DBMS, allow them to be queried without manual effort

Idea: Add a layer to each sensor which advertises its capabilities

Page 33: Challenges in Sensor Network Query Processing

Catalog (Continued)#temperature sensor field {

name : "temp" #optionaltype : int units : celsiusmin : -20 max : 100 bits : 8 sample_cost : 10.0 J #optional -- for use in costing sample_time : 10.0 ms #optional -- for use in costing input : adc2 #optional : read from adc channel 1 sends : ondemand accessorEvent : GET_TEMPERATURE_DATA responseEvent : TEMPERATURE_DATA_READY

}

Compiled in 27 bytes of memory

Layer to register with Query Processor

Can be “push” or “pull”

Page 34: Challenges in Sensor Network Query Processing

Aggregating Over Sensors Sensor Proxy combines user

queries, pushes down aggregates Goal: Save energy, increase

efficiency Idea: Take advantage of the

routing hierarchy

Page 35: Challenges in Sensor Network Query Processing

Why bother with aggregation Individual sensor readings are of limited use

Interest in higher level properties, e.g. what vehicles drove through, what is the spread of temperatures in the building

We have a processor & network on board, lets use it We cannot survive without aggregation

Delivering a message to all nodes much easier than delivering a message from each node to a central point

Delivering a large amount of data from every node harder still, vide connectivity experiment

Forwarding raw information too expensive Scarce energy Scarce bandwidth Multihop performance penalty

Page 36: Challenges in Sensor Network Query Processing

Aggregation challenges Inherently unreliable environment, certain information

unavailable or expensive to obtain how many nodes are present? how many nodes are supposed to respond? what is the error distribution (in particular, what about malicious

nodes?) Trying to build an infrastructure to remove all uncertainty from

the application may not be feasible – do we want to build distributed transactions?

Information trickles in one message at a time Never have a complete and up-to-date information about the

neighborhood What type of information should we expect from aggregation

Streams Robust estimates

Page 37: Challenges in Sensor Network Query Processing

What does it mean to aggregate(The DB Perspective)

General purpose solution: apply standard aggregation operators like COUNT, MIN, MAX, AVERAGE, and SUM to any set of sensors.

Existing solutions are application specific In sensors, operators may be arbitrary signal processing functions By assuming a standard interface, many optimizations are possible

Example: TopN queries via hypothesis testing Provide grouping semantics: e.g. ‘select avg(temp) group by

trunc(light/10)’ In sensor networks, groups may be random samples

t1 t2 t3

t4 t5 t6

t7 t8 t9

Page 38: Challenges in Sensor Network Query Processing

Outline Background Server Side Solutions

Fjords, Sensor Proxies, CACQ Sensor Side Solutions

Catalog Management Aggregation

Future Work

Page 39: Challenges in Sensor Network Query Processing

Future Work DBMS Side

Efficient Catalog Management Moving Object Databases

Query Optimization Techniques Sensor Side

Efficient Grouping Joins over Network Topology Non Standard Aggregate Functions

Somewhere In Between Histograms and other Correlations Sampling and Compression for Streams Real Query Language / API Demonstration Apps (SIGMOD Demo)

Page 40: Challenges in Sensor Network Query Processing

Questions?

Page 41: Challenges in Sensor Network Query Processing

2

1

3

4

5

Scenario: Count

Page 42: Challenges in Sensor Network Query Processing

2

1

3

4

5

Scenario: Count

Goal: Count the number of nodes in the network.

Number of children is unknown.

1 2 3 4 5- - - - -

- - - - -

- - - - -

- - - - -

- - - - -

- - - - -

- - - - -

Sensor #

Time

Page 43: Challenges in Sensor Network Query Processing

2

1

3

Scenario: Count

Goal: Count the number of nodes in the network.

Number of children is unknown.

1 2 3 4 51 - - - -

- - - - -

- - - - -

- - - - -

- - - - -

- - - - -

- - - - -

Sensor #

Time

Page 44: Challenges in Sensor Network Query Processing

2

1

3

4

Scenario: Count

Goal: Count the number of nodes in the network.

Number of children is unknown.

1 2 3 4 51 - - - -

1 1 1 - -

1 + 2

1 1 - -

- - - - -

- - - - -

- - - - -

- - - - -

Sensor #

Time

Page 45: Challenges in Sensor Network Query Processing

2

1

3

4

5

Scenario: Count

Goal: Count the number of nodes in the network.

Number of children is unknown.

1 2 3 4 51 - - - -

1 1 1 - -

1 + 2

1 1 1 -

1 + 2

1 + ½

1 + ½

1 -

- - - - -

- - - - -

- - - - -

Sensor #

Time

Page 46: Challenges in Sensor Network Query Processing

2

1

3

4

5

Scenario: Count

Goal: Count the number of nodes in the network.

Number of children is unknown.

1 2 3 4 51 - - - -

1 1 1 - -

1 + 2

1 1 1 -

1 + 2

1 + ½

1 + ½

1 1

1+3 1+ ½

1+ ½

1+1 1

- - - - -

- - - - -

Sensor #

Time

Page 47: Challenges in Sensor Network Query Processing

2

1

3

4

5

Scenario: Count

Goal: Count the number of nodes in the network.

Number of children is unknown.

1 2 3 4 51 - - - -

1 1 1 - -

1 + 2

1 1 1 -

1 + 2

1 + ½

1 + ½

1 1

1+3 1+ ½

1+ ½

1+1 1

1+3 1+2/2

1+2/2

1+1 1

- - - - -

Sensor #

Time

Page 48: Challenges in Sensor Network Query Processing

2

1

3

4

5

Scenario: Count

Goal: Count the number of nodes in the network.

Number of children is unknown.

1 2 3 4 51 - - - -

1 1 1 - -

1 + 2

1 1 1 -

1 + 2

1 + ½

1 + ½

1 1

1+3 1+ ½

1+ ½

1+1 1

1+3 1+2/2

1+2/2

1+1 1

1+4 1+2/2

1+2/2

1+1 1

Sensor #

Time

Page 49: Challenges in Sensor Network Query Processing

Counting Lessons Take advantage of redundancy to

improve accuracy (reply to all parents, not just one)

Use broadcast to reduce number of messages

Result is a stream of values: much more robust to failures, movement, or collision than a single value.

Page 50: Challenges in Sensor Network Query Processing

Aggregation in network programming

Network programming problem Reliable delivery of a large number of messages to all nodes in

range, while exploiting the broadcast nature of the medium Basic setup

Broadcast a known number of idempotent program fragments Each node keeps a bitmap of fragments received (1=packet

received) Two stages of the problem: single hop, and multihop

Solutions Single hop, dense cell

Broadcasting the program – trivial, the central node broadcasts Feedback from nodes – broadcast a request from the central node:

Is anyone missing packets in this packet range? Convergence: no replies to the request

Page 51: Challenges in Sensor Network Query Processing

Aggregation in multihop network programming

Broadcasting the program – use flooding Remember the last 8 packets forwarded, use that cache to

decide whether to forward or not Feedback from nodes

Distribute requests for feedback using the flooding After some delay, respond if any packets are missing

locally Responses from children: AND with the local bitmap, store

the result locally, forward the request Suboptimal because there is no local fixups

Convergence No replies to the request

Page 52: Challenges in Sensor Network Query Processing

Aggregation over streams Inherent uncertainty of the system

Can nodes communicate, do they have enough power, have they moved?

computing a complete single answer can be very expensive, and may not be possible

Partial estimates have their own value Aggregation over streams

Values reflect the current best estimates Self stabilizing: in the absence of changes

converges to a desired value within N steps

Page 53: Challenges in Sensor Network Query Processing

Identifying Groups Need a way to identify groups

Idea: set of membership criteria pushed down Nodes determine their membership set based on those criteria Nodes can be in multiple but not unlimited groups E.g. “Group 1 : 0 <= t < 10, Group 2 : 10 <= t < 20, …”

Need a way to evaluate aggregation predicates by group

May want to allow grouping and aggregation predicates to be expressed together to take advantage of broadcast effects

Page 54: Challenges in Sensor Network Query Processing

Local Query Rewrite Intermediate nodes may determine that its

faster to evaluate an aggregate by asking children a different question.

Example 1: MAX(t). Once we have a guess T for MAX, ask children to report iff t > T, rather than asking all children to compute a local maximum.

Example 2: Network programming. Rather than asking nodes what packets they have, ask them to report iff packets missing.

Is this a general technique? Maybe: Inform child of guess at aggregate, ask it to refute.

Works for average (within error bound), not count.

Page 55: Challenges in Sensor Network Query Processing

Wins and pitfalls of aggregation

Aggregation over natural network topology Aggregation over an arbitrary subset of the network

may be a loss Really dense cells

Aggregation does not help with the starvation problem

Use the message suppression via query rewrite technique

Still beneficial in a multihop scenario

Page 56: Challenges in Sensor Network Query Processing

Advanced Aggregation Tricks Break the Network Protocol

Boundary Use analog reading from channel

over time to determine aggregates. Simple example:Time

Sum

Reading = 11 = 110100

Reading = 21 = 101010

Reading = 32 = 2 + 2 + 4 + 8 + 16