aggregate query processing in ad-hoc sensor networks yong yao database lunch, apr. 15th

25
Aggregate Query Processing in Ad-Hoc Sensor Networks Yong Yao Database lunch, Apr. 15th

Post on 21-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Aggregate Query Processing in Ad-Hoc Sensor Networks

Yong YaoDatabase lunch, Apr. 15th

Outline

Motivating Example Sensor and Sensor Network Query Model In-Network Aggregates Routing and Aggregation Summary

Motivating Example

A several-hundred node ad-hoc network of sensors(Cougar) is deployed in Rhodes Hall and Upson Hall

The network is shared by all occupants The network is dynamic, and people can

add and remove sensors, and sensors frequently run out of power or crash

Motivating Example

People extract information from the environment by querying the network What is the temperature of my office? How many people are in the system lab? What’s the quietest conference room? Where is Johannes?

Next generation sensors

Data source: Sensors respond to physical stimulus (heat, light, or a motion) and produce events

Computation Ability: Sensors are active, full fledged computers

Communication Ability: Wireless connected, broadcast channel, self organized into a multi-hop network topology.

Limitation: Energy constrained and easy to crash.

Today’s Hardware - Motes

Assembled from off-the-shelf components

4Mhz, 8bit MCU (ATMEL) 512 bytes RAM, 8K ROM

900Mhz Radio (RF Monolithics) 10-100 ft. range

Temperature Sensor & Light Sensor

LED outputs Serial Port

1.5” x 1.5”

Sensor Network

Consist of a bunch of sensors, and gateway nodes(sinks).

As an ad-hoc network Static or quasi-static Dynamically changing Large scale

As a distributed database system with in-network query processing

Query Model

SELECT {agg(attr),attrs} FROM sensorsWHERE {spatial

constraint}GROUP BY {attrs}HAVING {havingPreds}DURATION {time}EVERY {period}

Example: What is the temperature of my office

Select AVG(temperature)From TemperatureSensor s

Where s in MY_OFFICEDuration 1hEvery 10sOpen Problem: What’s the best model of general

queries.

Aggregate Operator

Agg is implemented via three functions Merging function f :

<z>=f(<x>,<y>) <x> and <y> are multi-valued partial state records. For

avg, it is a two-tuple <SUM,COUNT> Initializer i to specify how to instantiate a state record

for a single sensor value Evaluator e takes a partial state record and computes

the actual value of the aggregate AVG:

f (<S1,C1>,<S2,C2>)=<S1+S2,C1+C2> i (x)=<x,1> e (<S,C>)=S/C

Aggregate operator classification

MAX,MIN COUNT,SUM

AVERAGE

MEDIAN

COUNT DISTINCT

Partial State

Distributive

Distributive

Algebraic

Holistic Unique

Duplicate Sensitive

No Yes Yes Yes No

Exemplary(E),Summary(S)

E S S E S

Monotonic Yes Yes No No Yes

In-Network Aggregation

Traditional Sensor Network (Fjord Architecture) Centralized server-based approach: All data are

sent back to the server. Sensors do not notice the content of user queries.

Example: What’s the temperature of my office? Tuple: <SensorID, Sensor Type, Value, Position, Time Stamp>

Problems: Not scalable Energy inefficient

Improvement Install a filter on each sensor

In-Network Aggregation

<z>=f(<x>,<y>) Computation Plan: How to divide sensors

into partitions Communication Plan: How to determine

next hop. Key Problem: Match computation plan to

communication plan. Example: What’s the temperature of the

fourth floor in the Upson Hall? Plan: Compute the temperature of each office

first, and then compute the final result.

In Network Aggregation

Two algorithms Cluster based algorithm

Divide and conquer: Divide the whole query region into smaller clusters, and execute the query in each cluster. Repeat the process until cluster size is small enough.

In-Network Aggregation

Cluster based algorithm Sensors close geographically are usually

close in hops The assumption is not always true Cluster leader election and maintenance

In Network Aggregation

Tree based Algorithm Create a Spanning Tree over the query

region Aggregate children data at the parent

node

In Network Aggregation

Pipelined Aggregation (TAG) Two phases:

Flooding phase: the routing tree is built and aggregate queries are pushed down into sensor networks

Aggregate phase: the aggregate values are continually routed up from children to parents

Epoch: the smallest time unit. Must bigger than the transition time of a packet

In Network Aggregation

An Example

In-Network Aggregation

Problems on the pipelined approach Epoch=?

Delay=Epoch * Depth of the tree Interval=Epoch

Fault tolerance Each link and node is a single failure point If a link close to the root is down, then …

If the query region only occupies a small part of the network, it is wasteful to create and maintain a global spanning tree

In-Network Aggregation

Solution: Local repair: Find a new route to the

tree Do aggregation when all data from

children are received. Requirements:

Monitor the network continuously Fast react to network topology changes

In-Network Aggregation

Go deep into the protocol stack Sensor network is task specific

Application Layer

Routing Layer

Link Layer

Mac Layer

Routing and Aggregation

A bunch of existing ad-hoc routing algorithms: AODV, DSDV, DSR, ZRP, Directed Diffusion, etc.

Classified into two main categories: Table Driven: DSDV, WRP Source-initiated On-Demand Driven: AODV,

DSR, TORA, SSR Two main tasks:

Route discovery Route maintenance

Routing and Aggregation

Can we use any existing ad-hoc routing protocol directly? Centralized algorithm and Cluster algorithm Tree based algorithm Different communication pattern

Ad-hoc network: Randomly selected source and destination pair

Sensor network: Query dissemination, data collection Predictable traffic workload

Routing and Aggregation

New Routing Algorithm Route Discovery: Similar to Table Driven

algorithm, the route information propagates from the destination to the source

Route Maintenance: Similar to Source-initiated On-Demand Driven, support local repair and cooperative repair. Periodically recreates all routes.

New Interface Send (Packet* p) Receive (Packet* p) Filter (Packet* p)

Ongoing Research

Query language and data model High level query processing algorithm Low level routing algorithm

Multiple query optimization Heterogeneous sensor network Approximate query processing

Summary

Sensor network is a large scale distributed database system. Each sensor is an independent data source

Cluster vs. Tree based algorithm Performance Fault tolerance Applications

How many people are in the system lab?

Interaction between in-network query processing and routing