tinydb tutorial

109
1 Implementation and Research Issues in Query Processing for Wireless Sensor Networks Wei Hong Intel Research, Berkeley whong@intel- research.net Sam Madden MIT [email protected] ICDE 2004

Upload: flashdomain

Post on 20-Jan-2015

2.426 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: TinyDB Tutorial

1

Implementation and Research Issues in Query Processing for Wireless

Sensor Networks

Wei Hong Intel Research, Berkeley

[email protected]

Sam MaddenMIT

[email protected]

ICDE 2004

Page 2: TinyDB Tutorial

2

Motivation• Sensor networks (aka sensor webs, emnets) are here

– Several widely deployed HW/SW platforms• Low power radio, small processor, RAM/Flash

– Variety of (novel) applications: scientific, industrial, commercial

– Great platform for mobile + ubicomp experimentation

• Real, hard research problems to be solved– Networking, systems, languages, databases

• We will summarize:– The state of the art– Our experiences building TinyDB– Current and future research directions

Berkeley Mote

Page 3: TinyDB Tutorial

3

Sensor Network Apps

Traditional monitoring apparatus.

Earthquake monitoring in shake-test sites.

Vehicle detection: sensors along a road, collect data about passing vehicles.

Habitat Monitoring: Storm petrels on Great Duck Island, microclimates on James Reserve.

Page 4: TinyDB Tutorial

4

Declarative Queries

• Programming Apps is Hard– Limited power budget– Lossy, low bandwidth communication– Require long-lived, zero admin deployments– Distributed Algorithms– Limited tools, debugging interfaces

• Queries abstract away much of the complexity– Burden on the database developers– Users get:

• Safe, optimizable programs• Freedom to think about apps instead of details

Page 5: TinyDB Tutorial

5

TinyDB: Prototype declarativequery processor

• Platform: Berkeley Motes + TinyOS• Continuous variant of SQL : TinySQL

• Power and data-acquisition based in-network optimization framework

• Extensible interface for aggregates, new types of sensors

Page 6: TinyDB Tutorial

6

Agenda

• Part 1 : Sensor Networks (50 Minutes)– TinyOS– NesC

• Short Break• Part 2: TinyDB (1 Hour)

– Data Model and Query Language– Software Architecture

• Long Break + Hands On• Part 3: Sensor Network Database

Research Directions (1 Hour, 10 Minutes)

Page 7: TinyDB Tutorial

7

Part 1

• Sensornet Background• Motes + Mote Hardware

– TinyOS– Programming Model + NesC

• TinyOS Architecture– Major Software Subsystems– Networking Services

Page 8: TinyDB Tutorial

8

A Brief History of Sensornets

• People have used sensors for a long time• Recent CS History:

– (1998) Pottie + Kaiser: Radio based networks of sensors

– (1998) Pister et al: Smart Dust• Initial focus on optical communication• By 1999, radio based networks, COTS Dust, “Motes”

– (1999) Estrin + Govindan• Ad-hoc networks of sensors

– (2000) Culler/Hill et al: TinyOS + Motes– (2002) Hill / Dust: SPEC, mm^3 scale computing

• UCLA / USC / Berkeley Continue to Lead Research•Many other players now•TinyOS/Motes as most common platform

• Emerging commercial space: • Crossbow, Ember, Dust, Sensicast, Moteiv, Intel

Page 9: TinyDB Tutorial

9

Why Now?

• Commoditization of radio hardware– Cellular and cordless phones, wireless

communication

• Low cost -> many/tiny -> new applications!

• Real application for ad-hoc network research from the late 90’s

• Coming together of EE + CS communities

Page 10: TinyDB Tutorial

10

Motes

4Mhz, 8 bit Atmel RISC uProc

40 kbit Radio

4 K RAM, 128 K Program Flash, 512 K Data Flash

AA battery pack

Based on TinyOS

Mica MoteMica Mote

Mica2DotMica2Dot

Page 11: TinyDB Tutorial

11

History of Motes

• Initial research goal wasn’t hardware– Has since become more of a priority with

emerging hardware needs, e.g.:• Power consumption• (Ultrasonic) ranging + localization

– MIT Cricket, NEST Project• Connectivity with diverse sensors

– UCLA sensor board

– Even so, now on the 5th generation of devices• Costs down to ~$50/node (Moteiv, Dust)• Greatly improved radio quality• Multitude of interfaces: USB, Ethernet, CF, etc.• Variety of form factors, packages

Page 12: TinyDB Tutorial

12

Motes vs. Traditional Computing

• Lossy, Adhoc Radio Communication

• Sensing Hardware• Severe Power Constraints

Page 13: TinyDB Tutorial

13

Radio Communication

• Low Bandwidth Shared Radio Channel– ~40kBits on motes– Much less in practice

• Encoding, Contention for Media Access (MAC)

• Very lossy: 30% base loss rate– Argues against TCP-like end-to-end

retransmission• And for link-layer retries

• Generally, not well behaved

From Ganesan, et al. “Complex Behavior at Scale.” UCLA/CSD-TR 02-0013

Page 14: TinyDB Tutorial

14

Types of Sensors

• Sensors attach via daughtercard

•Weather–Temperature–Light x 2 (high intensity PAR, low intensity, full spectrum)–Air Pressure–Humidity

•Vibration–2 or 3 axis accelerometers

•Tracking–Microphone (for ranging and acoustic signatures)–Magnetometer

• GPS

Page 15: TinyDB Tutorial

15

Power Consumption and Lifetime

• Power typically supplied by a small battery– 1000-2000 mAH– 1 mAH = 1 milliamp current for 1 hour

• Typically at optimum voltage, current drain rates

– Power = Watts (W) = Amps (A) * Volts (V)– Energy = Joules (J) = W * time

• Lifetime, power consumption varies by application– Processor: 5mA active, 1 mA idle, 5 uA sleeping– Radio: 5 mA listen, 10 mA xmit/receive, ~20mS / packet– Sensors: 1 uA -> 100’s mA, 1 uS -> 1 S / sample

Page 16: TinyDB Tutorial

16

• Each mote collects 1 sample of (light,humidity) data every 10 seconds, forwards it

• Each mote can “hear” 10 other motes• Process:

– Wake up, collect samples (~ 1 second)– Listen to radio for messages to forward (~1

second)– Forward data

Power Consumption Breakdown

0

10

20

30

40

50

60

70

80

90

Radio Sensors Processor

Hardware Element

Percentage of Total Power

Energy Usage in A Typical Data Collection Scenario

Processor Energy Breakdown

05

101520253035404550

Idle Waiting

for Radio

Waiting

for

Sensors

Sending

Processing Phase

Percentage of Total Energy

Page 17: TinyDB Tutorial

17

Sensors: Slow, Power Hungry, NoisyTime of Day vs. Light

-20

0

20

40

60

80

100

120

140

160

180

200

20:09 20:38 21:07 21:36 22:04 22:33 23:02 23:31 0:00 0:28 0:57 1:26

Time of Day

Lux

Chamber Sensor

Sensor 69

Time of Day vs. Light

-20

0

20

40

60

80

100

120

140

160

180

200

20:09 20:38 21:07 21:36 22:04 22:33 23:02 23:31 0:00 0:28 0:57 1:26

Time

Light (Lux)

Chamber Sensor

Sensor 69 (Median of Last 10)

Page 18: TinyDB Tutorial

18

Programming Sensornets: TinyOS

• Component Based Programming Model

• Suite of software components– Timers, clocks, clock synchronization– Single and multi-hop networking– Power management– Non-volatile storage management

Page 19: TinyDB Tutorial

19

Programming Philosophy

• Component Based– “Wiring” to components together via

interfaces, configurations

• Split-Phased– Nothing blocks, ever.– Instead, completion events are signaled.

• Highly Concurrent– Single thread of “tasks”, posted and

scheduled FIFO– Events “fired” asynchronously in response

to interrupts.

Page 20: TinyDB Tutorial

20

NesC

• C-like programming language with component model support– Compiles into GCC-compatible C

• 3 types of files:– Interfaces

• Set of function prototypes; no implementations or variables– Modules

• Provide (implement) zero or more interfaces• Require zero or more interfaces• May define module variables, scoped to functions in module

– Configurations• Wire (connect) modules according to requires/provides

relationship

Page 21: TinyDB Tutorial

21

Component Example: Leds

module LedsC { provides interface Leds;}implementation{ uint8_t ledsOn;

enum { RED_BIT = 1, GREEN_BIT = 2, YELLOW_BIT = 4 };

…. async command result_t Leds.redOn() { dbg(DBG_LED, "LEDS: Red on.\n"); atomic { TOSH_CLR_RED_LED_PIN(); ledsOn |= RED_BIT; } return SUCCESS; }….}

Page 22: TinyDB Tutorial

22

Configuration Example

configuration CntToLedsAndRfm {}implementation { components Main, Counter, IntToLeds, IntToRfm, TimerC;

Main.StdControl -> Counter.StdControl; Main.StdControl -> IntToLeds.StdControl; Main.StdControl -> IntToRfm.StdControl; Main.StdControl -> TimerC.StdControl; Counter.Timer -> TimerC.Timer[unique("Timer")]; IntToLeds <- Counter.IntOutput; Counter.IntOutput -> IntToRfm;}

Page 23: TinyDB Tutorial

23

Split Phase Examplemodule IntToRfmM { … }implementation { …command result_t IntOutput.output (uint16_t value) { IntMsg *message = (IntMsg *)data.data; if (!pending) { pending = TRUE; message->val = value; atomic { message->src = TOS_LOCAL_ADDRESS; } if (call Send.send(TOS_BCAST_ADDR, sizeof(IntMsg), &data)) return SUCCESS; pending = FALSE; } return FAIL; }

event result_t Send.sendDone (TOS_MsgPtr msg,

result_t success) { if (pending && msg == &data) { pending = FALSE; signal IntOutput.outputComplete (success); } return SUCCESS; }}

}

Page 24: TinyDB Tutorial

24

Major Components

• Timers: Clock, TimerC, LogicalTime

• Networking: Send, GenericComm, AMStandard, lib/Route

• Power Management: HPLPowerManagement

• Storage Management: EEPROM, MatchBox

Page 25: TinyDB Tutorial

25

Timers

• Clock: Basic abstraction over hardware timers; periodic events, single frequency.

• LogicalTime: Fire an event some number of H:M:S:ms in the future.

• TimerC: Multiplex multiple periodic timers on top of LogicalTime.

Page 26: TinyDB Tutorial

26

Radio Stack• Interfaces:

– Send• Broadcast, or to a specific ID• split phase

– Receive• asynchronous signal

• Implementations:– AMStandard

• Application specific messages• Id-based dispatch

– GenericComm• AMStandard + Serial IO

– Lib/Route• Mulithop

IntMsg *message = (IntMsg *)data.data;…message->val = value;atomic { message->src = TOS_LOCAL_ADDRESS;}call Send.send(TOS_BCAST_ADDR, sizeof(IntMsg), &data))

event TOS_MsgPtr ReceiveIntMsg.receive(TOS_MsgPtr m) {

IntMsg *message = (IntMsg *)m->data; call IntOutput.output(message->val); return m; }

Wiring to equate IntMsg to ReceiveIntMsg

Page 27: TinyDB Tutorial

27

Multihop Networking

• Standard implementation “tree based routing”

A

B C

D

FE

B B

B

BB

B

B

B

B

B BB

R:{…}

R:{…}

R:{…}

R:{…} R:{…}

Problems:

Parent SelectionAsymmetric LinksAdaptation vs. Stability

Node DNeigh QualB .75C .66E .45F .82

Node CNeigh QualA .5B .44D .53F .35

Page 28: TinyDB Tutorial

28

Geographic Routing• Any-to-any routing via geographic

coordinates– See “GPSR”, MOBICOM 2000, Karp + Kung.

A

B

•Requires coordinate system*

•Requires endpont coordinates

•Hard to route around local minima (“holes”)

*Could be virtual, as in Rao et al “Geographic Routing Without Coordinate Information.” MOBICOM 2003

Page 29: TinyDB Tutorial

29

Power Management

• HPLPowerManagement– TinyOS sleeps processor when possible– Observes the radio, sensor, and timer state

• Application managed, for the most part– App. must turn off subsystems when not in use– Helper utility: ServiceScheduler

• Peridically calls the “start” and “stop” methods of an app

– More on power management in TinyDB later– Approach works because:

• single application• no interactivity requirements

Page 30: TinyDB Tutorial

30

Non-Volatile Storage

• EEPROM– 512K off chip, 32K on chip– Writes at disk speeds, reads at RAM speeds– Interface : random access, read/write 256 byte

pages– Maximum throughput ~10Kbytes / second

• MatchBox Filing System– Provides a Unix-like file I/O interface– Single, flat directory– Only one file being read/written at a time

Page 31: TinyDB Tutorial

31

TinyOS: Getting Started

• The TinyOS home page:– http://webs.cs.berkeley.edu/tinyos– Start with the tutorials!

• The CVS repository– http://sf.net/projects/tinyos

• The NesC Project Page– http://sf.net/projects/nescc

• Crossbow motes (hardware):– http://www.xbow.com

• Intel Imote– www.intel.com/research/exploratory/motes.htm.

Page 32: TinyDB Tutorial

32

Part 2

The Design and Implementation of TinyDB

Page 33: TinyDB Tutorial

33

Part 2 Outline

• TinyDB Overview• Data Model and Query Language• TinyDB Java API and Scripting• Demo with TinyDB GUI• TinyDB Internals• Extending TinyDB• TinyDB Status and Roadmap

Page 34: TinyDB Tutorial

34

TinyDB RevisitedSELECT MAX(mag) FROM sensors WHERE mag > threshSAMPLE PERIOD 64ms

• High level abstraction:– Data centric programming– Interact with sensor

network as a whole– Extensible framework

• Under the hood:– Intelligent query

processing: query optimization, power efficient execution

– Fault Mitigation: automatically introduce redundancy, avoid problem areas

App

Sensor Network

TinyDB

Query, Trigger

Data

Page 35: TinyDB Tutorial

35

Feature Overview

• Declarative SQL-like query interface• Metadata catalog management• Multiple concurrent queries• Network monitoring (via queries)• In-network, distributed query processing• Extensible framework for attributes,

commands and aggregates• In-network, persistent storage

Page 36: TinyDB Tutorial

36

TinyDB GUI

TinyDB Client APIDBMS

Sensor network

Architecture

TinyDB query processor

0

4

0

1

5

2

6

3

7

JDBC

Mote side

PC side

8

Page 37: TinyDB Tutorial

37

Data Model

• Entire sensor network as one single, infinitely-long logical table: sensors

• Columns consist of all the attributes defined in the network

• Typical attributes:– Sensor readings– Meta-data: node id, location, etc.– Internal states: routing tree parent, timestamp, queue

length, etc.• Nodes return NULL for unknown attributes• On server, all attributes are defined in catalog.xml• Discussion: other alternative data models?

Page 38: TinyDB Tutorial

38

Query Language (TinySQL)

SELECT <aggregates>, <attributes>

[FROM {sensors | <buffer>}][WHERE <predicates>][GROUP BY <exprs>][SAMPLE PERIOD <const> |

ONCE][INTO <buffer>][TRIGGER ACTION <command>]

Page 39: TinyDB Tutorial

39

Comparison with SQL

• Single table in FROM clause• Only conjunctive comparison predicates

in WHERE and HAVING• No subqueries• No column alias in SELECT clause• Arithmetic expressions limited to

column op constant• Only fundamental difference: SAMPLE

PERIOD clause

Page 40: TinyDB Tutorial

40

TinySQL Examples

SELECT nodeid, nestNo, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s

1EpocEpoc

hhNodeiNodei

ddnestNnestN

ooLightLight

0 1 17 455

0 2 25 389

1 1 17 422

1 2 25 405

Sensors

“Find the sensors in bright nests.”

Page 41: TinyDB Tutorial

41

TinySQL Examples (cont.)

Epoch region CNT(…) AVG(…)

0 North 3 360

0 South 3 520

1 North 3 370

1 South 3 520

“Count the number occupied nests in each loud region of the island.”

SELECT region, CNT(occupied) AVG(sound)

FROM sensors

GROUP BY region

HAVING AVG(sound) > 200

EPOCH DURATION 10s

3

Regions w/ AVG(sound) > 200

SELECT AVG(sound)

FROM sensors

EPOCH DURATION 10s

2

Page 42: TinyDB Tutorial

42

Event-based Queries

• ON event SELECT …• Run query only when interesting events

happens• Event examples

– Button pushed– Message arrival– Bird enters nest

• Analogous to triggers but events are user-defined

Page 43: TinyDB Tutorial

43

Query over Stored Data

• Named buffers in Flash memory• Store query results in buffers• Query over named buffers• Analogous to materialized views• Example:

– CREATE BUFFER name SIZE x (field1 type1, field2 type2, …)

– SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO name

– SELECT field1, field2, … FROM name SAMPLE PERIOD d

Page 44: TinyDB Tutorial

44

Using the Java API

• SensorQueryer– translateQuery() converts TinySQL string into

TinyDBQuery object– Static query optimization

• TinyDBNetwork– sendQuery() injects query into network– abortQuery() stops a running query– addResultListener() adds a ResultListener that is

invoked for every QueryResult received– removeResultListener()

• QueryResult– A complete result tuple, or– A partial aggregate result, call mergeQueryResult()

to combine partial results• Key difference from JDBC: push vs. pull

Page 45: TinyDB Tutorial

45

Writing Scripts with TinyDB

• TinyDB’s text interface– java net.tinyos.tinydb.TinyDBMain –

run “select …”– Query results printed out to the

console– All motes get reset each time new

query is posed• Handy for writing scripts with shell,

perl, etc.

Page 46: TinyDB Tutorial

46

Using the GUI Tools

• Demo time

Page 47: TinyDB Tutorial

47

Inside TinyDB

TinyOS

Schema

Query Processor

Multihop Network

Filterlight >

400get (‘temp’)

Aggavg(tem

p)

QueriesSELECT AVG(temp) WHERE light > 400

ResultsT:1, AVG: 225T:2, AVG: 250

Tables Samples got(‘temp’)

Name: tempTime to sample: 50 uSCost to sample: 90 uJCalibration Table: 3Units: Deg. FError: ± 5 Deg FGet f : getTempFunc()…

getTempFunc(…)getTempFunc(…)

TinyDBTinyDB

~10,000 Lines Embedded C Code

~5,000 Lines (PC-Side) Java

~3200 Bytes RAM (w/ 768 byte heap)

~58 kB compiled code

(3x larger than 2nd largest TinyOS Program)

Page 48: TinyDB Tutorial

48

Tree-based Routing

• Tree-based routing– Used in:

• Query delivery • Data collection• In-network aggregation

– Relationship to indexing?

A

B C

D

FE

Q:SELECT …

Q Q

Q

QQ

Q

Q

Q

Q

Q QQ

R:{…}

R:{…}

R:{…}

R:{…} R:{…}

Page 49: TinyDB Tutorial

49

Power Management Approach

Coarse-grained app-controlled communication scheduling

1

2

3

4

5

Mote ID

time

Epoch (10s -100s of seconds)

2-4s Waking Period

… zzz … … zzz …

Page 50: TinyDB Tutorial

50

Time Synchronization

• All messages include a 5 byte time stamp indicating system time in ms– Synchronize (e.g. set system time to timestamp) with

• Any message from parent• Any new query message (even if not from parent)

– Punt on multiple queries– Timestamps written just after preamble is xmitted

• All nodes agree that the waking period begins when (system time % epoch dur = 0)– And lasts for WAKING_PERIOD ms

• Adjustment of clock happens by changing duration of sleep cycle, not wake cycle.

Page 51: TinyDB Tutorial

51

Extending TinyDB

• Why extending TinyDB?– New sensors attributes– New control/actuation commands– New data processing logic

aggregates– New events

• Analogous to concepts in object-relational databases

Page 52: TinyDB Tutorial

52

Adding Attributes

• Types of attributes– Sensor attributes: raw or cooked

sensor readings– Introspective attributes: parent,

voltage, ram usage, etc.– Constant attributes: constant values

that can be statically or dynamically assigned to a mote, e.g., nodeid, location, etc.

Page 53: TinyDB Tutorial

53

Adding Attributes (cont)

• Interfaces provided by Attr component– StdControl: init, start, stop– AttrRegister

• command registerAttr(name, type, len)• event getAttr(name, resultBuf, errorPtr)• event setAttr(name, val)• command getAttrDone(name, resultBuf, error)

– AttrUse• command startAttr(attr)• event startAttrDone(attr)• command getAttrValue(name, resultBuf, errorPtr)• event getAttrDone(name, resultBuf, error)• command setAttrValue(name, val)

Page 54: TinyDB Tutorial

54

Adding Attributes (cont)

• Steps to adding attributes to TinyDB1) Create attribute nesC components2) Wire new attribute components to

TinyDBAttr configuration 3) Reprogram TinyDB motes4) Add new attribute entries to catalog.xml

• Constant attributes can be added on the fly through TinyDB GUI

Page 55: TinyDB Tutorial

55

Adding Aggregates

• Step 1: wire new nesC components

Page 56: TinyDB Tutorial

56

Adding Aggregates (cont)

• Step 2: add entry to catalog.xml<aggregate>

<name>AVG</name>

<id>5</id>

<temporal>false</temporal>

<readerClass>net.tinyos.tinydb.AverageClass</readerClass>

</aggregate>

• Step 3 (optional): implement reader class in Java– a reader class interprets and finalizes aggregate state

received from the mote network, returns final result as a string for display.

Page 57: TinyDB Tutorial

57

TinyDB Status

• Latest released with TinyOS 1.1 (9/03)– Install the task-tinydb package in TinyOS 1.1

distribution– First release in TinyOS 1.0 (9/02)– Widely used by research groups as well as industry pilot

projects

• Successful deployments in Intel Berkeley Lab and redwood trees at UC Botanical Garden– Largest deployment: ~80 weather station nodes– Network longevity: 4-5 months

Page 58: TinyDB Tutorial

58

The Redwood Tree Deployment

• Redwood Grove in UC Botanical Garden, Berkeley

• Collect dense sensor readings to monitor climatic variations across– altitudes,– angles,– time,– forest locations, etc.

• Versus sporadic monitoring points with 30lb loggers!

• Current focus: study how dense sensor data affect predictions of conventional tree-growth models

Page 59: TinyDB Tutorial

59

Humidity vs. Time

35

45

55

65

75

85

95

Rel Humidity (%)

101 104 109 110 111

Data from Redwoods

36m

33m: 111

32m: 110

30m: 109,108,107

20m: 106,105,104

10m: 103, 102, 101

Temperature vs. Time

8

13

18

23

28

33

7/7/039:40

7/7/0313:11

7/7/0316:43

7/7/0320:15

7/7/0323:46

7/8/033:18

7/8/036:50

7/8/0310:21

7/8/0313:53

7/8/0317:25

7/8/0320:56

7/9/030:28

7/9/034:00

7/9/037:31

7/9/0311:03

Date

Temperature (C)

Page 60: TinyDB Tutorial

60

TinyDB Roadmap (near term)

• Support for high frequency sampling– Equipment vibration monitoring, structural

monitoring, etc.– Store and forward– Bulk reliable data transfer– Scheduling of communications

• Port to Intel Mote• Deployment in Intel Fab equipment monitoring

application and the Golden Gate Bridge monitoring application

Page 61: TinyDB Tutorial

61

For more information

• http://berkeley.intel-research.net/tinydb or http://triplerock.cs.bekeley.edu/tinydb

Page 62: TinyDB Tutorial

62

Part 3

Database Research Issues in Sensor Networks

Page 63: TinyDB Tutorial

63

Sensor Network Research

• Very active research area– Can’t summarize it all

• Focus: database-relevant research topics– Some outside of Berkeley– Other topics that are itching to be scratched– But, some bias towards work that we find

compelling

Page 64: TinyDB Tutorial

64

Topics

• In-network aggregation• Acquisitional Query Processing• Heterogeneity• Intermittent Connectivity• In-network Storage• Statistics-based summarization and

sampling• In-network Joins• Adaptivity and Sensor Networks• Multiple Queries

Page 65: TinyDB Tutorial

65

Topics

• In-network aggregation• Acquisitional Query Processing• Heterogeneity• Intermittent Connectivity• In-network Storage• Statistics-based summarization and

sampling• In-network Joins• Adaptivity and Sensor Networks• Multiple Queries

Page 66: TinyDB Tutorial

66

Tiny Aggregation (TAG)

• In-network processing of aggregates– Common data analysis operation

• Aka gather operation or reduction in || programming

– Communication reducing• Operator dependent benefit

– Across nodes during same epoch

• Exploit query semantics to improve efficiency!

Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002.

Page 67: TinyDB Tutorial

67

Basic Aggregation

• In each epoch:– Each node samples local sensors once– Generates partial state record (PSR)

• local readings • readings from children

– Outputs PSR during assigned comm. interval

• At end of epoch, PSR for whole network output at root

• New result on each successive epoch

• Extras:– Predicate-based partitioning via GROUP BY

1

2 3

4

5

Page 68: TinyDB Tutorial

68

Illustration: Aggregation

1 2 3 4 5

4 1

3

2

1

4

1

2 3

4

5

1

Sensor #

Inte

rval #

Interval 4SELECT COUNT(*) FROM sensors

Epoch

Page 69: TinyDB Tutorial

69

Illustration: Aggregation

1 2 3 4 5

4 1

3 2

2

1

4

1

2 3

4

5

2

Sensor #

Interval 3SELECT COUNT(*) FROM sensors

Inte

rval #

Page 70: TinyDB Tutorial

70

Illustration: Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1

4

1

2 3

4

5

31

Sensor #

Interval 2SELECT COUNT(*) FROM sensors

Inte

rval #

Page 71: TinyDB Tutorial

71

Illustration: Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1 5

4

1

2 3

4

5

5

Sensor #

SELECT COUNT(*) FROM sensors Interval 1

Inte

rval #

Page 72: TinyDB Tutorial

72

Illustration: Aggregation

1 2 3 4 5

4 1

3 2

2 1 3

1 5

4 1

1

2 3

4

5

1

Sensor #

SELECT COUNT(*) FROM sensors Interval 4

Inte

rval #

Page 73: TinyDB Tutorial

73

Aggregation Framework

• As in extensible databases, TinyDB supports any aggregation function conforming to:

Aggn={finit, fmerge, fevaluate}

Finit {a0} <a0>

Fmerge {<a1>,<a2>} <a12>

Fevaluate {<a1>} aggregate value

Example: AverageAVGinit {v} <v,1>

AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2>

AVGevaluate{<S, C>} S/C

Partial State Record (PSR)

Restriction: Merge associative, commutative

Page 74: TinyDB Tutorial

74

Property Examples Affects

Partial State MEDIAN : unbounded, MAX : 1 record

Effectiveness of TAG

Monotonicity COUNT : monotonicAVG : non-monotonic

Hypothesis Testing, Snooping

Exemplary vs. Summary

MAX : exemplaryCOUNT: summary

Applicability of Sampling, Effect of Loss

Duplicate Sensitivity

MIN : dup. insensitive,AVG : dup. sensitive

Routing Redundancy

Taxonomy of Aggregates

• TAG insight: classify aggregates according to various functional properties– Yields a general set of optimizations that can automatically be

applied

Drives an API!

Page 75: TinyDB Tutorial

75

Use Multiple Parents

• Use graph structure – Increase delivery probability with no communication

overhead

• For duplicate insensitive aggregates, or• Aggs expressible as sum of parts

– Send (part of) aggregate to all parents• In just one message, via multicast

– Assuming independence, decreases variance

SELECT COUNT(*)

A

B C

R

A

B C

c

R

P(link xmit successful) = p

P(success from A->R) = p2

E(cnt) = c * p2

Var(cnt) = c2 * p2 * (1 – p2) V

# of parents = n

E(cnt) = n * (c/n * p2)

Var(cnt) = n * (c/n)2 * p2 * (1 – p2) = V/n

A

B C

c/n c/n

R

n = 2

Page 76: TinyDB Tutorial

76

Multiple Parents Results

• Better than previous analysis expected!

• Losses aren’t independent!

• Insight: spreads data over many links

Benefit of Result Splitting (COUNT query)

0

200

400

600

800

1000

1200

1400

(2500 nodes, lossy radio model, 6 parents per node)

Avg. COUNT

Splitting

No Splitting

Critical Link!

No Splitting With Splitting

Page 77: TinyDB Tutorial

77

Acquisitional Query Processing (ACQP)

• TinyDB acquires AND processes data

– Could generate an infinite number of samples

• An acqusitional query processor controls

– when,

– where,

– and with what frequency data is collected!

• Versus traditional systems where data is provided a priori

Madden, Franklin, Hellerstein, and Hong. The Design of An Acqusitional Query Processor. SIGMOD, 2003.

Page 78: TinyDB Tutorial

78

ACQP: What’s Different?• How should the query be processed?

– Sampling as a first class operation

• How does the user control acquisition?– Rates or lifetimes– Event-based triggers

• Which nodes have relevant data?– Index-like data structures

• Which samples should be transmitted?– Prioritization, summary, and rate control

Page 79: TinyDB Tutorial

79

• E(sampling mag) >> E(sampling light)

1500 uJ vs. 90 uJ

Operator Ordering: Interleave Sampling + Selection

SELECT light, magFROM sensorsWHERE pred1(mag)AND pred2(light)EPOCH DURATION 1s

(pred1)

(pred2)

mag

light

(pred1)

(pred2)

mag

light

(pred1)

(pred2)

mag light

Traditional DBMS

ACQP

At 1 sample / sec, total power savings could be as much as 3.5mW Comparable to processor!

Correct orderingCorrect ordering(unless pred1 is (unless pred1 is very very selective selective

and pred2 is not):and pred2 is not):

Cheap

Costly

Page 80: TinyDB Tutorial

80

Exemplary Aggregate Pushdown

SELECT WINMAX(light,8s,8s)FROM sensorsWHERE mag > xEPOCH DURATION 1s

• Novel, general pushdown technique

• Mag sampling is the most expensive operation!

WINMAX

(mag>x)

mag light

Traditional DBMS

light

mag

(mag>x)

WINMAX

(light > MAX)

ACQP

Page 81: TinyDB Tutorial

81

Topics

• In-network aggregation• Acquisitional Query Processing• Heterogeneity• Intermittent Connectivity• In-network Storage• Statistics-based summarization and sampling• In-network Joins• Adaptivity and Sensor Networks• Multiple Queries

Page 82: TinyDB Tutorial

82

Heterogeneous Sensor Networks

• Leverage small numbers of high-end nodes to benefit large numbers of inexpensive nodes

• Still must be transparent and ad-hoc• Key to scalability of sensor networks• Interesting heterogeneities

– Energy: battery vs. outlet power– Link bandwidth: Chipcon vs. 802.11x– Computing and storage: ATMega128 vs.

Xscale– Pre-computed results– Sensing nodes vs. QP nodes

Page 83: TinyDB Tutorial

83

Computing Heterogeneity with TinyDB

• Separate query processing from sensing– Provide query processing on a small number of nodes– Attract packets to query processors based on “service

value”• Compare the total energy consumption of the

network

• No aggregation• All aggregation• Opportunistic aggregation• HSN proactive

aggregation

Mark Yarvis and York Liu, Intel’s Heterogeneous Sensor

Network Project, ftp://download.intel.com/research/people/HSN_IR_Day_Poster_03.pdf.

Page 84: TinyDB Tutorial

84

5x7 TinyDB/HSN Mica2 Testbed

Page 85: TinyDB Tutorial

85

Data Packet SavingData Packet Saving

-50.00%

-45.00%

-40.00%

-35.00%

-30.00%

-25.00%

-20.00%

-15.00%

-10.00%

-5.00%

0.00%

1 2 3 4 5 6 All (35)

Number of Aggregator

% Change in Data Packet Count

Data Packet Saving - Aggregator Placement

-50.00%

-45.00%

-40.00%

-35.00%

-30.00%

-25.00%

-20.00%

-15.00%

-10.00%

-5.00%

0.00%

25 27 29 31 All (35)

Aggregator Location

% Change in Data Packet Counnt

• How many aggregators are desired?

• Does placement matter?

11% aggregators achieve 72% of max

data reduction

Optimal placement 2/3 distance from sink.

Page 86: TinyDB Tutorial

86

Occasionally Connected Sensornets

TinyDB QPTinyDB QP

TinyDB Server

GTWY

Mobile GTWYMobile GTWY

TinyDB QP

Mobile GTWY

GTWY

internet

GTWY

Page 87: TinyDB Tutorial

87

Occasionally Connected Sensornets Challenges

• Networking support– Tradeoff between reliability, power

consumption and delay– Data custody transfer: duplicates?– Load shedding– Routing of mobile gateways

• Query processing– Operation placement: in-network vs. on mobile

gateways– Proactive pre-computation and data movement

• Tight interaction between networking and QP

Fall, Hong and Madden, Custody Transfer for Reliable Delivery in Delay Tolerant Networks, http://www.intel-research.net/Publications/Berkeley/081220030852_157.pdf.

Page 88: TinyDB Tutorial

88

Distributed In-network Storage

• Collectively, sensornets have large amounts of in-network storage

• Good for in-network consumption or caching

• Challenges– Distributed indexing for fast query

dissemination– Resilience to node or link failures– Graceful adaptation to data skews– Minimizing index insertion/maintenance cost

Page 89: TinyDB Tutorial

89

Example: DIM• Functionality

– Efficient range query for multidimensional data.

• Approaches– Divide sensor field into

bins.– Locality preserving

mapping from m-d space to geographic locations.

– Use geographic routing such as GPSR.

• Assumptions– Nodes know their

locations and network boundary

– No node mobility

E2= <0.6, 0.7>E1 = <0.7, 0.8>

Q1=<.5-.7, .5-1>

Xin Li, Young Jin Kim, Ramesh Govindan and Wei Hong, Distributed Index for Multi-dimentional Data (DIM) in Sensor Networks, SenSys 2003.

Page 90: TinyDB Tutorial

90

Statistical Techniques

• Approximations, summaries, and sampling based on statistics and statistical models

• Applications:– Limited bandwidth and large number of

nodes -> data reduction– Lossiness -> predictive modeling– Uncertainty -> tracking correlations and

changes over time– Physical models -> improved query

answering

Page 91: TinyDB Tutorial

91

Correlated Attributes

• Data in sensor networks is correlated; e.g.,– Temperature and voltage– Temperature and light– Temperature and humidity– Temperature and time of day– etc.

Page 92: TinyDB Tutorial

92

IDSQ

• Idea: task sensors in order of best improvement to estimate of some value:– Choose leader(s)

• Suppress subordinates• Task subordinates, one at a time

– Until some measure of goodness (error bound) is met

» E.g. “Mahalanobis Distance” -- Accounts for correlations in axes, tends to favor minimizing principal axis

See “Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks.” Chu, Haussecker and Zhao. Xerox TR P2001-10113. May, 2001.

Page 93: TinyDB Tutorial

93

Model location estimate as a point with 2-dimensional Gaussian uncertainty.

Graphical Representation

Principal Axis

S1

Residual 1

Preferred because it reduces error along principal axis

Residual 2 S2

Area of residuals is equal

Page 94: TinyDB Tutorial

94

MQSN: Model-based Probabilistic Querying over

Sensor Networks

Query ProcessorModel

1

23

5

6

4

7

8

9

Joint work with Amol Desphande, Carlos Guestrin,

and Joe Hellerstein

Page 95: TinyDB Tutorial

95

MQSN: Model-based Probabilistic Querying over

Sensor Networks

Query ProcessorModel

1

23

5

6

4

7

8

9

Probabilistic Queryselect NodeID, Temp ± 0.1Cwhere NodeID in [1..9] with conf(0.95)

Consult

Model

Observation Plan[Temp, 3], [Temp, 9]

Page 96: TinyDB Tutorial

96

MQSN: Model-based Probabilistic Querying over

Sensor Networks

Query ProcessorModel

1

23

5

6

4

7

8

9

Observation Plan[Temp, 3], [Temp, 9]

Probabilistic Queryselect NodeID, Temp ± 0.1Cwhere NodeID in [1..9] with conf(0.95)

Consult

Model

Page 97: TinyDB Tutorial

97

MQSN: Model-based Probabilistic Querying over

Sensor Networks

Query ProcessorModel

1

23

5

6

4

7

8

9

Data[Temp, 3] = …, [Temp, 9] = …

Query Results

10

15

20

25

30

1 2 3 4

Node ID

Temperature

Update

Model

Page 98: TinyDB Tutorial

98

Challenges

• What kind of models to use ?• Optimization problem:

– Given a model and a query, find the best set of attributes to observe

– Cost not easy to measure• Non-uniform network communication costs• Changing network topologies

– Large plan space• Might be cheaper to observe attributes not in query

– e.g. Voltage instead of Temperature• Conditional Plans:

– Change the observation plan based on observed values

Page 99: TinyDB Tutorial

99

MQSN: Current Prototype

• Multi-variate Gaussian Models– Kalman Filters to capture correlations across time

• Handles:– Range predicate queries

• sensor value within [x,y], w/ confidence– Value queries

• sensor value = x, w/in epsilon, w/ confidence– Simple aggregate queries

• AVG(sensor value) n, w/in epsilon, w/confidence

• Uses a greedy algorithm to choose the observation plan

Page 100: TinyDB Tutorial

100

In-Net Regression

• Linear regression : simple way to predict future values, identify outliers

• Regression can be across local or remote values, multiple dimensions, or with high degree polynomials– E.g., node A readings vs. node B’s– Or, location (X,Y), versus temperature

E.g., over many nodes

X vs Y w/ Curve Fit

y = 0.9703x - 0.0067

R2 = 0.947

0

2

4

6

8

10

12

1 3 5 7 9Guestrin, Thibaux, Bodik, Paskin, Madden. “Distributed Regression: an Efficient

Framework for Modeling Sensor Network Data .” Under submission.

Page 101: TinyDB Tutorial

101

In-Net Regression (Continued)

• Problem: may require data from all sensors to build model

• Solution: partition sensors into overlapping “kernels” that influence each other– Run regression in each kernel

• Requiring just local communication

– Blend data between kernels– Requires some clever matrix manipulation

• End result: regressed model at every node– Useful in failure detection, missing value

estimation

Page 102: TinyDB Tutorial

102

Exploiting Correlations in Query Processing

• Simple idea: – Given predicate P(A) over expensive attribute A– Replace it with P’ over cheap attribute A’ such that P’

evaluates to P – Problem: unless A and A’ are perfectly correlated, P’ ≠ P

for all time• So we could incorrectly accept or reject some readings

• Alternative: use correlations to improve selectivity estimates in query optimization– Construct conditional plans that vary predicate order

based on prior observations

Page 103: TinyDB Tutorial

103

Exploiting Correlations (Cont.)

• Insight: by observing a (cheap and correlated) variable not involved in the query, it may be possible to improve query performance – Improves estimates of selectivities

• Use conditional plans• Example

Light > 100 Lux

Temp < 20° C

Cost = 100Selectivity = .5

Cost = 100Selectivity = .5

Expected Cost = 150

Light > 100 Lux

Temp < 20° C

Cost = 100Selectivity = .5

Cost = 100Selectivity = .5

Expected Cost = 150

Light > 100 Lux

Temp < 20° C

Cost = 100Selectivity = .1

Cost = 100Selectivity = .9

Expected Cost = 110

Light > 100 Lux

Temp < 20° C

Cost = 100Selectivity = .1

Cost = 100Selectivity = .9

Expected Cost = 110

Time in [6pm, 6am]

T

F

Page 104: TinyDB Tutorial

104

In-Network Join Strategies

• Types of joins: – non-sensor -> sensor– sensor -> sensor

• Optimization questions:– Should the join be pushed down?– If so, where should it be placed?– What if a join table exceeds the

memory available on one node?

Page 105: TinyDB Tutorial

105

Choosing Where to Place Operators

• Idea : choose a “join node” to run the operator

• Over time, explore other candidate placements– Nodes advertise data rates to their neighbors– Neighbors compute expected cost of running the

join based on these rates– Neighbors advertise costs– Current join node selects a new, lower cost node

Bonfils + Bonnet, Adaptive and Decentralized Operator Placement for In-Network QueryProcessing IPSN 2003.

Page 106: TinyDB Tutorial

106

Topics

• In-network aggregation• Acquisitional Query Processing• Heterogeneity• Intermittent Connectivity• In-network Storage• Statistics-based summarization and

sampling• In-network Joins• Adaptivity and Sensor Networks• Multiple Queries

Page 107: TinyDB Tutorial

107

Adaptivity In Sensor Networks

• Queries are long running• Selectivities change

– E.g. night vs day

• Network load and available energy vary• All suggest that some adaptivity is needed

– Of data rates or granularity of aggregation when optimizing for lifetimes

– Of operator orderings or placements when selectivities change (c.f., conditional plans for correlations)

• As far as we know, this is an open problem!

Page 108: TinyDB Tutorial

108

Multiple Queries and Work Sharing

• As sensornets evolve, users will run many queries simultaneously– E.g., traffic monitoring

• Likely that queries will be similar– But have different end points, parameters,

etc

• Would like to share processing, routing as much as possible

• But how? Again, an open problem.

Page 109: TinyDB Tutorial

109

Concluding Remarks

• Sensor networks are an exciting emerging technology, with a wide variety of applications

• Many research challenges in all areas of computer science– Database community included– Some agreement that a declarative interface is right

• TinyDB and other early work are an important first step

• But there’s lots more to be done!