[ieee 2009 fifth international conference on mobile ad-hoc and sensor networks - fujian province, wu...

8
Processing Area Queries in Wireless Sensor Networks Chunyu Ai ∗† , Longjiang Guo †∗ , Zhipeng Cai and Yingshu Li Department of Computer Science, Georgia State University, Atlanta, Georgia 30303, USA Department of Computer Science, Heilongjiang University, Harbin, China Department of Basic Science, Mississippi State University, Mississippi 39762, USA Email: [email protected], [email protected], [email protected] and [email protected] Abstract—Area query processing is significant for various applications of wireless sensor networks. No previous study has specifically addressed this issue. We can adopt a naive method, which is to send all data to Base Station for centralized processing. However, this method wastes a large amount of energy for reporting useless data. This motivates us to propose an energy-efficient in-network area query processing scheme. In our scheme, the whole monitored area is partitioned into grids, and a gray code is used to represent a Grid ID (GID), which is a smart way to describe an area. Furthermore, a reporting tree is constructed to process merging areas and aggregations. Based on the properties of GIDs, useless data can be dropped and areas can be merged as early as possible. Incremental update is used to continuously generate query results. In essence, all of these strategies are pivots to conserve energy consumption. With a thorough simulation study, it is shown that our scheme is energy-efficient. Keywords-wireless sensor network; area query; energy- efficiency; in-network processing; I. I NTRODUCTION Sensor networks are being used in many significant appli- cations such as environmental monitoring, habitat monitor- ing, industrial process control, and object tracking. Sensor- based applications extract different kinds of data via col- lecting data, processing in-network aggregations, and de- tecting complicated events. Since sensor nodes have limited resources, novel data management techniques are needed to satisfy application requirements considering characteristics of wireless sensor networks, especially, the limitation of resources. Most existing data collection systems are query-based ones. Traditional query processing techniques of wireless sensor networks mainly deal with retrieving sensor node locations, sensing values, and aggregating the sensed values. However, in a lot of applications, users expect information about areas of their interests. For instance, workers in a coal mine want to find an area with a high oxygen density to take a break, and this area must be big enough to accommodate several workers. In Figure 1, all sensing values of the sensors in regions R1 through R5 reach the expected oxygen density. However, regions R3 and R5 are not big enough for several workers, so R3 and R5 are not the areas users expect. For this application, existing methods may only return the locations and sensing values of the sensors, which is meaningless and R1 R2 R3 R4 R5 Figure 1. Area Queries helpless. Here, users want to find areas instead of several locations since the size of an area should also be a filtering condition. Another interesting scenario is an air pollution monitor- ing application within a city. Users expect this monitoring system to detect regions whose pollution levels reach thresh- olds. In reality, the polluted thresholds might be different for different areas. For example, the threshold of an industrial area is 80, and that of a residential area is 50. Usually, traditional methods apply the same filtering condition for the whole network. So the existing methods cannot be used for this application. This application concerns specific areas, and different filtering conditions should be applied for each area. We define Area Query as requests for area information including area locations, sizes of areas, and collected and aggregated data of areas. Sensor nodes with expected read- ings and adjacent sensing coverage are divided into the same group. The total coverage area of sensor nodes in the same group is a possible result area. Area Query can retrieve not only sensing values but also specific geographical in- formation compared to traditional queries of wireless sensor networks. Area Query is more practicable and useful than traditional queries for some applications, which require ge- ographical information. A common property of Area Query 2009 Fifth International Conference on Mobile Ad-hoc and Sensor Networks 978-0-7695-3935-5/09 $26.00 © 2009 IEEE DOI 10.1109/MSN.2009.72 1

Upload: yingshu

Post on 22-Mar-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2009 Fifth International Conference on Mobile Ad-hoc and Sensor Networks - Fujian Province, Wu Yi Mountain, China (2009.12.14-2009.12.16)] 2009 Fifth International Conference

Processing Area Queries in Wireless Sensor Networks

Chunyu Ai∗†, Longjiang Guo†∗, Zhipeng Cai‡ and Yingshu Li∗∗Department of Computer Science, Georgia State University, Atlanta, Georgia 30303, USA

†Department of Computer Science, Heilongjiang University, Harbin, China‡Department of Basic Science, Mississippi State University, Mississippi 39762, USAEmail: [email protected], [email protected], [email protected] and [email protected]

Abstract—Area query processing is significant for variousapplications of wireless sensor networks. No previous studyhas specifically addressed this issue. We can adopt a naivemethod, which is to send all data to Base Station for centralizedprocessing. However, this method wastes a large amount ofenergy for reporting useless data. This motivates us to proposean energy-efficient in-network area query processing scheme.In our scheme, the whole monitored area is partitioned intogrids, and a gray code is used to represent a Grid ID (GID),which is a smart way to describe an area. Furthermore, areporting tree is constructed to process merging areas andaggregations. Based on the properties of GIDs, useless datacan be dropped and areas can be merged as early as possible.Incremental update is used to continuously generate queryresults. In essence, all of these strategies are pivots to conserveenergy consumption. With a thorough simulation study, it isshown that our scheme is energy-efficient.

Keywords-wireless sensor network; area query; energy-efficiency; in-network processing;

I. INTRODUCTION

Sensor networks are being used in many significant appli-cations such as environmental monitoring, habitat monitor-ing, industrial process control, and object tracking. Sensor-based applications extract different kinds of data via col-lecting data, processing in-network aggregations, and de-tecting complicated events. Since sensor nodes have limitedresources, novel data management techniques are needed tosatisfy application requirements considering characteristicsof wireless sensor networks, especially, the limitation ofresources.

Most existing data collection systems are query-basedones. Traditional query processing techniques of wirelesssensor networks mainly deal with retrieving sensor nodelocations, sensing values, and aggregating the sensed values.However, in a lot of applications, users expect informationabout areas of their interests. For instance, workers in a coalmine want to find an area with a high oxygen density to takea break, and this area must be big enough to accommodateseveral workers. In Figure 1, all sensing values of the sensorsin regions R1 through R5 reach the expected oxygen density.However, regions R3 and R5 are not big enough for severalworkers, so R3 and R5 are not the areas users expect. For thisapplication, existing methods may only return the locationsand sensing values of the sensors, which is meaningless and

������������������������������������

������������������������������������

������������������������������������������

������������������������������������������

����������������������������

����������������������������

������������������������������������������

������������������������������������������

������������������������������������

������������������������������������

������������������������������������������

������������������������������������������

��������������������

��������������������

�������������������������

�������������������������

����������������

����������������

��������������������

��������������������

��������������������

��������������������

�������������������������

�������������������������

����������������

����������������

�������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������

������������������������������������������������������������������������

������������������������������������������������������������������������

�������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������

������������������������������������������

������������������������������������������

������������������������������������������

������������������������������������������

������������������������������������

������������������������������������

������������������������������������������

������������������������������������������

������������������������������������������

�������������������������������������������������������������������������������������������

�������������������������������������������������

�������������������������������������������������

�������������������������������������������������

������������

������������

����������������

������������������������

������������������������

������������

������������

���������

���������

R1 R2 R3 R4 R5

Figure 1. Area Queries

helpless. Here, users want to find areas instead of severallocations since the size of an area should also be a filteringcondition.

Another interesting scenario is an air pollution monitor-ing application within a city. Users expect this monitoringsystem to detect regions whose pollution levels reach thresh-olds. In reality, the polluted thresholds might be different fordifferent areas. For example, the threshold of an industrialarea is 80, and that of a residential area is 50. Usually,traditional methods apply the same filtering condition forthe whole network. So the existing methods cannot be usedfor this application. This application concerns specific areas,and different filtering conditions should be applied for eacharea.

We define Area Query as requests for area informationincluding area locations, sizes of areas, and collected andaggregated data of areas. Sensor nodes with expected read-ings and adjacent sensing coverage are divided into the samegroup. The total coverage area of sensor nodes in the samegroup is a possible result area. Area Query can retrievenot only sensing values but also specific geographical in-formation compared to traditional queries of wireless sensornetworks. Area Query is more practicable and useful thantraditional queries for some applications, which require ge-ographical information. A common property of Area Query

2009 Fifth International Conference on Mobile Ad-hoc and Sensor Networks

978-0-7695-3935-5/09 $26.00 © 2009 IEEE

DOI 10.1109/MSN.2009.72

1

Page 2: [IEEE 2009 Fifth International Conference on Mobile Ad-hoc and Sensor Networks - Fujian Province, Wu Yi Mountain, China (2009.12.14-2009.12.16)] 2009 Fifth International Conference

applications is that results of these queries are areas andaggregation values of sensors in these areas. Size of areasalso can be query conditions. Furthermore, queries might berun for either the whole network or sub-areas. For differentareas, it is possible to use different querying conditions. AreaQuery is a new kind of query in wireless sensor networks.

Intuitively, Base Station can collect data from all sensornodes and answer area queries. However, sensor nodes willdrain energy quickly due to frequently reporting. Existingtechniques of in-network query processing suppress andaggregate sensing values to save energy. However, thesemethods cannot merge adjacent sensor coverage areas in dis-tributive manner, which is the key to area query processing.Therefore, a new in-network area query processing techniqueis necessary.

The first challenge of in-network area query processing isthat it is hard to suppress useless data as early as possible.For instance, an area satisfies all the conditions except size ofrequirement. Obviously, this area is not the one expected byusers. However, we cannot drop this area since it is difficultto decide the boundary of a result area locally. Moreover,how to describe an area in-network is another challenge.Because areas are also part of query results, ideal areadescription can reduce the size of transmitted data and thecomplexity of area size computation, which are two primaryconsideration factors for energy conservation.

In this paper, we propose an energy-efficient in-networkarea query processing scheme. The whole sensor network ispartitioned into grids, and a gray code is used to representeach grid. We construct a reporting tree to process mergingareas and aggregations. For conserving energy, incrementalupdate techniques are used to process continuous queries.Our contributions are summarized as follows:

1) A new area query is studied in this paper.2) A smart area description, GID lists, is used to reduce

the size of query results.3) An energy-efficient in-network area query processing

scheme is proposed. By using GIDs and mergingstrategies, partial results can be merged to reduce thesize of results and useless data can be dropped as earlyas possible to reduce the number of messages.

4) An incremental update method is addressed to reducethe size of reports for continuous queries.

5) A distributive reporting tree updating strategy is usedto switch roles of sensor nodes in the reporting treeperiodically, thus balancing the energy consumptionof sensor nodes.

The rest of this paper is organized as follows. In Section2, an overview of related works is presented. The definitionof an area query is provided in Section 3. In Section 4,we introduce our energy-efficient in-network area queryprocessing scheme. Section 5 shows our simulation results,and Section 6 concludes this paper.

II. RELATED WORK

Area query processing is a special kind of data processingin wireless sensor networks. Data processing of wirelesssensor networks was well-studied. It mainly involves threeaspects, data collection, data aggregation, and query pro-cessing. Resource-limitation and unreliable communicationlinks are the main concerns.

Data collection is widely used for applications, whichcollects all sensed data continuously (SELECT ∗ query).In [1], an approximate data collection scheme was proposed.Sensed data is reported only if it differs significantly fromprediction. In [2], a data-driven approach was presented.Model-based suppression is used to provide continuous datawithout continuous reporting. A mobile filtering approachfor error-bounded data collection was proposed in [3]. Datacollection techniques achieve ideal data reduction whenbounded error is acceptable for users. Data collection may beused to process area queries. That is, Base Station computesresults after collecting data from all sensor nodes. Becauseerrors are not tolerable for area queries, then the methodswhich provide results with errors are not suitable for areaquery processing. If the methods which can provide accurateresults are employed, energy consumption will be large sincemost of data cannot be suppressed and must be sent to BaseStation.

In-network data aggregation is an efficient way to reduceenergy consumption since data items are compressed on theirway to the destination. TAG [4] aggregates sensed data asearly as possible through tree-based routing. Aggregationof values from multiple sensor nodes are routed to onedestination (Base Station). In [5], a continuous SpatialAggregation was studied. Users specify a set of spatialregions, data aggregation is then performed in each region.Existing data aggregation methods either consider the sensornodes in the whole network as belong to one group or dividesensor nodes into static groups such as the works in [5]. Areaquery processing requires dynamic sensor grouping based onreadings from each sensor. Therefore, the above mentionedmethods cannot be used. In [6], events are abstracted intospatial-temporal patterns. In this scheme, event detection iseffectively carried out through matching contour maps ofsensed data to event patterns. To the best of our knowledge,area query processing has not been investigated. This is thefirst work to address area queries specifically.

In this paper, we propose an area query processingscheme, which can handle not only normal queries likeselection, aggregation, and threshold-based event detectionbut also area queries. More importantly, this scheme isenergy-efficient since it processes area queries in in-networkmanner. Compared with existing techniques, the remarkabledifference is that this scheme supports dynamic sensorgrouping which is an indispensable step for area query pro-cessing. For dynamic sensor grouping, we consider not only

2

Page 3: [IEEE 2009 Fifth International Conference on Mobile Ad-hoc and Sensor Networks - Fujian Province, Wu Yi Mountain, China (2009.12.14-2009.12.16)] 2009 Fifth International Conference

geographical correlations of sensors’ sensing coverage butalso sensed data correlations of sensors to derive expectedresults.

III. AREA QUERY IN WIRELESS SENSOR NETWORKS

In monitored areas of a sensor network, sensor nodesmight be equipped with several sensing components tomonitor environment or detect events. Users can specify aquery, which describes an event or specific areas of users’interests.

An area query is defined as followings:

SELECT areas and/or aggregation functionsFROM entire sensor network or subareas[WHERE predicates][GROUP BY Adjacency][HAVING predicates][DURATION time − span EVERY time − interval]

SELECT presents query results. It can be an area, someaggregation function of sensing values of an area. FROMspecifies queried areas. Users can write complicated queryconditions on sensing attributes by WHERE. GROUP BYis used to divide sensor nodes with expected readings andadjacent sensing coverage into groups. HAVING presentsconditions on aggregation functions. DURATION specifieslifetime of a continuous query. EVERY defines an executioninterval, which means the query is continuously executedand results are returned every time − interval seconds. Ifa query does not specify DURATION and EVERY, it is nota continuous query, and it will be executed just once.

The coal mine example mentioned in Section I can bepresented as the following area query.

SELECT Area, Area.avg(sensors.oxygen)FROM sensorsWHERE sensors.oxygen > 80GROUP BY AdjacencyHAVING Area.size > 50m2

DURATION 24 Hours EVERY 600 Seconds

It depicts that an area is qualified when oxygen density ofsensors is greater than 80, and the acreage of this area isgreater than 50m2. This query will continuously run for 24hours. Results will be reported to users every 600 seconds.To promise the safety of workers, this query continuouslyreports results to users. This query also reports the averageoxygen density of each qualified area to users. Most previousaggregation methods provide the aggregated values of someparticular properties, but not for areas.

A scheme which efficiently processes area queries isneeded. Due to the fact of limited computation and energyresources of sensor networks, energy-efficiency is the mainoptimization goal here, and reducing in-network computa-tion complexity of in-network should also be a concern.

IV. IN-NETWORK AREA QUERY PROCESSING SCHEME

In this section, an in-network area query processingscheme is presented. Our approach is to construct an initialquery results and incrementally update query results along areporting tree in a bottom-up manner, rather than collecting

all sensor reports and transmitting them to Base Station toprocess queries centrally. As we know, energy is the mostlimited resource for current battery-powered sensor nodes.Moreover, communication cost is the dominating factor ofenergy consumption. Consequently, in-network area queryprocessing is more energy-efficient than a simple centralizedapproach.

We first need to partition the whole monitored area intogrids and assign a unique ID for each grid, which is pre-sented in Section IV-A. Then a reporting tree is constructedfor the whole network, which is described in Section IV-B.Based on the reporting tree, we show in Section IV-C howto obtain an initial set of query results. The procedure toincrementally update the previous results through the wholequery lifetime is presented in Section IV-D.

A. Area Partition

000

100

101

111

110

010

011

001

000

100101111110010011

sensor

001

leader

YX

������������������������

������������������������

����������������������

����������������������

��������������������

����������������������������

�����������������������������������

�����������������������������������

������������

������������

������������������������

�������������������������������������������������������

�������������������������������������������������������

����������������������

����������������������

�������������

����

�����

��������

��������

����

����

����

���������������

���������������

������������

������������

������������

����

����

����

������������������

������������������

���������������������������������������������������������������������������������������������������

���������������������������������������������������������������������������������������������������

������������������������������������������������

��������������������������������������������������������������������������������������������

��������������������������������������������

��������������������

��������������������

Figure 2. Working Scenario.

We assume that any point in the monitored area is coveredby at least one sensor node. In addition, the sensor networkis static, and the locations (2-D Cartesian coordinates) ofsensor nodes are known to Base Station. First, the wholemonitored area is vertically partitioned into two even sub-regions. Then, these two subregions are further horizontallypartitioned into four subregions. For each subregion, we re-cursively partition it vertically and horizontally. The partitionprocess is stopped until either there is only one sensor nodein that subregion or the size of this subregion is not greaterthan partitioning threshold Pt. Pt is the minimum size of agrid defined by users. If there are enough sensor nodes, wesuggest that the diagonal length of a grid is no larger thana sensor’s sensing range, so that each sensor can cover itsresident grid. The accuracy is therefore guaranteed. In thisway we divide the whole network into grids as shown inFigure 2.

The grid at top-right corner has two sensor nodes. Sincethe size of this grid is less than Pt, we can stop the partitioneven though there are more than one sensor node in that grid.

3

Page 4: [IEEE 2009 Fifth International Conference on Mobile Ad-hoc and Sensor Networks - Fujian Province, Wu Yi Mountain, China (2009.12.14-2009.12.16)] 2009 Fifth International Conference

The grid at bottom-right corner cannot be further dividedbecause there is only one sensor node in it. After partition,there is at least one sensor node in each grid. The averagesensing value of all sensor nodes in each grid is treated asthe sensing value of that grid.

Algorithm 1 : Partition(Xl,Xr, Yl, Yr,Xb, Yb,D,CX , CY )Input: Partition area. {Initially, Partition(Xleft, Xright, Yleft,

Yright, null, null, X, 0, 0) is called.}Output: Grids with GIDs.

if area((Xl, Xr), (Yl, Yr)) has no intersection with the monitored area thenreturn;

end ifif

√(Xr − Xl)2 + (Yr − Yl)2 ≤ Pt then

Output the Grid ((Xl, Xl), (Xr, Yr))with GID:(Xb, Yb).return.

end if{The condition, at least one sensor node in each grid, cannot be guaranteed.}if it cannot be partitioned in direction D then

if D == X(i.e. verticality) thenPartition(Xl, Xr, Yl, Yr, Xb, Yb, Y, CX , CY ) {If it cannot be parti-tioned in vertical direction, try to partition it in horizontal direction.}

elseOutput the Grid ((Xl, Xl), (Xr, Yr)) with GID:(Xb, Yb).

end ifend ifif D == X then

Partition(Xl,(Xl+Xr)

2 , Yl, Yr, Shiftleft(Xb) + CX , Yb, Y, 0, CY ).

Partition((Xl+Xr)

2 , Xr, Yl, Yr, Shiftleft(Xb) + CX , Yb, Y, 1, CY ).else

Partition(Xl, Xr, Yl,(Yl+Yr)

2 , Xb, Shiftleft(Yb) + CY , X, CX , 0).

Partition(Xl, Xr,(Yl+Yr)

2 , Yr, Xb, Shiftleft(Yb) + CY , X, CX , 1).

end if

As Base Station is in charge of partitioning, it storesgeographical location information of all grids. We can usea unique Grid ID (GID) to represent each grid so that BaseStation can easily acquire location information according toa GID. Gray code [7] is a binary numeral system wheretwo successive numbers differ in only one bit. We adoptgray codes as GIDs. We use X to present a gray codein horizontal direction, and Y to present a gray code invertical direction. A GID is formatted as (X,Y ). As shownin Figure 2, the grid at the top-right corner is (100, 000).The grid at the bottom-right corner is (100, 10x), where xmeans either 0 or 1 since it is the union of (100, 101) and(100, 100). Obviously, the length of a GID is the number ofpartition iterations in both vertical and horizontal directions.Furthermore, we can use one GID to express the union ofmultiple adjacent grids by replacing the union of 0 and 1with x. For instance, (10x, 00x) is the union of the four gridsat top-right corner. This scheme can significantly reduce theamount of the maintained information.

The recursive partition algorithm is described in Algo-rithm 1. First, the whole monitored area is enclosed witha rectangle. We use two vertices on the same diagonal todecide a rectangle’s location. Base Station needs to recordwhat is the location information of a particular GID. BaseStation runs Algorithm 1 to divide this rectangle into grids.Since the monitored area might be in an irregular shape,it is possible that there exists non-monitored area in this

rectangle as shown by the bottom-left corner grids in Figure2. In the process of partitioning, once a non-monitoredsubregion is detected, it is ignored (Algorithm 1, Line 1-3). The rectangle is recursively partitioned in vertical andhorizontal directions alternatively until return condition issatisfied. For each generated grid, the location information,GID (Xb, Yb), and sensor nodes in this grid are recorded. InAlgorithm 1, (Xl,Xr), (Yl, Yr) are coordinates of the twovertices representing the current partition area. D is partitiondirection, and it can be either X (vertical) or Y (horizontal).CX and CY are parameters to generate gray codes. FunctionShiftleft is used to shift a binary number one bit left. Afterpartition, Base Station sends a unique GID to every sensornode.

B. Reporting Tree Construction

For in-network query processing, a reporting tree isconstructed by Base Station. In Figure 2, a reporting treeexample is shown. This tree is established from leaf nodes tothe root. After deploying sensor nodes, Base Station knowsthe initial energy and GID of all sensor nodes. We groupthe grids whose GIDs differ on the last bits of X and Ycodes (i.e. (%x,%x), where % can represents any binarystring of any length) together. In each group, a sensor nodeis chosen to be the leader (triangular nodes shown in Figure2) of this group. Other sensor nodes in the group report datato its leader. In other words, the leader is the parent nodeof other sensor nodes in the same group. Then, among theleaders in the groups of grids (%xx,%xx), one sensor nodeis chosen to be the parent node of other leaders generatedfrom the last step. Next step, we consider groups with grids(%xxx,%xxx) and so on. The process is repeated until theroot of this tree is identified. That is, finally we consider allthe grids as an entire group. The height of this reporting treeis Max(|X| , |Y |) + 1, where |X| and |Y | are the length ofgray code X and Y respectively. Base Station sends childrenand parent information to each sensor node. In our scheme,the reporting tree does not serve as the routing tree. Therouting can be generated by any routing protocol such asDSDV in [8].

Since leaders, non-leaf nodes, have to process the reportsfrom their children and report to their parent, they consumemuch more energy than leaf nodes. To prolong the networklifetime, we should reconstruct the reporting tree periodi-cally, and let sensor nodes with more residual energy serve asleaders. However, to construct a new tree, Base Station hasto collect energy information from all sensor nodes, and aftergenerating the new tree, Base Station has to broadcast thetree structure to the whole network. This consumes a largeamount of energy. Therefore we generate a new reportingtree through switching child and parent roles in the treein distributed way. This switching process starts from thebottom of the tree and ends at the root. The child node withthe maximum remaining energy accepts its parent role to

4

Page 5: [IEEE 2009 Fifth International Conference on Mobile Ad-hoc and Sensor Networks - Fujian Province, Wu Yi Mountain, China (2009.12.14-2009.12.16)] 2009 Fifth International Conference

sever as new parent, the old parent becomes a child of thisnew parent, and new parent’s siblings become its children.

C. In-network Area Query Processing

Algorithm 2 : Merging(i, GID lists from its children)Input: Step i and subareas from children.Output: Merged results.

for each grid in the reporting messages doif the last (i − 1)th bit of GID on X is 1 and last (i − 2) bits are 0 then

Add this GID into merging list L0 if the ith bit is 0, otherwise add it toL1.

end ifend forfor all pairs of GID1 in L0 and GID2 in L1 do

if GID1.Y⊕

GID2.Y == 1 thenif (GID1.Y == GID2.Y ) AND (GID1.X == GID2.X except thelast ith bit) then

Merge GID1 and GID2 into one GID by replacing 0 and 1 on thelast ith bit with x.

end ifMerge two GID lists (to which GID1 and GID2 belong) into one andCalculate aggregations.

end ifend forMerge subareas on Y coordinate using the similar method from line 1 to 13.for each subarea A do

if GIDs of this Area list have no intersection with ((% x

(i−1)bits︷ ︸︸ ︷0 · · · 0 , %) ∪

(%, % x

(i−1)bits︷ ︸︸ ︷0 · · · 0 )) then

if Size of A does not satisfy the condition thenDelete A.

elseSend this result A to Base Station and delete A.

end ifend if

end for

Send merged subareas to its parent.

The motivation of in-network processing is that the num-ber of transmission messages can be reduced by mergingmultiple reports as one and filtering useless data as early aspossible, thereby prolonging network lifetime.

When Base Station receives an area query request, itregisters this query in the system. Then, the query is de-composed into query conditions, merging conditions, querylifetime, and executing interval. Query conditions, lifetime,and executing interval are sent to every sensor node in thequeried area. Merging conditions are sent to non-leaf nodesof the reporting tree. If a query is a continuous query, it isprocessed at interval.

In essence, query conditions are these filtering conditionson sensing attributes defined by WHERE clause such astemperature, moisture, and oxygen density. Query lifetimeand executing interval are specified in DURATION andEVERY clauses respectively. A noteworthy point is thatdifferent sensor nodes might accept different query con-ditions if they are in different query areas since usersmight define different conditions for different areas suchas air pollution monitoring example mentioned in SectionI. Within the lifetime of a query, a sensor node sends areport to its parent every interval if its readings satisfy thequery conditions. The reporting message is formatted as

(SID,GID,RID, V1, V2, ..., VM ), where SID is a sensornode ID, GID is the grid ID, RID is the area ID to indicatedifferent query areas such as industrial area and residentialarea in the example mentioned in Section I, and Vi is thesensing value of the attribute i.

Merging conditions include aggregation operations andfiltering conditions specified by HAVING clause. The num-ber of merging steps S is Max(|X| , |Y |). We start from step1 involving grids (%x−%x) and stop at step S involving allgrids. In other words, merging process follows the reportingtree in bottom-up order. In the reporting tree, every non-leaf node is responsible for at least one merging step. If anon-leaf node is the root for grids (%x · · ·x,%x · · ·x), it isthe merging node for these grids. The merging step i is thenumber of x in X or Y. Some non-leaf nodes might partici-pate in several merging steps, and the root node participatesall merging steps. For instance, in Figure 2 the root node (inthe grid (010, 110)) processes step 1, 2, and 3 merging forgrids (01x, 11x), (0xx, 1xx), and (xxx, xxx) one by one.In the process of merging, a leader merges adjacent subareasfrom its children into one and filters areas which cannot bemerged by the up-level leader. Moreover, aggregations arecalculated after merging. Then, the leader generates a reportand sends it to its parent. Specially, the root sends a report toBase Station. The report is formatted as (RID, Subarea1,GIDlist, datalist, RID, Subarea2, GIDlist, datalist,· · · , RID, Subarean, GIDlist, datalist). A GID list isused to describe an area. For instance, the shaded areas attop-left corner in Figure 2 is denoted as (00x, 001), (0x1,01x) instead of 6 GIDs. Consequently, GID lists can reducethe size of area description information since a GID canrepresent multiple grids.

There are two main difficulties for merging areas. Firstly,how to judge two subareas are adjacent since locationinformation is not maintained by sensor nodes. Secondly,how to judge an isolated area is identified. For a non-leafnode, it cannot decide whether a subarea is adjacent tosubareas of its siblings since it just has information of itschildren. However, using gray codes as GIDs provides anintelligent way to solve these two difficult problems.

For the example in Figure 2, suppose the shaded gridsare eligible grids which satisfy the requirements on sensingattributes. Suppose the requirement of acreage is that thereturned area should not be less than 2A, where A isthe acreage of a smallest grid. At step 1, grids in eachgroup (%x,%x) are merged. For example, the result of thegroup (00x, 00x) is (00x, 001) which means the two lowergrids are involved, and the result of the group (11x,11x)is (11x,11x) which means all the grids in this group areinvolved. At step 2, grids in each group (%xx,%xx) aremerge. For example, the result of group (1xx, 0xx) is(1x1, 0x1), and this is an isolated area since it is not adjacentto the border of the group (1xx, 0xx). That is, only thegrids adjacent to the border of a group have chances to be

5

Page 6: [IEEE 2009 Fifth International Conference on Mobile Ad-hoc and Sensor Networks - Fujian Province, Wu Yi Mountain, China (2009.12.14-2009.12.16)] 2009 Fifth International Conference

merged with other grids in other groups at the next step.We can infer the following rule. At step i, grids whoselast i − 1 bits are all 0’s on X or Y , are on the borderof a group. Once an isolated area is found, it is sent toBase Station instead of its parent if it is large enough,otherwise it is dropped. As a result, transmitting data sizein the reporting tree can be reduced significantly. Thus,(1x1, 0x1) is sent to Base Station since its size is 4A,and (011, 111) is dropped since its size is less than 2A.Other two shaded subareas, ((00x, 001), (0x1, 01x)) and((11x, 11x), (101, 111), (1x1, 101)), are sent to the leaderin the grid (010, 110) which is the root of the tree, becausethey have chances to be merged. Now we give anotherrule for merging. At step i, only GIDs, whose the last(i − 1)th bit is 1 and last i − 2 bits are all 0’s on X (orY ), might be merged on X (or Y ). For example, at step2, grids in (%1,%) and (%,%1) might be merged; at step3, grids in (%10,%) and (%,%10) might be merged. Thisrule is important for reducing the complexity of mergingcomputation. The properties of gray codes and these tworules are crucial for efficiently solving these two difficultproblems mentioned above.

The merging algorithm is described in Algorithm 2. Afterreceiving reports from its children, a non-leaf node invokesthis algorithm to do merging subareas with same RID andsends merged results to its parent. In this algorithm, if L0

and L1 are ordered by Y -GID in a gray code order when wetry to merge on X , the complexity can be reduced since wetry to find GIDs with intersection on Y . XNOR (the inverseof the exclusive OR operation) denoted as

⊕is used to

judge whether two GIDs have intersection on Y or X . Wedefine the result of x

⊕ω = 1, where ω is 0,1, or x. The

acreage of a GID is 2n ∗ A, where n is the number of x inthe GID. The acreage of an area or subarea is the sum ofacreage of all GIDs in its GID list. Acreage calculation isalso simplified by using GIDs. The aggregation function sum

and avg of an area are calculated by

n∑i=1

Vi∗Ai

n∑i=1

Ai

and

n∑i=1

Vi∗Ai

n∗n∑

i=1Ai

respectively, where Vi and Ai are the sensing value andacreage of the grid i respectively.

D. Incremental Result Update

In most applications, sensing values only change slightlyor mostly unchange over a long time. Therefore, consecutivemerged results are extremely similar for a continuous queryand incremental updates could conserve a large amountof energy. By comparison, regenerating query results atintervals using Algorithm 2 will quickly deplete energybecause of the heavy communication cost.

We now present an incremental updating scheme to main-tain merged results on non-leaf nodes of the reporting treeafter the initial results are calculated. Each sensor nodestores its previous report, and non-leaf nodes store previous

merged results. We classify changed subareas into threeclasses, only value, positive, and negative. For a subareain the previous report, if only sensing values or aggregationvalues are changed, it is a only value subarea; for a subareanot in the previous report but in the current report, it is apositive subarea; for a subarea in the previous but not inthe current report, it is a negative subarea. The incrementalupdating process updates merged results along the reportingtree step by step. For a negative subarea, a merging noderemoves the dropped grids. A subarea might be split ifsome grids are dropped. For a positive subarea, it is mergedinto the previous results by using the similar method inSection IV-C. For only value subareas, aggregation valuesare updated.

A leaf node does not send a report if its readings arenot changed or both previous and current readings are notsatisfied the WHERE conditions. A non-leaf node generatesan update report by comparing new merged results with theprevious results and sends it to its parent. An update reportincludes positive, negative subareas, and new aggregationvalues for other subareas if these values are changed. If thesize of an update report is greater than merged results, westill send merged results.

E. Advantages of Using Gray Codes

Communication cost is a significant guideline to evaluatethe efficiency of an algorithm in wireless sensor networks.Gray codes are used to represent grids in our scheme. If weuse 14-bit gray codes, i.e. 7 bits for X and 7 bits for Y ,16384 (that is 27 ∗ 27) grids can be represented. However,we cannot use binary representation since x (either 0 or1) is also used in GIDs. Hence, 14-bit ternary (base 3)representation is used to represent GIDs. Due to the factthat 224 is greater than 314, 3 bytes (i.e. 24 bits) is enoughto express a GID after converting it to binary representation.Another effective method for representing a grid is using twovertices. Each vertex is stored as a pair of x-y coordinate.If we use 2 bytes for x and y coordinates respectively, 8bytes are needed to represent a grid. 8 bytes are quite longcompared to 3 bytes of a GID. Furthermore, GID descriptionmethod does not lose location information, although GIDsdo not maintain these information. Location information canbe retrieved from Base Station. So, GID is the most effectiveapproach to reduce the size of grid denotation.

In our merging algorithm, due to the properties of graycodes and two rules we found, it is easy to detect an isolatearea as early as possible. Consequently, useless data canbe dropped early, and results can be sent to Base Stationwithout further merging. Moreover, adjacent subareas canbe merged to reduce the size of GID list. In summary, allthese strategies are effective methods to reduce the numberof messages and the size of messages. However, it is verydifficult to achieve these benefits by using other methodssuch as vertex denotation.

6

Page 7: [IEEE 2009 Fifth International Conference on Mobile Ad-hoc and Sensor Networks - Fujian Province, Wu Yi Mountain, China (2009.12.14-2009.12.16)] 2009 Fifth International Conference

V. SIMULATION RESULTS

In this section, we evaluate the performance of our pro-posed in-network area query processing scheme.

A. Simulation Setup

We simulate the area query of coal mine mentioned inSection III. The payload size limit of a packet is 29 byteswhich are the standard payload size provided by the TinyOS[9]. Transmission range of sensor nodes is 30m, and sensingrange is 15m. Pt is 15m. DSDV [8] is used as the routingprotocol.

Since network traffic greatly affects energy efficiency, weuse it as the metric for performance evaluation. The networktraffic is defined as the total number of messages transmitted(sent and forwarded) by all nodes in the network during theexecution of a query.

We compare the performance of our scheme, IPGID forshort, with two other alternative approaches: (1) CentralizedProcessing (CP), and (2) In-network Processing based onVertex description (IPV). For a fair comparison, both CP andIPV use conditions on attributes to filter useless readings.That is, a sensor node does not send a report if its readingscannot passs through the WHERE conditions. IPV is similarwith our scheme except using an area’s vertices to representit instead of a GID list. In-network merging techniques arealso used in IPV.

We varied several system parameters when comparing theperformances of the three approaches. These parameters arelisted as follows:

1) Length of a GID (L). This parameter affects networkdiameter and the number of sensor nodes deployed.The longer L means the bigger network and muchmore sensor nodes deployed.

2) Selective Rate on Attributes (S). It is the percentageof sensors’ sensing values which can pass through theWHERE conditions.

3) Filtered Rate of Area (F ). It is the percentage of areaswhich satisfy the HAVING conditions.

4) Changed Rate of sensing Values (C). This parameteris the percentage of changed results compared to theprevious set of results.

B. Efficiency of Our Scheme

As shown in Figures 3(a), 3(b), 3(c), 3(d), and 3(e), ourIPGID is always the best one, and IPV is better than CP.The reason is that since IPGID and IPV use incrementalupdating techniques to maintain results, only the differencebetween the current and previous set of results are reported.However, CP reports qualified data every interval. IPGID isbetter than IPV due to the fact that the size of a GID list ismuch smaller than that of vertex information for describingan area.

Figure 3(a) shows the network traffic of the first 20 inter-vals. At the first interval, there’s only a marginal difference

among the three approaches since both IPV and IPGID con-struct the initial query results at the first interval. Later on,a clear difference between CP and IPV (or IPGID) presents.The reason is that the number of the reported messages isreduced after the first interval since IPV and IPGID useincremental update techniques. At the 20th interval, IPGIDhas 83% less transmitted messages than CP. IPV has 71%less transmitted messages than IPV.

As shown in Figure 3(b), with the increasing of L, theadvantages of IPGID becomes more obvious. Although alonger GID incurs more messages, merging multiple reportsand filtering useless data in the reporting tree as early aspossible can reduce the number of messages, which is themethod used by IPV and IPGID. However the reportingpathes of CP become longer as the network size becomesbigger. Thereby, on average, IPGID has 84% less transmittedmessages than CP. IPGID has 37% less transmitted messagesthan IPV.

Figure 3(c) shows the network traffic for variant S. Thenetwork traffic increases with S because the bigger S causesmore sensor nodes to report readings. CP is affected byS more obvious than IPV and IPGID. The reason is thatfor CP, one more sensor node involved causes a long pathmessage transmitting to Base Station. However, for IPV andIPGID, it just causes a message to its parent. The networktraffic for variant F is shown in Figure 3(d). Since smallerF implies that more areas can be dropped in the reportingtree according to HAVING conditions, the smaller F , the lessnetwork traffic for IPGID. IPV and CP are not affected bythis parameter since IPV and CP cannot filter the uselessareas. The network traffic for variant C is presented inFigure 3(e). With the increasing of C, the task of incrementalupdating becomes much heavier. As a result, network trafficincreases with C. When C is close to 1, IPGID and IPValmost have to reconstruct merged results every interval.IPV is worse than IPGID since IPV use more messages todescribe a big area than IPGID.

In summary, our in-network area query processing schemeis energy-efficient in various circumstances.

VI. CONCLUSIONS

In this paper, we have proposed an energy-efficient in-network area query processing scheme. Our approach is thefirst work to study area query processing in wireless sensornetworks. We define the area queries and partition the net-work into grids. Gray Code is employed to express the ID ofgrids. Initial query results are generated by merging partialresults along the reporting tree. For conserving energy, anincremental updating strategy is used to generate continuousquery results. The size of results can be reduced by usingbinary GIDs to describe areas. In-network processing candrop the useless data as early as possible. Furthermore,incremental updating avoids unnecessary reports. All these

7

Page 8: [IEEE 2009 Fifth International Conference on Mobile Ad-hoc and Sensor Networks - Fujian Province, Wu Yi Mountain, China (2009.12.14-2009.12.16)] 2009 Fifth International Conference

0 5 10 15 200

20000

40000

60000

80000

100000

Network Traffic

Intervals

CP IPV IPGID

(a) Network Traffic of execution intervals(L=10 bits, S=50%, F =50%, C=20%).

6 8 10 12 140

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

Network Traffic

L

CP IPV IPGID

(b) Network Traffic of variant L (S=50%,F =50%,C=20%, 100 intervals).

0.2 0.4 0.6 0.8

100000

200000

300000

400000

500000

600000

700000

800000

Network Traffic

S

CP IPV IPGID

(c) Network Traffic of variant S (L=10 bits,F =50%, C=20%, 100 intervals).

0.2 0.4 0.6 0.880000

100000

120000

140000

Network Traffic

F

IPV IPGID

(d) Network Traffic of variant F (L=10 bits,S=50%, C=20%, 100 intervals).

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

100000

200000

300000

400000

500000

600000

Network Traffic

C

CP IPV IPGID

(e) Network Traffic of variant C (L=10 bits,S=50%, F =50%, 100 intervals).

Figure 3. Simulation Results

details fulfill the achievement of energy efficiency. Oursimulation results confirm it.

ACKNOWLEDGMENT

This work is supported by the NSF under grant No. CCF-0545667 and CCF

0844829. It is partly supported by the National Natural Science Foundation of

China for Young Scholar under grant No.60803015, the China Postdoctoral Science

Foundation under grant No.20080430902, the Science and Technology Innovation

Research Project of Harbin for Young Scholar under grant No.2008RFQXG107,

the Heilongjiang Postdoctoral Science Foundation under grant No.LRB08-021, the

Science and Technology Key Research of Heilongjiang Educational Committee under

grant No.1154Z1001, and the General Program about Science and Technology research

of Heilongjiang Educational Committee under grant No.11541263.

REFERENCES

[1] D. Chu, A. Deshpande, J. M. Hellerstein, and W. Hong,“Approximate data collection in sensor networks using prob-abilistic models,” in ICDE ’06: Proceedings of the 22ndInternational Conference on Data Engineering. Washington,DC, USA: IEEE Computer Society, 2006, p. 48.

[2] A. Silberstein, R. Braynard, G. Filpus, G. Puggioni,A. Gelfand, K. Munagala, and J. Yang, “Data-driven processingin sensor networks,” in CIDR ’07: Proceedings of the 3rdBiennial Conference on Innovative Data Systems Research.Asilomar, California, USA: IEEE Computer Society, 2007.

[3] D. Wang, J. Xu, J. Liu, and F. Wang, “Mobile filter: Exploringmigration of filters for error-bounded data collection in sensornetworks,” Data Engineering, 2008. ICDE 2008. IEEE 24thInternational Conference on, pp. 1483–1485, April 2008.

[4] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong,“Tag: a tiny aggregation service for ad-hoc sensor networks,”SIGOPS Oper. Syst. Rev., vol. 36, no. SI, pp. 131–146, 2002.

[5] D. Goldin, “Faster in-network evaluation of spatial aggrega-tionin sensor networks,” Data Engineering, 2006. ICDE ’06.Proceedings of the 22nd International Conference on, pp. 148–148, April 2006.

[6] W. Xue, Q. Luo, L. Chen, and Y. Liu, “Contour map matchingfor event detection in sensor networks,” in SIGMOD ’06: Pro-ceedings of the 2006 ACM SIGMOD international conferenceon Management of data. New York, NY, USA: ACM, 2006,pp. 145–156.

[7] G. M., “The binary gray code,” in Ch. 2 of Knotted Doughnutsand Other Mathematical Entertainments, New York: W. H.,1986.

[8] C. E. Perkins and P. Bhagwat, “Highly dynamic destination-sequenced distance-vector routing (dsdv) for mobile comput-ers,” SIGCOMM Comput. Commun. Rev., vol. 24, no. 4, pp.234–244, 1994.

[9] “Tinyos faq [online],” Available:http://www.tinyos.net/faq.html. [Accessed Oct 28, 2008].

8