data gathering and aggregation in wireless sensor networks elg7178f “ad hoc networking” albert...

Data Gathering and Aggregation in Wireless Sensor Networks

ELG7178F “Ad Hoc Networking”ELG7178F “Ad Hoc Networking”

Albert Wahba – March 11, 2010Albert Wahba – March 11, 2010

Introduction & Problem Statement

End User(s) Sink(s)

Sensors

Queries

Reply

Data

How can Data be Effectively gathered and aggregated from sensors to End Users?

Outline

Data Storage Location

* [1] Wei-Peng Chen and Jennifer C. Hou, 2005

External Local Data-Centric

Outline

Distributed Index for MultidimensionalData (DIM)

DIM Builds an in-network distributed data structure to effectively answer multi-dimensional range queries.

Assumptions: All nodes are aware of the network

geographic boundaries. Each sensor node is aware of its geographic location. Data values normalized to be between 0 and 1.

* [3] Xin Li, Young Jin Kim, Ramesh Govindan, and Wei Hong, 2003

DIM Zone Assignment

A1<0.5 A1<1

A2<1

A2<0.5

A1<0.25 0.25<A1<0.5 0.5<A1<0.75 0.75<A1<1

0.75<A2<1

0.5<A2<0.75

0.25<A2<0.5

A2<0.25

0

0

0

0

00

1

1

1

1

1

1

0101

0100

0001

0000

0111

0110

0011

0010

1101

1100

1001

1000

1111

1110

1011

1010

* [3] Xin Li, Young Jin Kim, Ramesh Govindan, and Wei Hong, 2003

Routing an Event to its Owner Example

010 0111

110 1111

1110

100010000

0001

0110

A1<0.50

A1<11

A2<11

A2<0.50

A1<0.250

0.25<A1<0.51

0.5<A1<0.750

0.75<A1<11

0.75<A2<11

0.5<A2<0.750

0.25<A2<0.51

A2<0.250

E1=(0.8, 0.7)

Store E1

111

1110 1110

DIM’s Zone Tree

Enhancing DIM Performance Using k-d Tree

• Divide the deployment field to cells.• Cells are utilized as the storage unit.• Index node covers one or more cells.• All cells belong to the same index

node stores the same data. • Dynamically control the depth of

DIM’s Zone Tree.• Solve the scalability problem of DIM. • Better energy efficiency.

* [4] Lei Xie, Lijun Chen, Daoxu Chen, Li Xie, 2009

Outline

Flat Network Architecture

• Two-Phase Pull Diffusion:– Sinks search by flooding, Sources reply by flooding, then Sinks

choose best route.– Many sources and only few sinks.

• One-Phase Pull Diffusion:– Replies sent to neighbors that first sent the query.– Large number of events being queried.

• Push Diffusion:– Sources floods the collected data, Sinks subscribe to events of

interest.– Many sinks and only few sources, target tracking.

Outline

Directed Diffusion (Two-Phase Pull)

•Consists of three phases:• Interest Propagation• Data Propagation• Reinforcement

* [5] C. Intanagonwiwat, R Govindan and D. Estrin , 2000

Outline

Sensor Protocols for Information via Negotiation SPIN (Push-Diffusion)

• Data sources initiate the data-sending activities.

• Consists of three-stage handshaking:– Advertisement

(metadata).– Request for data.– Data Message.

* [7] Joanna Kulik, Wendi Heinzelman and Hari Balakrishnan, 2002

Outline

A Novel Real-Time Routing Protocol

Assumptions: Network is Data-Centric. Sensor know its energy. Sensors has IDs.

• Real-Time Route Tree• Alternate suboptimal

routes, slower.• Route monitoring and

reporting algorithm.• None of the routes are

used all the time.

* [6] Li-Ming He, Xi’an , 2009

Outline

Minimum-Latency Aggregation Protocols

Assumptions: Interference Radius (p) = 1 Communication topology routed at

the sink. Synchronous time-slot communication. Node transmits a Max of one packet of a

fixed size in each time slot. Children nodes must transmits first before

their parents can transmits.

•[18] Peng-Jun Wan, Scott C.-H. Huang, Lixin Wang, Zhiyuan Wan, Xiaohua Jia 2009

1s

p

Minimum-Latency Aggregation Protocols Development History

• Minimum-Latency for p = 1:

(Δ-1)R 2005 23R + Δ – 18 2007 15R + Δ – 4 2009 SAS 2R + O(log R) + Δ 2009 PAS

p: Interference RadiusΔ: Maximum degree of communication

topologyR: Radius of communication topology,

maximum hop distance

•[18] Peng-Jun Wan, Scott C.-H. Huang, Lixin Wang, Zhiyuan Wan, Xiaohua Jia 2009•[19] www.wikipedia.org (Graphs Only)

sR

Unit-Disk Graph (UDG)

Connected Dominating Sets (CDS) Construction

Phase One:Constructs DS U

• Maximal IndependentSet (MIS)

Phase Two:Connectors Selection WThere is an edge between two dominators iff they have a common neighbor

U υ W is a CDS

* [18] Peng-Jun Wan, Scott C.-H. Huang, Lixin Wang, Zhiyuan Wan, Xiaohua Jia 2009* [19] www.wikipedia.org (Graphs Only)

http://upload.wikimedia.org/wikipedia/commons/b/b6/Cube-maximal-independence.svg

Iterative Minimal Covering (IMC)

y1 y2 y3 y4 y5

x1 x2 x3 x4 x5 x6 x7

1 1 1

X = { x1, x2, x3, x4, x5, x6, x7 }

Y = { y1, y2, y3, y4, y5 }

A= { }

A= { ( x1 , y2 ) ( x4 , y3 )( x6 , y5 )

ℓ( x1 , y2 ) = 1

ℓ( x4 , y3 ) = 1

ℓ( x6 , y5 ) = 12 2ℓ( x2 , y2 ) = 2

ℓ( x5 , y5 ) = 2

( x2 , y2 ) ( x5 , y5 )

33

( x3 , y2 ) ( x7 , y5 ) }

ℓ( x3 , y2 ) = 3

ℓ( x7 , y5 ) = 3

* [18] Peng-Jun Wan, Scott C.-H. Huang, Lixin Wang, Zhiyuan Wan, Xiaohua Jia 2009

Canonical Breadth-First-Search (CBFS)

Parent Rank Assignment:

If v has no Child Rank (v) = 0

If v has only 1 Child Rank (v) = r

If v has more than 1 Child Rank (v) = r+1

r: The maximum rank of a parent’ children

3

0 0 0 0 0 0

0 0 1 1 0 0

1 0 2 0 1 0

1 0 2 1 0 0

0 2 2 1 0 1

0 2 2 0 0 1

1 1 2 1 2 1

222 111

111111

1 1 1 1 22

2 1111

1 1 2 1 2 2

v6 (R’)

v5

v4

v3

v2

v1

v0


Pipelined Aggregation Scheduling (PAS)

3

0 0 0 0 0 0

0 0 1 1 0 0

1 0 2 0 1 0

1 0 2 1 0 0

0 2 2 1 0 1

0 2 2 0 0 1

1 1 2 1 2 1

222 111

111111

1 1 1 1 22

2 1111

1 1 2 1 2 2

v6 (R’)

v5

v4

v3

v2

v1

v0


0 0 0 04 4

1 1 55 45 49

97

2 2 246 4690

3 3 747 5191

4

1

4 8 4892 92

5 949 5393

Link Time Slot = (R’ – i) + 44j + 4(ℓ – 1)

Where:i = radius

0 ≤ i ≤ R’j = node rank(s)

0 ≤ j ≤ rℓ = link label

(6-6) + 44(0) + 4(1-1) = 0

(6-4) + 44(2) + 4(1-1) = 90

Conclusion

• Data gathering and aggregation in wireless sensor networks can be classified based on:– Data Storage: External, Local and Data-Centric.– Network Architectural:

• Flat: Two-Phase Pull Diffusion, One-Phase Pull Diffusion, and Push Diffusion.

• Hierarchical: Tree, Grid, Cluster and Chain.– Resources: Maximum Lifetime, Data Reliability, and

Minimum Latency.• There are several algorithms that will deliver an optimal

performance for a given application.• There is no ONE algorithm that will work for all applications.• Data gathering and aggregation algorithms advance in recent

years as a result of the big improvement in electronic design.

Questions?

Q1: Mapping an event to a DIM zone.

The Distributed Index for Multidimensional Data (DIM) algorithm builds an in-network distributed data structure to effectively store multi-attribute events, and also effectively answer multidimensional range queries.The algorithm divides the area of interest to several zones, and then uses a hash function to map a multi-attribute event to a geographic zone. The hashing scheme assigns a k bit zone code to an event as follows:

For i between 1 and m (m is the total number of attributes), if Ai < 0.5, the i-th bit of the zone code is assigned 0, else 1. For i between m + 1 and 2m, if Ai−m < 0.25 or Ai−m ∈ [0.5, 0.75), the i-th bit of the zone is assigned 0, else 1, because the next level divisions are at 0.25 and 0.75 which divide the ranges to [0, 0.25), [0.25, 0.5), [0.5, 0.75), and [0.75, 1). We repeat this procedure until all k bits have been assigned.

Using the DIM algorithm explained in the lecture, show where the following event will be stored in the DIM zones?

Temperature =0.9 andHumidity = 0.4

The event was initiated from the node located at zone 000.What would be the answer if the event passed through a node located at zone 1001?

Q1 Answer

< 0.9, 0.4 >

< 0.9, 0.4 >

1

< 0.9, 0.4 >

0

< 0.9, 0.4 >

1

< 0.9, 0.4 >

1Answer: 101

0.9 > 0.5?

0.4 > 0.5?

0.9 > 0.75?

0.4 > 0.25?

Answer: 1011

Q2: Applying the IMC Algorithm

The Iterative Minimal Covering (IMC) algorithm is used to construct a spanning inward s-arborescence tree, which is associated with a link labeling.

The IMC algorithm takes as an input a pair (X,Y) of disjoint subsets X and Y, satisfying that X is covered by Y and outputs a single-hop (X,Y)-aggregation schedule.

Using the IMC algorithm explained in the lecture (algorithm outline is provided below [18]), provide the minimum covering set of Y with the associated link labels?

y1 y2 y3 y4 y5

x1 x2 x3 x4 x5

Q2 Answer

y1 y2 y3 y4 y5

x1 x2 x3 x4 x5

3

X = { x1, x2, x3, x4, x5 }

Y = { y1, y2, y3, y4, y5 }

A= { }

A= { ( x1 , y1 ) ( x3 , y3 )( x2 , y1 )

ℓ( x1 , y1 ) = 1

ℓ( x3 , y3 ) = 1

ℓ( x2 , y1 ) = 2

ℓ( x4 , y3 ) = 2

ℓ( x5 , y1 ) = 3

( x4 , y3 ) ( x5 , y1 ) }

1 12 2

Q3: Applying the PAS Algorithma) In the Pipeline Aggregation Scheduling (PAS) protocol,

each sensor node is assigned a specific time slot based on its node rank, communication radius, and link label. The link label indicated on the following graph has been calculated using the IMC algorithm. Use the PAS algorithm to calculate the rank of each node using the following set of rules:

If v has no Child Rank (v) = 0If v has only 1 Child Rank (v) = rIf v has more than 1 Child Rank (v) = r+1Where r: The maximum rank of a parent’ children

b) Then use the following equation to assign a time slot to each sensor node.

Link Time Slot = (R’ – i) + 44j + 4(ℓ – 1)Where:

– i = radius 0 ≤ i ≤ R’– j = node rank(s) 0 ≤ j ≤ r– ℓ = link label– R: Radius of communication topology

c) Based on your answer for part (b) what is the advantages and disadvantages of the PAS algorithm?

1 2

21

1 12

v3

v2

v1

v0

1

1

Q3 Answer

2

0 0 0

1 1 0

1 0 0

1 2

21

1 12

v3

v2

v1

v0

46

040

1

1 545

250

1

a) By applying the set of rules mentioned in the question, the node ranks can be easily found. See red numbers inside each node repents the node rank.

b) Using the node ranks from part (a) and the equation mentioned in the question with R’ = 3, all time slots can be calculated as represented by the green numbers in the following graph.

c) Although the number of sensor nodes are very small, the total number of time slots to complete data aggregation is 50, which indicates that the PAS algorithm is not suitable for a network with small communication radius.

The advantage of using the PAS algorithm is that the sink node will start receiving data after 2 time slots only, which is due to the pipeline algorithm that increases the network throughput.

References

1. Chapter Book “Data Gathering and Fusion in Sensor Networks” by Wei-Peng Chen and Jennifer C. Hou, 2005

2. Presentation “Data Gathering and Aggregation in Wireless Sensor Networks” by Ivan Stojmenovic.

3. Technical Paper ”Multi-Dimensional Range Queries in Sensor Networks” by Xin Li, Young Jin Kim, Ramesh Govindan, and Wei Hong, 2003

4. Technical Paper “A Decentralized Storage Scheme for Multi-Dimensional Range Queries Over Sensor Networks” by Lei Xie, Lijun Chen, Daoxu Chen, Li Xie, 2009

5. Technical Paper “Direct Diffusion: a Scalable and Robust Communication Paradigm for Sensor Networks” by C. Intanagonwiwat, R Govindan and D. Estrin, 2000

6. Technical Paper “A Novel Real-Time Routing Proto col for Wireless Sensor Networks” by Li-Ming He, Xi’an, 2009

7. Technical Paper “Negotiation-Based Protocols for Disseminating Information in Wireless Sensor Networks” by Joanna Kulik, Wendi Heinzelman and Hari Balakrishnan, 2002

8. Technical Paper ”Minimum-Energy Asynchronous Dissemination to Mobile Sinks in Wireless Sensor Networks” by Hyung Seok Kim, Tarek F. Abdelzaher, Wook Hyun Kwon, 2003

References Cont.

9. Technical Paper ” A Two-Tier Data Dissemination Model for Large-scale Wireless Sensor Networks” by Fan Ye, Haiyun Luo, Jerry Cheng, Songwu Lu, Lixia Zhang, 2002

10. Technical Paper “Spiral Grid Routing for Load Balance in Wireless Sensor Networks” by Chiu-Kuo Liang and Chih-Shiuan Li, 2009

11. Technical Paper ”Energy-Efficient Communication Protocol for Wireless Microsensor Networks” by Wendi Rabiner Heinzelman, Anantha Chandrakasan, and Hari Balakrishnan, 2000

12. Technical Paper “Adaptable Protocol for Time Critical Information Dissemination via Negotiation in Large Scale Wireless Sensor Networks” by M. Tabibzadeh, M. Sarram, M. Ghasemzadeh, 2009

13. Technical Paper “Data Gathering Algorithms in Sensor Networks Using Energy Metrics” by S. Lindsey, C. Raghavendra, and K. M. Sivalingam, 2002

14. Technical Paper “TAG: A Tiny Aggregation Service for ad-hoc Sensor Networks” by Samuel Madden, Michael J. Franklin, Joseph Hellerstein, and Wei Hong, 2002

15. Technical Paper “Energy-Efficient Wake-Up Scheduling for Data Collection and Aggregation” by Yanwei Wu, Xiang-Yang Li, YunHao Liu, Wei Lou. 2010

16. Technical Paper “An Evaluation of Overhearing-based Data Transmission Reduction in Wireless Sensor Networks” by Yuuki Iima, Akimitsu Kanzaki, Takahiro Hara, and Shojiro Nishio., 2009

References Cont.

17. Technical Paper ” AIDA: Adaptive Application-Independent Data Aggregation in Wireless Sensor Networks” by TIAN HE, BRIAN M. BLUM, JOHN A. STANKOVIC and TAREK ABDELZAHER, 2004

18. Technical Paper ”Minimum-Latency Aggregation Scheduling in Multihop Wireless Networks” by Peng-Jun Wan, Scott C.-H. Huang, Lixin Wang, Zhiyuan Wan, Xiaohua Jia 2009

19. www.wikipedia.org (Graphs Only)

http://www.wikipedia.org/

data gathering and aggregation in wireless sensor networks elg7178f “ad hoc networking” albert...

Documents

footeroutlineclick view

collected data

data message

data values

push diffusion

sensor node

index node

sinks search